Add ARCHITECTURE + FOLLOWUPS docs and README pointers

Two new docs filling the gaps in the prior set:

docs/ARCHITECTURE.md
- Component map + runtime flow diagrams (install-time and per-stream).
- Cert pipeline walk-through end-to-end (CA bootstrap, op:// references,
  per-host mint, idempotency conditions).
- State directory inventory (where things write at runtime).
- Idempotency contract — explicit rules every script in this repo follows.
- Full file map of the repo.

docs/FOLLOWUPS.md
- Promoted the punch list out of the TROUBLESHOOTING.md trailing section.
- Each item now has: symptom, current workaround, fix sketch (with the
  actual code change, not vague intent), and a complexity estimate.
- Tracks: screensaver inhibit, busiest-workspace auto-switch (2-line
  patch), 1Password black-rectangle workarounds (untested), host.lan
  DNS (out-of-repo), 1P SSH-agent timeout, cert renewal timer, stale
  config keys, single-user assumption.

README.md
- New "Documentation" section between Clients and Diagnostics points at
  each of the three doc files plus client/README.md, with a one-line
  description for each so readers can navigate without spelunking.
This commit is contained in:
2026-05-20 06:54:07 -06:00
parent 16e2465cf5
commit be34fb0dc6
3 changed files with 539 additions and 0 deletions

View File

@@ -188,6 +188,15 @@ The host-side installer handles Linux clients via `moonlight-qt`. For everything
| Android | Play Store: Moonlight Game Streaming. |
| Apple TV (tvOS) | App Store: Moonlight Game Streaming. |
## Documentation
| Doc | What it covers |
|---|---|
| [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) | How the pieces fit together at install time and at stream time; component map; runtime flow diagrams; idempotency contract; file map. |
| [`docs/TROUBLESHOOTING.md`](docs/TROUBLESHOOTING.md) | Every failure mode hit during the first end-to-end install, in order, with symptom → cause → fix. Custom-keybinding setup for escaping Moonlight on macOS. |
| [`docs/FOLLOWUPS.md`](docs/FOLLOWUPS.md) | Outstanding work not yet on `main`: screensaver inhibit, busiest-workspace auto-switch, 1Password streaming workarounds, cert renewal automation, stale Sunshine config keys. |
| [`client/README.md`](client/README.md) | Per-platform Moonlight install and the first-pair walkthrough. |
## Diagnostics
```bash

297
docs/ARCHITECTURE.md Normal file
View File

@@ -0,0 +1,297 @@
# Architecture
How the pieces fit together at install time and at stream time. Pair this
with `TROUBLESHOOTING.md` (what went wrong + why) and the top-level `README.md`
(how to use it).
---
## Component map
```
┌─────────────────────────────────────┐
│ install.sh │
└──┬──────────────────────────────────┘
│ sources & orchestrates
┌──────────┬──────────┬────┼──────────┬──────────┬──────────┐
▼ ▼ ▼ ▼ ▼ ▼ ▼
common.sh detect.sh preflight packages permissions config certs
(logging, (GPU, (env (yay, (input grp, (writes (1P CA →
sudo, hostname, sanity setcap, uinput sunshine. host cert,
yay) session) checks) encoder) udev) conf) trust store)
├─→ headless.sh
│ (hook +
│ drop-in install)
├─→ firewall.sh
│ (ufw / firewalld
│ port opening)
└─→ service.sh
(unit detection
+linger+enable)
```
At runtime the moving parts are:
```
Moonlight client ───TCP/UDP─→ sunshine (Linux process)
┌───────── service ─────┤ on connect / disconnect
▼ ▼
hypridle Hyprland global_prep_cmd
(sleeps) ◄───────── hooks (do/undo)
HEADLESS-N output
(resized per client)
```
---
## Install-time flow
`install.sh` is intentionally linear and idempotent. Each step is "check, then
act" — re-running is safe.
1. **`require_*`** sanity guards: not root, on Arch, `yay` present.
2. **`detect_all`** sets `$HOSTNAME_SHORT`, `$GPU_VENDOR`, `$SESSION_TYPE` for the rest of the run.
3. **`compute_stream_mode`** picks `mirror` or `headless`:
- `--mirror` or `--headless` wins
- Otherwise: hostname matches an entry in `$HEADLESS_HOSTS` (case-insensitive, comma-separated) → headless; everything else → mirror.
4. **`preflight_all`** surfaces problems early (Wayland session? GPU driver responsive? `nvidia-drm.modeset` not explicitly disabled? `pipewire-pulse` installed? `hyprctl`+`jq`+Hyprland reachable if headless?). Mostly warn-only — install proceeds even with warnings.
5. **`install_sunshine`** + **`install_gpu_encoder_packages`**:
- Skip if `sunshine` or `sunshine-bin` is already installed AND `ldd` reports no missing libs.
- Otherwise `yay -S $SUNSHINE_PKG` (default `sunshine-bin`). `ldd`-check; if anything is "not found", remove `sunshine-bin`+`-debug` and rebuild from source.
6. **Permissions**: add user to `input` group, drop the uinput udev rule if missing, `setcap cap_sys_admin+p` on the sunshine binary (KMS capture needs this).
7. **Headless hooks** (only in headless mode): install `sunshine-stream-do.sh`, `sunshine-stream-undo.sh`, `sunshine-prestart.sh` into `~/.local/share/omarchy-moonlight/bin/`.
8. **`write_sunshine_config`**: generate `~/.config/sunshine/sunshine.conf` with a `# managed-by:` marker. Per-vendor encoder block + capture method per stream mode. Mirror mode writes `capture = kms` only; headless mode writes `capture = wlr`, `output_name = HEADLESS-1`, and a `global_prep_cmd` JSON referencing the installed hooks.
9. **Certs** (default on): pull CA from `op://Private/Omarchy-Stream Root CA/{cert,key}`, mint a host cert with SANs (`<host>.lan`, LAN IP, `localhost`, `127.0.0.1`), install host cert + key into Sunshine's `credentials/`, install CA into `/etc/ca-certificates/trust-source/anchors/`, `update-ca-trust extract`. Idempotent — skips re-mint if the existing cert is signed by the current CA and has all expected SANs and is >30d from expiry.
10. **Firewall**: detect ufw / firewalld via systemd unit state. Open Sunshine's TCP (47984/47989/47990/48010) and UDP (47998/47999/48000/48010) ports if either is active.
11. **Service**: resolve which unit was packaged (`sunshine.service` or `app-dev.lizardbyte.app.Sunshine.service`), install the headless prestart drop-in if mode is headless, `loginctl enable-linger`, enable + restart.
12. **`verify_install`** runs the same checks as `--doctor`: cap_sys_admin set, group resolves, web UI listening on `:47990`, encoder reachable, hook scripts present, CA in trust store.
---
## Runtime flow — headless mode
Mirror mode is simpler: Sunshine just captures the real DRM output via KMS,
hooks aren't involved. Headless mode is where the orchestration lives.
### Service start
```
systemd-user starts sunshine.service
├─ ExecStartPre=/bin/sleep 5 (packaged by AUR sunshine, gives the
│ graphical session time to come up)
├─ ExecStartPre=-${HOME}/.local/share/omarchy-moonlight/bin/sunshine-prestart.sh
│ │ (non-fatal: '-' prefix)
│ │
│ ├─ recovers $HYPRLAND_INSTANCE_SIGNATURE from $XDG_RUNTIME_DIR/hypr/
│ │ if it wasn't propagated by systemd-user env
│ │
│ ├─ sort -V the existing HEADLESS-* outputs (oldest first)
│ │
│ ├─ if zero exist → 'hyprctl output create headless'
│ ├─ if more than one exists → remove all but the lowest-numbered
│ │ (Hyprland's HEADLESS-N counter is monotonic; without active
│ │ dedupe extras accumulate forever)
│ │
│ └─ rewrite sunshine.conf's `output_name = ...` line to match the
│ surviving output's actual name (sed-in-place, idempotent)
└─ ExecStart=/usr/bin/sunshine
├─ reads sunshine.conf (now has correct output_name)
├─ enumerates Wayland outputs (finds HEADLESS-N)
├─ probes encoders against that output (nvenc / vaapi / etc.)
└─ binds :47984/:47989/:47990/:48010, advertises via Avahi
```
### Client connect
```
Moonlight client opens a stream
├─ TCP handshake to sunshine on :47989
├─ Sunshine spawns global_prep_cmd's `do` script:
│ env: SUNSHINE_CLIENT_WIDTH, _HEIGHT, _FPS, _HOST_AUDIO, _HDR
│ cwd: sunshine's working directory
│ sunshine-stream-do.sh
│ │
│ ├─ recover $HYPRLAND_INSTANCE_SIGNATURE (defensive)
│ ├─ snapshot prior workspace + monitor state to
│ │ $XDG_RUNTIME_DIR/sunshine-headless/{prev-monitors.json,
│ │ prev-workspace-id,
│ │ headless-name}
│ ├─ discover the existing HEADLESS-* (no hardcoded name)
│ │ if none → 'hyprctl output create headless' and wait for it
│ ├─ 'hyprctl keyword monitor <name>,${W}x${H}@${FPS},auto,1'
│ ├─ 'hyprctl dispatch moveworkspacetomonitor <prev_ws> <name>'
│ └─ 'hyprctl dispatch focusmonitor <name>'
├─ Sunshine binds encoder to the (now resized) output
└─ stream begins
```
### Client disconnect
```
Moonlight tears down the stream
├─ Sunshine runs global_prep_cmd's `undo` script:
│ sunshine-stream-undo.sh
│ │
│ ├─ read headless-name from state dir; fall back to discovery
│ ├─ find a non-headless monitor (if any are connected)
│ │ └─ 'hyprctl dispatch moveworkspacetomonitor <prev_ws> <real>'
│ │ └─ 'hyprctl dispatch focusmonitor <real>'
│ └─ DO NOT remove HEADLESS-N — it stays so Sunshine's next
│ startup encoder probe still has a surface to bind to.
│ (Create-then-destroy per session raced with Sunshine's
│ startup probe and tripped fatal errors.)
└─ clean state files
```
---
## Cert pipeline
A separate one-time bootstrap creates the CA in 1Password. Every host then
mints itself from that CA.
### One-time bootstrap (run on any single machine)
```
scripts/cert-bootstrap.sh
├─ require op CLI signed in
├─ openssl genrsa 4096 → CA private key (in tmpfs)
├─ openssl req -new -x509 -days 3650 → CA cert (signed by itself)
├─ 'op item create --category "Secure Note" --vault Private
│ --title "Omarchy-Stream Root CA"
│ cert[text]=<ca-cert.pem>
│ key[concealed]=<ca-key.pem>'
└─ print op://Private/Omarchy-Stream Root CA/{cert,key} references
```
### Per-host (built into install.sh)
```
fetch_and_install_certs (lib/certs.sh)
├─ op_require_signin (op whoami)
├─ stage CA in tmpfs ($XDG_RUNTIME_DIR/omarchy-certs.XXXX)
│ │
│ ├─ 'op read op://Private/Omarchy-Stream Root CA/cert' → ca-cert.pem
│ └─ 'op read op://Private/Omarchy-Stream Root CA/key' → ca-key.pem
├─ install CA into /etc/ca-certificates/trust-source/anchors/
│ omarchy-stream-ca.pem; update-ca-trust extract
├─ idempotency: existing host cert signed by current CA AND has all
│ expected SANs AND >30d from expiry → skip mint, exit
└─ openssl x509 -req → host cert (RSA 2048, 365 days)
-extensions v3_req
SANs:
DNS = <hostname>.lan
DNS = localhost
IP = <LAN-IP>
IP = 127.0.0.1
EKU = serverAuth, clientAuth
├─ install -m 0644 → ~/.config/sunshine/credentials/cacert.pem
└─ install -m 0600 → ~/.config/sunshine/credentials/cakey.pem
```
After the cert is replaced, all previously-paired Moonlight clients see a
fingerprint mismatch and must re-pair once. Subsequent installs preserve the
cert as long as the CA + SAN set hasn't changed.
---
## State directories
Everything the system writes lives in well-known paths so it can be inspected
or cleaned without code archaeology.
| Path | Owner | Contents |
|---|---|---|
| `~/.config/sunshine/` | sunshine | runtime config + state (user-managed) |
| `~/.config/sunshine/sunshine.conf` | this installer (when marker present) | tuned config; managed-by marker on line 1 |
| `~/.config/sunshine/credentials/` | this installer (cert step) | `cacert.pem` (host cert) + `cakey.pem` (private key) |
| `~/.local/share/omarchy-moonlight/bin/` | this installer | prep-cmd hook scripts in stable location |
| `~/.config/systemd/user/<unit>.service.d/` | this installer | headless prestart drop-in |
| `$XDG_RUNTIME_DIR/sunshine-headless/` | sunshine-stream-do.sh at runtime | per-stream state, cleared on reboot |
| `/etc/ca-certificates/trust-source/anchors/` | this installer (cert step) | the omarchy-stream Root CA, picked up by `update-ca-trust` |
---
## Idempotency contract
Every script in this repo is designed to be re-run safely. The patterns:
- `pkg_installed X || yay_install X` — only install missing packages.
- "Managed-by marker" on generated config files — only rewrite the file if
we wrote it last time (marker present) OR no file exists. User edits
that remove the marker are durable.
- Cert minting skips when the on-disk cert matches CA + SANs + expiry.
- `setcap` is a no-op if the cap is already set.
- `loginctl enable-linger` is a no-op if linger is already on.
- `systemctl --user enable` doesn't fail on an already-enabled unit.
- `hyprctl output create headless` runs only when no headless output exists
(the prestart script counts first).
- The prestart script *removes* extras to converge on exactly one — this is
the only "destructive" idempotency action, and it preserves the
lowest-numbered output (most likely to have workspaces bound to it).
A `./install.sh` re-run with no changes to environment should report a long
list of "Already installed" / "already present" lines and end with a clean
verify.
---
## File map
```
omarchy-moonlight/
├── install.sh Orchestrator
├── uninstall.sh Reverse install (preserves user data by default)
├── README.md User-facing install + usage
├── scripts/
│ └── cert-bootstrap.sh One-time CA generation + 1P upload
├── bin/ Repo source for runtime scripts
│ ├── sunshine-prestart.sh ExecStartPre: dedupe headless + sync conf
│ ├── sunshine-stream-do.sh prep-cmd `do`: resize + workspace migrate
│ └── sunshine-stream-undo.sh prep-cmd `undo`: workspace return
├── lib/ Installer library (sourced by install.sh)
│ ├── common.sh Logging, sudo helpers, idempotency primitives
│ ├── detect.sh GPU vendor, session type, hostname
│ ├── preflight.sh Pre-install sanity checks
│ ├── packages.sh yay -S + auto-fall-back from bin to source
│ ├── permissions.sh input group, uinput udev, cap_sys_admin
│ ├── config.sh Generates sunshine.conf
│ ├── certs.sh 1P-backed CA, host cert minting, trust install
│ ├── headless.sh Installs hook scripts + prestart drop-in
│ ├── firewall.sh ufw / firewalld port opening
│ ├── service.sh Unit detection, fallback, linger, enable
│ └── verify.sh Post-install checks (also reused by --doctor)
├── files/ Static files installed by lib functions
│ ├── headless-prestart.conf systemd drop-in template
│ └── sunshine.service Fallback unit if no AUR variant ships one
├── client/ Moonlight client side
│ ├── install-macos.sh brew install + add CA to System keychain
│ └── README.md Per-platform install + first-pair flow
└── docs/
├── ARCHITECTURE.md this file
├── TROUBLESHOOTING.md failure modes hit during install, with fixes
└── FOLLOWUPS.md outstanding work / known limitations
```

233
docs/FOLLOWUPS.md Normal file
View File

@@ -0,0 +1,233 @@
# Outstanding follow-ups
Tracked work that isn't on `main` yet. Each item lists the symptom, current
workaround, the fix sketch, and rough complexity so it can be picked up
in any order.
---
## P1 — Inhibit screensaver during streaming
**Symptom**: Omarchy's hypridle fires `omarchy-screensaver` after the host's
idle timeout, even when a Moonlight client is actively streaming. Sunshine
input arrives via `/dev/uinput` virtual devices, which hypridle's input
watcher doesn't always treat as activity. From the client's perspective the
stream "freezes" on the screensaver screen until the user wiggles input
enough to break hypridle's threshold.
**Current workaround**: wiggle the mouse / tap a key in the stream window.
**Fix sketch**:
1. In `bin/sunshine-stream-do.sh`, on stream start, take a `systemd-inhibit`
lock or `systemctl --user stop hypridle.service`:
```bash
# at the top, after env recovery
if command -v systemctl >/dev/null && systemctl --user is-active --quiet hypridle.service; then
touch "$STATE_DIR/hypridle-was-active"
systemctl --user stop hypridle.service
fi
```
2. In `bin/sunshine-stream-undo.sh`, on disconnect, restore:
```bash
if [[ -f "$STATE_DIR/hypridle-was-active" ]]; then
systemctl --user start hypridle.service
rm -f "$STATE_DIR/hypridle-was-active"
fi
```
**Complexity**: low. ~10 lines split across the two hook scripts. Self-healing
(if a stream crashes and undo doesn't run, next connect will be a no-op on
the inhibit and undo will idempotently leave the marker file).
**Alternative**: pass `--inhibit` to systemd-inhibit and run sunshine wrapped
in it. Cleaner separation but requires changing the service unit, which
fights the AUR package every upgrade.
---
## P1 — Auto-switch to the busiest workspace on connect
**Symptom**: when a client connects, the prep-cmd migrates the
*active* workspace to `HEADLESS-N`. If active was empty (e.g. workspace 4)
and the user's apps are on workspace 1, the stream shows an empty desktop
until the user presses `Super+1`.
**Current workaround**: press `Cmd+1` (or whatever maps to the apps workspace)
in the stream.
**Fix sketch**: change the workspace-detection in `bin/sunshine-stream-do.sh`
from `activeworkspace` to "workspace with the most windows":
```bash
# replace this:
PREV_WS="$(hyprctl activeworkspace -j 2>/dev/null | jq -r '.id // 1' || echo 1)"
# with this:
PREV_WS="$(hyprctl workspaces -j 2>/dev/null \
| jq -r 'sort_by(-.windows) | .[0].id // 1')"
```
**Complexity**: trivial. 2-line patch.
**Edge case**: if every workspace is empty (no apps running), `sort_by` is
still deterministic — picks whichever workspace happens to be first. That's
fine; the user just lands on an empty workspace either way.
---
## P2 — 1Password streams as a black rectangle
**Symptom**: the 1Password Linux app deliberately masks its window from
screen-capture surfaces. Through Moonlight, the 1Password window outline is
visible but the interior is solid black. Untested whether this is recoverable.
**Current workaround**: use the 1Password browser extension during the stream
(it lives inside Chromium and captures normally), or unlock 1Password on the
host before walking away (the SSH-agent / browser extension / `op` CLI all
still work even though the desktop window is unviewable).
**Things to test, in order of "least invasive"**:
1. Launch 1Password with `--disable-gpu --no-sandbox`. Software rendering
path may bypass the anti-capture flag.
```bash
pkill -f 1password
1password --disable-gpu --no-sandbox &
```
2. Force XWayland: `1password --ozone-platform=x11`. XWayland surfaces are
captured through a different code path than native-Wayland Electron
surfaces.
3. Patch the launcher (`~/.local/share/applications/1password.desktop`) with
whichever flag works.
4. If neither works: this is by-design upstream behavior and the realistic
answer is the browser-extension workaround. Document in
`client/README.md` under "Limitations."
**Complexity**: low (test command-line flags) to medium (patching .desktop
file properly).
---
## P2 — Resolve `<host>.lan` from clients
**Symptom**: `getent hosts <host>.lan` returns nothing — there's no DNS or
mDNS entry for the friendly LAN name. Clients work via raw IP
(`192.168.x.y`), and the host cert SAN covers both the .lan name and the IP,
so cert trust works either way. The friendly name just isn't reachable.
**Current workaround**: use the LAN IP, or add `<IP> <host> <host>.lan` to
the Mac's `/etc/hosts`.
**Fix sketch**: this is a network-layer change, not a repo change.
- **Unifi**: Settings → Networks → (your network) → DHCP → static reservation
for the host machine + Settings → DNS or "Local domain name" → set to
`lan`. Then the controller publishes `A` records for every reserved
device.
- **pi-hole / dnsmasq**: add an `address=/<host>.lan/<IP>` entry.
- **avahi-daemon**: would publish via mDNS, but Macs and some Linux clients
don't always pick it up. Less reliable than proper DNS.
**Complexity**: out-of-repo (depends on the user's network stack). Document
the choice once made.
---
## P2 — 1Password SSH-agent timeout breaks git signing
**Symptom**: long sessions interspersed with idle time cause 1Password's
SSH agent to go to sleep. Subsequent `git commit` fails with:
```
error: 1Password: failed to fill whole buffer
fatal: failed to write commit object
```
**Current workaround**: touch the 1Password desktop app or run
`eval $(op signin)` to revive the agent.
**Fix sketch** (none are perfect):
1. **Lengthen the agent timeout** — 1Password app → Settings → Developer →
SSH agent → bump the lock-after timeout. Cap is configurable but capped.
2. **Watchdog script** — periodically `ssh-add -l` and warn the user if it
fails. Adds noise.
3. **Per-repo `signingkey` config** to a different key not held by 1P —
defeats the purpose.
**Complexity**: low. Setting (1) is the realistic choice; (2) is overkill.
---
## P3 — Cert renewal automation
**Symptom**: host certs are valid 365 days. The installer re-mints when
`<30 days remaining` on the next `install.sh` run, but if you don't re-run
the installer in 11 months the cert silently expires and the next run
(somewhere between day 365 and day 395) re-mints — leaving a gap.
**Current workaround**: re-run `install.sh` periodically.
**Fix sketch**: ship a systemd `--user` timer that runs
`install.sh --doctor && install.sh --force-certs` weekly. Tied to the same
op-signin requirement, so it won't run unattended if 1P isn't unlocked —
that's actually fine (failure is loud, not silent).
**Complexity**: low-medium. New `.timer` + `.service` units; tricky bit is
making the timer fire only when both Hyprland and `op` are reachable.
---
## P3 — Stale config keys produce harmless warnings
**Symptom**: Sunshine logs `Unrecognized configurable option [nvenc_rc]` (and
`nvenc_tune`, `nvenc_coder`) on every startup. Recent Sunshine versions
renamed these keys — the encoder still works because the rename is
backward-compatible for the most important ones (`nvenc_preset`), but the
specific tuning keys we set don't take effect.
**Current workaround**: ignore the warnings; encoder defaults still work.
**Fix sketch**: update `lib/config.sh` to emit the new key names. Need to
verify the current spelling against the Sunshine version installed (likely
`nv_preset`/`nv_tune`/`nv_rc`/`nv_coder` without the `enc`, but the user
should `sunshine --help` to confirm). Same situation likely on AMD's
`amd_*` keys — verify when first Framework install actually exercises that
code path.
**Complexity**: trivial once the right key names are confirmed.
---
## P3 — Single-user assumption in everything
The installer + scripts assume a single human-user / single-host model. Not
a problem now, but if someone wants to share a JARVIS install across
multiple Sunshine instances (one per logged-in user), the headless prestart
script and `output_name` would collide.
**Fix sketch**: introduce a username-scoped `output_name` (e.g.
`HEADLESS-${USER}-1`) and scope the prestart's dedupe to outputs matching
that prefix. Substantial work; not justified without a real second user.
---
## Not on the list (intentionally)
- **TLS for the stream itself.** Sunshine and Moonlight handle this with
their own pinned-cert protocol; we don't touch it.
- **AV1 encoder.** RTX 3070 Ti is HEVC-capable but not AV1; users with
Ada/40-series can opt in via the web UI without code changes.
- **Remote-access layer** (Tailscale / WireGuard / Cloudflare). LAN-only
for now; the design assumes RFC1918 only and would need real DNS + DNS-01
cert issuance for proper remote.
- **Windows host support.** Sunshine works on Windows but the installer is
Arch-only by design.