This bundles every fix we made debugging the first real install plus a comprehensive troubleshooting reference. Working tree is now PII-safe for public distribution: hostname-based default mode is driven by a HEADLESS_HOSTS env var instead of a hardcoded literal; docs use placeholders for hostnames and LAN IPs. Self-healing headless management - bin/sunshine-prestart.sh (new): runs as systemd ExecStartPre. Resolves the Hyprland instance signature from XDG_RUNTIME_DIR/hypr when systemd-user env didn't propagate it. Reduces to exactly one headless output by keeping the lowest-numbered HEADLESS-N and removing the rest. Rewrites the managed sunshine.conf's output_name line to match the surviving name — Hyprland's HEADLESS-N counter is monotonic and ignores the optional name argument to 'output create headless', so without active sync output_name drifts off HEADLESS-1 after the first restart cycle. - bin/sunshine-stream-do.sh: dropped the hardcoded MON=HEADLESS-1. Now discovers whatever HEADLESS-* exists via jq. Resize and workspace migration target the actual output. - bin/sunshine-stream-undo.sh: reads the headless name from a state file the do-script wrote, with discovery fallback. Stops removing the output between sessions — the create/destroy race caused fatal startup encoder errors on the next Sunshine restart. - files/headless-prestart.conf, files/sunshine.service: ExecStartPre now points at the new prestart script. - lib/headless.sh: install_headless_hooks now installs all three scripts. New install_headless_prestart_dropin resolves the actual systemd unit name (sunshine.service vs app-dev.lizardbyte.app.Sunshine.service) and lands the drop-in under <unit>.service.d/. Firewall detection - lib/firewall.sh: _ufw_active now uses 'systemctl is-active ufw.service' instead of 'ufw status'. The latter requires root to read /etc/ufw state, so the unprivileged probe returned false and we silently skipped opening Sunshine's ports on hosts where ufw was actively dropping packets. Service unit fallbacks - lib/service.sh: ensure_sunshine_unit_present looks for sunshine.service in every systemd-user path first; falls back to the reverse-DNS AUR-source unit name; last resort drops a repo-provided fallback unit. systemctl reset-failed before each restart so a previous start-limit-hit doesn't immediately reject the new attempt. Preflight - lib/preflight.sh: new preflight_headless step that, only when STREAM_MODE is headless, surfaces missing hyprctl / jq / Hyprland reachability before install proceeds. Public-safe defaults - install.sh: streaming-mode default is now driven by HEADLESS_HOSTS env var (comma-separated, case-insensitive). Unset by default — every host gets mirror mode unless its hostname is listed or --headless is passed explicitly. Past versions hardcoded a specific hostname. - README.md: replaced JARVIS-specific examples with HEADLESS_HOSTS prose. Docs - docs/TROUBLESHOOTING.md (new): comprehensive failure-mode reference. Every issue hit during the first end-to-end install, in order, with symptom → cause → fix → permanent prevention. Plus a "Custom keybinding to escape Moonlight" section and an outstanding-followups punch list (1Password black-rectangle workarounds, hypridle inhibit during stream, busiest- workspace auto-switch, jarvis.lan DNS, 1Password SSH agent timeouts).
18 KiB
Troubleshooting & lessons from the first end-to-end install
Every failure mode below was hit in practice during the first end-to-end install on a real Omarchy/NVIDIA/Wayland host. The fixes landed in this repo so future installs (e.g. on the Framework) shouldn't repeat them — this doc explains the why so you can recognize them if they come back with a twist.
The "Real failures hit on first install" section is in roughly the order they surfaced.
Real failures hit on first install
1. sunshine-bin breaks on rolling Arch (library drift)
Symptom
sunshine: error while loading shared libraries: libicuuc.so.76:
cannot open shared object file: No such file or directory
Service hits the systemd start-limit after five rapid failures and won't try
again until reset-failed.
Cause
The AUR sunshine-bin package ships a binary built against whatever ICU was
current at AUR-build time (ICU 76). Arch had moved to ICU 78. Library SONAME is
hard-pinned at build time, so the binary fails to load any of its ICU calls.
This recurs whenever Arch bumps a major library and the sunshine-bin
maintainer hasn't rebuilt yet.
Fix in this repo
lib/packages.sh → install_sunshine now runs ldd $(command -v sunshine)
after install. If anything is "not found", removes sunshine-bin (and
sunshine-bin-debug) and rebuilds from source. The source-build linking pulls
in the current ICU, so it works as long as Arch ships any version.
lib/service.sh → enable_sunshine_service runs systemctl --user reset-failed before each restart, so a previous failure doesn't preempt the
recovery.
Manual recovery if you hit it on a non-installer flow
sudo pacman -Rns --noconfirm sunshine-bin sunshine-bin-debug
yay -S --needed --noconfirm sunshine
sudo setcap cap_sys_admin+p "$(readlink -f "$(command -v sunshine)")"
systemctl --user reset-failed sunshine.service
2. Source build is slow (~10 min); cached pkg.tar.zst install collides on debug symbols
Symptom
error: failed to commit transaction (conflicting files)
sunshine-debug: /usr/lib/debug/usr/bin/sunshine.debug exists in filesystem
(owned by sunshine-bin-debug)
After a successful makepkg, pacman -U of the resulting .pkg.tar.zst
refuses to install.
Cause
sunshine-bin ships with a separate sunshine-bin-debug package for debug
symbols, owned by sunshine-bin-debug. The source build's
sunshine-debug package wants to put its symbols at the same path. Removing
sunshine-bin doesn't auto-remove its -debug sibling.
Fix in this repo
lib/packages.sh removes both sunshine-bin AND sunshine-bin-debug before
the source rebuild. uninstall.sh similarly drops all four variants
(sunshine, sunshine-debug, sunshine-bin, sunshine-bin-debug).
3. Source sunshine doesn't ship sunshine.service
Symptom
Failed to restart sunshine.service: Unit sunshine.service not found.
After replacing sunshine-bin with the source sunshine build, the service
that worked before is gone.
Cause
The AUR source PKGBUILD installs its systemd unit under a Flatpak-style
reverse-DNS filename: app-dev.lizardbyte.app.Sunshine.service. The
sunshine-bin variant ships sunshine.service directly.
Fix in this repo
lib/service.sh → ensure_sunshine_unit_present resolves the actual unit
name in three steps:
- Look for
sunshine.servicein every systemd-user lookup path. - If not found, look for
app-dev.lizardbyte.app.Sunshine.serviceand use that name directly (we no longer try to symlink it assunshine.service— systemd refuses toenablelinked unit files). - Last resort: install the repo's
files/sunshine.servicefallback unit.
The drop-in path (for the headless prestart hook) is computed against the
resolved unit name so <unit-name>.service.d/ always lands in the right
place.
4. The fatal-but-misleading "Unable to find display or encoder during startup"
Symptom
[wlgrab] -------- Start of Wayland monitor list --------
[wlgrab] --------- End of Wayland monitor list ---------
Error: Unable to initialize capture method
...
Encoder [nvenc] failed
Encoder [vulkan] failed
Encoder [vaapi] failed
Encoder [software] failed
Fatal: Unable to find display or encoder during startup.
Fatal: Please ensure your manually chosen GPU and monitor are connected
Visible in the web UI as red banner warnings. Streaming may still appear to work later because Sunshine re-probes at stream time, but the startup state is broken.
Cause
In headless mode, sunshine.conf pins output_name = HEADLESS-1 and
capture = wlr. At service start, Sunshine connects to Wayland, enumerates
outputs, and finds nothing — because the Hyprland headless output is created
by the per-stream prep-cmd, not at boot. With no output, every encoder probe
fails on platform-init.
Fix in this repo
bin/sunshine-prestart.sh runs as ExecStartPre for the Sunshine service. It
guarantees one (and exactly one) Hyprland headless output exists before
Sunshine probes encoders. It also rewrites sunshine.conf's output_name
line to match whatever name Hyprland assigned the surviving headless (see
issue 6).
The drop-in lives at
~/.config/systemd/user/<unit-name>.service.d/headless-prestart.conf,
managed by lib/headless.sh → install_headless_prestart_dropin.
The bin/sunshine-stream-undo.sh was also changed to keep the headless
across stream sessions — create/destroy per stream raced with the startup
encoder probe.
5. Web UI 0.0.0.0 binding was open to the LAN
Symptom
By default Sunshine binds the web UI on 0.0.0.0:47990. Anyone on your LAN
who can reach the host can hit the admin panel; only a username/password
guards it.
Fix in this repo
lib/config.sh writes origin_web_ui_allowed = pc. Sunshine then rejects
web-UI requests from any source other than localhost regardless of bind
address. Streaming/pairing on port 47989 stays LAN-accessible — only the
admin surface is locked down.
Recommended access pattern (also documented inline in sunshine.conf):
# From a client machine:
ssh -L 47990:127.0.0.1:47990 user@host
# Then browse to:
https://localhost:47990
The host cert SANs include localhost and 127.0.0.1 (see issue 11) so the
tunneled URL doesn't trigger hostname-mismatch warnings.
6. Hyprland headless counter never decrements
Symptom
After several restarts, hyprctl monitors all shows
HEADLESS-4, HEADLESS-9, HEADLESS-10, ...
Selected monitor [] for streaming in the journal — Sunshine can't match
output_name = HEADLESS-1 against any existing output.
Cause
Hyprland's hyprctl output create headless ignores the optional name
argument (at least through 0.54.x) and auto-numbers as HEADLESS-N. The
counter is monotonic across the Hyprland session — removing HEADLESS-1 and
re-creating gives you HEADLESS-2, not HEADLESS-1. Combine that with an
old version of the prep-cmd script that hardcoded MON=HEADLESS-1 and
created a new output every connect, and you accumulate a herd.
Fix in this repo
Two-layer:
bin/sunshine-prestart.shactively removes extras, keeping only the lowest-numberedHEADLESS-N. Then it rewritessunshine.conf'soutput_nameto that surviving name. Self-healing.bin/sunshine-stream-do.shandsunshine-stream-undo.shnow discover the existing headless name viajqinstead of hardcodingHEADLESS-1. Resize and workspace-migration target the actual output.
7. ufw was active but our detection missed it
Symptom
ARP between Mac and the host worked fine (layer 2 visible). TCP to port
47989 hung from the Mac. nmap -p 47989 192.168.x.y said "Host seems
down."
Cause
lib/firewall.sh → _ufw_active previously ran ufw status | grep -q "Status: active". ufw status requires root to read /etc/ufw state, so
an unprivileged check returned nothing → returned false → we silently
skipped opening Sunshine's ports. ufw was happily dropping inbound packets.
Fix in this repo
_ufw_active now uses systemctl is-active ufw.service. ufw runs as a
Type=oneshot unit with RemainAfterExit, so is-active reports active
while the rules are loaded — and the check works without sudo.
One-shot LAN allow if you hit it
for port in 47984 47989 47990 48010; do
sudo ufw allow from 192.168.x.0/24 to any port $port proto tcp
done
for port in 47998 47999 48000 48010; do
sudo ufw allow from 192.168.x.0/24 to any port $port proto udp
done
sudo ufw status verbose
8. Existing prep-cmd hooks were a stale version
Symptom
Headless output stayed at 1920x1080 regardless of what Moonlight requested.
hyprctl monitors all showed extra HEADLESS-N appearing on each
reconnect. Journal had no [sunshine-do] log lines.
Cause
~/.local/share/omarchy-moonlight/bin/sunshine-stream-do.sh was an early
version that hardcoded MON=HEADLESS-1. Each connect: didn't find
HEADLESS-1, ran hyprctl output create headless (spawned a new
unnamed one), then tried to resize the non-existent HEADLESS-1 — silent
no-op. The repo's bin/ had the discovery-based version, but it had never
been re-installed after the first install.
Sunshine's stderr capture also doesn't propagate prep-cmd output to the journal, so the silent failures were invisible.
Fix in this repo
lib/headless.sh → install_headless_hooks now installs all three scripts
(do / undo / prestart) into ~/.local/share/omarchy-moonlight/bin/ and is
re-run on every install.sh invocation — re-installing keeps them in
lockstep with the repo.
For ad-hoc debugging of the prep-cmd, add a file-logging shim:
sed -i '/^set -euo pipefail/a exec > >(tee -a /tmp/sunshine-prepcmd.log) 2>&1' \
~/.local/share/omarchy-moonlight/bin/sunshine-stream-do.sh
then cat /tmp/sunshine-prepcmd.log after a reconnect to see exactly what
the script saw and what it tried to do.
9. omarchy-screensaver shows in the stream instead of your apps
Symptom
Stream connects, shows the omarchy screensaver. Workspace shows
hasfullscreen: 1, lastwindowtitle: omarchy-screensaver.
Cause
Omarchy's hypridle config runs the screensaver after the host's idle
threshold. Input arriving via Sunshine's /dev/uinput virtual devices
doesn't always reset hypridle's input-detection (depends on whether
hypridle treats uinput devices as activity sources). When you sit on the
Mac for a while, the host thinks the host is idle, fires the screensaver.
Current workaround
Wiggle the mouse / tap a key in the stream window — hypridle will register the activity and dismiss the screensaver.
Outstanding fix (not yet implemented):
sunshine-stream-do.sh should issue hyprctl dispatch exec hyprctl reload or systemctl --user stop hypridle.service-style
inhibition when a stream starts, and re-enable on stream stop via the
undo hook. Alternatively, add a hypridle.conf rule that excludes the
streaming session from idle counting. See follow-up #1 below.
10. Moonlight Mac defaults make it a roach motel
Symptom
After enabling "Capture system keyboard shortcuts: Always," all standard Mac escapes (Cmd+Tab, Cmd+Q, even Cmd+Option+Esc) get forwarded to the host. Mouse is also captured in game-mode. No clear way out.
Causes & fixes (Mac-side settings)
In Moonlight on the Mac → gear icon → Input section:
| Setting | Value | Why |
|---|---|---|
| Optimize mouse for remote desktop | ON | Absolute-position cursor; can mouse out of Moonlight onto other Mac apps |
| Capture system keyboard shortcuts | Only in fullscreen | Windowed-mode Mac shortcuts still work; capture kicks in only on explicit fullscreen |
| Swap Cmd and Ctrl keys | OFF | Keep Cmd → Super mapping for Hyprland binds |
Emergency exit that always works (macOS-level, can't be captured):
- Three-finger swipe up (Mission Control)
Cmd + Option + Esc(Force Quit window) — usuallyCtrl + Shift + Q(system logout dialog — steals focus from Moonlight, then cancel)
Reliable custom exit shortcut — see "Custom keybinding" section below.
11. Web UI cert hostname mismatch under SSH tunnel
Symptom
After bringing the SSH tunnel up (ssh -L 47990:127.0.0.1:47990 ...),
visiting https://localhost:47990 shows a browser cert warning even
though the cert IS signed by the omarchy-stream CA.
Cause
The host cert's SAN list was originally only <hostname>.lan and the LAN
IP. localhost and 127.0.0.1 weren't valid for that cert, so the browser
correctly rejected the hostname.
Fix in this repo
lib/certs.sh now mints host certs with SANs:
DNS.1 = ${host_lc}.lan
DNS.2 = localhost
IP.1 = ${lan_ip}
IP.2 = 127.0.0.1
And the idempotency check in fetch_and_install_certs requires those SANs
— existing hosts re-mint on next install.
Custom keybinding to escape Moonlight (Mac)
Three options. Option C is most reliable.
A. macOS App Shortcut (5 min, no extra software)
System Settings → Keyboard → Keyboard Shortcuts → App Shortcuts → +:
- Application: Moonlight Game Streaming
- Menu Title:
Quit Moonlight Game Streaming(orDisconnect) - Keyboard Shortcut: e.g.
⌘⎋(Cmd+Esc)
Requires Moonlight's "Capture system keyboard shortcuts" to be Only in fullscreen so the shortcut reaches macOS in windowed mode.
B. Karabiner-Elements (best for fullscreen)
Karabiner intercepts at HID, below the Moonlight app layer. The setup pasted
in the troubleshooting chat: bind Right ⌘ + Esc → killall Moonlight.
{
"title": "Force-quit Moonlight",
"rules": [{
"description": "Right-Cmd + Escape → killall Moonlight",
"manipulators": [{
"type": "basic",
"from": {
"key_code": "escape",
"modifiers": { "mandatory": ["right_command"] }
},
"to": [{ "shell_command": "killall Moonlight" }]
}]
}]
}
C. Stream Deck (since you have one)
System → Multimedia → Hotkey → send ⌃⌥⇧Q (Moonlight's built-in quit-stream
combo) or System → Open with command /usr/bin/killall Moonlight. Stream
Deck input is HID; Moonlight can't capture it.
Outstanding follow-ups
1. Inhibit hypridle / screensaver during stream
Currently the omarchy screensaver fires after the host's idle timeout even when a stream is active. Either:
- Have
sunshine-stream-do.shsend asystemd-inhibitlock orhyprctl dispatch global hyprlock:lock-off(does that exist?) when a stream starts, release insunshine-stream-undo.sh. - Or add an
unbindfor the input watcher in~/.config/hypr/hypridle.confwhen aSUNSHINE_STREAMING=1env marker is present.
Outcome wanted: no screensaver while any Moonlight client is connected.
2. Auto-switch to the workspace with the most windows on connect
Currently sunshine-stream-do.sh migrates the active workspace to the
headless. If active was empty (workspace 4) and the user's apps are on
workspace 1, they see nothing until pressing Cmd+1.
Two-line patch in sunshine-stream-do.sh:
TARGET_WS="$(hyprctl workspaces -j | jq -r 'sort_by(-.windows) | .[0].id')"
PREV_WS="$TARGET_WS" # instead of activeworkspace
Pick the busiest workspace and migrate that. User lands on their work.
3. 1Password window streams as a black rectangle
By design — 1Password masks its window from screen capture. Workarounds in order of preference:
- Launch 1Password with
--disable-gpu --no-sandbox— software rendering path may evade the anti-capture. - Launch with
--ozone-platform=x11— XWayland surface captured differently than native-Wayland surface. - Use the 1Password browser extension during the streamed session — the extension's UI lives in Chromium, which captures normally.
- Use
opCLI for everything CLI-driven (cert pipeline already does this). - Unlock 1Password locally on the host before walking away; other apps that
talk to the 1Password agent (browser extension,
opCLI, SSH signing agent) work fine even though the desktop app's window can't be seen.
We haven't picked one yet — see chat for next steps.
4. mDNS / your-host.lan doesn't resolve
Even on the host, getent hosts your-host.lan returns nothing. To use the
friendly hostname instead of the LAN IP from clients, set up the DNS entry
in Unifi (the user's network gear). Until then, clients use 192.168.x.y
directly.
5. Periodic 1Password SSH-agent timeout breaks git signing
During this install the 1P SSH agent went idle several times mid-session,
causing git commit to fail with 1Password: failed to fill whole buffer.
Touching the 1Password desktop app revives it.
Possible mitigations:
- Set a longer SSH-agent timeout in 1Password app settings.
- Add a watchdog that warns if
ssh-add -lfails for >N minutes. - Disable commit signing for this repo specifically (not recommended; just noting the option).
Quick reference — the install sequence as it actually runs end-to-end
After ./install.sh (with all the fixes above):
- Preflight checks (GPU, modeset, audio, headless prereqs)
yay -S sunshine-bin moonlight-qt pipewire-pulse vulkan-tools libva-utils jq- If
sunshine-binhas unresolved libs → rebuild from source (sunshine) - Add user to
inputgroup, install uinput udev rule,setcap cap_sys_admin+p - Write tuned
~/.config/sunshine/sunshine.conf:capture = wlr,output_name = HEADLESS-1(will be re-synced by prestart)encoder = nvenc+ low-latency NVENC paramsglobal_prep_cmd→ discovery-based do/undo scriptsorigin_web_ui_allowed = pc
- Install hook scripts (
do,undo,prestart) into~/.local/share/omarchy-moonlight/bin/ - Install prestart drop-in into
~/.config/systemd/user/<unit-name>.service.d/headless-prestart.conf - Open Sunshine's LAN ports on whatever firewall is active
- Enable the service,
loginctl enable-linger, start - (1Password-CLI signed-in) Fetch the CA from 1Password, mint a host cert with
SANs for
<hostname>.lan, LAN IP,localhost,127.0.0.1. Install CA into/etc/ca-certificates. - Restart service so it picks up the new cert
- Verify: cap_sys_admin set, service active on :47990, encoder reachable, hooks present, CA in trust store
Manual one-time setup on each Moonlight client:
- Mac:
./client/install-macos.sh(brew cask + CA add to System keychain) - Other Linux host: same
./install.shputs the CA into its system trust store - iOS/Android/Apple TV: install Moonlight from app store + import CA manually