# publish Go CLI that turns a local audio/video recording of a church service into: 1. A markdown summary (`--summerize`) 2. A 60–90s social-media hook clip cut from the source (`--clip`) 3. (Future) A post to Spotify for Podcasters (`--post` — currently stubs out) The repo directory is still `summerize/` (historical), but the module and binary are both `publish`. ## Pipeline (one pass, shared by all modes) ``` input ──ffmpeg──► 16kHz mono WAV ──whisper.cpp -oj──► []Segment{Start,End,Text} │ ┌─────────────────┴─────────────────┐ │ │ PlainText(segs) FormatForLLM(segs) │ │ ▼ ▼ Summarizer clip.Pick (Anthropic API or (same Summarizer, shelled-out claude CLI) different prompt → JSON) │ │ ▼ ▼ markdown summary ffmpeg cut [start,end] │ └─► output.MarkdownToSpotifyHTML (b/i/a/ul/ol/li/p subset that Spotify show notes accept) ``` Whisper output is cached at `.segments.json`. Subsequent runs (different modes, different prompt-clip params) skip whisper entirely. ## Layout ``` main.go flat flagset, mode dispatch, orchestration prompts/church-service.md default --summerize prompt (go:embed) prompts/clip-selector.md default --clip prompt; templated with {{MIN_SECONDS}} / {{MAX_SECONDS}} (go:embed) internal/audio/audio.go ffmpeg → 16kHz mono PCM WAV internal/transcribe/ transcribe.go Transcriber interface, Segment type segments.go Segment, PlainText, FormatForLLM, mm:ss helper whispercpp.go shells out to whisper-cli with -oj; parses JSON internal/summarize/ summarize.go Summarizer interface anthropic.go direct Messages API via net/http (no SDK dep; reads ANTHROPIC_API_KEY) claudecli.go `claude -p ` with transcript on stdin internal/clip/ clip.go Selection, Pick (LLM JSON parse), Extract (ffmpeg) clip_test.go JSON object extraction edge cases internal/output/ spotify.go markdown → Spotify-safe HTML spotify_test.go clipboard.go wl-copy / xclip / pbcopy Makefile build/install/link/doctor/uninstall targets scripts/install.sh interactive setup (OS + GPU detect → deps, whisper.cpp build, model download, link) ``` **Zero external Go dependencies.** Stdlib only. ## CLI surface ``` publish [mode...] [flags] modes (combine freely; defaults to --summerize): --summerize write a markdown summary --clip cut a 60-90s social hook clip --post post to Spotify (not implemented yet) ``` Modes share whisper output, so `publish --summerize --clip sermon.mp4` only transcribes once. ### Shared flags | flag | purpose | default | |---|---|---| | `--summarizer` | `claude-cli` or `claude-api` | `claude-cli` | | `--model` | model name (Anthropic API path defaults to `claude-sonnet-4-6`) | empty | | `--prompt-summary` | override summary prompt path | bundled | | `--prompt-clip` | override clip-selector prompt path | bundled | | `--whisper-bin` | whisper.cpp binary; auto-detects best backend (see "Backend auto-detect" below) | auto | | `--whisper-model` | path to ggml model | `~/.cache/whisper.cpp/ggml-base.en.bin` | | `--whisper-lang` | force language code | auto-detect | | `--whisper-threads` | thread count | library default | | `--segments` | segments JSON cache path | `.segments.json` | | `--keep-transcript` | also write `.transcript.txt` | off | | `--keep-wav` | keep the normalized WAV instead of tempdir | off | | `-v` | verbose progress to stderr | off | ### --summerize flags | flag | purpose | default | |---|---|---| | `--prompt` | producer's notes (any pre-written framing, title, key points) that anchor the summary | empty | | `--md PATH` | markdown output; `-` = stdout, `""` = disable | `.summary.md` | | `--spotify PATH` | Spotify HTML output; `-` = stdout | disabled | | `--copy` | copy Spotify HTML to clipboard | off | When `--prompt` is set, the value is prepended to the user message as a "Producer's notes" block above the transcript. The bundled prompt instructs the LLM to treat producer's notes as authoritative for titles, speaker names, framing, and key points, then use the transcript to expand and enrich them. Use this when the Spotify show notes you've already drafted should drive the summary's framing rather than the LLM inferring everything from scratch. For longer notes, use shell expansion: `--prompt "$(cat notes.md)"`. Note: `--prompt-summary` (system prompt template path) and `--prompt` (user notes content) are different flags. The former overrides the *system* prompt; the latter feeds *user content* into it. ### --clip flags | flag | purpose | default | |---|---|---| | `--min` | minimum clip length (seconds) | 60 | | `--max` | maximum clip length (seconds) | 90 | | `--out PATH` | clip output path | `.clip` (`.clip.m4a` for audio) | | `--copy-codec` | ffmpeg `-c copy` (fast, keyframe-aligned) — **skips the 9:16 portrait crop**, since stream copy can't apply video filters | off | | `--dry-run` | print the picked window but don't run ffmpeg | off | Video clips are always re-encoded as **1080×1920 portrait (9:16)** with a center crop, capped at **1 GiB** via ffmpeg's `-fs`. The crop filter is `crop=min(iw,ih*9/16):min(ih,iw*16/9)` so any source aspect (16:9, 4:3, 1:1, or already-portrait) yields the largest 9:16 sub-rectangle without distortion. See `portraitFilter` and `MaxClipBytes` in `internal/clip/clip.go`. ## Conventions / non-obvious choices - **Spelling: `summerize` is intentional.** It's the original name of the project and the user's preferred spelling. Use `summerize` (e.g. for `--summerize`) rather than auto-correcting to `summarize` in user-facing surfaces. Internal Go package `internal/summarize` keeps the standard spelling. - **Pluggable Summarizer is shared between modes.** `--clip` reuses the same Summarizer interface; the only difference is the prompt and the expectation of JSON output. If you add a new mode that talks to an LLM, plug it in there. - **Summarizer.Summarize takes the user content verbatim.** No implicit `"Transcript:"` prefix or other framing. Callers (`doSummerize` in main.go, `clip.Pick`) build the full user message themselves — that's how `--prompt` (producer's notes) prepends a "Producer's notes:" block above the transcript without the message getting mislabeled. - **Whisper output is the source of truth.** All text-only consumers go through `transcribe.PlainText(segs)`; we don't run whisper twice. - **JSON parsing for clip selection is defensive.** `clip.extractJSONObject` walks balanced braces (skipping strings) so the model can wrap its answer in prose despite the prompt asking for raw JSON. - **Clip extraction defaults to re-encode.** Frame-accurate cuts matter for short social hooks; `--copy-codec` trades that for speed. - **Anthropic API call uses net/http directly.** Adding the SDK was tempting, but the request is one POST and avoiding the dep keeps go.sum empty. - **`prepareWAV` cleanup is owned by the caller.** It returns a `func()` you must `defer`. Don't call `os.RemoveAll` on the wav path yourself. - **No subcommands.** The CLI is one flat flagset. Modes are boolean flags so multiple can run in one invocation and share state. ## Build / install **Fresh machine (recommended)** — clone, then run the interactive installer. It detects OS + GPU, builds whisper.cpp with the right backend, downloads a ggml model, and links `publish` + `whisper-cli-` into `~/.local/bin`: ```bash git clone ~/Git\ Repos/summerize cd ~/Git\ Repos/summerize make install # interactive make doctor # just print detected platform/GPU/dependencies ``` Re-runnable; each step is idempotent and skippable. The script supports Arch (`pacman`), Debian/Ubuntu (`apt`), Fedora (`dnf`), and macOS (`brew`); for unknown distros it prints the package list and skips the install command. **Already built once** — just rebuild: ```bash go build -o publish . # or make link # rebuilds + (re)points ~/.local/bin/publish at the repo ``` The symlink at `~/.local/bin/publish` is the canonical install location; rebuilds update in place via the symlink. ## External dependencies (runtime) | tool | required for | install | |---|---|---| | `ffmpeg` | always (audio extraction + clip cut) | `pacman -S ffmpeg` | | `whisper-cli` (whisper.cpp) | transcription | `pacman -S whisper.cpp` for CPU; for GPU acceleration see "GPU builds" below | | ggml whisper model | transcription | `curl -LO https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin` into `~/.cache/whisper.cpp/` | | `claude` CLI | `--summarizer claude-cli` (default) | already installed (Claude Code) | | `ANTHROPIC_API_KEY` | `--summarizer claude-api` | env var | | `wl-copy` / `xclip` / `pbcopy` | `--copy` flag (Spotify HTML to clipboard) | wl-copy ships with wayland on omarchy | ## Backend auto-detect When `--whisper-bin` is not set, `resolveBin` in `internal/transcribe/whispercpp.go` picks a backend at runtime: 1. **CUDA** — if `~/.local/bin/whisper-cli-cuda` (or `whisper-cli-cuda` on PATH) exists *and* `nvidia-smi -L` exits 0. 2. **ROCm** — if `whisper-cli-rocm` exists *and* `rocminfo` exits 0. 3. **Vulkan** — if `whisper-cli-vulkan` exists *and* `vulkaninfo --summary` exits 0. 4. **CPU fallback** — first of `whisper-cli` / `whisper-cpp` / `main` on PATH. Each probe is gated on a 5s timeout. The chosen backend is logged on a single stderr line (`whisper: using CUDA backend (/path)`); `-v` adds diagnostics about which probes were skipped or failed. The convention is one whisper.cpp checkout per host with a per-backend symlink in `~/.local/bin/whisper-cli-`, so the same `publish` binary works across machines without machine-specific flags. ### CUDA build (RTX 3070 Ti / desktop) ``` sudo pacman -S --needed cuda # ~3GB; installs to /opt/cuda git clone --depth=1 https://github.com/ggerganov/whisper.cpp ~/Git\ Repos/whisper.cpp cd ~/Git\ Repos/whisper.cpp PATH=/opt/cuda/bin:$PATH cmake -B build \ -DGGML_CUDA=1 \ -DCMAKE_CUDA_ARCHITECTURES=86 \ -DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-15 PATH=/opt/cuda/bin:$PATH cmake --build build -j$(nproc) --config Release ln -sf "$PWD/build/bin/whisper-cli" ~/.local/bin/whisper-cli-cuda ``` CUDA 13.2 caps the host compiler at GCC 15; system gcc is 16, so the `-DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-15` line is required (the `gcc15` package ships `g++-15` alongside the default toolchain). `sm_86` matches the RTX 3070 Ti compute capability — adjust if the GPU changes. CUDA smoke test — these stderr lines should appear in any run: ``` whisper_init_with_params_no_state: use gpu = 1 ggml_cuda_init: found 1 CUDA devices ... whisper_backend_init_gpu: using CUDA0 backend ``` ### ROCm build (Framework 16 / Radeon RX 7700S) The 7700S is RDNA3 (gfx1102). ROCm 6.x supports it. ``` sudo pacman -S --needed rocm-hip-sdk rocm-hip-runtime hipblas rocblas git clone --depth=1 https://github.com/ggerganov/whisper.cpp ~/Git\ Repos/whisper.cpp cd ~/Git\ Repos/whisper.cpp HIPCXX=/opt/rocm/llvm/bin/clang++ cmake -B build-rocm \ -DGGML_HIP=1 \ -DAMDGPU_TARGETS=gfx1102 \ -DCMAKE_BUILD_TYPE=Release cmake --build build-rocm -j$(nproc) ln -sf "$PWD/build-rocm/bin/whisper-cli" ~/.local/bin/whisper-cli-rocm ``` If ROCm doesn't recognize gfx1102 (older ROCm releases), set `HSA_OVERRIDE_GFX_VERSION=11.0.0` in the shell before invoking `publish` to spoof gfx1100 — same RDNA3 ISA, supported kernels. ROCm smoke test — look for `ggml_cuda_init` (HIP reuses the CUDA backend naming in whisper.cpp) plus a ROCm device line on stderr. ### Vulkan build (universal GPU fallback) Vulkan is the easiest cross-vendor path; uses any GPU with a working Vulkan driver (Mesa RADV for AMD/Intel, Nvidia proprietary, etc.). ``` sudo pacman -S --needed vulkan-headers vulkan-icd-loader shaderc cd ~/Git\ Repos/whisper.cpp cmake -B build-vulkan -DGGML_VULKAN=1 -DCMAKE_BUILD_TYPE=Release cmake --build build-vulkan -j$(nproc) ln -sf "$PWD/build-vulkan/bin/whisper-cli" ~/.local/bin/whisper-cli-vulkan ``` Slower than native CUDA/ROCm but works on machines where the vendor toolchain is too painful to install. Useful as a portable fallback for laptops with iGPUs. ### Metal build (Apple Silicon) `make install` handles this automatically; the manual recipe is short because cmake on macOS picks up Metal by default — no special flag. Prerequisites: ``` xcode-select --install # one-time /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" brew install go cmake ffmpeg ``` Build: ``` git clone --depth=1 https://github.com/ggerganov/whisper.cpp ~/Git\ Repos/whisper.cpp cd ~/Git\ Repos/whisper.cpp cmake -B build-metal -DCMAKE_BUILD_TYPE=Release cmake --build build-metal -j$(sysctl -n hw.ncpu) ln -sf "$PWD/build-metal/bin/whisper-cli" ~/.local/bin/whisper-cli-metal ``` The resolver special-cases Darwin: if `whisper-cli-metal` exists it's used immediately (no probe — Metal is always available on macOS). On a Mac without that symlink, the CPU fallback finds `whisper-cli` / `whisper-cpp` from brew (which is itself Metal-enabled by default), so a plain `brew install whisper-cpp` is a workable lazy path. It just shows "CPU backend" in the publish log line even though whisper.cpp is in fact running Metal kernels. Metal smoke test — these stderr lines should appear in any run: ``` ggml_metal_init: allocating ggml_metal_init: found device: Apple M1 ... whisper_backend_init_gpu: using Metal backend ``` ## Tests ``` go test ./... ``` Covered: - `internal/output/spotify_test.go` — markdown→Spotify-HTML conversion, escaping - `internal/clip/clip_test.go` — JSON object extraction, including prose-wrapped and fence-wrapped model output There are no integration tests for whisper or the LLM calls — those depend on external binaries and remote APIs. ## Future work - **`--post`**: post the markdown summary as a Spotify for Podcasters episode description. Requires the Spotify show-notes API or Spotify for Podcasters upload integration. Reuse `output.MarkdownToSpotifyHTML` since their show- notes editor accepts that subset. - **Multi-clip output**: pick the top-N hooks instead of one. The current `Selection` would become `[]Selection` and the prompt would request an array. - **Faster `--summarizer` for short transcripts**: default to Haiku for very short inputs to save on API costs.