16 KiB
publish
Go CLI that turns a local audio/video recording of a church service into:
- A markdown summary (
--summerize) - A 60–90s social-media hook clip cut from the source (
--clip) - (Future) A post to Spotify for Podcasters (
--post— currently stubs out)
The repo directory is still summerize/ (historical), but the module and binary
are both publish.
Pipeline (one pass, shared by all modes)
input ──ffmpeg──► 16kHz mono WAV ──whisper.cpp -oj──► []Segment{Start,End,Text}
│
┌─────────────────┴─────────────────┐
│ │
PlainText(segs) FormatForLLM(segs)
│ │
▼ ▼
Summarizer clip.Pick
(Anthropic API or (same Summarizer,
shelled-out claude CLI) different prompt → JSON)
│ │
▼ ▼
markdown summary ffmpeg cut [start,end]
│
└─► output.MarkdownToSpotifyHTML
(b/i/a/ul/ol/li/p subset
that Spotify show notes accept)
Whisper output is cached at <input>.segments.json. Subsequent runs (different
modes, different prompt-clip params) skip whisper entirely.
Layout
main.go flat flagset, mode dispatch, orchestration
prompts/church-service.md default --summerize prompt (go:embed)
prompts/clip-selector.md default --clip prompt; templated with
{{MIN_SECONDS}} / {{MAX_SECONDS}} (go:embed)
internal/audio/audio.go ffmpeg → 16kHz mono PCM WAV
internal/transcribe/
transcribe.go Transcriber interface, Segment type
segments.go Segment, PlainText, FormatForLLM, mm:ss helper
whispercpp.go shells out to whisper-cli with -oj; parses JSON
internal/summarize/
summarize.go Summarizer interface
anthropic.go direct Messages API via net/http
(no SDK dep; reads ANTHROPIC_API_KEY)
claudecli.go `claude -p <prompt>` with transcript on stdin
internal/clip/
clip.go Selection, Pick (LLM JSON parse), Extract (ffmpeg)
clip_test.go JSON object extraction edge cases
internal/output/
spotify.go markdown → Spotify-safe HTML
spotify_test.go
clipboard.go wl-copy / xclip / pbcopy
Makefile build/install/link/doctor/uninstall targets
scripts/install.sh interactive setup (OS + GPU detect → deps,
whisper.cpp build, model download, link)
Zero external Go dependencies. Stdlib only.
CLI surface
publish [mode...] [flags] <input>
modes (combine freely; defaults to --summerize):
--summerize write a markdown summary
--clip cut a 60-90s social hook clip
--post post to Spotify (not implemented yet)
Modes share whisper output, so publish --summerize --clip sermon.mp4 only
transcribes once.
Shared flags
| flag | purpose | default |
|---|---|---|
--summarizer |
claude-cli or claude-api |
claude-cli |
--model |
model name (Anthropic API path defaults to claude-sonnet-4-6) |
empty |
--prompt-summary |
override summary prompt path | bundled |
--prompt-clip |
override clip-selector prompt path | bundled |
--whisper-bin |
whisper.cpp binary; auto-detects best backend (see "Backend auto-detect" below) | auto |
--whisper-model |
path to ggml model | ~/.cache/whisper.cpp/ggml-base.en.bin |
--whisper-lang |
force language code | auto-detect |
--whisper-threads |
thread count | library default |
--segments |
segments JSON cache path | <input>.segments.json |
--keep-transcript |
also write <input>.transcript.txt |
off |
--keep-wav |
keep the normalized WAV instead of tempdir | off |
-v |
verbose progress to stderr | off |
--summerize flags
| flag | purpose | default |
|---|---|---|
--prompt |
producer's notes (any pre-written framing, title, key points) that anchor the summary | empty |
--md PATH |
markdown output; - = stdout, "" = disable |
<input>.summary.md |
--spotify PATH |
Spotify HTML output; - = stdout |
disabled |
--copy |
copy Spotify HTML to clipboard | off |
When --prompt is set, the value is prepended to the user message as a "Producer's notes" block above the transcript. The bundled prompt instructs the LLM to treat producer's notes as authoritative for titles, speaker names, framing, and key points, then use the transcript to expand and enrich them. Use this when the Spotify show notes you've already drafted should drive the summary's framing rather than the LLM inferring everything from scratch.
For longer notes, use shell expansion: --prompt "$(cat notes.md)".
Note: --prompt-summary (system prompt template path) and --prompt (user notes content) are different flags. The former overrides the system prompt; the latter feeds user content into it.
--clip flags
| flag | purpose | default |
|---|---|---|
--min |
minimum clip length (seconds) | 60 |
--max |
maximum clip length (seconds) | 90 |
--out PATH |
clip output path | <input>.clip<ext> (.clip.m4a for audio) |
--copy-codec |
ffmpeg -c copy (fast, keyframe-aligned) — skips the 9:16 portrait crop, since stream copy can't apply video filters |
off |
--dry-run |
print the picked window but don't run ffmpeg | off |
Video clips are always re-encoded as 1080×1920 portrait (9:16) with a center
crop, capped at 1 GiB via ffmpeg's -fs. The crop filter is
crop=min(iw,ih*9/16):min(ih,iw*16/9) so any source aspect (16:9, 4:3, 1:1, or
already-portrait) yields the largest 9:16 sub-rectangle without distortion. See
portraitFilter and MaxClipBytes in internal/clip/clip.go.
Conventions / non-obvious choices
- Spelling:
summerizeis intentional. It's the original name of the project and the user's preferred spelling. Usesummerize(e.g. for--summerize) rather than auto-correcting tosummarizein user-facing surfaces. Internal Go packageinternal/summarizekeeps the standard spelling. - Pluggable Summarizer is shared between modes.
--clipreuses the same Summarizer interface; the only difference is the prompt and the expectation of JSON output. If you add a new mode that talks to an LLM, plug it in there. - Summarizer.Summarize takes the user content verbatim. No implicit
"Transcript:"prefix or other framing. Callers (doSummerizein main.go,clip.Pick) build the full user message themselves — that's how--prompt(producer's notes) prepends a "Producer's notes:" block above the transcript without the message getting mislabeled. - Whisper output is the source of truth. All text-only consumers go through
transcribe.PlainText(segs); we don't run whisper twice. - JSON parsing for clip selection is defensive.
clip.extractJSONObjectwalks balanced braces (skipping strings) so the model can wrap its answer in prose despite the prompt asking for raw JSON. - Clip extraction defaults to re-encode. Frame-accurate cuts matter for
short social hooks;
--copy-codectrades that for speed. - Anthropic API call uses net/http directly. Adding the SDK was tempting, but the request is one POST and avoiding the dep keeps go.sum empty.
prepareWAVcleanup is owned by the caller. It returns afunc()you mustdefer. Don't callos.RemoveAllon the wav path yourself.- No subcommands. The CLI is one flat flagset. Modes are boolean flags so multiple can run in one invocation and share state.
Build / install
Fresh machine (recommended) — clone, then run the interactive installer.
It detects OS + GPU, builds whisper.cpp with the right backend, downloads a
ggml model, and links publish + whisper-cli-<backend> into ~/.local/bin:
git clone <repo-url> ~/Git\ Repos/summerize
cd ~/Git\ Repos/summerize
make install # interactive
make doctor # just print detected platform/GPU/dependencies
Re-runnable; each step is idempotent and skippable. The script supports Arch
(pacman), Debian/Ubuntu (apt), Fedora (dnf), and macOS (brew); for
unknown distros it prints the package list and skips the install command.
Already built once — just rebuild:
go build -o publish .
# or
make link # rebuilds + (re)points ~/.local/bin/publish at the repo
The symlink at ~/.local/bin/publish is the canonical install location;
rebuilds update in place via the symlink.
External dependencies (runtime)
| tool | required for | install |
|---|---|---|
ffmpeg |
always (audio extraction + clip cut) | pacman -S ffmpeg |
whisper-cli (whisper.cpp) |
transcription | pacman -S whisper.cpp for CPU; for GPU acceleration see "GPU builds" below |
| ggml whisper model | transcription | curl -LO https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin into ~/.cache/whisper.cpp/ |
claude CLI |
--summarizer claude-cli (default) |
already installed (Claude Code) |
ANTHROPIC_API_KEY |
--summarizer claude-api |
env var |
wl-copy / xclip / pbcopy |
--copy flag (Spotify HTML to clipboard) |
wl-copy ships with wayland on omarchy |
Backend auto-detect
When --whisper-bin is not set, resolveBin in
internal/transcribe/whispercpp.go picks a backend at runtime:
- CUDA — if
~/.local/bin/whisper-cli-cuda(orwhisper-cli-cudaon PATH) exists andnvidia-smi -Lexits 0. - ROCm — if
whisper-cli-rocmexists androcminfoexits 0. - Vulkan — if
whisper-cli-vulkanexists andvulkaninfo --summaryexits 0. - CPU fallback — first of
whisper-cli/whisper-cpp/mainon PATH.
Each probe is gated on a 5s timeout. The chosen backend is logged on a
single stderr line (whisper: using CUDA backend (/path)); -v adds
diagnostics about which probes were skipped or failed. The convention is
one whisper.cpp checkout per host with a per-backend symlink in
~/.local/bin/whisper-cli-<backend>, so the same publish binary works
across machines without machine-specific flags.
CUDA build (RTX 3070 Ti / desktop)
sudo pacman -S --needed cuda # ~3GB; installs to /opt/cuda
git clone --depth=1 https://github.com/ggerganov/whisper.cpp ~/Git\ Repos/whisper.cpp
cd ~/Git\ Repos/whisper.cpp
PATH=/opt/cuda/bin:$PATH cmake -B build \
-DGGML_CUDA=1 \
-DCMAKE_CUDA_ARCHITECTURES=86 \
-DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-15
PATH=/opt/cuda/bin:$PATH cmake --build build -j$(nproc) --config Release
ln -sf "$PWD/build/bin/whisper-cli" ~/.local/bin/whisper-cli-cuda
CUDA 13.2 caps the host compiler at GCC 15; system gcc is 16, so the
-DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-15 line is required (the gcc15
package ships g++-15 alongside the default toolchain). sm_86 matches
the RTX 3070 Ti compute capability — adjust if the GPU changes.
CUDA smoke test — these stderr lines should appear in any run:
whisper_init_with_params_no_state: use gpu = 1
ggml_cuda_init: found 1 CUDA devices ...
whisper_backend_init_gpu: using CUDA0 backend
ROCm build (Framework 16 / Radeon RX 7700S)
The 7700S is RDNA3 (gfx1102). ROCm 6.x supports it.
sudo pacman -S --needed rocm-hip-sdk rocm-hip-runtime hipblas rocblas
git clone --depth=1 https://github.com/ggerganov/whisper.cpp ~/Git\ Repos/whisper.cpp
cd ~/Git\ Repos/whisper.cpp
HIPCXX=/opt/rocm/llvm/bin/clang++ cmake -B build-rocm \
-DGGML_HIP=1 \
-DAMDGPU_TARGETS=gfx1102 \
-DCMAKE_BUILD_TYPE=Release
cmake --build build-rocm -j$(nproc)
ln -sf "$PWD/build-rocm/bin/whisper-cli" ~/.local/bin/whisper-cli-rocm
If ROCm doesn't recognize gfx1102 (older ROCm releases), set
HSA_OVERRIDE_GFX_VERSION=11.0.0 in the shell before invoking publish
to spoof gfx1100 — same RDNA3 ISA, supported kernels.
ROCm smoke test — look for ggml_cuda_init (HIP reuses the CUDA backend
naming in whisper.cpp) plus a ROCm device line on stderr.
Vulkan build (universal GPU fallback)
Vulkan is the easiest cross-vendor path; uses any GPU with a working Vulkan driver (Mesa RADV for AMD/Intel, Nvidia proprietary, etc.).
sudo pacman -S --needed vulkan-headers vulkan-icd-loader shaderc
cd ~/Git\ Repos/whisper.cpp
cmake -B build-vulkan -DGGML_VULKAN=1 -DCMAKE_BUILD_TYPE=Release
cmake --build build-vulkan -j$(nproc)
ln -sf "$PWD/build-vulkan/bin/whisper-cli" ~/.local/bin/whisper-cli-vulkan
Slower than native CUDA/ROCm but works on machines where the vendor toolchain is too painful to install. Useful as a portable fallback for laptops with iGPUs.
Metal build (Apple Silicon)
make install handles this automatically; the manual recipe is short
because cmake on macOS picks up Metal by default — no special flag.
Prerequisites:
xcode-select --install # one-time
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install go cmake ffmpeg
Build:
git clone --depth=1 https://github.com/ggerganov/whisper.cpp ~/Git\ Repos/whisper.cpp
cd ~/Git\ Repos/whisper.cpp
cmake -B build-metal -DCMAKE_BUILD_TYPE=Release
cmake --build build-metal -j$(sysctl -n hw.ncpu)
ln -sf "$PWD/build-metal/bin/whisper-cli" ~/.local/bin/whisper-cli-metal
The resolver special-cases Darwin: if whisper-cli-metal exists it's
used immediately (no probe — Metal is always available on macOS). On a
Mac without that symlink, the CPU fallback finds whisper-cli /
whisper-cpp from brew (which is itself Metal-enabled by default), so a
plain brew install whisper-cpp is a workable lazy path. It just shows
"CPU backend" in the publish log line even though whisper.cpp is in fact
running Metal kernels.
Metal smoke test — these stderr lines should appear in any run:
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 ...
whisper_backend_init_gpu: using Metal backend
Tests
go test ./...
Covered:
internal/output/spotify_test.go— markdown→Spotify-HTML conversion, escapinginternal/clip/clip_test.go— JSON object extraction, including prose-wrapped and fence-wrapped model output
There are no integration tests for whisper or the LLM calls — those depend on external binaries and remote APIs.
Future work
--post: post the markdown summary as a Spotify for Podcasters episode description. Requires the Spotify show-notes API or Spotify for Podcasters upload integration. Reuseoutput.MarkdownToSpotifyHTMLsince their show- notes editor accepts that subset.- Multi-clip output: pick the top-N hooks instead of one. The current
Selectionwould become[]Selectionand the prompt would request an array. - Faster
--summarizerfor short transcripts: default to Haiku for very short inputs to save on API costs.