Initial push to gitea

This commit is contained in:
2026-05-10 13:37:17 -06:00
commit 54629aecad
20 changed files with 2381 additions and 0 deletions

343
CLAUDE.md Normal file
View File

@@ -0,0 +1,343 @@
# publish
Go CLI that turns a local audio/video recording of a church service into:
1. A markdown summary (`--summerize`)
2. A 6090s social-media hook clip cut from the source (`--clip`)
3. (Future) A post to Spotify for Podcasters (`--post` — currently stubs out)
The repo directory is still `summerize/` (historical), but the module and binary
are both `publish`.
## Pipeline (one pass, shared by all modes)
```
input ──ffmpeg──► 16kHz mono WAV ──whisper.cpp -oj──► []Segment{Start,End,Text}
┌─────────────────┴─────────────────┐
│ │
PlainText(segs) FormatForLLM(segs)
│ │
▼ ▼
Summarizer clip.Pick
(Anthropic API or (same Summarizer,
shelled-out claude CLI) different prompt → JSON)
│ │
▼ ▼
markdown summary ffmpeg cut [start,end]
└─► output.MarkdownToSpotifyHTML
(b/i/a/ul/ol/li/p subset
that Spotify show notes accept)
```
Whisper output is cached at `<input>.segments.json`. Subsequent runs (different
modes, different prompt-clip params) skip whisper entirely.
## Layout
```
main.go flat flagset, mode dispatch, orchestration
prompts/church-service.md default --summerize prompt (go:embed)
prompts/clip-selector.md default --clip prompt; templated with
{{MIN_SECONDS}} / {{MAX_SECONDS}} (go:embed)
internal/audio/audio.go ffmpeg → 16kHz mono PCM WAV
internal/transcribe/
transcribe.go Transcriber interface, Segment type
segments.go Segment, PlainText, FormatForLLM, mm:ss helper
whispercpp.go shells out to whisper-cli with -oj; parses JSON
internal/summarize/
summarize.go Summarizer interface
anthropic.go direct Messages API via net/http
(no SDK dep; reads ANTHROPIC_API_KEY)
claudecli.go `claude -p <prompt>` with transcript on stdin
internal/clip/
clip.go Selection, Pick (LLM JSON parse), Extract (ffmpeg)
clip_test.go JSON object extraction edge cases
internal/output/
spotify.go markdown → Spotify-safe HTML
spotify_test.go
clipboard.go wl-copy / xclip / pbcopy
Makefile build/install/link/doctor/uninstall targets
scripts/install.sh interactive setup (OS + GPU detect → deps,
whisper.cpp build, model download, link)
```
**Zero external Go dependencies.** Stdlib only.
## CLI surface
```
publish [mode...] [flags] <input>
modes (combine freely; defaults to --summerize):
--summerize write a markdown summary
--clip cut a 60-90s social hook clip
--post post to Spotify (not implemented yet)
```
Modes share whisper output, so `publish --summerize --clip sermon.mp4` only
transcribes once.
### Shared flags
| flag | purpose | default |
|---|---|---|
| `--summarizer` | `claude-cli` or `claude-api` | `claude-cli` |
| `--model` | model name (Anthropic API path defaults to `claude-sonnet-4-6`) | empty |
| `--prompt-summary` | override summary prompt path | bundled |
| `--prompt-clip` | override clip-selector prompt path | bundled |
| `--whisper-bin` | whisper.cpp binary; auto-detects best backend (see "Backend auto-detect" below) | auto |
| `--whisper-model` | path to ggml model | `~/.cache/whisper.cpp/ggml-base.en.bin` |
| `--whisper-lang` | force language code | auto-detect |
| `--whisper-threads` | thread count | library default |
| `--segments` | segments JSON cache path | `<input>.segments.json` |
| `--keep-transcript` | also write `<input>.transcript.txt` | off |
| `--keep-wav` | keep the normalized WAV instead of tempdir | off |
| `-v` | verbose progress to stderr | off |
### --summerize flags
| flag | purpose | default |
|---|---|---|
| `--prompt` | producer's notes (any pre-written framing, title, key points) that anchor the summary | empty |
| `--md PATH` | markdown output; `-` = stdout, `""` = disable | `<input>.summary.md` |
| `--spotify PATH` | Spotify HTML output; `-` = stdout | disabled |
| `--copy` | copy Spotify HTML to clipboard | off |
When `--prompt` is set, the value is prepended to the user message as a "Producer's notes" block above the transcript. The bundled prompt instructs the LLM to treat producer's notes as authoritative for titles, speaker names, framing, and key points, then use the transcript to expand and enrich them. Use this when the Spotify show notes you've already drafted should drive the summary's framing rather than the LLM inferring everything from scratch.
For longer notes, use shell expansion: `--prompt "$(cat notes.md)"`.
Note: `--prompt-summary` (system prompt template path) and `--prompt` (user notes content) are different flags. The former overrides the *system* prompt; the latter feeds *user content* into it.
### --clip flags
| flag | purpose | default |
|---|---|---|
| `--min` | minimum clip length (seconds) | 60 |
| `--max` | maximum clip length (seconds) | 90 |
| `--out PATH` | clip output path | `<input>.clip<ext>` (`.clip.m4a` for audio) |
| `--copy-codec` | ffmpeg `-c copy` (fast, keyframe-aligned) — **skips the 9:16 portrait crop**, since stream copy can't apply video filters | off |
| `--dry-run` | print the picked window but don't run ffmpeg | off |
Video clips are always re-encoded as **1080×1920 portrait (9:16)** with a center
crop, capped at **1 GiB** via ffmpeg's `-fs`. The crop filter is
`crop=min(iw,ih*9/16):min(ih,iw*16/9)` so any source aspect (16:9, 4:3, 1:1, or
already-portrait) yields the largest 9:16 sub-rectangle without distortion. See
`portraitFilter` and `MaxClipBytes` in `internal/clip/clip.go`.
## Conventions / non-obvious choices
- **Spelling: `summerize` is intentional.** It's the original name of the project
and the user's preferred spelling. Use `summerize` (e.g. for `--summerize`)
rather than auto-correcting to `summarize` in user-facing surfaces. Internal
Go package `internal/summarize` keeps the standard spelling.
- **Pluggable Summarizer is shared between modes.** `--clip` reuses the same
Summarizer interface; the only difference is the prompt and the expectation
of JSON output. If you add a new mode that talks to an LLM, plug it in there.
- **Summarizer.Summarize takes the user content verbatim.** No implicit
`"Transcript:"` prefix or other framing. Callers (`doSummerize` in main.go,
`clip.Pick`) build the full user message themselves — that's how
`--prompt` (producer's notes) prepends a "Producer's notes:" block above
the transcript without the message getting mislabeled.
- **Whisper output is the source of truth.** All text-only consumers go through
`transcribe.PlainText(segs)`; we don't run whisper twice.
- **JSON parsing for clip selection is defensive.** `clip.extractJSONObject`
walks balanced braces (skipping strings) so the model can wrap its answer in
prose despite the prompt asking for raw JSON.
- **Clip extraction defaults to re-encode.** Frame-accurate cuts matter for
short social hooks; `--copy-codec` trades that for speed.
- **Anthropic API call uses net/http directly.** Adding the SDK was tempting,
but the request is one POST and avoiding the dep keeps go.sum empty.
- **`prepareWAV` cleanup is owned by the caller.** It returns a `func()` you
must `defer`. Don't call `os.RemoveAll` on the wav path yourself.
- **No subcommands.** The CLI is one flat flagset. Modes are boolean flags so
multiple can run in one invocation and share state.
## Build / install
**Fresh machine (recommended)** — clone, then run the interactive installer.
It detects OS + GPU, builds whisper.cpp with the right backend, downloads a
ggml model, and links `publish` + `whisper-cli-<backend>` into `~/.local/bin`:
```bash
git clone <repo-url> ~/Git\ Repos/summerize
cd ~/Git\ Repos/summerize
make install # interactive
make doctor # just print detected platform/GPU/dependencies
```
Re-runnable; each step is idempotent and skippable. The script supports Arch
(`pacman`), Debian/Ubuntu (`apt`), Fedora (`dnf`), and macOS (`brew`); for
unknown distros it prints the package list and skips the install command.
**Already built once** — just rebuild:
```bash
go build -o publish .
# or
make link # rebuilds + (re)points ~/.local/bin/publish at the repo
```
The symlink at `~/.local/bin/publish` is the canonical install location;
rebuilds update in place via the symlink.
## External dependencies (runtime)
| tool | required for | install |
|---|---|---|
| `ffmpeg` | always (audio extraction + clip cut) | `pacman -S ffmpeg` |
| `whisper-cli` (whisper.cpp) | transcription | `pacman -S whisper.cpp` for CPU; for GPU acceleration see "GPU builds" below |
| ggml whisper model | transcription | `curl -LO https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin` into `~/.cache/whisper.cpp/` |
| `claude` CLI | `--summarizer claude-cli` (default) | already installed (Claude Code) |
| `ANTHROPIC_API_KEY` | `--summarizer claude-api` | env var |
| `wl-copy` / `xclip` / `pbcopy` | `--copy` flag (Spotify HTML to clipboard) | wl-copy ships with wayland on omarchy |
## Backend auto-detect
When `--whisper-bin` is not set, `resolveBin` in
`internal/transcribe/whispercpp.go` picks a backend at runtime:
1. **CUDA** — if `~/.local/bin/whisper-cli-cuda` (or `whisper-cli-cuda` on PATH) exists *and* `nvidia-smi -L` exits 0.
2. **ROCm** — if `whisper-cli-rocm` exists *and* `rocminfo` exits 0.
3. **Vulkan** — if `whisper-cli-vulkan` exists *and* `vulkaninfo --summary` exits 0.
4. **CPU fallback** — first of `whisper-cli` / `whisper-cpp` / `main` on PATH.
Each probe is gated on a 5s timeout. The chosen backend is logged on a
single stderr line (`whisper: using CUDA backend (/path)`); `-v` adds
diagnostics about which probes were skipped or failed. The convention is
one whisper.cpp checkout per host with a per-backend symlink in
`~/.local/bin/whisper-cli-<backend>`, so the same `publish` binary works
across machines without machine-specific flags.
### CUDA build (RTX 3070 Ti / desktop)
```
sudo pacman -S --needed cuda # ~3GB; installs to /opt/cuda
git clone --depth=1 https://github.com/ggerganov/whisper.cpp ~/Git\ Repos/whisper.cpp
cd ~/Git\ Repos/whisper.cpp
PATH=/opt/cuda/bin:$PATH cmake -B build \
-DGGML_CUDA=1 \
-DCMAKE_CUDA_ARCHITECTURES=86 \
-DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-15
PATH=/opt/cuda/bin:$PATH cmake --build build -j$(nproc) --config Release
ln -sf "$PWD/build/bin/whisper-cli" ~/.local/bin/whisper-cli-cuda
```
CUDA 13.2 caps the host compiler at GCC 15; system gcc is 16, so the
`-DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-15` line is required (the `gcc15`
package ships `g++-15` alongside the default toolchain). `sm_86` matches
the RTX 3070 Ti compute capability — adjust if the GPU changes.
CUDA smoke test — these stderr lines should appear in any run:
```
whisper_init_with_params_no_state: use gpu = 1
ggml_cuda_init: found 1 CUDA devices ...
whisper_backend_init_gpu: using CUDA0 backend
```
### ROCm build (Framework 16 / Radeon RX 7700S)
The 7700S is RDNA3 (gfx1102). ROCm 6.x supports it.
```
sudo pacman -S --needed rocm-hip-sdk rocm-hip-runtime hipblas rocblas
git clone --depth=1 https://github.com/ggerganov/whisper.cpp ~/Git\ Repos/whisper.cpp
cd ~/Git\ Repos/whisper.cpp
HIPCXX=/opt/rocm/llvm/bin/clang++ cmake -B build-rocm \
-DGGML_HIP=1 \
-DAMDGPU_TARGETS=gfx1102 \
-DCMAKE_BUILD_TYPE=Release
cmake --build build-rocm -j$(nproc)
ln -sf "$PWD/build-rocm/bin/whisper-cli" ~/.local/bin/whisper-cli-rocm
```
If ROCm doesn't recognize gfx1102 (older ROCm releases), set
`HSA_OVERRIDE_GFX_VERSION=11.0.0` in the shell before invoking `publish`
to spoof gfx1100 — same RDNA3 ISA, supported kernels.
ROCm smoke test — look for `ggml_cuda_init` (HIP reuses the CUDA backend
naming in whisper.cpp) plus a ROCm device line on stderr.
### Vulkan build (universal GPU fallback)
Vulkan is the easiest cross-vendor path; uses any GPU with a working
Vulkan driver (Mesa RADV for AMD/Intel, Nvidia proprietary, etc.).
```
sudo pacman -S --needed vulkan-headers vulkan-icd-loader shaderc
cd ~/Git\ Repos/whisper.cpp
cmake -B build-vulkan -DGGML_VULKAN=1 -DCMAKE_BUILD_TYPE=Release
cmake --build build-vulkan -j$(nproc)
ln -sf "$PWD/build-vulkan/bin/whisper-cli" ~/.local/bin/whisper-cli-vulkan
```
Slower than native CUDA/ROCm but works on machines where the vendor
toolchain is too painful to install. Useful as a portable fallback for
laptops with iGPUs.
### Metal build (Apple Silicon)
`make install` handles this automatically; the manual recipe is short
because cmake on macOS picks up Metal by default — no special flag.
Prerequisites:
```
xcode-select --install # one-time
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install go cmake ffmpeg
```
Build:
```
git clone --depth=1 https://github.com/ggerganov/whisper.cpp ~/Git\ Repos/whisper.cpp
cd ~/Git\ Repos/whisper.cpp
cmake -B build-metal -DCMAKE_BUILD_TYPE=Release
cmake --build build-metal -j$(sysctl -n hw.ncpu)
ln -sf "$PWD/build-metal/bin/whisper-cli" ~/.local/bin/whisper-cli-metal
```
The resolver special-cases Darwin: if `whisper-cli-metal` exists it's
used immediately (no probe — Metal is always available on macOS). On a
Mac without that symlink, the CPU fallback finds `whisper-cli` /
`whisper-cpp` from brew (which is itself Metal-enabled by default), so a
plain `brew install whisper-cpp` is a workable lazy path. It just shows
"CPU backend" in the publish log line even though whisper.cpp is in fact
running Metal kernels.
Metal smoke test — these stderr lines should appear in any run:
```
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 ...
whisper_backend_init_gpu: using Metal backend
```
## Tests
```
go test ./...
```
Covered:
- `internal/output/spotify_test.go` — markdown→Spotify-HTML conversion, escaping
- `internal/clip/clip_test.go` — JSON object extraction, including prose-wrapped
and fence-wrapped model output
There are no integration tests for whisper or the LLM calls — those depend on
external binaries and remote APIs.
## Future work
- **`--post`**: post the markdown summary as a Spotify for Podcasters episode
description. Requires the Spotify show-notes API or Spotify for Podcasters
upload integration. Reuse `output.MarkdownToSpotifyHTML` since their show-
notes editor accepts that subset.
- **Multi-clip output**: pick the top-N hooks instead of one. The current
`Selection` would become `[]Selection` and the prompt would request an array.
- **Faster `--summarizer` for short transcripts**: default to Haiku for very
short inputs to save on API costs.