Files
Summerize/README.md
2026-05-10 13:43:17 -06:00

246 lines
9.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# publish
Turn a recorded church service into show-notes and a social hook clip in one
pass. Local transcription via [whisper.cpp](https://github.com/ggerganov/whisper.cpp),
LLM summary via Claude, ffmpeg-cut portrait clip — wired together in a single Go
CLI.
```
publish [--summerize] [--clip] [--post] [flags] <audio-or-video>
```
## What it does
Given an audio or video recording (mp4, m4a, mp3, wav, ...), `publish` will:
1. **Transcribe** the audio locally with whisper.cpp (CUDA / ROCm / Vulkan /
Metal / CPU — picked automatically per machine).
2. **Summarize** (`--summerize`) the sermon into a Markdown document with
speaker, scripture references (KJV), key points, and a memorable quote.
Optionally also emit Spotify-for-Podcasters-friendly HTML.
3. **Clip** (`--clip`) a 6090 second hook from the preaching, re-encoded to
1080×1920 portrait (9:16) with a center-crop, capped at 1 GiB — ready to
upload to Reels / Shorts / TikTok / X.
4. **Post** (`--post`) — Spotify upload integration, not implemented yet.
The transcript is cached at `<input>.segments.json`, so running multiple modes
or re-tuning prompt parameters costs one whisper run.
## Quick start
```bash
git clone <repo-url> ~/Git\ Repos/summerize
cd ~/Git\ Repos/summerize
make install
```
`make install` is interactive: it detects your OS and GPU, walks you through
installing system dependencies, builds whisper.cpp with the right backend,
downloads a ggml model, and links `publish` + `whisper-cli-<backend>` into
`~/.local/bin`. Re-runnable; each step is idempotent.
Then:
```bash
publish --summerize sermon.mp4
publish --clip sermon.mp4
publish --summerize --clip sermon.mp4 # both, one transcribe pass
```
Make sure `~/.local/bin` is on your `PATH`.
### Other Make targets
| target | what it does |
|---|---|
| `make` / `make build` | build `./publish` in the repo |
| `make link` | rebuild + link `./publish` into `~/.local/bin` |
| `make install` | interactive end-to-end setup |
| `make doctor` | print detected OS / GPU / dependencies and exit |
| `make uninstall` | remove the `publish` symlink |
| `make clean` | remove the local `publish` binary |
| `make test` | `go test ./...` |
## Modes
Modes are boolean flags; combine freely. Defaults to `--summerize` if none set.
### `--summerize`
Markdown summary of the message.
```bash
publish --summerize sermon.mp4
publish --summerize --spotify sermon.html sermon.mp4
publish --summerize --copy sermon.mp4 # Spotify HTML -> clipboard
publish --summerize --prompt "$(cat notes.md)" sermon.mp4
```
Key flags:
| flag | purpose |
|---|---|
| `--md PATH` | Markdown output path; `-` = stdout, `""` = disable. Default `<input>.summary.md` |
| `--spotify PATH` | Also write Spotify-show-notes HTML (subset of HTML their editor accepts) |
| `--copy` | Copy the Spotify HTML to the clipboard (`wl-copy` / `xclip` / `pbcopy`) |
| `--prompt TEXT` | Producer's notes — pre-written framing the LLM treats as authoritative for title, speaker name, key points. The transcript expands and enriches it |
### `--clip`
Pick the best 6090 second sermon clip and cut it to a portrait social video.
```bash
publish --clip sermon.mp4
publish --clip --min 75 --max 90 sermon.mp4
publish --clip --dry-run sermon.mp4 # show the picked window only
publish --clip --copy-codec sermon.mp4 # fast stream copy (skips 9:16 crop)
```
Key flags:
| flag | purpose |
|---|---|
| `--min` / `--max` | clip length bounds in seconds (default 60 / 90) |
| `--out PATH` | clip output path (default `<input>.clip<ext>`) |
| `--copy-codec` | use `ffmpeg -c copy` — fast, but **skips the 9:16 portrait crop** (stream copy can't apply video filters) |
| `--dry-run` | print the picked window but don't run ffmpeg |
Video clips are re-encoded to **1080×1920 portrait** with a safe center-crop
(`crop=min(iw,ih*9/16):min(ih,iw*16/9)`) that handles any source aspect ratio
without distortion, and capped at 1 GiB via ffmpeg's `-fs`.
### `--post`
Stub. Will eventually push the markdown summary to a Spotify-for-Podcasters
episode description.
## Shared flags
| flag | purpose | default |
|---|---|---|
| `--summarizer` | `claude-cli` (shells out to `claude -p`) or `claude-api` (direct Messages API) | `claude-cli` |
| `--model` | model name (Anthropic API path defaults to `claude-sonnet-4-6`) | empty |
| `--prompt-summary` | override the bundled summary system prompt | bundled |
| `--prompt-clip` | override the bundled clip-selector system prompt | bundled |
| `--whisper-bin` | whisper.cpp binary; auto-detects best backend if empty | auto |
| `--whisper-model` | path to a ggml whisper model (.bin) | `~/.cache/whisper.cpp/ggml-base.en.bin` |
| `--whisper-lang` | force whisper language code | auto-detect |
| `--whisper-threads` | thread count | library default |
| `--segments` | segments JSON cache path | `<input>.segments.json` |
| `--keep-transcript` | also write `<input>.transcript.txt` | off |
| `--keep-wav` | keep the normalized 16kHz WAV instead of using a tempdir | off |
| `-v` | verbose progress to stderr | off |
> **Note on `--prompt` vs `--prompt-summary`:**
> `--prompt` is **content** (producer's notes that anchor the summary).
> `--prompt-summary` is a **path** to override the system prompt template.
> Different things; both are intentional.
## Backends
When `--whisper-bin` is not set, `publish` picks a whisper.cpp backend at
runtime by walking this order:
1. **Metal** (macOS) — uses `~/.local/bin/whisper-cli-metal` if present
2. **CUDA**`whisper-cli-cuda` if `nvidia-smi -L` succeeds
3. **ROCm**`whisper-cli-rocm` if `rocminfo` succeeds
4. **Vulkan**`whisper-cli-vulkan` if `vulkaninfo --summary` succeeds
5. **CPU**`whisper-cli` / `whisper-cpp` / `main` on PATH
Each step requires both the binary and a working runtime probe; failures fall
through to the next backend. The chosen backend is logged on stderr; `-v`
adds diagnostics about which probes were skipped or failed.
`make install` builds the right backend for your machine. Per-platform build
recipes (CUDA, ROCm, Vulkan, Metal) live in [CLAUDE.md](./CLAUDE.md#backend-auto-detect).
## Platform support
| OS | tested | notes |
|---|---|---|
| Arch Linux | yes | `pacman` for system deps; CUDA / ROCm / Vulkan all supported |
| macOS (Apple Silicon) | install path | `brew` + Xcode CLT; Metal acceleration |
| Debian / Ubuntu | install path | `apt`; GPU runtime install is distro-specific, prints the package list |
| Fedora | install path | `dnf`; same caveat as Debian |
| Windows | no | not supported |
## Output
A typical `publish --summerize sermon.mp4` produces `sermon.summary.md` that
looks roughly like:
```markdown
# Fatal Sleep
**Speaker:** Rev. Hayford
**Scripture:** Romans 13:1114, 1 Thessalonians 5:68, Ephesians 5:14
## Overview
A 2-4 sentence plain-English summary of the central message.
## Key Points
- ...
## Scripture & References
- Romans 13:11 — "And that, knowing the time, that now it is high time to awake out of sleep..."
- ...
## Application / Call to Action
- ...
## Memorable Quote
> "..."
```
`--spotify path.html` writes the same content as the small HTML subset that
Spotify-for-Podcasters' show-notes editor accepts (`<b>`, `<i>`, `<a>`,
`<ul>`, `<ol>`, `<li>`, `<p>`).
`--clip` writes `<input>.clip.mp4` (or `.m4a` for audio inputs). Video clips
are 1080×1920 portrait, ≤1 GiB.
## Requirements
| tool | required for | install on Arch | install on macOS |
|---|---|---|---|
| `ffmpeg` | always | `pacman -S ffmpeg` | `brew install ffmpeg` |
| `whisper-cli` | transcription | `pacman -S whisper.cpp` (CPU) or build from source for GPU | `brew install whisper-cpp` (Metal) or build |
| ggml model | transcription | downloaded by `make install` | downloaded by `make install` |
| `claude` CLI | `--summarizer claude-cli` (default) | comes with [Claude Code](https://claude.com/claude-code) | same |
| `ANTHROPIC_API_KEY` | `--summarizer claude-api` | env var | env var |
| `wl-copy` / `xclip` / `pbcopy` | `--copy` flag | wayland default; `pacman -S xclip` for X11 | `pbcopy` ships with macOS |
`make install` walks you through these.
## Building from source
```bash
go build -o publish .
```
Zero external Go dependencies — stdlib only. `go.sum` is empty.
```
.
├── main.go flat flagset, mode dispatch, orchestration
├── prompts/
│ ├── church-service.md summary system prompt
│ └── clip-selector.md clip-selector system prompt
├── internal/
│ ├── audio/ ffmpeg → 16kHz mono WAV
│ ├── transcribe/ whisper.cpp wrapper, segments, mm:ss helper
│ ├── summarize/ pluggable LLM backends (CLI + Anthropic API)
│ ├── clip/ LLM clip selection + ffmpeg cut
│ └── output/ markdown→Spotify HTML, clipboard
├── scripts/install.sh interactive setup
├── Makefile
└── CLAUDE.md deep architecture / pipeline docs
```
## Spelling note
Yes, `--summerize` is intentionally spelled with an "e" — sermon + summarize,
the original name of the project. The internal Go package uses standard
`summarize`; only the user-facing flag and binary keep the pun.