Initial push to gitea

This commit is contained in:
2026-05-10 13:37:17 -06:00
commit 54629aecad
20 changed files with 2381 additions and 0 deletions

9
.gitignore vendored Normal file
View File

@@ -0,0 +1,9 @@
/publish
*.summary.md
*.spotify.html
*.transcript.txt
*.segments.json
*.16k.wav
*.clip.mp4
*.clip.m4a
.env

343
CLAUDE.md Normal file
View File

@@ -0,0 +1,343 @@
# publish
Go CLI that turns a local audio/video recording of a church service into:
1. A markdown summary (`--summerize`)
2. A 6090s social-media hook clip cut from the source (`--clip`)
3. (Future) A post to Spotify for Podcasters (`--post` — currently stubs out)
The repo directory is still `summerize/` (historical), but the module and binary
are both `publish`.
## Pipeline (one pass, shared by all modes)
```
input ──ffmpeg──► 16kHz mono WAV ──whisper.cpp -oj──► []Segment{Start,End,Text}
┌─────────────────┴─────────────────┐
│ │
PlainText(segs) FormatForLLM(segs)
│ │
▼ ▼
Summarizer clip.Pick
(Anthropic API or (same Summarizer,
shelled-out claude CLI) different prompt → JSON)
│ │
▼ ▼
markdown summary ffmpeg cut [start,end]
└─► output.MarkdownToSpotifyHTML
(b/i/a/ul/ol/li/p subset
that Spotify show notes accept)
```
Whisper output is cached at `<input>.segments.json`. Subsequent runs (different
modes, different prompt-clip params) skip whisper entirely.
## Layout
```
main.go flat flagset, mode dispatch, orchestration
prompts/church-service.md default --summerize prompt (go:embed)
prompts/clip-selector.md default --clip prompt; templated with
{{MIN_SECONDS}} / {{MAX_SECONDS}} (go:embed)
internal/audio/audio.go ffmpeg → 16kHz mono PCM WAV
internal/transcribe/
transcribe.go Transcriber interface, Segment type
segments.go Segment, PlainText, FormatForLLM, mm:ss helper
whispercpp.go shells out to whisper-cli with -oj; parses JSON
internal/summarize/
summarize.go Summarizer interface
anthropic.go direct Messages API via net/http
(no SDK dep; reads ANTHROPIC_API_KEY)
claudecli.go `claude -p <prompt>` with transcript on stdin
internal/clip/
clip.go Selection, Pick (LLM JSON parse), Extract (ffmpeg)
clip_test.go JSON object extraction edge cases
internal/output/
spotify.go markdown → Spotify-safe HTML
spotify_test.go
clipboard.go wl-copy / xclip / pbcopy
Makefile build/install/link/doctor/uninstall targets
scripts/install.sh interactive setup (OS + GPU detect → deps,
whisper.cpp build, model download, link)
```
**Zero external Go dependencies.** Stdlib only.
## CLI surface
```
publish [mode...] [flags] <input>
modes (combine freely; defaults to --summerize):
--summerize write a markdown summary
--clip cut a 60-90s social hook clip
--post post to Spotify (not implemented yet)
```
Modes share whisper output, so `publish --summerize --clip sermon.mp4` only
transcribes once.
### Shared flags
| flag | purpose | default |
|---|---|---|
| `--summarizer` | `claude-cli` or `claude-api` | `claude-cli` |
| `--model` | model name (Anthropic API path defaults to `claude-sonnet-4-6`) | empty |
| `--prompt-summary` | override summary prompt path | bundled |
| `--prompt-clip` | override clip-selector prompt path | bundled |
| `--whisper-bin` | whisper.cpp binary; auto-detects best backend (see "Backend auto-detect" below) | auto |
| `--whisper-model` | path to ggml model | `~/.cache/whisper.cpp/ggml-base.en.bin` |
| `--whisper-lang` | force language code | auto-detect |
| `--whisper-threads` | thread count | library default |
| `--segments` | segments JSON cache path | `<input>.segments.json` |
| `--keep-transcript` | also write `<input>.transcript.txt` | off |
| `--keep-wav` | keep the normalized WAV instead of tempdir | off |
| `-v` | verbose progress to stderr | off |
### --summerize flags
| flag | purpose | default |
|---|---|---|
| `--prompt` | producer's notes (any pre-written framing, title, key points) that anchor the summary | empty |
| `--md PATH` | markdown output; `-` = stdout, `""` = disable | `<input>.summary.md` |
| `--spotify PATH` | Spotify HTML output; `-` = stdout | disabled |
| `--copy` | copy Spotify HTML to clipboard | off |
When `--prompt` is set, the value is prepended to the user message as a "Producer's notes" block above the transcript. The bundled prompt instructs the LLM to treat producer's notes as authoritative for titles, speaker names, framing, and key points, then use the transcript to expand and enrich them. Use this when the Spotify show notes you've already drafted should drive the summary's framing rather than the LLM inferring everything from scratch.
For longer notes, use shell expansion: `--prompt "$(cat notes.md)"`.
Note: `--prompt-summary` (system prompt template path) and `--prompt` (user notes content) are different flags. The former overrides the *system* prompt; the latter feeds *user content* into it.
### --clip flags
| flag | purpose | default |
|---|---|---|
| `--min` | minimum clip length (seconds) | 60 |
| `--max` | maximum clip length (seconds) | 90 |
| `--out PATH` | clip output path | `<input>.clip<ext>` (`.clip.m4a` for audio) |
| `--copy-codec` | ffmpeg `-c copy` (fast, keyframe-aligned) — **skips the 9:16 portrait crop**, since stream copy can't apply video filters | off |
| `--dry-run` | print the picked window but don't run ffmpeg | off |
Video clips are always re-encoded as **1080×1920 portrait (9:16)** with a center
crop, capped at **1 GiB** via ffmpeg's `-fs`. The crop filter is
`crop=min(iw,ih*9/16):min(ih,iw*16/9)` so any source aspect (16:9, 4:3, 1:1, or
already-portrait) yields the largest 9:16 sub-rectangle without distortion. See
`portraitFilter` and `MaxClipBytes` in `internal/clip/clip.go`.
## Conventions / non-obvious choices
- **Spelling: `summerize` is intentional.** It's the original name of the project
and the user's preferred spelling. Use `summerize` (e.g. for `--summerize`)
rather than auto-correcting to `summarize` in user-facing surfaces. Internal
Go package `internal/summarize` keeps the standard spelling.
- **Pluggable Summarizer is shared between modes.** `--clip` reuses the same
Summarizer interface; the only difference is the prompt and the expectation
of JSON output. If you add a new mode that talks to an LLM, plug it in there.
- **Summarizer.Summarize takes the user content verbatim.** No implicit
`"Transcript:"` prefix or other framing. Callers (`doSummerize` in main.go,
`clip.Pick`) build the full user message themselves — that's how
`--prompt` (producer's notes) prepends a "Producer's notes:" block above
the transcript without the message getting mislabeled.
- **Whisper output is the source of truth.** All text-only consumers go through
`transcribe.PlainText(segs)`; we don't run whisper twice.
- **JSON parsing for clip selection is defensive.** `clip.extractJSONObject`
walks balanced braces (skipping strings) so the model can wrap its answer in
prose despite the prompt asking for raw JSON.
- **Clip extraction defaults to re-encode.** Frame-accurate cuts matter for
short social hooks; `--copy-codec` trades that for speed.
- **Anthropic API call uses net/http directly.** Adding the SDK was tempting,
but the request is one POST and avoiding the dep keeps go.sum empty.
- **`prepareWAV` cleanup is owned by the caller.** It returns a `func()` you
must `defer`. Don't call `os.RemoveAll` on the wav path yourself.
- **No subcommands.** The CLI is one flat flagset. Modes are boolean flags so
multiple can run in one invocation and share state.
## Build / install
**Fresh machine (recommended)** — clone, then run the interactive installer.
It detects OS + GPU, builds whisper.cpp with the right backend, downloads a
ggml model, and links `publish` + `whisper-cli-<backend>` into `~/.local/bin`:
```bash
git clone <repo-url> ~/Git\ Repos/summerize
cd ~/Git\ Repos/summerize
make install # interactive
make doctor # just print detected platform/GPU/dependencies
```
Re-runnable; each step is idempotent and skippable. The script supports Arch
(`pacman`), Debian/Ubuntu (`apt`), Fedora (`dnf`), and macOS (`brew`); for
unknown distros it prints the package list and skips the install command.
**Already built once** — just rebuild:
```bash
go build -o publish .
# or
make link # rebuilds + (re)points ~/.local/bin/publish at the repo
```
The symlink at `~/.local/bin/publish` is the canonical install location;
rebuilds update in place via the symlink.
## External dependencies (runtime)
| tool | required for | install |
|---|---|---|
| `ffmpeg` | always (audio extraction + clip cut) | `pacman -S ffmpeg` |
| `whisper-cli` (whisper.cpp) | transcription | `pacman -S whisper.cpp` for CPU; for GPU acceleration see "GPU builds" below |
| ggml whisper model | transcription | `curl -LO https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin` into `~/.cache/whisper.cpp/` |
| `claude` CLI | `--summarizer claude-cli` (default) | already installed (Claude Code) |
| `ANTHROPIC_API_KEY` | `--summarizer claude-api` | env var |
| `wl-copy` / `xclip` / `pbcopy` | `--copy` flag (Spotify HTML to clipboard) | wl-copy ships with wayland on omarchy |
## Backend auto-detect
When `--whisper-bin` is not set, `resolveBin` in
`internal/transcribe/whispercpp.go` picks a backend at runtime:
1. **CUDA** — if `~/.local/bin/whisper-cli-cuda` (or `whisper-cli-cuda` on PATH) exists *and* `nvidia-smi -L` exits 0.
2. **ROCm** — if `whisper-cli-rocm` exists *and* `rocminfo` exits 0.
3. **Vulkan** — if `whisper-cli-vulkan` exists *and* `vulkaninfo --summary` exits 0.
4. **CPU fallback** — first of `whisper-cli` / `whisper-cpp` / `main` on PATH.
Each probe is gated on a 5s timeout. The chosen backend is logged on a
single stderr line (`whisper: using CUDA backend (/path)`); `-v` adds
diagnostics about which probes were skipped or failed. The convention is
one whisper.cpp checkout per host with a per-backend symlink in
`~/.local/bin/whisper-cli-<backend>`, so the same `publish` binary works
across machines without machine-specific flags.
### CUDA build (RTX 3070 Ti / desktop)
```
sudo pacman -S --needed cuda # ~3GB; installs to /opt/cuda
git clone --depth=1 https://github.com/ggerganov/whisper.cpp ~/Git\ Repos/whisper.cpp
cd ~/Git\ Repos/whisper.cpp
PATH=/opt/cuda/bin:$PATH cmake -B build \
-DGGML_CUDA=1 \
-DCMAKE_CUDA_ARCHITECTURES=86 \
-DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-15
PATH=/opt/cuda/bin:$PATH cmake --build build -j$(nproc) --config Release
ln -sf "$PWD/build/bin/whisper-cli" ~/.local/bin/whisper-cli-cuda
```
CUDA 13.2 caps the host compiler at GCC 15; system gcc is 16, so the
`-DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-15` line is required (the `gcc15`
package ships `g++-15` alongside the default toolchain). `sm_86` matches
the RTX 3070 Ti compute capability — adjust if the GPU changes.
CUDA smoke test — these stderr lines should appear in any run:
```
whisper_init_with_params_no_state: use gpu = 1
ggml_cuda_init: found 1 CUDA devices ...
whisper_backend_init_gpu: using CUDA0 backend
```
### ROCm build (Framework 16 / Radeon RX 7700S)
The 7700S is RDNA3 (gfx1102). ROCm 6.x supports it.
```
sudo pacman -S --needed rocm-hip-sdk rocm-hip-runtime hipblas rocblas
git clone --depth=1 https://github.com/ggerganov/whisper.cpp ~/Git\ Repos/whisper.cpp
cd ~/Git\ Repos/whisper.cpp
HIPCXX=/opt/rocm/llvm/bin/clang++ cmake -B build-rocm \
-DGGML_HIP=1 \
-DAMDGPU_TARGETS=gfx1102 \
-DCMAKE_BUILD_TYPE=Release
cmake --build build-rocm -j$(nproc)
ln -sf "$PWD/build-rocm/bin/whisper-cli" ~/.local/bin/whisper-cli-rocm
```
If ROCm doesn't recognize gfx1102 (older ROCm releases), set
`HSA_OVERRIDE_GFX_VERSION=11.0.0` in the shell before invoking `publish`
to spoof gfx1100 — same RDNA3 ISA, supported kernels.
ROCm smoke test — look for `ggml_cuda_init` (HIP reuses the CUDA backend
naming in whisper.cpp) plus a ROCm device line on stderr.
### Vulkan build (universal GPU fallback)
Vulkan is the easiest cross-vendor path; uses any GPU with a working
Vulkan driver (Mesa RADV for AMD/Intel, Nvidia proprietary, etc.).
```
sudo pacman -S --needed vulkan-headers vulkan-icd-loader shaderc
cd ~/Git\ Repos/whisper.cpp
cmake -B build-vulkan -DGGML_VULKAN=1 -DCMAKE_BUILD_TYPE=Release
cmake --build build-vulkan -j$(nproc)
ln -sf "$PWD/build-vulkan/bin/whisper-cli" ~/.local/bin/whisper-cli-vulkan
```
Slower than native CUDA/ROCm but works on machines where the vendor
toolchain is too painful to install. Useful as a portable fallback for
laptops with iGPUs.
### Metal build (Apple Silicon)
`make install` handles this automatically; the manual recipe is short
because cmake on macOS picks up Metal by default — no special flag.
Prerequisites:
```
xcode-select --install # one-time
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install go cmake ffmpeg
```
Build:
```
git clone --depth=1 https://github.com/ggerganov/whisper.cpp ~/Git\ Repos/whisper.cpp
cd ~/Git\ Repos/whisper.cpp
cmake -B build-metal -DCMAKE_BUILD_TYPE=Release
cmake --build build-metal -j$(sysctl -n hw.ncpu)
ln -sf "$PWD/build-metal/bin/whisper-cli" ~/.local/bin/whisper-cli-metal
```
The resolver special-cases Darwin: if `whisper-cli-metal` exists it's
used immediately (no probe — Metal is always available on macOS). On a
Mac without that symlink, the CPU fallback finds `whisper-cli` /
`whisper-cpp` from brew (which is itself Metal-enabled by default), so a
plain `brew install whisper-cpp` is a workable lazy path. It just shows
"CPU backend" in the publish log line even though whisper.cpp is in fact
running Metal kernels.
Metal smoke test — these stderr lines should appear in any run:
```
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 ...
whisper_backend_init_gpu: using Metal backend
```
## Tests
```
go test ./...
```
Covered:
- `internal/output/spotify_test.go` — markdown→Spotify-HTML conversion, escaping
- `internal/clip/clip_test.go` — JSON object extraction, including prose-wrapped
and fence-wrapped model output
There are no integration tests for whisper or the LLM calls — those depend on
external binaries and remote APIs.
## Future work
- **`--post`**: post the markdown summary as a Spotify for Podcasters episode
description. Requires the Spotify show-notes API or Spotify for Podcasters
upload integration. Reuse `output.MarkdownToSpotifyHTML` since their show-
notes editor accepts that subset.
- **Multi-clip output**: pick the top-N hooks instead of one. The current
`Selection` would become `[]Selection` and the prompt would request an array.
- **Faster `--summarizer` for short transcripts**: default to Haiku for very
short inputs to save on API costs.

52
Makefile Normal file
View File

@@ -0,0 +1,52 @@
# publish — Makefile
#
# Common targets:
# make - build the publish binary in the repo
# make install - interactive setup: detect OS/GPU, build whisper.cpp
# with the right backend, download a model, and link
# publish + whisper-cli-<backend> into ~/.local/bin
# make doctor - print detected platform/GPU/dependencies and exit
# make link - just link the existing publish binary into PREFIX/bin
# make uninstall - remove the publish symlink (leaves whisper.cpp alone)
# make clean - remove the local publish binary
PREFIX ?= $(HOME)/.local
BINDIR := $(PREFIX)/bin
.PHONY: all build link install doctor uninstall clean test help
all: build
build:
go build -o publish .
link: build
@mkdir -p "$(BINDIR)"
@ln -sf "$(CURDIR)/publish" "$(BINDIR)/publish"
@echo "linked $(BINDIR)/publish -> $(CURDIR)/publish"
install:
@bash scripts/install.sh
doctor:
@bash scripts/install.sh --doctor
uninstall:
@rm -f "$(BINDIR)/publish"
@echo "removed $(BINDIR)/publish (whisper.cpp checkout and whisper-cli-* symlinks left intact)"
clean:
rm -f publish
test:
go test ./...
help:
@echo "Targets:"
@echo " make build build ./publish"
@echo " make link symlink ./publish into \$$PREFIX/bin (default ~/.local)"
@echo " make install interactive end-to-end setup (deps + whisper + model + publish)"
@echo " make doctor show detected platform/GPU/dependencies"
@echo " make uninstall remove the publish symlink"
@echo " make clean remove the built publish binary"
@echo " make test go test ./..."

3
go.mod Normal file
View File

@@ -0,0 +1,3 @@
module publish
go 1.26.3

42
internal/audio/audio.go Normal file
View File

@@ -0,0 +1,42 @@
// Package audio normalizes arbitrary audio/video inputs into a whisper.cpp-friendly
// 16 kHz mono PCM WAV file using ffmpeg.
package audio
import (
"context"
"fmt"
"os"
"os/exec"
"path/filepath"
)
// ExtractWAV runs ffmpeg to convert input (audio or video) into a 16kHz mono
// signed-16-bit PCM WAV file at outPath. ffmpeg must be on PATH.
func ExtractWAV(ctx context.Context, input, outPath string) error {
if _, err := exec.LookPath("ffmpeg"); err != nil {
return fmt.Errorf("ffmpeg not found on PATH: %w", err)
}
if _, err := os.Stat(input); err != nil {
return fmt.Errorf("input not readable: %w", err)
}
if err := os.MkdirAll(filepath.Dir(outPath), 0o755); err != nil {
return err
}
cmd := exec.CommandContext(ctx, "ffmpeg",
"-y",
"-loglevel", "error",
"-i", input,
"-vn",
"-ac", "1",
"-ar", "16000",
"-c:a", "pcm_s16le",
outPath,
)
cmd.Stdout = os.Stderr
cmd.Stderr = os.Stderr
if err := cmd.Run(); err != nil {
return fmt.Errorf("ffmpeg: %w", err)
}
return nil
}

204
internal/clip/clip.go Normal file
View File

@@ -0,0 +1,204 @@
// Package clip selects the best 6090s window from a timestamped transcript
// (using a Summarizer to do the picking) and runs ffmpeg to cut that window
// out of the original media.
package clip
import (
"context"
"encoding/json"
"fmt"
"os"
"os/exec"
"path/filepath"
"strings"
"publish/internal/summarize"
"publish/internal/transcribe"
)
// Selection is the LLM's chosen clip window plus metadata.
type Selection struct {
StartSeconds float64 `json:"start_seconds"`
EndSeconds float64 `json:"end_seconds"`
Title string `json:"title"`
Hook string `json:"hook"`
Quote string `json:"quote"`
Reasoning string `json:"reasoning"`
}
// Duration returns the selected window length in seconds.
func (s Selection) Duration() float64 { return s.EndSeconds - s.StartSeconds }
// Pick asks the summarizer to choose the best window in the given segments,
// using promptTemplate (which may contain {{MIN_SECONDS}} / {{MAX_SECONDS}}
// placeholders). It clamps and validates the returned window against minSec
// and maxSec.
func Pick(ctx context.Context, sum summarize.Summarizer, promptTemplate string, segs []transcribe.Segment, minSec, maxSec float64) (Selection, string, error) {
if len(segs) == 0 {
return Selection{}, "", fmt.Errorf("no transcript segments to choose from")
}
prompt := strings.NewReplacer(
"{{MIN_SECONDS}}", fmt.Sprintf("%g", minSec),
"{{MAX_SECONDS}}", fmt.Sprintf("%g", maxSec),
).Replace(promptTemplate)
body := transcribe.FormatForLLM(segs)
raw, err := sum.Summarize(ctx, prompt, body)
if err != nil {
return Selection{}, "", err
}
jsonText, err := extractJSONObject(raw)
if err != nil {
return Selection{}, raw, fmt.Errorf("could not find JSON object in model output: %w", err)
}
var sel Selection
if err := json.Unmarshal([]byte(jsonText), &sel); err != nil {
return Selection{}, raw, fmt.Errorf("parsing selection JSON: %w\n--- raw ---\n%s", err, jsonText)
}
if err := validate(&sel, segs, minSec, maxSec); err != nil {
return sel, raw, err
}
return sel, raw, nil
}
func validate(sel *Selection, segs []transcribe.Segment, minSec, maxSec float64) error {
if sel.EndSeconds <= sel.StartSeconds {
return fmt.Errorf("invalid window: end (%g) <= start (%g)", sel.EndSeconds, sel.StartSeconds)
}
maxEnd := segs[len(segs)-1].End
if sel.StartSeconds < 0 || sel.EndSeconds > maxEnd+1.0 {
return fmt.Errorf("window [%g, %g] is outside transcript bounds [0, %g]",
sel.StartSeconds, sel.EndSeconds, maxEnd)
}
dur := sel.Duration()
// Allow small slop on either side; otherwise reject.
if dur < minSec-2 || dur > maxSec+2 {
return fmt.Errorf("window duration %.1fs is outside requested bounds [%g, %g]",
dur, minSec, maxSec)
}
return nil
}
// extractJSONObject pulls the first balanced {...} object out of s, ignoring
// braces that appear inside JSON strings. Useful when the model wraps its
// answer in prose despite being told not to.
func extractJSONObject(s string) (string, error) {
start := strings.Index(s, "{")
if start < 0 {
return "", fmt.Errorf("no '{' in response")
}
depth := 0
inStr := false
esc := false
for i := start; i < len(s); i++ {
c := s[i]
if inStr {
switch {
case esc:
esc = false
case c == '\\':
esc = true
case c == '"':
inStr = false
}
continue
}
switch c {
case '"':
inStr = true
case '{':
depth++
case '}':
depth--
if depth == 0 {
return s[start : i+1], nil
}
}
}
return "", fmt.Errorf("unbalanced braces")
}
// portraitFilter center-crops any source aspect ratio to a 9:16 sub-rectangle
// (no distortion, just cropping) and scales to 1080x1920. The min() expressions
// pick the largest 9:16 box that fits inside the source: 16:9 sources lose the
// left/right edges, 9:16 sources are unchanged, and 4:3 / 1:1 sources crop the
// sides. setsar=1 forces square pixels.
const portraitFilter = `crop=min(iw\,ih*9/16):min(ih\,iw*16/9),scale=1080:1920,setsar=1`
// MaxClipBytes is the hard size ceiling enforced by ffmpeg's -fs flag.
// Realistic 6090s 1080x1920 H.264 clips at CRF 23 land 30100 MB, so this is
// a safety cap rather than a target.
const MaxClipBytes = 1 << 30 // 1 GiB
// Extract runs ffmpeg to cut [start, end) seconds out of input into outPath.
// For video inputs, the clip is re-encoded as a 1080x1920 portrait (9:16
// center-crop) under a 1 GiB size cap. If reencode is false, stream copy is
// used (fast, keyframe-aligned, but the source aspect ratio is preserved).
func Extract(ctx context.Context, input string, sel Selection, outPath string, reencode bool) error {
if _, err := exec.LookPath("ffmpeg"); err != nil {
return fmt.Errorf("ffmpeg not on PATH: %w", err)
}
if err := os.MkdirAll(filepath.Dir(outPath), 0o755); err != nil {
return err
}
dur := sel.EndSeconds - sel.StartSeconds
args := []string{
"-y",
"-loglevel", "error",
"-ss", fmt.Sprintf("%.3f", sel.StartSeconds),
"-i", input,
"-t", fmt.Sprintf("%.3f", dur),
}
if reencode {
if hasVideoExt(input) {
args = append(args,
"-vf", portraitFilter,
"-c:v", "libx264",
"-preset", "fast",
"-crf", "23",
"-c:a", "aac",
"-b:a", "128k",
"-movflags", "+faststart",
)
} else {
args = append(args,
"-vn",
"-c:a", "aac",
"-b:a", "128k",
)
}
} else {
args = append(args, "-c", "copy")
}
args = append(args, "-fs", fmt.Sprintf("%d", MaxClipBytes), outPath)
cmd := exec.CommandContext(ctx, "ffmpeg", args...)
cmd.Stdout = os.Stderr
cmd.Stderr = os.Stderr
if err := cmd.Run(); err != nil {
return fmt.Errorf("ffmpeg cut: %w", err)
}
return nil
}
func hasVideoExt(p string) bool {
switch strings.ToLower(filepath.Ext(p)) {
case ".mp4", ".mov", ".mkv", ".webm", ".avi", ".m4v", ".flv", ".ts":
return true
}
return false
}
// DefaultOutputPath builds <input-without-ext>.clip<ext> for video inputs and
// .m4a for audio inputs.
func DefaultOutputPath(input string) string {
base := strings.TrimSuffix(input, filepath.Ext(input))
if hasVideoExt(input) {
return base + ".clip" + filepath.Ext(input)
}
return base + ".clip.m4a"
}

View File

@@ -0,0 +1,37 @@
package clip
import "testing"
func TestExtractJSONObject(t *testing.T) {
cases := []struct {
name string
in string
want string
}{
{"raw json", `{"a":1}`, `{"a":1}`},
{"with prose", "Sure, here you go:\n{\"a\":1}\nThanks", `{"a":1}`},
{"with fence", "```json\n{\"a\":1}\n```", `{"a":1}`},
{"nested", `prelude {"a":{"b":2},"c":3} trailing`, `{"a":{"b":2},"c":3}`},
{"brace in string", `{"text":"hello {world}"}`, `{"text":"hello {world}"}`},
}
for _, c := range cases {
t.Run(c.name, func(t *testing.T) {
got, err := extractJSONObject(c.in)
if err != nil {
t.Fatalf("err: %v", err)
}
if got != c.want {
t.Errorf("got %q want %q", got, c.want)
}
})
}
}
func TestExtractJSONObjectMissing(t *testing.T) {
if _, err := extractJSONObject("no json here"); err == nil {
t.Error("expected error for missing JSON")
}
if _, err := extractJSONObject(`{"unterminated":`); err == nil {
t.Error("expected error for unbalanced braces")
}
}

View File

@@ -0,0 +1,30 @@
package output
import (
"fmt"
"os/exec"
"strings"
)
// CopyToClipboard tries platform-appropriate clipboard tools and writes data
// to the first one available: wl-copy (Wayland), xclip (X11), pbcopy (macOS).
// Returns the tool name used or an error if none are available.
func CopyToClipboard(data string) (string, error) {
candidates := [][]string{
{"wl-copy"},
{"xclip", "-selection", "clipboard"},
{"pbcopy"},
}
for _, c := range candidates {
if _, err := exec.LookPath(c[0]); err != nil {
continue
}
cmd := exec.Command(c[0], c[1:]...)
cmd.Stdin = strings.NewReader(data)
if err := cmd.Run(); err != nil {
return "", fmt.Errorf("%s: %w", c[0], err)
}
return c[0], nil
}
return "", fmt.Errorf("no clipboard tool found (tried wl-copy, xclip, pbcopy)")
}

154
internal/output/spotify.go Normal file
View File

@@ -0,0 +1,154 @@
// Package output renders summaries to user-visible formats. Markdown is
// passed through; Spotify HTML uses the small tag subset that Spotify for
// Podcasters' show-notes editor accepts (b, i, a, ul/ol/li, paragraphs).
package output
import (
"regexp"
"strings"
)
var (
reBoldStar = regexp.MustCompile(`\*\*([^*\n]+)\*\*`)
reBoldUnder = regexp.MustCompile(`__([^_\n]+)__`)
reItalicStar = regexp.MustCompile(`\*([^*\n]+)\*`)
reItalicUnder = regexp.MustCompile(`(^|[\s(])_([^_\n]+)_($|[\s).,!?;:])`)
reLink = regexp.MustCompile(`\[([^\]]+)\]\(([^)\s]+)\)`)
reInlineCode = regexp.MustCompile("`([^`\n]+)`")
)
// MarkdownToSpotifyHTML converts a markdown summary into the limited HTML
// subset Spotify for Podcasters renders. Unknown markdown structures degrade
// to plain text rather than producing rejected tags.
func MarkdownToSpotifyHTML(md string) string {
lines := strings.Split(strings.ReplaceAll(md, "\r\n", "\n"), "\n")
var out strings.Builder
listKind := "" // "ul" or "ol" while we're inside a list
flushList := func() {
if listKind != "" {
out.WriteString("</" + listKind + ">\n")
listKind = ""
}
}
openList := func(kind string) {
if listKind != kind {
flushList()
out.WriteString("<" + kind + ">\n")
listKind = kind
}
}
paragraph := []string{}
flushPara := func() {
if len(paragraph) == 0 {
return
}
text := strings.Join(paragraph, " ")
out.WriteString("<p>" + inline(text) + "</p>\n")
paragraph = paragraph[:0]
}
for _, raw := range lines {
line := strings.TrimRight(raw, " \t")
trim := strings.TrimSpace(line)
// Blank line: end current paragraph/list block.
if trim == "" {
flushPara()
flushList()
continue
}
// Horizontal rule.
if trim == "---" || trim == "***" || trim == "___" {
flushPara()
flushList()
continue
}
// Heading -> bold paragraph.
if h := headingText(trim); h != "" {
flushPara()
flushList()
out.WriteString("<p><b>" + inline(h) + "</b></p>\n")
continue
}
// Blockquote -> italic paragraph.
if strings.HasPrefix(trim, "> ") {
flushPara()
flushList()
out.WriteString("<p><i>" + inline(strings.TrimPrefix(trim, "> ")) + "</i></p>\n")
continue
}
// Unordered list item.
if strings.HasPrefix(trim, "- ") || strings.HasPrefix(trim, "* ") || strings.HasPrefix(trim, "+ ") {
flushPara()
openList("ul")
out.WriteString(" <li>" + inline(trim[2:]) + "</li>\n")
continue
}
// Ordered list item like "1. text".
if item, ok := orderedItem(trim); ok {
flushPara()
openList("ol")
out.WriteString(" <li>" + inline(item) + "</li>\n")
continue
}
// Anything else: append to current paragraph.
flushList()
paragraph = append(paragraph, trim)
}
flushPara()
flushList()
return strings.TrimRight(out.String(), "\n")
}
func headingText(s string) string {
// Up to 6 leading '#' followed by a space.
hashes := 0
for hashes < len(s) && s[hashes] == '#' {
hashes++
}
if hashes == 0 || hashes > 6 || hashes >= len(s) || s[hashes] != ' ' {
return ""
}
return strings.TrimSpace(s[hashes+1:])
}
func orderedItem(s string) (string, bool) {
i := 0
for i < len(s) && s[i] >= '0' && s[i] <= '9' {
i++
}
if i == 0 || i+1 >= len(s) || s[i] != '.' || s[i+1] != ' ' {
return "", false
}
return strings.TrimSpace(s[i+2:]), true
}
func inline(s string) string {
s = escapeHTML(s)
s = reInlineCode.ReplaceAllString(s, "$1")
s = reBoldStar.ReplaceAllString(s, "<b>$1</b>")
s = reBoldUnder.ReplaceAllString(s, "<b>$1</b>")
s = reItalicStar.ReplaceAllString(s, "<i>$1</i>")
s = reItalicUnder.ReplaceAllString(s, "$1<i>$2</i>$3")
s = reLink.ReplaceAllString(s, `<a href="$2">$1</a>`)
return s
}
func escapeHTML(s string) string {
r := strings.NewReplacer(
"&", "&amp;",
"<", "&lt;",
">", "&gt;",
)
return r.Replace(s)
}

View File

@@ -0,0 +1,67 @@
package output
import (
"strings"
"testing"
)
func TestMarkdownToSpotifyHTML(t *testing.T) {
in := `# Sermon Title
**Speaker:** Pastor Bob
**Scripture:** John 3:16
## Overview
This was a *short* message about hope. See [the site](https://example.com).
## Key Points
- First point
- Second point with **bold** text
- Third one
1. Step one
2. Step two
> A pithy quote.
`
got := MarkdownToSpotifyHTML(in)
mustContain := []string{
"<p><b>Sermon Title</b></p>",
"<b>Speaker:</b>",
"<p><b>Overview</b></p>",
"<i>short</i>",
`<a href="https://example.com">the site</a>`,
"<ul>",
"<li>First point</li>",
"<li>Second point with <b>bold</b> text</li>",
"</ul>",
"<ol>",
"<li>Step one</li>",
"</ol>",
"<p><i>A pithy quote.</i></p>",
}
for _, s := range mustContain {
if !strings.Contains(got, s) {
t.Errorf("expected output to contain %q\n--- got ---\n%s", s, got)
}
}
mustNotContain := []string{"<h1>", "<h2>", "<blockquote>", "**", "##"}
for _, s := range mustNotContain {
if strings.Contains(got, s) {
t.Errorf("did not expect output to contain %q\n--- got ---\n%s", s, got)
}
}
}
func TestEscapesHTML(t *testing.T) {
got := MarkdownToSpotifyHTML("A <script>tag</script> & ampersand")
if strings.Contains(got, "<script>") {
t.Errorf("unescaped <script>: %s", got)
}
if !strings.Contains(got, "&amp;") {
t.Errorf("expected &amp; in: %s", got)
}
}

View File

@@ -0,0 +1,123 @@
package summarize
import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"net/http"
"os"
"time"
)
// Anthropic talks to the Claude Messages API directly via net/http to avoid an
// SDK dependency. Requires ANTHROPIC_API_KEY (or APIKey set explicitly).
type Anthropic struct {
APIKey string
Model string
MaxTokens int
BaseURL string // optional override; defaults to https://api.anthropic.com
Client *http.Client
}
func (a *Anthropic) Name() string { return "anthropic-api" }
type anthroMessage struct {
Role string `json:"role"`
Content string `json:"content"`
}
type anthroRequest struct {
Model string `json:"model"`
MaxTokens int `json:"max_tokens"`
System string `json:"system,omitempty"`
Messages []anthroMessage `json:"messages"`
}
type anthroContentBlock struct {
Type string `json:"type"`
Text string `json:"text"`
}
type anthroResponse struct {
Content []anthroContentBlock `json:"content"`
Error *struct {
Type string `json:"type"`
Message string `json:"message"`
} `json:"error,omitempty"`
}
func (a *Anthropic) Summarize(ctx context.Context, systemPrompt, userContent string) (string, error) {
key := a.APIKey
if key == "" {
key = os.Getenv("ANTHROPIC_API_KEY")
}
if key == "" {
return "", fmt.Errorf("ANTHROPIC_API_KEY is not set")
}
model := a.Model
if model == "" {
model = "claude-sonnet-4-6"
}
maxTokens := a.MaxTokens
if maxTokens == 0 {
maxTokens = 4096
}
baseURL := a.BaseURL
if baseURL == "" {
baseURL = "https://api.anthropic.com"
}
client := a.Client
if client == nil {
client = &http.Client{Timeout: 5 * time.Minute}
}
body := anthroRequest{
Model: model,
MaxTokens: maxTokens,
System: systemPrompt,
Messages: []anthroMessage{
{Role: "user", Content: userContent},
},
}
buf, err := json.Marshal(body)
if err != nil {
return "", err
}
req, err := http.NewRequestWithContext(ctx, "POST", baseURL+"/v1/messages", bytes.NewReader(buf))
if err != nil {
return "", err
}
req.Header.Set("Content-Type", "application/json")
req.Header.Set("x-api-key", key)
req.Header.Set("anthropic-version", "2023-06-01")
resp, err := client.Do(req)
if err != nil {
return "", err
}
defer resp.Body.Close()
respBody, _ := io.ReadAll(resp.Body)
if resp.StatusCode/100 != 2 {
return "", fmt.Errorf("anthropic API %d: %s", resp.StatusCode, string(respBody))
}
var out anthroResponse
if err := json.Unmarshal(respBody, &out); err != nil {
return "", fmt.Errorf("decoding anthropic response: %w", err)
}
if out.Error != nil {
return "", fmt.Errorf("anthropic error: %s: %s", out.Error.Type, out.Error.Message)
}
var text bytes.Buffer
for _, c := range out.Content {
if c.Type == "text" {
text.WriteString(c.Text)
}
}
return text.String(), nil
}

View File

@@ -0,0 +1,49 @@
package summarize
import (
"bytes"
"context"
"fmt"
"os/exec"
"strings"
)
// ClaudeCLI shells out to the `claude` CLI in print mode. The transcript is
// sent on stdin so we don't bump into ARG_MAX for very long services.
type ClaudeCLI struct {
// Bin is the binary name; defaults to "claude".
Bin string
// Model passes through to `claude --model`. Empty leaves the CLI default.
Model string
// ExtraArgs are appended verbatim before the prompt arg.
ExtraArgs []string
}
func (c *ClaudeCLI) Name() string { return "claude-cli" }
func (c *ClaudeCLI) Summarize(ctx context.Context, systemPrompt, userContent string) (string, error) {
bin := c.Bin
if bin == "" {
bin = "claude"
}
if _, err := exec.LookPath(bin); err != nil {
return "", fmt.Errorf("%q not on PATH: %w", bin, err)
}
args := []string{"-p"}
if c.Model != "" {
args = append(args, "--model", c.Model)
}
args = append(args, c.ExtraArgs...)
args = append(args, systemPrompt)
cmd := exec.CommandContext(ctx, bin, args...)
cmd.Stdin = strings.NewReader(userContent)
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
if err := cmd.Run(); err != nil {
return "", fmt.Errorf("%s: %w (stderr: %s)", bin, err, strings.TrimSpace(stderr.String()))
}
return strings.TrimSpace(stdout.String()), nil
}

View File

@@ -0,0 +1,13 @@
// Package summarize turns a transcript + system prompt into a markdown summary.
package summarize
import "context"
// Summarizer produces a markdown summary (or other generation) guided by
// systemPrompt and given the user-message body. The body is passed verbatim:
// callers are responsible for any framing like "Transcript:", "Producer's
// notes:", or timestamped segment formatting.
type Summarizer interface {
Summarize(ctx context.Context, systemPrompt, userContent string) (string, error)
Name() string
}

View File

@@ -0,0 +1,49 @@
package transcribe
import (
"fmt"
"strings"
)
// Segment is one timestamped chunk of a transcript.
type Segment struct {
Start float64 // seconds from start of audio
End float64
Text string
}
// PlainText joins all segments into a single transcript.
func PlainText(segs []Segment) string {
var b strings.Builder
for _, s := range segs {
b.WriteString(strings.TrimSpace(s.Text))
b.WriteByte(' ')
}
return strings.TrimSpace(b.String())
}
// FormatForLLM renders segments as one timestamped line each, suitable for
// feeding to a model that needs to pick a time window.
//
// [mm:ss] [mm:ss] text
func FormatForLLM(segs []Segment) string {
var b strings.Builder
for _, s := range segs {
fmt.Fprintf(&b, "[%s] [%s] %s\n", formatTS(s.Start), formatTS(s.End), strings.TrimSpace(s.Text))
}
return b.String()
}
func formatTS(seconds float64) string {
if seconds < 0 {
seconds = 0
}
total := int(seconds)
h := total / 3600
m := (total % 3600) / 60
s := total % 60
if h > 0 {
return fmt.Sprintf("%02d:%02d:%02d", h, m, s)
}
return fmt.Sprintf("%02d:%02d", m, s)
}

View File

@@ -0,0 +1,10 @@
// Package transcribe converts a normalized WAV file into plain-text transcript.
package transcribe
import "context"
// Transcriber turns a 16kHz mono WAV at wavPath into a plaintext transcript.
type Transcriber interface {
Transcribe(ctx context.Context, wavPath string) (string, error)
Name() string
}

View File

@@ -0,0 +1,213 @@
package transcribe
import (
"context"
"encoding/json"
"fmt"
"os"
"os/exec"
"path/filepath"
"runtime"
"strings"
"time"
)
// WhisperCPP shells out to a whisper.cpp CLI binary (whisper-cli, whisper-cpp,
// or legacy `main`) and reads its `-otxt` output. The binary must produce a
// .txt file next to the requested output basename.
type WhisperCPP struct {
// Bin is the whisper.cpp binary name or absolute path.
Bin string
// Model is the path to a ggml whisper model (.bin).
Model string
// Language to force; empty means auto-detect.
Language string
// Threads to use; 0 lets whisper.cpp pick.
Threads int
// ExtraArgs are appended to the command verbatim.
ExtraArgs []string
// Verbose enables per-step diagnostic logging to stderr (which probe ran,
// which backend was selected, etc.). The selected backend is always logged
// on a single stderr line regardless of this flag.
Verbose bool
}
func (w *WhisperCPP) Name() string { return "whisper.cpp" }
func (w *WhisperCPP) Transcribe(ctx context.Context, wavPath string) (string, error) {
segs, err := w.TranscribeSegments(ctx, wavPath)
if err != nil {
return "", err
}
return PlainText(segs), nil
}
// TranscribeSegments runs whisper.cpp with JSON output and returns the
// per-segment timestamps (in seconds) and text.
func (w *WhisperCPP) TranscribeSegments(ctx context.Context, wavPath string) ([]Segment, error) {
bin, err := w.resolveBin()
if err != nil {
return nil, err
}
if w.Model == "" {
return nil, fmt.Errorf("whisper.cpp model path is required (--whisper-model)")
}
if _, err := os.Stat(w.Model); err != nil {
return nil, fmt.Errorf("whisper model not readable at %s: %w", w.Model, err)
}
dir := filepath.Dir(wavPath)
base := strings.TrimSuffix(filepath.Base(wavPath), filepath.Ext(wavPath))
outBase := filepath.Join(dir, base)
jsonPath := outBase + ".json"
_ = os.Remove(jsonPath)
args := []string{
"-m", w.Model,
"-f", wavPath,
"-oj",
"-of", outBase,
"--no-prints",
}
if w.Language != "" {
args = append(args, "-l", w.Language)
}
if w.Threads > 0 {
args = append(args, "-t", fmt.Sprintf("%d", w.Threads))
}
args = append(args, w.ExtraArgs...)
cmd := exec.CommandContext(ctx, bin, args...)
cmd.Stdout = os.Stderr
cmd.Stderr = os.Stderr
if err := cmd.Run(); err != nil {
return nil, fmt.Errorf("%s: %w", bin, err)
}
data, err := os.ReadFile(jsonPath)
if err != nil {
return nil, fmt.Errorf("reading whisper json %s: %w", jsonPath, err)
}
return parseWhisperJSON(data)
}
// gpuBackend describes one accelerated whisper.cpp build we may pick at
// runtime. The binary is conventionally installed at ~/.local/bin/<bin> (or
// anywhere on PATH); the probe is a fast command that exits 0 only when the
// matching GPU runtime is actually usable on this machine.
type gpuBackend struct {
name string
bin string
probe []string
}
var gpuBackends = []gpuBackend{
{"CUDA", "whisper-cli-cuda", []string{"nvidia-smi", "-L"}},
{"ROCm", "whisper-cli-rocm", []string{"rocminfo"}},
{"Vulkan", "whisper-cli-vulkan", []string{"vulkaninfo", "--summary"}},
}
func (w *WhisperCPP) resolveBin() (string, error) {
if w.Bin != "" {
if _, err := exec.LookPath(w.Bin); err == nil {
return w.Bin, nil
}
if _, err := os.Stat(w.Bin); err == nil {
return w.Bin, nil
}
return "", fmt.Errorf("whisper.cpp binary %q not found on PATH", w.Bin)
}
// Metal is always usable on macOS — no separate probe needed; if the
// binary exists we trust it.
if runtime.GOOS == "darwin" {
if path := findBinary("whisper-cli-metal"); path != "" {
fmt.Fprintf(os.Stderr, "whisper: using Metal backend (%s)\n", path)
return path, nil
}
}
for _, b := range gpuBackends {
path := findBinary(b.bin)
if path == "" {
if w.Verbose {
fmt.Fprintf(os.Stderr, "whisper: no %s binary (%s) installed; skipping\n", b.name, b.bin)
}
continue
}
if !probeSucceeds(b.probe) {
if w.Verbose {
fmt.Fprintf(os.Stderr, "whisper: %s binary present at %s but %s probe failed; trying next\n", b.name, path, b.probe[0])
}
continue
}
fmt.Fprintf(os.Stderr, "whisper: using %s backend (%s)\n", b.name, path)
return path, nil
}
for _, alt := range []string{"whisper-cli", "whisper-cpp", "main"} {
if path, e := exec.LookPath(alt); e == nil {
fmt.Fprintf(os.Stderr, "whisper: using CPU backend (%s)\n", path)
return path, nil
}
}
return "", fmt.Errorf("no whisper.cpp binary found (tried GPU builds whisper-cli-{cuda,rocm,vulkan} in ~/.local/bin and PATH, then CPU whisper-cli/whisper-cpp/main on PATH); pass --whisper-bin")
}
// findBinary looks for an executable first in ~/.local/bin (the convention
// for hand-built backends), then on PATH. Returns "" if neither has it.
func findBinary(name string) string {
if home, err := os.UserHomeDir(); err == nil {
candidate := filepath.Join(home, ".local", "bin", name)
if info, err := os.Stat(candidate); err == nil && !info.IsDir() {
return candidate
}
}
if path, err := exec.LookPath(name); err == nil {
return path
}
return ""
}
// probeSucceeds runs the probe with a short timeout and reports whether it
// exited 0. Used to confirm the GPU runtime is actually usable before we
// commit to its whisper-cli build.
func probeSucceeds(argv []string) bool {
if _, err := exec.LookPath(argv[0]); err != nil {
return false
}
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
cmd := exec.CommandContext(ctx, argv[0], argv[1:]...)
return cmd.Run() == nil
}
// whisperJSONFile mirrors the structure whisper.cpp writes with -oj.
type whisperJSONFile struct {
Transcription []struct {
Offsets struct {
From int64 `json:"from"`
To int64 `json:"to"`
} `json:"offsets"`
Text string `json:"text"`
} `json:"transcription"`
}
func parseWhisperJSON(data []byte) ([]Segment, error) {
var f whisperJSONFile
if err := json.Unmarshal(data, &f); err != nil {
return nil, fmt.Errorf("parsing whisper JSON: %w", err)
}
if len(f.Transcription) == 0 {
return nil, fmt.Errorf("whisper produced no transcription segments")
}
out := make([]Segment, 0, len(f.Transcription))
for _, s := range f.Transcription {
out = append(out, Segment{
Start: float64(s.Offsets.From) / 1000.0,
End: float64(s.Offsets.To) / 1000.0,
Text: s.Text,
})
}
return out, nil
}

400
main.go Normal file
View File

@@ -0,0 +1,400 @@
// publish — generate a markdown summary, a 6090s social hook clip, or both
// from a local audio/video file. Each mode is enabled by its own boolean flag.
package main
import (
"context"
_ "embed"
"encoding/json"
"flag"
"fmt"
"os"
"os/signal"
"path/filepath"
"strings"
"syscall"
"time"
"publish/internal/audio"
"publish/internal/clip"
"publish/internal/output"
"publish/internal/summarize"
"publish/internal/transcribe"
)
//go:embed prompts/church-service.md
var defaultSummaryPrompt string
//go:embed prompts/clip-selector.md
var defaultClipPrompt string
func main() {
if err := run(os.Args[1:]); err != nil {
fmt.Fprintln(os.Stderr, "publish: "+err.Error())
os.Exit(1)
}
}
type config struct {
input string
// mode selection
modeSummerize bool
modeClip bool
modePost bool
// shared
summarizer string
model string
promptSummary string
promptClip string
whisperBin string
whisperModel string
whisperLang string
whisperThreads int
segmentsCache string
keepWAV bool
keepTranscript bool
verbose bool
// --summerize inputs/outputs
prompt string
mdOut string
spotifyOut string
copyHTML bool
// --clip outputs
minSec float64
maxSec float64
clipOut string
copyCodec bool
dryRun bool
}
func run(args []string) error {
var cfg config
fs := flag.NewFlagSet("publish", flag.ContinueOnError)
// Mode flags.
fs.BoolVar(&cfg.modeSummerize, "summerize", false, "produce a markdown summary (default if no mode is set)")
fs.BoolVar(&cfg.modeClip, "clip", false, "pick a 60-90s hook clip and cut it out of the source")
fs.BoolVar(&cfg.modePost, "post", false, "post the summary to Spotify (not implemented yet)")
// Shared flags.
fs.StringVar(&cfg.summarizer, "summarizer", "claude-cli", "LLM backend: claude-cli | claude-api")
fs.StringVar(&cfg.model, "model", "", "model name (claude-api default: claude-sonnet-4-6)")
fs.StringVar(&cfg.promptSummary, "prompt-summary", "", "summary prompt path; empty uses bundled prompts/church-service.md")
fs.StringVar(&cfg.promptClip, "prompt-clip", "", "clip-selection prompt path; empty uses bundled prompts/clip-selector.md")
fs.StringVar(&cfg.whisperBin, "whisper-bin", "", "whisper.cpp binary (auto-detect if empty)")
fs.StringVar(&cfg.whisperModel, "whisper-model", defaultWhisperModel(), "whisper.cpp ggml model path")
fs.StringVar(&cfg.whisperLang, "whisper-lang", "", "force whisper language code (empty = auto)")
fs.IntVar(&cfg.whisperThreads, "whisper-threads", 0, "whisper.cpp thread count (0 = library default)")
fs.StringVar(&cfg.segmentsCache, "segments", "", `path to read/write whisper segments JSON; default: <input>.segments.json`)
fs.BoolVar(&cfg.keepWAV, "keep-wav", false, "keep the normalized 16kHz WAV next to the input")
fs.BoolVar(&cfg.keepTranscript, "keep-transcript", false, "also write <input>.transcript.txt")
fs.BoolVar(&cfg.verbose, "v", false, "verbose progress output")
// --summerize inputs/outputs.
fs.StringVar(&cfg.prompt, "prompt", "", "[--summerize] producer's notes to anchor the summary (titles, framing, key points). For longer notes use shell expansion: --prompt \"$(cat notes.txt)\"")
fs.StringVar(&cfg.mdOut, "md", "", `[--summerize] markdown output; "-" for stdout, "" disables; default: <input>.summary.md`)
fs.StringVar(&cfg.spotifyOut, "spotify", "", `[--summerize] Spotify HTML output; "-" for stdout (default: disabled)`)
fs.BoolVar(&cfg.copyHTML, "copy", false, "[--summerize] copy Spotify HTML to clipboard")
// --clip outputs.
fs.Float64Var(&cfg.minSec, "min", 60, "[--clip] minimum clip length in seconds")
fs.Float64Var(&cfg.maxSec, "max", 90, "[--clip] maximum clip length in seconds")
fs.StringVar(&cfg.clipOut, "out", "", `[--clip] clip output path; default: <input>.clip<ext> (or .clip.m4a for audio)`)
fs.BoolVar(&cfg.copyCodec, "copy-codec", false, "[--clip] use ffmpeg stream copy instead of re-encoding (faster, keyframe-aligned)")
fs.BoolVar(&cfg.dryRun, "dry-run", false, "[--clip] pick the clip and print metadata, but skip the ffmpeg cut")
fs.Usage = func() {
fmt.Fprintf(os.Stderr, `usage: publish [mode...] [flags] <input>
modes (combine freely; defaults to --summerize):
--summerize write a markdown summary
--clip cut a 60-90s social hook clip
--post post to Spotify (not implemented yet)
flags:
`)
fs.PrintDefaults()
}
if err := fs.Parse(args); err != nil {
return err
}
if fs.NArg() != 1 {
fs.Usage()
return fmt.Errorf("exactly one input file is required")
}
cfg.input = fs.Arg(0)
// Default to --summerize if no mode flag was passed.
if !cfg.modeSummerize && !cfg.modeClip && !cfg.modePost {
cfg.modeSummerize = true
}
if cfg.modePost {
return fmt.Errorf("--post is not implemented yet")
}
// Output path defaults that depend on input.
if cfg.mdOut == "" {
cfg.mdOut = cfg.input + ".summary.md"
}
if cfg.mdOut == "-" && cfg.spotifyOut == "-" {
return fmt.Errorf("--md and --spotify cannot both be \"-\"")
}
if cfg.segmentsCache == "" {
cfg.segmentsCache = cfg.input + ".segments.json"
}
if cfg.clipOut == "" {
cfg.clipOut = clip.DefaultOutputPath(cfg.input)
}
if cfg.minSec <= 0 || cfg.maxSec <= 0 || cfg.maxSec < cfg.minSec {
return fmt.Errorf("invalid --min/--max bounds")
}
ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
defer cancel()
segs, err := loadOrTranscribeSegments(ctx, cfg)
if err != nil {
return err
}
if cfg.keepTranscript {
if err := os.WriteFile(cfg.input+".transcript.txt", []byte(transcribe.PlainText(segs)), 0o644); err != nil {
return fmt.Errorf("writing transcript: %w", err)
}
}
sum, err := buildSummarizer(cfg.summarizer, cfg.model)
if err != nil {
return err
}
if cfg.modeSummerize {
if err := doSummerize(ctx, cfg, sum, segs); err != nil {
return err
}
}
if cfg.modeClip {
if err := doClip(ctx, cfg, sum, segs); err != nil {
return err
}
}
return nil
}
func doSummerize(ctx context.Context, cfg config, sum summarize.Summarizer, segs []transcribe.Segment) error {
systemPrompt, err := loadPrompt(cfg.promptSummary, defaultSummaryPrompt)
if err != nil {
return err
}
body := "Transcript:\n\n" + transcribe.PlainText(segs)
if notes := strings.TrimSpace(cfg.prompt); notes != "" {
body = "Producer's notes (treat these as authoritative for titles, framing, and key points; expand and enrich them using the transcript that follows):\n\n" +
notes + "\n\n---\n\n" + body
}
logIf(cfg.verbose, "summarizing with %s", sum.Name())
t0 := time.Now()
md, err := sum.Summarize(ctx, systemPrompt, body)
if err != nil {
return fmt.Errorf("summarize: %w", err)
}
md = strings.TrimSpace(md)
logIf(cfg.verbose, "summary ready (%d chars, %s)", len(md), time.Since(t0).Round(time.Second))
if err := writeOutput(cfg.mdOut, md); err != nil {
return fmt.Errorf("writing markdown: %w", err)
}
var html string
if cfg.spotifyOut != "" || cfg.copyHTML {
html = output.MarkdownToSpotifyHTML(md)
}
if cfg.spotifyOut != "" {
if err := writeOutput(cfg.spotifyOut, html); err != nil {
return fmt.Errorf("writing spotify HTML: %w", err)
}
}
if cfg.copyHTML {
tool, err := output.CopyToClipboard(html)
if err != nil {
return fmt.Errorf("clipboard: %w", err)
}
logIf(cfg.verbose, "Spotify HTML copied via %s", tool)
}
return nil
}
func doClip(ctx context.Context, cfg config, sum summarize.Summarizer, segs []transcribe.Segment) error {
prompt, err := loadPrompt(cfg.promptClip, defaultClipPrompt)
if err != nil {
return err
}
logIf(cfg.verbose, "selecting clip with %s (looking for %g-%gs window)", sum.Name(), cfg.minSec, cfg.maxSec)
t0 := time.Now()
sel, raw, err := clip.Pick(ctx, sum, prompt, segs, cfg.minSec, cfg.maxSec)
if err != nil {
if raw != "" {
fmt.Fprintf(os.Stderr, "model output:\n%s\n", raw)
}
return fmt.Errorf("selecting clip: %w", err)
}
logIf(cfg.verbose, "selection ready (%s)", time.Since(t0).Round(time.Second))
fmt.Printf("Title: %s\n", sel.Title)
fmt.Printf("Hook: %s\n", sel.Hook)
fmt.Printf("Quote: %s\n", sel.Quote)
fmt.Printf("Window: %s -> %s (%.1fs)\n", mmss(sel.StartSeconds), mmss(sel.EndSeconds), sel.Duration())
fmt.Printf("Reason: %s\n", sel.Reasoning)
if cfg.dryRun {
return nil
}
logIf(cfg.verbose, "cutting clip with ffmpeg -> %s", cfg.clipOut)
if err := clip.Extract(ctx, cfg.input, sel, cfg.clipOut, !cfg.copyCodec); err != nil {
return err
}
fmt.Printf("Wrote: %s\n", cfg.clipOut)
return nil
}
// loadOrTranscribeSegments reads cached whisper JSON if available; otherwise
// extracts audio, runs whisper, writes the cache, and returns segments.
func loadOrTranscribeSegments(ctx context.Context, cfg config) ([]transcribe.Segment, error) {
if data, err := os.ReadFile(cfg.segmentsCache); err == nil {
var segs []transcribe.Segment
if jerr := json.Unmarshal(data, &segs); jerr == nil && len(segs) > 0 {
logIf(cfg.verbose, "reusing cached segments from %s (%d segments)", cfg.segmentsCache, len(segs))
return segs, nil
}
}
wavPath, cleanup, err := prepareWAV(ctx, cfg.input, cfg.keepWAV, cfg.verbose)
if err != nil {
return nil, err
}
defer cleanup()
tr := buildTranscriber(cfg)
logIf(cfg.verbose, "transcribing with %s", tr.Name())
t0 := time.Now()
segs, err := tr.TranscribeSegments(ctx, wavPath)
if err != nil {
return nil, fmt.Errorf("transcribe: %w", err)
}
logIf(cfg.verbose, "transcript ready (%d segments, %s)", len(segs), time.Since(t0).Round(time.Second))
if data, err := json.Marshal(segs); err == nil {
_ = os.WriteFile(cfg.segmentsCache, data, 0o644)
logIf(cfg.verbose, "cached segments to %s", cfg.segmentsCache)
}
return segs, nil
}
// prepareWAV normalizes input to 16 kHz mono WAV. Returns the wav path and a
// cleanup function (no-op if keep is true).
func prepareWAV(ctx context.Context, input string, keep, verbose bool) (string, func(), error) {
wavPath := input + ".16k.wav"
cleanup := func() {}
if !keep {
tmpDir, err := os.MkdirTemp("", "publish-")
if err != nil {
return "", cleanup, err
}
wavPath = filepath.Join(tmpDir, "audio.wav")
cleanup = func() { _ = os.RemoveAll(tmpDir) }
}
logIf(verbose, "extracting audio -> %s", wavPath)
if err := audio.ExtractWAV(ctx, input, wavPath); err != nil {
cleanup()
return "", func() {}, fmt.Errorf("audio extraction: %w", err)
}
return wavPath, cleanup, nil
}
func loadPrompt(path, fallback string) (string, error) {
if path == "" {
return fallback, nil
}
b, err := os.ReadFile(expand(path))
if err != nil {
return "", fmt.Errorf("reading prompt %s: %w", path, err)
}
return string(b), nil
}
func buildTranscriber(cfg config) *transcribe.WhisperCPP {
return &transcribe.WhisperCPP{
Bin: cfg.whisperBin,
Model: expand(cfg.whisperModel),
Language: cfg.whisperLang,
Threads: cfg.whisperThreads,
Verbose: cfg.verbose,
}
}
func buildSummarizer(kind, model string) (summarize.Summarizer, error) {
switch kind {
case "claude-cli", "cli":
return &summarize.ClaudeCLI{Model: model}, nil
case "claude-api", "anthropic", "api":
return &summarize.Anthropic{Model: model}, nil
default:
return nil, fmt.Errorf("unknown summarizer %q", kind)
}
}
func writeOutput(path, data string) error {
if path == "" {
return nil
}
if path == "-" {
_, err := os.Stdout.WriteString(data + "\n")
return err
}
return os.WriteFile(expand(path), []byte(data+"\n"), 0o644)
}
func expand(p string) string {
if strings.HasPrefix(p, "~/") {
if home, err := os.UserHomeDir(); err == nil {
return filepath.Join(home, p[2:])
}
}
return p
}
func defaultWhisperModel() string {
home, err := os.UserHomeDir()
if err != nil {
return ""
}
return filepath.Join(home, ".cache", "whisper.cpp", "ggml-base.en.bin")
}
func logIf(on bool, format string, args ...any) {
if !on {
return
}
fmt.Fprintf(os.Stderr, "[publish] "+format+"\n", args...)
}
func mmss(seconds float64) string {
if seconds < 0 {
seconds = 0
}
total := int(seconds)
h := total / 3600
m := (total % 3600) / 60
s := total % 60
if h > 0 {
return fmt.Sprintf("%02d:%02d:%02d", h, m, s)
}
return fmt.Sprintf("%02d:%02d", m, s)
}

58
prompts/church-service.md Normal file
View File

@@ -0,0 +1,58 @@
You are summarizing a recorded Christian church service. The transcript may include
welcome/announcements, worship songs, scripture readings, a sermon, prayer, and a
closing/benediction. The transcript is auto-generated and may contain misheard words,
missing punctuation, or speaker confusion — use context to interpret.
This is a King-James-Bible-preaching congregation. The summary must focus on the
preaching of the Bible (KJV). When you reference or quote scripture in any field
of the output, render it in King James Version English using KJV book/chapter/verse
formatting (e.g. "Romans 5:8", "1 Corinthians 13:13"). If the preacher quoted
scripture and the transcript mishears the wording, restore the KJV phrasing.
Reject purely motivational framing that lacks a biblical anchor — every section
should connect back to what the Bible says.
If the user message begins with a "Producer's notes" section before the transcript,
treat those notes as authoritative. Use the producer's title, speaker name, framing,
and any specific points they emphasize verbatim — do not contradict or rewrite them.
Use the transcript to expand, enrich, and back up what the producer wrote (filling
in scripture references, application points, memorable quotes, etc.).
Produce a faithful, neutral summary written in clear, accessible English. Do not
invent facts, doctrines, scripture references, or quotes that are not present in the
transcript or the producer's notes. If something is unclear, say so rather than guessing.
Output the summary in Markdown with this structure:
# {Sermon title or topic — infer from the message; if truly unknown, write "Sunday Service Summary"}
**Speaker:** {pastor/teacher name if identifiable, else "Unknown"}
**Scripture:** {primary passages referenced, comma-separated; "—" if none}
## Overview
A 24 sentence plain-English summary of the central message.
## Key Points
- 47 bullets capturing the main teaching points in the order they were made.
- Keep each bullet to one or two sentences.
## Scripture & References
- Bullet list of every scripture reference, book/author quoted, or notable resource
mentioned. Include the verse text only if it was read aloud and you can quote it
accurately from the transcript.
## Application / Call to Action
- What the speaker asked listeners to do, believe, or reflect on this week.
## Announcements & Prayer Requests
- Brief bullets for anything congregational: events, missions updates, prayer
requests, baptisms, etc. Omit this section entirely if none were mentioned.
## Memorable Quote
> One short, verbatim quote from the speaker that captures the heart of the message.
> Only include if you can quote it accurately from the transcript.
Style rules:
- Be concise. Total length: 250500 words.
- Use the speaker's own framing and terminology when possible.
- Do not add commentary, critique, or your own theological interpretation.
- Do not include timestamps, filler ("uh", "you know"), or worship lyrics.

45
prompts/clip-selector.md Normal file
View File

@@ -0,0 +1,45 @@
You are selecting the single best 6090 second clip from a recorded Christian
church service to use as a social-media hook (Reels, Shorts, TikTok, X video).
This is a King-James-Bible-preaching congregation; the clip MUST come from the
preaching of the sermon and MUST be rooted in the Bible (KJV).
You will be given a timestamped transcript. Each line is formatted:
[mm:ss] [mm:ss] text
The first timestamp is the segment start, the second is its end. Times are
relative to the start of the recording.
Pick the clip that:
- Comes from the preaching (sermon exposition) — NOT worship music, opening
prayer, announcements, offering, "turn to verse X" housekeeping, altar call
logistics, or the benediction.
- Is anchored in the Bible — exposition of scripture, a scriptural truth being
applied, or a story tied directly to a biblical principle. Reject purely
motivational content with no biblical anchor.
- Stands alone without prior context — a viewer who jumps in cold should still
get the point.
- Has emotional weight, vivid language, a sharp insight, or a memorable story.
- Avoids mid-sentence cuts. Start at a natural beginning, end at a natural end.
- Is between {{MIN_SECONDS}} and {{MAX_SECONDS}} seconds long.
Respond with ONLY a JSON object — no preamble, no code fence, no commentary.
Use this exact shape:
{
"start_seconds": 123.4,
"end_seconds": 198.7,
"title": "short punchy title for the clip",
"hook": "one-sentence reason a viewer should stop scrolling",
"quote": "the most quotable line in the clip, verbatim from the transcript",
"reasoning": "1-2 sentences on why this is the best window in the service"
}
Use the segment timestamps you were given for start_seconds and end_seconds —
do not invent times. Keep the duration within the requested bounds.
Any scripture references you cite (in the "quote" field, the "hook", or the
"reasoning") must use the King James Version. If the preacher quoted scripture
within the chosen window, render that quote in KJV English even if the
transcript misheard the wording, and use KJV book/chapter/verse formatting
(e.g. "Romans 5:8", "1 Corinthians 13:13").

480
scripts/install.sh Executable file
View File

@@ -0,0 +1,480 @@
#!/usr/bin/env bash
# install.sh — interactive setup for `publish`.
#
# Detects OS + GPU, walks through:
# 1. system dependencies (ffmpeg, cmake, git, go, GPU runtime)
# 2. whisper.cpp checkout + build for the chosen backend
# 3. ggml model download
# 4. publish binary build + symlink into ~/.local/bin
#
# Re-runnable. Each step is idempotent and skippable.
#
# Pass --doctor to print detection info and exit.
set -euo pipefail
REPO_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
PREFIX="${PREFIX:-$HOME/.local}"
BINDIR="$PREFIX/bin"
MODELDIR="$HOME/.cache/whisper.cpp"
WHISPER_REPO="${WHISPER_REPO:-$HOME/Git Repos/whisper.cpp}"
WHISPER_GIT="https://github.com/ggerganov/whisper.cpp"
MODEL_BASE_URL="https://huggingface.co/ggerganov/whisper.cpp/resolve/main"
DOCTOR_ONLY=0
[[ "${1:-}" == "--doctor" ]] && DOCTOR_ONLY=1
bold() { printf '\033[1m%s\033[0m\n' "$*"; }
info() { printf ' %s\n' "$*"; }
warn() { printf '\033[33m warn: %s\033[0m\n' "$*" >&2; }
err() { printf '\033[31m error: %s\033[0m\n' "$*" >&2; }
hr() { printf -- '----------------------------------------\n'; }
# Portable CPU count: nproc on Linux, sysctl on macOS, fallback 4.
ncpu() {
if command -v nproc >/dev/null 2>&1; then nproc
elif command -v sysctl >/dev/null 2>&1; then sysctl -n hw.ncpu 2>/dev/null || echo 4
else echo 4; fi
}
ask_yn() {
# ask_yn "prompt" default(Y|N)
local prompt="$1" default="${2:-Y}" reply
local hint="[Y/n]"; [[ "$default" == "N" ]] && hint="[y/N]"
while true; do
read -r -p " $prompt $hint " reply || true
reply="${reply:-$default}"
case "$reply" in
[Yy]*) return 0 ;;
[Nn]*) return 1 ;;
esac
done
}
ask_choice() {
# ask_choice "prompt" default option1 option2 ...
local prompt="$1" default="$2"; shift 2
local options=("$@") reply
local opts_joined; opts_joined="$(IFS='/'; echo "${options[*]}")"
while true; do
read -r -p " $prompt [$opts_joined] (default: $default): " reply || true
reply="${reply:-$default}"
for opt in "${options[@]}"; do
[[ "$reply" == "$opt" ]] && { echo "$reply"; return 0; }
done
warn "must be one of: $opts_joined"
done
}
# ---------- detection ----------
detect_os() {
case "$(uname -s)" in
Linux*)
if command -v pacman >/dev/null 2>&1; then echo "arch"
elif command -v apt-get >/dev/null 2>&1; then echo "debian"
elif command -v dnf >/dev/null 2>&1; then echo "fedora"
else echo "linux-other"; fi ;;
Darwin*) echo "macos" ;;
*) echo "unknown" ;;
esac
}
detect_gpu() {
# apple silicon
if [[ "$(uname -s)" == "Darwin" ]]; then
[[ "$(uname -m)" == "arm64" ]] && { echo "apple"; return; }
echo "none"; return
fi
# nvidia
if command -v nvidia-smi >/dev/null 2>&1 && nvidia-smi -L >/dev/null 2>&1; then
echo "nvidia"; return
fi
if command -v lspci >/dev/null 2>&1; then
local vga; vga="$(lspci 2>/dev/null | grep -iE 'vga|3d|display' || true)"
if echo "$vga" | grep -iq 'nvidia'; then echo "nvidia"; return; fi
if echo "$vga" | grep -iqE 'amd|ati|radeon'; then echo "amd"; return; fi
if echo "$vga" | grep -iq 'intel'; then echo "intel"; return; fi
fi
echo "none"
}
default_backend_for() {
case "$1" in
nvidia) echo "cuda" ;;
amd) echo "vulkan" ;; # easier to set up than rocm; user can pick rocm
apple) echo "metal" ;;
intel) echo "vulkan" ;;
*) echo "cpu" ;;
esac
}
OS="$(detect_os)"
GPU="$(detect_gpu)"
DEFAULT_BACKEND="$(default_backend_for "$GPU")"
# ---------- doctor ----------
print_detection() {
bold "=== publish — environment ==="
info "OS: $OS ($(uname -s) $(uname -r))"
info "Arch: $(uname -m)"
info "GPU: $GPU"
info "Default backend: $DEFAULT_BACKEND"
info "Repo dir: $REPO_DIR"
info "Install prefix: $PREFIX"
info "Whisper repo: $WHISPER_REPO"
info "Model dir: $MODELDIR"
hr
bold "Dependencies"
local deps=(go ffmpeg cmake git curl)
case "$DEFAULT_BACKEND" in
cuda) deps+=(nvidia-smi nvcc) ;;
rocm) deps+=(rocminfo hipcc) ;;
vulkan) deps+=(vulkaninfo glslc) ;;
esac
case "$OS" in
macos) deps+=(brew xcode-select pbcopy) ;;
*) deps+=(claude wl-copy xclip) ;;
esac
# Known off-PATH locations for tools that linux distros tuck away.
extra_path_for() {
case "$1" in
nvcc) echo /opt/cuda/bin/nvcc ;;
hipcc) echo /opt/rocm/bin/hipcc ;;
rocminfo) echo /opt/rocm/bin/rocminfo ;;
*) echo "" ;;
esac
}
for d in "${deps[@]}"; do
if command -v "$d" >/dev/null 2>&1; then
info "$(printf '%-14s %s' "$d:" "$(command -v "$d")")"
else
extra="$(extra_path_for "$d")"
if [[ -n "$extra" && -x "$extra" ]]; then
info "$(printf '%-14s %s' "$d:" "$extra (not on PATH)")"
else
info "$(printf '%-14s %s' "$d:" "MISSING")"
fi
fi
done
}
print_detection
if [[ $DOCTOR_ONLY -eq 1 ]]; then
exit 0
fi
hr
echo
# ---------- pick backend ----------
bold "Step 1 — pick whisper.cpp backend"
echo
echo " Detected default: $DEFAULT_BACKEND"
echo " Options: cuda | rocm | vulkan | metal | cpu | skip"
echo " cuda — NVIDIA GPU, fastest if you have one"
echo " rocm — AMD GPU via ROCm/HIP, fastest on supported AMD cards"
echo " vulkan — any GPU with a Vulkan driver (good cross-vendor fallback)"
echo " metal — Apple Silicon"
echo " cpu — no GPU, use the system whisper.cpp package"
echo " skip — leave whisper.cpp alone (e.g. you've already built it)"
echo
BACKEND="$(ask_choice "Backend" "$DEFAULT_BACKEND" cuda rocm vulkan metal cpu skip)"
# ---------- macOS preflight ----------
if [[ "$OS" == "macos" ]]; then
hr
bold "macOS preflight — Xcode Command Line Tools + Homebrew"
echo
if ! xcode-select -p >/dev/null 2>&1; then
warn "Xcode Command Line Tools not installed."
echo " Run: xcode-select --install"
echo " Then re-run 'make install'."
exit 1
else
info "Xcode CLT present at $(xcode-select -p)"
fi
if ! command -v brew >/dev/null 2>&1; then
warn "Homebrew not installed."
echo
echo " Install with:"
echo ' /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"'
echo
if ask_yn "Install Homebrew now?" Y; then
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Make brew available for the rest of this run on Apple Silicon (default prefix /opt/homebrew).
if [[ -x /opt/homebrew/bin/brew ]]; then
eval "$(/opt/homebrew/bin/brew shellenv)"
elif [[ -x /usr/local/bin/brew ]]; then
eval "$(/usr/local/bin/brew shellenv)"
fi
else
err "Homebrew is required for the macOS install path. Install it and re-run."
exit 1
fi
else
info "Homebrew present at $(command -v brew)"
fi
fi
# ---------- system deps ----------
hr
bold "Step 2 — system dependencies"
echo
base_pkgs_arch=(go ffmpeg cmake git curl)
base_pkgs_debian=(golang-go ffmpeg cmake git curl build-essential)
base_pkgs_fedora=(golang ffmpeg cmake git curl gcc-c++)
base_pkgs_macos=(go ffmpeg cmake git curl)
backend_pkgs_arch_cuda=(cuda gcc15)
backend_pkgs_arch_rocm=(rocm-hip-sdk rocm-hip-runtime hipblas rocblas)
backend_pkgs_arch_vulkan=(vulkan-headers vulkan-icd-loader shaderc)
backend_pkgs_arch_cpu=(whisper.cpp)
backend_pkgs_macos_metal=() # xcode-select on macOS provides Metal toolchain
backend_pkgs_macos_cpu=(whisper-cpp)
pkgs_to_install=()
case "$OS" in
arch)
pkgs_to_install+=("${base_pkgs_arch[@]}")
case "$BACKEND" in
cuda) pkgs_to_install+=("${backend_pkgs_arch_cuda[@]}") ;;
rocm) pkgs_to_install+=("${backend_pkgs_arch_rocm[@]}") ;;
vulkan) pkgs_to_install+=("${backend_pkgs_arch_vulkan[@]}") ;;
cpu) pkgs_to_install+=("${backend_pkgs_arch_cpu[@]}") ;;
esac
;;
macos)
pkgs_to_install+=("${base_pkgs_macos[@]}")
[[ "$BACKEND" == "cpu" ]] && pkgs_to_install+=("${backend_pkgs_macos_cpu[@]}")
;;
debian)
pkgs_to_install+=("${base_pkgs_debian[@]}")
warn "Debian/Ubuntu: GPU runtime install for $BACKEND is distro-specific; you'll need to handle it manually."
;;
fedora)
pkgs_to_install+=("${base_pkgs_fedora[@]}")
warn "Fedora: GPU runtime install for $BACKEND is distro-specific; you'll need to handle it manually."
;;
*)
warn "Unknown OS '$OS' — install $BACKEND runtime, ffmpeg, cmake, git, and Go manually."
;;
esac
missing=()
for p in "${pkgs_to_install[@]}"; do
case "$p" in
# arch package -> binary mapping for "is it installed?" checks
go) command -v go >/dev/null 2>&1 || missing+=("$p") ;;
golang-go|golang) command -v go >/dev/null 2>&1 || missing+=("$p") ;;
ffmpeg) command -v ffmpeg >/dev/null 2>&1 || missing+=("$p") ;;
cmake) command -v cmake >/dev/null 2>&1 || missing+=("$p") ;;
git) command -v git >/dev/null 2>&1 || missing+=("$p") ;;
curl) command -v curl >/dev/null 2>&1 || missing+=("$p") ;;
cuda) command -v nvcc >/dev/null 2>&1 || missing+=("$p") ;;
gcc15) [[ -x /usr/bin/g++-15 ]] || missing+=("$p") ;;
rocm-hip-sdk) command -v hipcc >/dev/null 2>&1 || missing+=("$p") ;;
rocm-hip-runtime|hipblas|rocblas) [[ -d /opt/rocm ]] || missing+=("$p") ;;
vulkan-headers|vulkan-icd-loader) command -v vulkaninfo >/dev/null 2>&1 || missing+=("$p") ;;
shaderc) command -v glslc >/dev/null 2>&1 || missing+=("$p") ;;
whisper.cpp|whisper-cpp) command -v whisper-cli >/dev/null 2>&1 || command -v whisper-cpp >/dev/null 2>&1 || missing+=("$p") ;;
build-essential|gcc-c++) command -v g++ >/dev/null 2>&1 || missing+=("$p") ;;
*) missing+=("$p") ;;
esac
done
if (( ${#missing[@]} == 0 )); then
info "All system dependencies present."
else
info "Missing: ${missing[*]}"
case "$OS" in
arch) cmd="sudo pacman -S --needed ${missing[*]}" ;;
debian) cmd="sudo apt-get update && sudo apt-get install -y ${missing[*]}" ;;
fedora) cmd="sudo dnf install -y ${missing[*]}" ;;
macos) cmd="brew install ${missing[*]}" ;;
*) cmd="" ;;
esac
if [[ -n "$cmd" ]]; then
echo
echo " Suggested install command:"
echo " $cmd"
echo
if ask_yn "Run it now?" Y; then
bash -c "$cmd"
else
warn "Skipped. Re-run after installing manually."
fi
fi
fi
# ---------- whisper.cpp build ----------
build_whisper() {
local backend="$1"
if [[ "$backend" == "cpu" ]]; then
if command -v whisper-cli >/dev/null 2>&1 || command -v whisper-cpp >/dev/null 2>&1; then
info "Using system whisper.cpp; no build needed."
return 0
fi
err "No system whisper.cpp found and 'cpu' chosen; install the package or pick another backend."
return 1
fi
if [[ "$backend" == "skip" ]]; then
info "Skipping whisper.cpp."
return 0
fi
if [[ ! -d "$WHISPER_REPO/.git" ]]; then
info "Cloning whisper.cpp into $WHISPER_REPO"
mkdir -p "$(dirname "$WHISPER_REPO")"
git clone --depth=1 "$WHISPER_GIT" "$WHISPER_REPO"
else
if ask_yn "whisper.cpp already at $WHISPER_REPO — git pull latest?" N; then
git -C "$WHISPER_REPO" pull --ff-only || warn "git pull failed; continuing with current checkout"
fi
fi
local build_dir="$WHISPER_REPO/build-$backend"
info "Configuring $backend build in $build_dir"
case "$backend" in
cuda)
local host_cxx=""
[[ -x /usr/bin/g++-15 ]] && host_cxx="-DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-15"
local arch="86"
if command -v nvidia-smi >/dev/null 2>&1; then
local cap; cap="$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader 2>/dev/null | head -1 | tr -d '.')"
[[ -n "$cap" ]] && arch="$cap"
fi
info " CUDA arch: sm_$arch"
PATH="/opt/cuda/bin:$PATH" cmake -S "$WHISPER_REPO" -B "$build_dir" \
-DGGML_CUDA=1 \
-DCMAKE_CUDA_ARCHITECTURES="$arch" \
$host_cxx \
-DCMAKE_BUILD_TYPE=Release
PATH="/opt/cuda/bin:$PATH" cmake --build "$build_dir" -j"$(ncpu)" --config Release
;;
rocm)
local gpu_arch="gfx1102"
if command -v rocminfo >/dev/null 2>&1; then
local detected; detected="$(rocminfo 2>/dev/null | awk '/Name:[[:space:]]+gfx/ {print $2; exit}')"
[[ -n "$detected" ]] && gpu_arch="$detected"
fi
info " AMDGPU target: $gpu_arch"
HIPCXX="${HIPCXX:-/opt/rocm/llvm/bin/clang++}" \
cmake -S "$WHISPER_REPO" -B "$build_dir" \
-DGGML_HIP=1 \
-DAMDGPU_TARGETS="$gpu_arch" \
-DCMAKE_BUILD_TYPE=Release
cmake --build "$build_dir" -j"$(ncpu)"
;;
vulkan)
cmake -S "$WHISPER_REPO" -B "$build_dir" \
-DGGML_VULKAN=1 \
-DCMAKE_BUILD_TYPE=Release
cmake --build "$build_dir" -j"$(ncpu)"
;;
metal)
# Metal is on by default on Apple Silicon; no special flag.
cmake -S "$WHISPER_REPO" -B "$build_dir" -DCMAKE_BUILD_TYPE=Release
cmake --build "$build_dir" -j"$(ncpu)"
;;
*)
err "Unknown backend: $backend"; return 1 ;;
esac
mkdir -p "$BINDIR"
local link="$BINDIR/whisper-cli-$backend"
ln -sf "$build_dir/bin/whisper-cli" "$link"
info "linked $link -> $build_dir/bin/whisper-cli"
}
hr
bold "Step 3 — whisper.cpp"
echo
build_whisper "$BACKEND"
# ---------- model download ----------
hr
bold "Step 4 — whisper model"
echo
mkdir -p "$MODELDIR"
existing_models=()
while IFS= read -r m; do existing_models+=("$(basename "$m")"); done < <(ls "$MODELDIR"/ggml-*.bin 2>/dev/null || true)
if (( ${#existing_models[@]} > 0 )); then
info "Existing models in $MODELDIR:"
for m in "${existing_models[@]}"; do info " - $m"; done
fi
if ask_yn "Download a model now?" $([ ${#existing_models[@]} -eq 0 ] && echo Y || echo N); then
echo
echo " Sizes (English-only suffix .en is faster on English audio):"
echo " tiny.en ~75MB"
echo " base.en ~142MB (default; good speed/quality balance)"
echo " small.en ~466MB"
echo " medium.en ~1.5GB"
echo " large-v3 ~2.9GB (multilingual, highest quality)"
echo
SIZE="$(ask_choice "Model" "base.en" tiny.en base.en small.en medium.en large-v3)"
target="$MODELDIR/ggml-$SIZE.bin"
if [[ -f "$target" ]]; then
info "$target already exists; skipping download."
else
info "Downloading ggml-$SIZE.bin ..."
curl -L --fail -o "$target" "$MODEL_BASE_URL/ggml-$SIZE.bin"
info "saved to $target"
fi
fi
# ---------- publish build + symlink ----------
hr
bold "Step 5 — publish binary"
echo
if ! command -v go >/dev/null 2>&1; then
err "Go is not installed; cannot build publish."
exit 1
fi
(cd "$REPO_DIR" && go build -o publish .)
info "built $REPO_DIR/publish"
mkdir -p "$BINDIR"
ln -sf "$REPO_DIR/publish" "$BINDIR/publish"
info "linked $BINDIR/publish -> $REPO_DIR/publish"
# ---------- summarizer hint ----------
hr
bold "Step 6 — summarizer"
echo
if command -v claude >/dev/null 2>&1; then
info "Found 'claude' CLI — default --summarizer claude-cli will work."
elif [[ -n "${ANTHROPIC_API_KEY:-}" ]]; then
info "ANTHROPIC_API_KEY set — use --summarizer claude-api."
else
warn "Neither 'claude' CLI nor ANTHROPIC_API_KEY found. Install Claude Code or export ANTHROPIC_API_KEY before running summaries."
fi
# ---------- done ----------
hr
bold "Done."
echo
echo " Quick sanity check:"
echo " publish --help"
echo " publish --doctor # via this script: bash scripts/install.sh --doctor"
echo
echo " Make sure $BINDIR is on your PATH."
case ":$PATH:" in
*":$BINDIR:"*) ;;
*) warn "$BINDIR is not currently on your PATH." ;;
esac