Initial push to gitea
This commit is contained in:
9
.gitignore
vendored
Normal file
9
.gitignore
vendored
Normal file
@@ -0,0 +1,9 @@
|
||||
/publish
|
||||
*.summary.md
|
||||
*.spotify.html
|
||||
*.transcript.txt
|
||||
*.segments.json
|
||||
*.16k.wav
|
||||
*.clip.mp4
|
||||
*.clip.m4a
|
||||
.env
|
||||
343
CLAUDE.md
Normal file
343
CLAUDE.md
Normal file
@@ -0,0 +1,343 @@
|
||||
# publish
|
||||
|
||||
Go CLI that turns a local audio/video recording of a church service into:
|
||||
|
||||
1. A markdown summary (`--summerize`)
|
||||
2. A 60–90s social-media hook clip cut from the source (`--clip`)
|
||||
3. (Future) A post to Spotify for Podcasters (`--post` — currently stubs out)
|
||||
|
||||
The repo directory is still `summerize/` (historical), but the module and binary
|
||||
are both `publish`.
|
||||
|
||||
## Pipeline (one pass, shared by all modes)
|
||||
|
||||
```
|
||||
input ──ffmpeg──► 16kHz mono WAV ──whisper.cpp -oj──► []Segment{Start,End,Text}
|
||||
│
|
||||
┌─────────────────┴─────────────────┐
|
||||
│ │
|
||||
PlainText(segs) FormatForLLM(segs)
|
||||
│ │
|
||||
▼ ▼
|
||||
Summarizer clip.Pick
|
||||
(Anthropic API or (same Summarizer,
|
||||
shelled-out claude CLI) different prompt → JSON)
|
||||
│ │
|
||||
▼ ▼
|
||||
markdown summary ffmpeg cut [start,end]
|
||||
│
|
||||
└─► output.MarkdownToSpotifyHTML
|
||||
(b/i/a/ul/ol/li/p subset
|
||||
that Spotify show notes accept)
|
||||
```
|
||||
|
||||
Whisper output is cached at `<input>.segments.json`. Subsequent runs (different
|
||||
modes, different prompt-clip params) skip whisper entirely.
|
||||
|
||||
## Layout
|
||||
|
||||
```
|
||||
main.go flat flagset, mode dispatch, orchestration
|
||||
prompts/church-service.md default --summerize prompt (go:embed)
|
||||
prompts/clip-selector.md default --clip prompt; templated with
|
||||
{{MIN_SECONDS}} / {{MAX_SECONDS}} (go:embed)
|
||||
internal/audio/audio.go ffmpeg → 16kHz mono PCM WAV
|
||||
internal/transcribe/
|
||||
transcribe.go Transcriber interface, Segment type
|
||||
segments.go Segment, PlainText, FormatForLLM, mm:ss helper
|
||||
whispercpp.go shells out to whisper-cli with -oj; parses JSON
|
||||
internal/summarize/
|
||||
summarize.go Summarizer interface
|
||||
anthropic.go direct Messages API via net/http
|
||||
(no SDK dep; reads ANTHROPIC_API_KEY)
|
||||
claudecli.go `claude -p <prompt>` with transcript on stdin
|
||||
internal/clip/
|
||||
clip.go Selection, Pick (LLM JSON parse), Extract (ffmpeg)
|
||||
clip_test.go JSON object extraction edge cases
|
||||
internal/output/
|
||||
spotify.go markdown → Spotify-safe HTML
|
||||
spotify_test.go
|
||||
clipboard.go wl-copy / xclip / pbcopy
|
||||
Makefile build/install/link/doctor/uninstall targets
|
||||
scripts/install.sh interactive setup (OS + GPU detect → deps,
|
||||
whisper.cpp build, model download, link)
|
||||
```
|
||||
|
||||
**Zero external Go dependencies.** Stdlib only.
|
||||
|
||||
## CLI surface
|
||||
|
||||
```
|
||||
publish [mode...] [flags] <input>
|
||||
|
||||
modes (combine freely; defaults to --summerize):
|
||||
--summerize write a markdown summary
|
||||
--clip cut a 60-90s social hook clip
|
||||
--post post to Spotify (not implemented yet)
|
||||
```
|
||||
|
||||
Modes share whisper output, so `publish --summerize --clip sermon.mp4` only
|
||||
transcribes once.
|
||||
|
||||
### Shared flags
|
||||
|
||||
| flag | purpose | default |
|
||||
|---|---|---|
|
||||
| `--summarizer` | `claude-cli` or `claude-api` | `claude-cli` |
|
||||
| `--model` | model name (Anthropic API path defaults to `claude-sonnet-4-6`) | empty |
|
||||
| `--prompt-summary` | override summary prompt path | bundled |
|
||||
| `--prompt-clip` | override clip-selector prompt path | bundled |
|
||||
| `--whisper-bin` | whisper.cpp binary; auto-detects best backend (see "Backend auto-detect" below) | auto |
|
||||
| `--whisper-model` | path to ggml model | `~/.cache/whisper.cpp/ggml-base.en.bin` |
|
||||
| `--whisper-lang` | force language code | auto-detect |
|
||||
| `--whisper-threads` | thread count | library default |
|
||||
| `--segments` | segments JSON cache path | `<input>.segments.json` |
|
||||
| `--keep-transcript` | also write `<input>.transcript.txt` | off |
|
||||
| `--keep-wav` | keep the normalized WAV instead of tempdir | off |
|
||||
| `-v` | verbose progress to stderr | off |
|
||||
|
||||
### --summerize flags
|
||||
|
||||
| flag | purpose | default |
|
||||
|---|---|---|
|
||||
| `--prompt` | producer's notes (any pre-written framing, title, key points) that anchor the summary | empty |
|
||||
| `--md PATH` | markdown output; `-` = stdout, `""` = disable | `<input>.summary.md` |
|
||||
| `--spotify PATH` | Spotify HTML output; `-` = stdout | disabled |
|
||||
| `--copy` | copy Spotify HTML to clipboard | off |
|
||||
|
||||
When `--prompt` is set, the value is prepended to the user message as a "Producer's notes" block above the transcript. The bundled prompt instructs the LLM to treat producer's notes as authoritative for titles, speaker names, framing, and key points, then use the transcript to expand and enrich them. Use this when the Spotify show notes you've already drafted should drive the summary's framing rather than the LLM inferring everything from scratch.
|
||||
|
||||
For longer notes, use shell expansion: `--prompt "$(cat notes.md)"`.
|
||||
|
||||
Note: `--prompt-summary` (system prompt template path) and `--prompt` (user notes content) are different flags. The former overrides the *system* prompt; the latter feeds *user content* into it.
|
||||
|
||||
### --clip flags
|
||||
|
||||
| flag | purpose | default |
|
||||
|---|---|---|
|
||||
| `--min` | minimum clip length (seconds) | 60 |
|
||||
| `--max` | maximum clip length (seconds) | 90 |
|
||||
| `--out PATH` | clip output path | `<input>.clip<ext>` (`.clip.m4a` for audio) |
|
||||
| `--copy-codec` | ffmpeg `-c copy` (fast, keyframe-aligned) — **skips the 9:16 portrait crop**, since stream copy can't apply video filters | off |
|
||||
| `--dry-run` | print the picked window but don't run ffmpeg | off |
|
||||
|
||||
Video clips are always re-encoded as **1080×1920 portrait (9:16)** with a center
|
||||
crop, capped at **1 GiB** via ffmpeg's `-fs`. The crop filter is
|
||||
`crop=min(iw,ih*9/16):min(ih,iw*16/9)` so any source aspect (16:9, 4:3, 1:1, or
|
||||
already-portrait) yields the largest 9:16 sub-rectangle without distortion. See
|
||||
`portraitFilter` and `MaxClipBytes` in `internal/clip/clip.go`.
|
||||
|
||||
## Conventions / non-obvious choices
|
||||
|
||||
- **Spelling: `summerize` is intentional.** It's the original name of the project
|
||||
and the user's preferred spelling. Use `summerize` (e.g. for `--summerize`)
|
||||
rather than auto-correcting to `summarize` in user-facing surfaces. Internal
|
||||
Go package `internal/summarize` keeps the standard spelling.
|
||||
- **Pluggable Summarizer is shared between modes.** `--clip` reuses the same
|
||||
Summarizer interface; the only difference is the prompt and the expectation
|
||||
of JSON output. If you add a new mode that talks to an LLM, plug it in there.
|
||||
- **Summarizer.Summarize takes the user content verbatim.** No implicit
|
||||
`"Transcript:"` prefix or other framing. Callers (`doSummerize` in main.go,
|
||||
`clip.Pick`) build the full user message themselves — that's how
|
||||
`--prompt` (producer's notes) prepends a "Producer's notes:" block above
|
||||
the transcript without the message getting mislabeled.
|
||||
- **Whisper output is the source of truth.** All text-only consumers go through
|
||||
`transcribe.PlainText(segs)`; we don't run whisper twice.
|
||||
- **JSON parsing for clip selection is defensive.** `clip.extractJSONObject`
|
||||
walks balanced braces (skipping strings) so the model can wrap its answer in
|
||||
prose despite the prompt asking for raw JSON.
|
||||
- **Clip extraction defaults to re-encode.** Frame-accurate cuts matter for
|
||||
short social hooks; `--copy-codec` trades that for speed.
|
||||
- **Anthropic API call uses net/http directly.** Adding the SDK was tempting,
|
||||
but the request is one POST and avoiding the dep keeps go.sum empty.
|
||||
- **`prepareWAV` cleanup is owned by the caller.** It returns a `func()` you
|
||||
must `defer`. Don't call `os.RemoveAll` on the wav path yourself.
|
||||
- **No subcommands.** The CLI is one flat flagset. Modes are boolean flags so
|
||||
multiple can run in one invocation and share state.
|
||||
|
||||
## Build / install
|
||||
|
||||
**Fresh machine (recommended)** — clone, then run the interactive installer.
|
||||
It detects OS + GPU, builds whisper.cpp with the right backend, downloads a
|
||||
ggml model, and links `publish` + `whisper-cli-<backend>` into `~/.local/bin`:
|
||||
|
||||
```bash
|
||||
git clone <repo-url> ~/Git\ Repos/summerize
|
||||
cd ~/Git\ Repos/summerize
|
||||
make install # interactive
|
||||
make doctor # just print detected platform/GPU/dependencies
|
||||
```
|
||||
|
||||
Re-runnable; each step is idempotent and skippable. The script supports Arch
|
||||
(`pacman`), Debian/Ubuntu (`apt`), Fedora (`dnf`), and macOS (`brew`); for
|
||||
unknown distros it prints the package list and skips the install command.
|
||||
|
||||
**Already built once** — just rebuild:
|
||||
|
||||
```bash
|
||||
go build -o publish .
|
||||
# or
|
||||
make link # rebuilds + (re)points ~/.local/bin/publish at the repo
|
||||
```
|
||||
|
||||
The symlink at `~/.local/bin/publish` is the canonical install location;
|
||||
rebuilds update in place via the symlink.
|
||||
|
||||
## External dependencies (runtime)
|
||||
|
||||
| tool | required for | install |
|
||||
|---|---|---|
|
||||
| `ffmpeg` | always (audio extraction + clip cut) | `pacman -S ffmpeg` |
|
||||
| `whisper-cli` (whisper.cpp) | transcription | `pacman -S whisper.cpp` for CPU; for GPU acceleration see "GPU builds" below |
|
||||
| ggml whisper model | transcription | `curl -LO https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin` into `~/.cache/whisper.cpp/` |
|
||||
| `claude` CLI | `--summarizer claude-cli` (default) | already installed (Claude Code) |
|
||||
| `ANTHROPIC_API_KEY` | `--summarizer claude-api` | env var |
|
||||
| `wl-copy` / `xclip` / `pbcopy` | `--copy` flag (Spotify HTML to clipboard) | wl-copy ships with wayland on omarchy |
|
||||
|
||||
## Backend auto-detect
|
||||
|
||||
When `--whisper-bin` is not set, `resolveBin` in
|
||||
`internal/transcribe/whispercpp.go` picks a backend at runtime:
|
||||
|
||||
1. **CUDA** — if `~/.local/bin/whisper-cli-cuda` (or `whisper-cli-cuda` on PATH) exists *and* `nvidia-smi -L` exits 0.
|
||||
2. **ROCm** — if `whisper-cli-rocm` exists *and* `rocminfo` exits 0.
|
||||
3. **Vulkan** — if `whisper-cli-vulkan` exists *and* `vulkaninfo --summary` exits 0.
|
||||
4. **CPU fallback** — first of `whisper-cli` / `whisper-cpp` / `main` on PATH.
|
||||
|
||||
Each probe is gated on a 5s timeout. The chosen backend is logged on a
|
||||
single stderr line (`whisper: using CUDA backend (/path)`); `-v` adds
|
||||
diagnostics about which probes were skipped or failed. The convention is
|
||||
one whisper.cpp checkout per host with a per-backend symlink in
|
||||
`~/.local/bin/whisper-cli-<backend>`, so the same `publish` binary works
|
||||
across machines without machine-specific flags.
|
||||
|
||||
### CUDA build (RTX 3070 Ti / desktop)
|
||||
|
||||
```
|
||||
sudo pacman -S --needed cuda # ~3GB; installs to /opt/cuda
|
||||
git clone --depth=1 https://github.com/ggerganov/whisper.cpp ~/Git\ Repos/whisper.cpp
|
||||
cd ~/Git\ Repos/whisper.cpp
|
||||
PATH=/opt/cuda/bin:$PATH cmake -B build \
|
||||
-DGGML_CUDA=1 \
|
||||
-DCMAKE_CUDA_ARCHITECTURES=86 \
|
||||
-DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-15
|
||||
PATH=/opt/cuda/bin:$PATH cmake --build build -j$(nproc) --config Release
|
||||
ln -sf "$PWD/build/bin/whisper-cli" ~/.local/bin/whisper-cli-cuda
|
||||
```
|
||||
|
||||
CUDA 13.2 caps the host compiler at GCC 15; system gcc is 16, so the
|
||||
`-DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-15` line is required (the `gcc15`
|
||||
package ships `g++-15` alongside the default toolchain). `sm_86` matches
|
||||
the RTX 3070 Ti compute capability — adjust if the GPU changes.
|
||||
|
||||
CUDA smoke test — these stderr lines should appear in any run:
|
||||
|
||||
```
|
||||
whisper_init_with_params_no_state: use gpu = 1
|
||||
ggml_cuda_init: found 1 CUDA devices ...
|
||||
whisper_backend_init_gpu: using CUDA0 backend
|
||||
```
|
||||
|
||||
### ROCm build (Framework 16 / Radeon RX 7700S)
|
||||
|
||||
The 7700S is RDNA3 (gfx1102). ROCm 6.x supports it.
|
||||
|
||||
```
|
||||
sudo pacman -S --needed rocm-hip-sdk rocm-hip-runtime hipblas rocblas
|
||||
git clone --depth=1 https://github.com/ggerganov/whisper.cpp ~/Git\ Repos/whisper.cpp
|
||||
cd ~/Git\ Repos/whisper.cpp
|
||||
HIPCXX=/opt/rocm/llvm/bin/clang++ cmake -B build-rocm \
|
||||
-DGGML_HIP=1 \
|
||||
-DAMDGPU_TARGETS=gfx1102 \
|
||||
-DCMAKE_BUILD_TYPE=Release
|
||||
cmake --build build-rocm -j$(nproc)
|
||||
ln -sf "$PWD/build-rocm/bin/whisper-cli" ~/.local/bin/whisper-cli-rocm
|
||||
```
|
||||
|
||||
If ROCm doesn't recognize gfx1102 (older ROCm releases), set
|
||||
`HSA_OVERRIDE_GFX_VERSION=11.0.0` in the shell before invoking `publish`
|
||||
to spoof gfx1100 — same RDNA3 ISA, supported kernels.
|
||||
|
||||
ROCm smoke test — look for `ggml_cuda_init` (HIP reuses the CUDA backend
|
||||
naming in whisper.cpp) plus a ROCm device line on stderr.
|
||||
|
||||
### Vulkan build (universal GPU fallback)
|
||||
|
||||
Vulkan is the easiest cross-vendor path; uses any GPU with a working
|
||||
Vulkan driver (Mesa RADV for AMD/Intel, Nvidia proprietary, etc.).
|
||||
|
||||
```
|
||||
sudo pacman -S --needed vulkan-headers vulkan-icd-loader shaderc
|
||||
cd ~/Git\ Repos/whisper.cpp
|
||||
cmake -B build-vulkan -DGGML_VULKAN=1 -DCMAKE_BUILD_TYPE=Release
|
||||
cmake --build build-vulkan -j$(nproc)
|
||||
ln -sf "$PWD/build-vulkan/bin/whisper-cli" ~/.local/bin/whisper-cli-vulkan
|
||||
```
|
||||
|
||||
Slower than native CUDA/ROCm but works on machines where the vendor
|
||||
toolchain is too painful to install. Useful as a portable fallback for
|
||||
laptops with iGPUs.
|
||||
|
||||
### Metal build (Apple Silicon)
|
||||
|
||||
`make install` handles this automatically; the manual recipe is short
|
||||
because cmake on macOS picks up Metal by default — no special flag.
|
||||
|
||||
Prerequisites:
|
||||
|
||||
```
|
||||
xcode-select --install # one-time
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
|
||||
brew install go cmake ffmpeg
|
||||
```
|
||||
|
||||
Build:
|
||||
|
||||
```
|
||||
git clone --depth=1 https://github.com/ggerganov/whisper.cpp ~/Git\ Repos/whisper.cpp
|
||||
cd ~/Git\ Repos/whisper.cpp
|
||||
cmake -B build-metal -DCMAKE_BUILD_TYPE=Release
|
||||
cmake --build build-metal -j$(sysctl -n hw.ncpu)
|
||||
ln -sf "$PWD/build-metal/bin/whisper-cli" ~/.local/bin/whisper-cli-metal
|
||||
```
|
||||
|
||||
The resolver special-cases Darwin: if `whisper-cli-metal` exists it's
|
||||
used immediately (no probe — Metal is always available on macOS). On a
|
||||
Mac without that symlink, the CPU fallback finds `whisper-cli` /
|
||||
`whisper-cpp` from brew (which is itself Metal-enabled by default), so a
|
||||
plain `brew install whisper-cpp` is a workable lazy path. It just shows
|
||||
"CPU backend" in the publish log line even though whisper.cpp is in fact
|
||||
running Metal kernels.
|
||||
|
||||
Metal smoke test — these stderr lines should appear in any run:
|
||||
|
||||
```
|
||||
ggml_metal_init: allocating
|
||||
ggml_metal_init: found device: Apple M1 ...
|
||||
whisper_backend_init_gpu: using Metal backend
|
||||
```
|
||||
|
||||
## Tests
|
||||
|
||||
```
|
||||
go test ./...
|
||||
```
|
||||
|
||||
Covered:
|
||||
- `internal/output/spotify_test.go` — markdown→Spotify-HTML conversion, escaping
|
||||
- `internal/clip/clip_test.go` — JSON object extraction, including prose-wrapped
|
||||
and fence-wrapped model output
|
||||
|
||||
There are no integration tests for whisper or the LLM calls — those depend on
|
||||
external binaries and remote APIs.
|
||||
|
||||
## Future work
|
||||
|
||||
- **`--post`**: post the markdown summary as a Spotify for Podcasters episode
|
||||
description. Requires the Spotify show-notes API or Spotify for Podcasters
|
||||
upload integration. Reuse `output.MarkdownToSpotifyHTML` since their show-
|
||||
notes editor accepts that subset.
|
||||
- **Multi-clip output**: pick the top-N hooks instead of one. The current
|
||||
`Selection` would become `[]Selection` and the prompt would request an array.
|
||||
- **Faster `--summarizer` for short transcripts**: default to Haiku for very
|
||||
short inputs to save on API costs.
|
||||
52
Makefile
Normal file
52
Makefile
Normal file
@@ -0,0 +1,52 @@
|
||||
# publish — Makefile
|
||||
#
|
||||
# Common targets:
|
||||
# make - build the publish binary in the repo
|
||||
# make install - interactive setup: detect OS/GPU, build whisper.cpp
|
||||
# with the right backend, download a model, and link
|
||||
# publish + whisper-cli-<backend> into ~/.local/bin
|
||||
# make doctor - print detected platform/GPU/dependencies and exit
|
||||
# make link - just link the existing publish binary into PREFIX/bin
|
||||
# make uninstall - remove the publish symlink (leaves whisper.cpp alone)
|
||||
# make clean - remove the local publish binary
|
||||
|
||||
PREFIX ?= $(HOME)/.local
|
||||
BINDIR := $(PREFIX)/bin
|
||||
|
||||
.PHONY: all build link install doctor uninstall clean test help
|
||||
|
||||
all: build
|
||||
|
||||
build:
|
||||
go build -o publish .
|
||||
|
||||
link: build
|
||||
@mkdir -p "$(BINDIR)"
|
||||
@ln -sf "$(CURDIR)/publish" "$(BINDIR)/publish"
|
||||
@echo "linked $(BINDIR)/publish -> $(CURDIR)/publish"
|
||||
|
||||
install:
|
||||
@bash scripts/install.sh
|
||||
|
||||
doctor:
|
||||
@bash scripts/install.sh --doctor
|
||||
|
||||
uninstall:
|
||||
@rm -f "$(BINDIR)/publish"
|
||||
@echo "removed $(BINDIR)/publish (whisper.cpp checkout and whisper-cli-* symlinks left intact)"
|
||||
|
||||
clean:
|
||||
rm -f publish
|
||||
|
||||
test:
|
||||
go test ./...
|
||||
|
||||
help:
|
||||
@echo "Targets:"
|
||||
@echo " make build build ./publish"
|
||||
@echo " make link symlink ./publish into \$$PREFIX/bin (default ~/.local)"
|
||||
@echo " make install interactive end-to-end setup (deps + whisper + model + publish)"
|
||||
@echo " make doctor show detected platform/GPU/dependencies"
|
||||
@echo " make uninstall remove the publish symlink"
|
||||
@echo " make clean remove the built publish binary"
|
||||
@echo " make test go test ./..."
|
||||
42
internal/audio/audio.go
Normal file
42
internal/audio/audio.go
Normal file
@@ -0,0 +1,42 @@
|
||||
// Package audio normalizes arbitrary audio/video inputs into a whisper.cpp-friendly
|
||||
// 16 kHz mono PCM WAV file using ffmpeg.
|
||||
package audio
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"os"
|
||||
"os/exec"
|
||||
"path/filepath"
|
||||
)
|
||||
|
||||
// ExtractWAV runs ffmpeg to convert input (audio or video) into a 16kHz mono
|
||||
// signed-16-bit PCM WAV file at outPath. ffmpeg must be on PATH.
|
||||
func ExtractWAV(ctx context.Context, input, outPath string) error {
|
||||
if _, err := exec.LookPath("ffmpeg"); err != nil {
|
||||
return fmt.Errorf("ffmpeg not found on PATH: %w", err)
|
||||
}
|
||||
if _, err := os.Stat(input); err != nil {
|
||||
return fmt.Errorf("input not readable: %w", err)
|
||||
}
|
||||
if err := os.MkdirAll(filepath.Dir(outPath), 0o755); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
cmd := exec.CommandContext(ctx, "ffmpeg",
|
||||
"-y",
|
||||
"-loglevel", "error",
|
||||
"-i", input,
|
||||
"-vn",
|
||||
"-ac", "1",
|
||||
"-ar", "16000",
|
||||
"-c:a", "pcm_s16le",
|
||||
outPath,
|
||||
)
|
||||
cmd.Stdout = os.Stderr
|
||||
cmd.Stderr = os.Stderr
|
||||
if err := cmd.Run(); err != nil {
|
||||
return fmt.Errorf("ffmpeg: %w", err)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
204
internal/clip/clip.go
Normal file
204
internal/clip/clip.go
Normal file
@@ -0,0 +1,204 @@
|
||||
// Package clip selects the best 60–90s window from a timestamped transcript
|
||||
// (using a Summarizer to do the picking) and runs ffmpeg to cut that window
|
||||
// out of the original media.
|
||||
package clip
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"os"
|
||||
"os/exec"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
|
||||
"publish/internal/summarize"
|
||||
"publish/internal/transcribe"
|
||||
)
|
||||
|
||||
// Selection is the LLM's chosen clip window plus metadata.
|
||||
type Selection struct {
|
||||
StartSeconds float64 `json:"start_seconds"`
|
||||
EndSeconds float64 `json:"end_seconds"`
|
||||
Title string `json:"title"`
|
||||
Hook string `json:"hook"`
|
||||
Quote string `json:"quote"`
|
||||
Reasoning string `json:"reasoning"`
|
||||
}
|
||||
|
||||
// Duration returns the selected window length in seconds.
|
||||
func (s Selection) Duration() float64 { return s.EndSeconds - s.StartSeconds }
|
||||
|
||||
// Pick asks the summarizer to choose the best window in the given segments,
|
||||
// using promptTemplate (which may contain {{MIN_SECONDS}} / {{MAX_SECONDS}}
|
||||
// placeholders). It clamps and validates the returned window against minSec
|
||||
// and maxSec.
|
||||
func Pick(ctx context.Context, sum summarize.Summarizer, promptTemplate string, segs []transcribe.Segment, minSec, maxSec float64) (Selection, string, error) {
|
||||
if len(segs) == 0 {
|
||||
return Selection{}, "", fmt.Errorf("no transcript segments to choose from")
|
||||
}
|
||||
prompt := strings.NewReplacer(
|
||||
"{{MIN_SECONDS}}", fmt.Sprintf("%g", minSec),
|
||||
"{{MAX_SECONDS}}", fmt.Sprintf("%g", maxSec),
|
||||
).Replace(promptTemplate)
|
||||
|
||||
body := transcribe.FormatForLLM(segs)
|
||||
|
||||
raw, err := sum.Summarize(ctx, prompt, body)
|
||||
if err != nil {
|
||||
return Selection{}, "", err
|
||||
}
|
||||
|
||||
jsonText, err := extractJSONObject(raw)
|
||||
if err != nil {
|
||||
return Selection{}, raw, fmt.Errorf("could not find JSON object in model output: %w", err)
|
||||
}
|
||||
var sel Selection
|
||||
if err := json.Unmarshal([]byte(jsonText), &sel); err != nil {
|
||||
return Selection{}, raw, fmt.Errorf("parsing selection JSON: %w\n--- raw ---\n%s", err, jsonText)
|
||||
}
|
||||
|
||||
if err := validate(&sel, segs, minSec, maxSec); err != nil {
|
||||
return sel, raw, err
|
||||
}
|
||||
return sel, raw, nil
|
||||
}
|
||||
|
||||
func validate(sel *Selection, segs []transcribe.Segment, minSec, maxSec float64) error {
|
||||
if sel.EndSeconds <= sel.StartSeconds {
|
||||
return fmt.Errorf("invalid window: end (%g) <= start (%g)", sel.EndSeconds, sel.StartSeconds)
|
||||
}
|
||||
maxEnd := segs[len(segs)-1].End
|
||||
if sel.StartSeconds < 0 || sel.EndSeconds > maxEnd+1.0 {
|
||||
return fmt.Errorf("window [%g, %g] is outside transcript bounds [0, %g]",
|
||||
sel.StartSeconds, sel.EndSeconds, maxEnd)
|
||||
}
|
||||
dur := sel.Duration()
|
||||
// Allow small slop on either side; otherwise reject.
|
||||
if dur < minSec-2 || dur > maxSec+2 {
|
||||
return fmt.Errorf("window duration %.1fs is outside requested bounds [%g, %g]",
|
||||
dur, minSec, maxSec)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// extractJSONObject pulls the first balanced {...} object out of s, ignoring
|
||||
// braces that appear inside JSON strings. Useful when the model wraps its
|
||||
// answer in prose despite being told not to.
|
||||
func extractJSONObject(s string) (string, error) {
|
||||
start := strings.Index(s, "{")
|
||||
if start < 0 {
|
||||
return "", fmt.Errorf("no '{' in response")
|
||||
}
|
||||
depth := 0
|
||||
inStr := false
|
||||
esc := false
|
||||
for i := start; i < len(s); i++ {
|
||||
c := s[i]
|
||||
if inStr {
|
||||
switch {
|
||||
case esc:
|
||||
esc = false
|
||||
case c == '\\':
|
||||
esc = true
|
||||
case c == '"':
|
||||
inStr = false
|
||||
}
|
||||
continue
|
||||
}
|
||||
switch c {
|
||||
case '"':
|
||||
inStr = true
|
||||
case '{':
|
||||
depth++
|
||||
case '}':
|
||||
depth--
|
||||
if depth == 0 {
|
||||
return s[start : i+1], nil
|
||||
}
|
||||
}
|
||||
}
|
||||
return "", fmt.Errorf("unbalanced braces")
|
||||
}
|
||||
|
||||
// portraitFilter center-crops any source aspect ratio to a 9:16 sub-rectangle
|
||||
// (no distortion, just cropping) and scales to 1080x1920. The min() expressions
|
||||
// pick the largest 9:16 box that fits inside the source: 16:9 sources lose the
|
||||
// left/right edges, 9:16 sources are unchanged, and 4:3 / 1:1 sources crop the
|
||||
// sides. setsar=1 forces square pixels.
|
||||
const portraitFilter = `crop=min(iw\,ih*9/16):min(ih\,iw*16/9),scale=1080:1920,setsar=1`
|
||||
|
||||
// MaxClipBytes is the hard size ceiling enforced by ffmpeg's -fs flag.
|
||||
// Realistic 60–90s 1080x1920 H.264 clips at CRF 23 land 30–100 MB, so this is
|
||||
// a safety cap rather than a target.
|
||||
const MaxClipBytes = 1 << 30 // 1 GiB
|
||||
|
||||
// Extract runs ffmpeg to cut [start, end) seconds out of input into outPath.
|
||||
// For video inputs, the clip is re-encoded as a 1080x1920 portrait (9:16
|
||||
// center-crop) under a 1 GiB size cap. If reencode is false, stream copy is
|
||||
// used (fast, keyframe-aligned, but the source aspect ratio is preserved).
|
||||
func Extract(ctx context.Context, input string, sel Selection, outPath string, reencode bool) error {
|
||||
if _, err := exec.LookPath("ffmpeg"); err != nil {
|
||||
return fmt.Errorf("ffmpeg not on PATH: %w", err)
|
||||
}
|
||||
if err := os.MkdirAll(filepath.Dir(outPath), 0o755); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
dur := sel.EndSeconds - sel.StartSeconds
|
||||
args := []string{
|
||||
"-y",
|
||||
"-loglevel", "error",
|
||||
"-ss", fmt.Sprintf("%.3f", sel.StartSeconds),
|
||||
"-i", input,
|
||||
"-t", fmt.Sprintf("%.3f", dur),
|
||||
}
|
||||
if reencode {
|
||||
if hasVideoExt(input) {
|
||||
args = append(args,
|
||||
"-vf", portraitFilter,
|
||||
"-c:v", "libx264",
|
||||
"-preset", "fast",
|
||||
"-crf", "23",
|
||||
"-c:a", "aac",
|
||||
"-b:a", "128k",
|
||||
"-movflags", "+faststart",
|
||||
)
|
||||
} else {
|
||||
args = append(args,
|
||||
"-vn",
|
||||
"-c:a", "aac",
|
||||
"-b:a", "128k",
|
||||
)
|
||||
}
|
||||
} else {
|
||||
args = append(args, "-c", "copy")
|
||||
}
|
||||
args = append(args, "-fs", fmt.Sprintf("%d", MaxClipBytes), outPath)
|
||||
|
||||
cmd := exec.CommandContext(ctx, "ffmpeg", args...)
|
||||
cmd.Stdout = os.Stderr
|
||||
cmd.Stderr = os.Stderr
|
||||
if err := cmd.Run(); err != nil {
|
||||
return fmt.Errorf("ffmpeg cut: %w", err)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func hasVideoExt(p string) bool {
|
||||
switch strings.ToLower(filepath.Ext(p)) {
|
||||
case ".mp4", ".mov", ".mkv", ".webm", ".avi", ".m4v", ".flv", ".ts":
|
||||
return true
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// DefaultOutputPath builds <input-without-ext>.clip<ext> for video inputs and
|
||||
// .m4a for audio inputs.
|
||||
func DefaultOutputPath(input string) string {
|
||||
base := strings.TrimSuffix(input, filepath.Ext(input))
|
||||
if hasVideoExt(input) {
|
||||
return base + ".clip" + filepath.Ext(input)
|
||||
}
|
||||
return base + ".clip.m4a"
|
||||
}
|
||||
37
internal/clip/clip_test.go
Normal file
37
internal/clip/clip_test.go
Normal file
@@ -0,0 +1,37 @@
|
||||
package clip
|
||||
|
||||
import "testing"
|
||||
|
||||
func TestExtractJSONObject(t *testing.T) {
|
||||
cases := []struct {
|
||||
name string
|
||||
in string
|
||||
want string
|
||||
}{
|
||||
{"raw json", `{"a":1}`, `{"a":1}`},
|
||||
{"with prose", "Sure, here you go:\n{\"a\":1}\nThanks", `{"a":1}`},
|
||||
{"with fence", "```json\n{\"a\":1}\n```", `{"a":1}`},
|
||||
{"nested", `prelude {"a":{"b":2},"c":3} trailing`, `{"a":{"b":2},"c":3}`},
|
||||
{"brace in string", `{"text":"hello {world}"}`, `{"text":"hello {world}"}`},
|
||||
}
|
||||
for _, c := range cases {
|
||||
t.Run(c.name, func(t *testing.T) {
|
||||
got, err := extractJSONObject(c.in)
|
||||
if err != nil {
|
||||
t.Fatalf("err: %v", err)
|
||||
}
|
||||
if got != c.want {
|
||||
t.Errorf("got %q want %q", got, c.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestExtractJSONObjectMissing(t *testing.T) {
|
||||
if _, err := extractJSONObject("no json here"); err == nil {
|
||||
t.Error("expected error for missing JSON")
|
||||
}
|
||||
if _, err := extractJSONObject(`{"unterminated":`); err == nil {
|
||||
t.Error("expected error for unbalanced braces")
|
||||
}
|
||||
}
|
||||
30
internal/output/clipboard.go
Normal file
30
internal/output/clipboard.go
Normal file
@@ -0,0 +1,30 @@
|
||||
package output
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os/exec"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// CopyToClipboard tries platform-appropriate clipboard tools and writes data
|
||||
// to the first one available: wl-copy (Wayland), xclip (X11), pbcopy (macOS).
|
||||
// Returns the tool name used or an error if none are available.
|
||||
func CopyToClipboard(data string) (string, error) {
|
||||
candidates := [][]string{
|
||||
{"wl-copy"},
|
||||
{"xclip", "-selection", "clipboard"},
|
||||
{"pbcopy"},
|
||||
}
|
||||
for _, c := range candidates {
|
||||
if _, err := exec.LookPath(c[0]); err != nil {
|
||||
continue
|
||||
}
|
||||
cmd := exec.Command(c[0], c[1:]...)
|
||||
cmd.Stdin = strings.NewReader(data)
|
||||
if err := cmd.Run(); err != nil {
|
||||
return "", fmt.Errorf("%s: %w", c[0], err)
|
||||
}
|
||||
return c[0], nil
|
||||
}
|
||||
return "", fmt.Errorf("no clipboard tool found (tried wl-copy, xclip, pbcopy)")
|
||||
}
|
||||
154
internal/output/spotify.go
Normal file
154
internal/output/spotify.go
Normal file
@@ -0,0 +1,154 @@
|
||||
// Package output renders summaries to user-visible formats. Markdown is
|
||||
// passed through; Spotify HTML uses the small tag subset that Spotify for
|
||||
// Podcasters' show-notes editor accepts (b, i, a, ul/ol/li, paragraphs).
|
||||
package output
|
||||
|
||||
import (
|
||||
"regexp"
|
||||
"strings"
|
||||
)
|
||||
|
||||
var (
|
||||
reBoldStar = regexp.MustCompile(`\*\*([^*\n]+)\*\*`)
|
||||
reBoldUnder = regexp.MustCompile(`__([^_\n]+)__`)
|
||||
reItalicStar = regexp.MustCompile(`\*([^*\n]+)\*`)
|
||||
reItalicUnder = regexp.MustCompile(`(^|[\s(])_([^_\n]+)_($|[\s).,!?;:])`)
|
||||
reLink = regexp.MustCompile(`\[([^\]]+)\]\(([^)\s]+)\)`)
|
||||
reInlineCode = regexp.MustCompile("`([^`\n]+)`")
|
||||
)
|
||||
|
||||
// MarkdownToSpotifyHTML converts a markdown summary into the limited HTML
|
||||
// subset Spotify for Podcasters renders. Unknown markdown structures degrade
|
||||
// to plain text rather than producing rejected tags.
|
||||
func MarkdownToSpotifyHTML(md string) string {
|
||||
lines := strings.Split(strings.ReplaceAll(md, "\r\n", "\n"), "\n")
|
||||
|
||||
var out strings.Builder
|
||||
listKind := "" // "ul" or "ol" while we're inside a list
|
||||
flushList := func() {
|
||||
if listKind != "" {
|
||||
out.WriteString("</" + listKind + ">\n")
|
||||
listKind = ""
|
||||
}
|
||||
}
|
||||
openList := func(kind string) {
|
||||
if listKind != kind {
|
||||
flushList()
|
||||
out.WriteString("<" + kind + ">\n")
|
||||
listKind = kind
|
||||
}
|
||||
}
|
||||
|
||||
paragraph := []string{}
|
||||
flushPara := func() {
|
||||
if len(paragraph) == 0 {
|
||||
return
|
||||
}
|
||||
text := strings.Join(paragraph, " ")
|
||||
out.WriteString("<p>" + inline(text) + "</p>\n")
|
||||
paragraph = paragraph[:0]
|
||||
}
|
||||
|
||||
for _, raw := range lines {
|
||||
line := strings.TrimRight(raw, " \t")
|
||||
trim := strings.TrimSpace(line)
|
||||
|
||||
// Blank line: end current paragraph/list block.
|
||||
if trim == "" {
|
||||
flushPara()
|
||||
flushList()
|
||||
continue
|
||||
}
|
||||
|
||||
// Horizontal rule.
|
||||
if trim == "---" || trim == "***" || trim == "___" {
|
||||
flushPara()
|
||||
flushList()
|
||||
continue
|
||||
}
|
||||
|
||||
// Heading -> bold paragraph.
|
||||
if h := headingText(trim); h != "" {
|
||||
flushPara()
|
||||
flushList()
|
||||
out.WriteString("<p><b>" + inline(h) + "</b></p>\n")
|
||||
continue
|
||||
}
|
||||
|
||||
// Blockquote -> italic paragraph.
|
||||
if strings.HasPrefix(trim, "> ") {
|
||||
flushPara()
|
||||
flushList()
|
||||
out.WriteString("<p><i>" + inline(strings.TrimPrefix(trim, "> ")) + "</i></p>\n")
|
||||
continue
|
||||
}
|
||||
|
||||
// Unordered list item.
|
||||
if strings.HasPrefix(trim, "- ") || strings.HasPrefix(trim, "* ") || strings.HasPrefix(trim, "+ ") {
|
||||
flushPara()
|
||||
openList("ul")
|
||||
out.WriteString(" <li>" + inline(trim[2:]) + "</li>\n")
|
||||
continue
|
||||
}
|
||||
|
||||
// Ordered list item like "1. text".
|
||||
if item, ok := orderedItem(trim); ok {
|
||||
flushPara()
|
||||
openList("ol")
|
||||
out.WriteString(" <li>" + inline(item) + "</li>\n")
|
||||
continue
|
||||
}
|
||||
|
||||
// Anything else: append to current paragraph.
|
||||
flushList()
|
||||
paragraph = append(paragraph, trim)
|
||||
}
|
||||
|
||||
flushPara()
|
||||
flushList()
|
||||
|
||||
return strings.TrimRight(out.String(), "\n")
|
||||
}
|
||||
|
||||
func headingText(s string) string {
|
||||
// Up to 6 leading '#' followed by a space.
|
||||
hashes := 0
|
||||
for hashes < len(s) && s[hashes] == '#' {
|
||||
hashes++
|
||||
}
|
||||
if hashes == 0 || hashes > 6 || hashes >= len(s) || s[hashes] != ' ' {
|
||||
return ""
|
||||
}
|
||||
return strings.TrimSpace(s[hashes+1:])
|
||||
}
|
||||
|
||||
func orderedItem(s string) (string, bool) {
|
||||
i := 0
|
||||
for i < len(s) && s[i] >= '0' && s[i] <= '9' {
|
||||
i++
|
||||
}
|
||||
if i == 0 || i+1 >= len(s) || s[i] != '.' || s[i+1] != ' ' {
|
||||
return "", false
|
||||
}
|
||||
return strings.TrimSpace(s[i+2:]), true
|
||||
}
|
||||
|
||||
func inline(s string) string {
|
||||
s = escapeHTML(s)
|
||||
s = reInlineCode.ReplaceAllString(s, "$1")
|
||||
s = reBoldStar.ReplaceAllString(s, "<b>$1</b>")
|
||||
s = reBoldUnder.ReplaceAllString(s, "<b>$1</b>")
|
||||
s = reItalicStar.ReplaceAllString(s, "<i>$1</i>")
|
||||
s = reItalicUnder.ReplaceAllString(s, "$1<i>$2</i>$3")
|
||||
s = reLink.ReplaceAllString(s, `<a href="$2">$1</a>`)
|
||||
return s
|
||||
}
|
||||
|
||||
func escapeHTML(s string) string {
|
||||
r := strings.NewReplacer(
|
||||
"&", "&",
|
||||
"<", "<",
|
||||
">", ">",
|
||||
)
|
||||
return r.Replace(s)
|
||||
}
|
||||
67
internal/output/spotify_test.go
Normal file
67
internal/output/spotify_test.go
Normal file
@@ -0,0 +1,67 @@
|
||||
package output
|
||||
|
||||
import (
|
||||
"strings"
|
||||
"testing"
|
||||
)
|
||||
|
||||
func TestMarkdownToSpotifyHTML(t *testing.T) {
|
||||
in := `# Sermon Title
|
||||
|
||||
**Speaker:** Pastor Bob
|
||||
**Scripture:** John 3:16
|
||||
|
||||
## Overview
|
||||
This was a *short* message about hope. See [the site](https://example.com).
|
||||
|
||||
## Key Points
|
||||
- First point
|
||||
- Second point with **bold** text
|
||||
- Third one
|
||||
|
||||
1. Step one
|
||||
2. Step two
|
||||
|
||||
> A pithy quote.
|
||||
`
|
||||
|
||||
got := MarkdownToSpotifyHTML(in)
|
||||
|
||||
mustContain := []string{
|
||||
"<p><b>Sermon Title</b></p>",
|
||||
"<b>Speaker:</b>",
|
||||
"<p><b>Overview</b></p>",
|
||||
"<i>short</i>",
|
||||
`<a href="https://example.com">the site</a>`,
|
||||
"<ul>",
|
||||
"<li>First point</li>",
|
||||
"<li>Second point with <b>bold</b> text</li>",
|
||||
"</ul>",
|
||||
"<ol>",
|
||||
"<li>Step one</li>",
|
||||
"</ol>",
|
||||
"<p><i>A pithy quote.</i></p>",
|
||||
}
|
||||
for _, s := range mustContain {
|
||||
if !strings.Contains(got, s) {
|
||||
t.Errorf("expected output to contain %q\n--- got ---\n%s", s, got)
|
||||
}
|
||||
}
|
||||
|
||||
mustNotContain := []string{"<h1>", "<h2>", "<blockquote>", "**", "##"}
|
||||
for _, s := range mustNotContain {
|
||||
if strings.Contains(got, s) {
|
||||
t.Errorf("did not expect output to contain %q\n--- got ---\n%s", s, got)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestEscapesHTML(t *testing.T) {
|
||||
got := MarkdownToSpotifyHTML("A <script>tag</script> & ampersand")
|
||||
if strings.Contains(got, "<script>") {
|
||||
t.Errorf("unescaped <script>: %s", got)
|
||||
}
|
||||
if !strings.Contains(got, "&") {
|
||||
t.Errorf("expected & in: %s", got)
|
||||
}
|
||||
}
|
||||
123
internal/summarize/anthropic.go
Normal file
123
internal/summarize/anthropic.go
Normal file
@@ -0,0 +1,123 @@
|
||||
package summarize
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
"net/http"
|
||||
"os"
|
||||
"time"
|
||||
)
|
||||
|
||||
// Anthropic talks to the Claude Messages API directly via net/http to avoid an
|
||||
// SDK dependency. Requires ANTHROPIC_API_KEY (or APIKey set explicitly).
|
||||
type Anthropic struct {
|
||||
APIKey string
|
||||
Model string
|
||||
MaxTokens int
|
||||
BaseURL string // optional override; defaults to https://api.anthropic.com
|
||||
Client *http.Client
|
||||
}
|
||||
|
||||
func (a *Anthropic) Name() string { return "anthropic-api" }
|
||||
|
||||
type anthroMessage struct {
|
||||
Role string `json:"role"`
|
||||
Content string `json:"content"`
|
||||
}
|
||||
|
||||
type anthroRequest struct {
|
||||
Model string `json:"model"`
|
||||
MaxTokens int `json:"max_tokens"`
|
||||
System string `json:"system,omitempty"`
|
||||
Messages []anthroMessage `json:"messages"`
|
||||
}
|
||||
|
||||
type anthroContentBlock struct {
|
||||
Type string `json:"type"`
|
||||
Text string `json:"text"`
|
||||
}
|
||||
|
||||
type anthroResponse struct {
|
||||
Content []anthroContentBlock `json:"content"`
|
||||
Error *struct {
|
||||
Type string `json:"type"`
|
||||
Message string `json:"message"`
|
||||
} `json:"error,omitempty"`
|
||||
}
|
||||
|
||||
func (a *Anthropic) Summarize(ctx context.Context, systemPrompt, userContent string) (string, error) {
|
||||
key := a.APIKey
|
||||
if key == "" {
|
||||
key = os.Getenv("ANTHROPIC_API_KEY")
|
||||
}
|
||||
if key == "" {
|
||||
return "", fmt.Errorf("ANTHROPIC_API_KEY is not set")
|
||||
}
|
||||
model := a.Model
|
||||
if model == "" {
|
||||
model = "claude-sonnet-4-6"
|
||||
}
|
||||
maxTokens := a.MaxTokens
|
||||
if maxTokens == 0 {
|
||||
maxTokens = 4096
|
||||
}
|
||||
baseURL := a.BaseURL
|
||||
if baseURL == "" {
|
||||
baseURL = "https://api.anthropic.com"
|
||||
}
|
||||
client := a.Client
|
||||
if client == nil {
|
||||
client = &http.Client{Timeout: 5 * time.Minute}
|
||||
}
|
||||
|
||||
body := anthroRequest{
|
||||
Model: model,
|
||||
MaxTokens: maxTokens,
|
||||
System: systemPrompt,
|
||||
Messages: []anthroMessage{
|
||||
{Role: "user", Content: userContent},
|
||||
},
|
||||
}
|
||||
buf, err := json.Marshal(body)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
req, err := http.NewRequestWithContext(ctx, "POST", baseURL+"/v1/messages", bytes.NewReader(buf))
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
req.Header.Set("Content-Type", "application/json")
|
||||
req.Header.Set("x-api-key", key)
|
||||
req.Header.Set("anthropic-version", "2023-06-01")
|
||||
|
||||
resp, err := client.Do(req)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
respBody, _ := io.ReadAll(resp.Body)
|
||||
|
||||
if resp.StatusCode/100 != 2 {
|
||||
return "", fmt.Errorf("anthropic API %d: %s", resp.StatusCode, string(respBody))
|
||||
}
|
||||
|
||||
var out anthroResponse
|
||||
if err := json.Unmarshal(respBody, &out); err != nil {
|
||||
return "", fmt.Errorf("decoding anthropic response: %w", err)
|
||||
}
|
||||
if out.Error != nil {
|
||||
return "", fmt.Errorf("anthropic error: %s: %s", out.Error.Type, out.Error.Message)
|
||||
}
|
||||
|
||||
var text bytes.Buffer
|
||||
for _, c := range out.Content {
|
||||
if c.Type == "text" {
|
||||
text.WriteString(c.Text)
|
||||
}
|
||||
}
|
||||
return text.String(), nil
|
||||
}
|
||||
49
internal/summarize/claudecli.go
Normal file
49
internal/summarize/claudecli.go
Normal file
@@ -0,0 +1,49 @@
|
||||
package summarize
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"fmt"
|
||||
"os/exec"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// ClaudeCLI shells out to the `claude` CLI in print mode. The transcript is
|
||||
// sent on stdin so we don't bump into ARG_MAX for very long services.
|
||||
type ClaudeCLI struct {
|
||||
// Bin is the binary name; defaults to "claude".
|
||||
Bin string
|
||||
// Model passes through to `claude --model`. Empty leaves the CLI default.
|
||||
Model string
|
||||
// ExtraArgs are appended verbatim before the prompt arg.
|
||||
ExtraArgs []string
|
||||
}
|
||||
|
||||
func (c *ClaudeCLI) Name() string { return "claude-cli" }
|
||||
|
||||
func (c *ClaudeCLI) Summarize(ctx context.Context, systemPrompt, userContent string) (string, error) {
|
||||
bin := c.Bin
|
||||
if bin == "" {
|
||||
bin = "claude"
|
||||
}
|
||||
if _, err := exec.LookPath(bin); err != nil {
|
||||
return "", fmt.Errorf("%q not on PATH: %w", bin, err)
|
||||
}
|
||||
|
||||
args := []string{"-p"}
|
||||
if c.Model != "" {
|
||||
args = append(args, "--model", c.Model)
|
||||
}
|
||||
args = append(args, c.ExtraArgs...)
|
||||
args = append(args, systemPrompt)
|
||||
|
||||
cmd := exec.CommandContext(ctx, bin, args...)
|
||||
cmd.Stdin = strings.NewReader(userContent)
|
||||
var stdout, stderr bytes.Buffer
|
||||
cmd.Stdout = &stdout
|
||||
cmd.Stderr = &stderr
|
||||
if err := cmd.Run(); err != nil {
|
||||
return "", fmt.Errorf("%s: %w (stderr: %s)", bin, err, strings.TrimSpace(stderr.String()))
|
||||
}
|
||||
return strings.TrimSpace(stdout.String()), nil
|
||||
}
|
||||
13
internal/summarize/summarize.go
Normal file
13
internal/summarize/summarize.go
Normal file
@@ -0,0 +1,13 @@
|
||||
// Package summarize turns a transcript + system prompt into a markdown summary.
|
||||
package summarize
|
||||
|
||||
import "context"
|
||||
|
||||
// Summarizer produces a markdown summary (or other generation) guided by
|
||||
// systemPrompt and given the user-message body. The body is passed verbatim:
|
||||
// callers are responsible for any framing like "Transcript:", "Producer's
|
||||
// notes:", or timestamped segment formatting.
|
||||
type Summarizer interface {
|
||||
Summarize(ctx context.Context, systemPrompt, userContent string) (string, error)
|
||||
Name() string
|
||||
}
|
||||
49
internal/transcribe/segments.go
Normal file
49
internal/transcribe/segments.go
Normal file
@@ -0,0 +1,49 @@
|
||||
package transcribe
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// Segment is one timestamped chunk of a transcript.
|
||||
type Segment struct {
|
||||
Start float64 // seconds from start of audio
|
||||
End float64
|
||||
Text string
|
||||
}
|
||||
|
||||
// PlainText joins all segments into a single transcript.
|
||||
func PlainText(segs []Segment) string {
|
||||
var b strings.Builder
|
||||
for _, s := range segs {
|
||||
b.WriteString(strings.TrimSpace(s.Text))
|
||||
b.WriteByte(' ')
|
||||
}
|
||||
return strings.TrimSpace(b.String())
|
||||
}
|
||||
|
||||
// FormatForLLM renders segments as one timestamped line each, suitable for
|
||||
// feeding to a model that needs to pick a time window.
|
||||
//
|
||||
// [mm:ss] [mm:ss] text
|
||||
func FormatForLLM(segs []Segment) string {
|
||||
var b strings.Builder
|
||||
for _, s := range segs {
|
||||
fmt.Fprintf(&b, "[%s] [%s] %s\n", formatTS(s.Start), formatTS(s.End), strings.TrimSpace(s.Text))
|
||||
}
|
||||
return b.String()
|
||||
}
|
||||
|
||||
func formatTS(seconds float64) string {
|
||||
if seconds < 0 {
|
||||
seconds = 0
|
||||
}
|
||||
total := int(seconds)
|
||||
h := total / 3600
|
||||
m := (total % 3600) / 60
|
||||
s := total % 60
|
||||
if h > 0 {
|
||||
return fmt.Sprintf("%02d:%02d:%02d", h, m, s)
|
||||
}
|
||||
return fmt.Sprintf("%02d:%02d", m, s)
|
||||
}
|
||||
10
internal/transcribe/transcribe.go
Normal file
10
internal/transcribe/transcribe.go
Normal file
@@ -0,0 +1,10 @@
|
||||
// Package transcribe converts a normalized WAV file into plain-text transcript.
|
||||
package transcribe
|
||||
|
||||
import "context"
|
||||
|
||||
// Transcriber turns a 16kHz mono WAV at wavPath into a plaintext transcript.
|
||||
type Transcriber interface {
|
||||
Transcribe(ctx context.Context, wavPath string) (string, error)
|
||||
Name() string
|
||||
}
|
||||
213
internal/transcribe/whispercpp.go
Normal file
213
internal/transcribe/whispercpp.go
Normal file
@@ -0,0 +1,213 @@
|
||||
package transcribe
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"os"
|
||||
"os/exec"
|
||||
"path/filepath"
|
||||
"runtime"
|
||||
"strings"
|
||||
"time"
|
||||
)
|
||||
|
||||
// WhisperCPP shells out to a whisper.cpp CLI binary (whisper-cli, whisper-cpp,
|
||||
// or legacy `main`) and reads its `-otxt` output. The binary must produce a
|
||||
// .txt file next to the requested output basename.
|
||||
type WhisperCPP struct {
|
||||
// Bin is the whisper.cpp binary name or absolute path.
|
||||
Bin string
|
||||
// Model is the path to a ggml whisper model (.bin).
|
||||
Model string
|
||||
// Language to force; empty means auto-detect.
|
||||
Language string
|
||||
// Threads to use; 0 lets whisper.cpp pick.
|
||||
Threads int
|
||||
// ExtraArgs are appended to the command verbatim.
|
||||
ExtraArgs []string
|
||||
// Verbose enables per-step diagnostic logging to stderr (which probe ran,
|
||||
// which backend was selected, etc.). The selected backend is always logged
|
||||
// on a single stderr line regardless of this flag.
|
||||
Verbose bool
|
||||
}
|
||||
|
||||
func (w *WhisperCPP) Name() string { return "whisper.cpp" }
|
||||
|
||||
func (w *WhisperCPP) Transcribe(ctx context.Context, wavPath string) (string, error) {
|
||||
segs, err := w.TranscribeSegments(ctx, wavPath)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
return PlainText(segs), nil
|
||||
}
|
||||
|
||||
// TranscribeSegments runs whisper.cpp with JSON output and returns the
|
||||
// per-segment timestamps (in seconds) and text.
|
||||
func (w *WhisperCPP) TranscribeSegments(ctx context.Context, wavPath string) ([]Segment, error) {
|
||||
bin, err := w.resolveBin()
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
if w.Model == "" {
|
||||
return nil, fmt.Errorf("whisper.cpp model path is required (--whisper-model)")
|
||||
}
|
||||
if _, err := os.Stat(w.Model); err != nil {
|
||||
return nil, fmt.Errorf("whisper model not readable at %s: %w", w.Model, err)
|
||||
}
|
||||
|
||||
dir := filepath.Dir(wavPath)
|
||||
base := strings.TrimSuffix(filepath.Base(wavPath), filepath.Ext(wavPath))
|
||||
outBase := filepath.Join(dir, base)
|
||||
jsonPath := outBase + ".json"
|
||||
_ = os.Remove(jsonPath)
|
||||
|
||||
args := []string{
|
||||
"-m", w.Model,
|
||||
"-f", wavPath,
|
||||
"-oj",
|
||||
"-of", outBase,
|
||||
"--no-prints",
|
||||
}
|
||||
if w.Language != "" {
|
||||
args = append(args, "-l", w.Language)
|
||||
}
|
||||
if w.Threads > 0 {
|
||||
args = append(args, "-t", fmt.Sprintf("%d", w.Threads))
|
||||
}
|
||||
args = append(args, w.ExtraArgs...)
|
||||
|
||||
cmd := exec.CommandContext(ctx, bin, args...)
|
||||
cmd.Stdout = os.Stderr
|
||||
cmd.Stderr = os.Stderr
|
||||
if err := cmd.Run(); err != nil {
|
||||
return nil, fmt.Errorf("%s: %w", bin, err)
|
||||
}
|
||||
|
||||
data, err := os.ReadFile(jsonPath)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("reading whisper json %s: %w", jsonPath, err)
|
||||
}
|
||||
return parseWhisperJSON(data)
|
||||
}
|
||||
|
||||
// gpuBackend describes one accelerated whisper.cpp build we may pick at
|
||||
// runtime. The binary is conventionally installed at ~/.local/bin/<bin> (or
|
||||
// anywhere on PATH); the probe is a fast command that exits 0 only when the
|
||||
// matching GPU runtime is actually usable on this machine.
|
||||
type gpuBackend struct {
|
||||
name string
|
||||
bin string
|
||||
probe []string
|
||||
}
|
||||
|
||||
var gpuBackends = []gpuBackend{
|
||||
{"CUDA", "whisper-cli-cuda", []string{"nvidia-smi", "-L"}},
|
||||
{"ROCm", "whisper-cli-rocm", []string{"rocminfo"}},
|
||||
{"Vulkan", "whisper-cli-vulkan", []string{"vulkaninfo", "--summary"}},
|
||||
}
|
||||
|
||||
func (w *WhisperCPP) resolveBin() (string, error) {
|
||||
if w.Bin != "" {
|
||||
if _, err := exec.LookPath(w.Bin); err == nil {
|
||||
return w.Bin, nil
|
||||
}
|
||||
if _, err := os.Stat(w.Bin); err == nil {
|
||||
return w.Bin, nil
|
||||
}
|
||||
return "", fmt.Errorf("whisper.cpp binary %q not found on PATH", w.Bin)
|
||||
}
|
||||
|
||||
// Metal is always usable on macOS — no separate probe needed; if the
|
||||
// binary exists we trust it.
|
||||
if runtime.GOOS == "darwin" {
|
||||
if path := findBinary("whisper-cli-metal"); path != "" {
|
||||
fmt.Fprintf(os.Stderr, "whisper: using Metal backend (%s)\n", path)
|
||||
return path, nil
|
||||
}
|
||||
}
|
||||
|
||||
for _, b := range gpuBackends {
|
||||
path := findBinary(b.bin)
|
||||
if path == "" {
|
||||
if w.Verbose {
|
||||
fmt.Fprintf(os.Stderr, "whisper: no %s binary (%s) installed; skipping\n", b.name, b.bin)
|
||||
}
|
||||
continue
|
||||
}
|
||||
if !probeSucceeds(b.probe) {
|
||||
if w.Verbose {
|
||||
fmt.Fprintf(os.Stderr, "whisper: %s binary present at %s but %s probe failed; trying next\n", b.name, path, b.probe[0])
|
||||
}
|
||||
continue
|
||||
}
|
||||
fmt.Fprintf(os.Stderr, "whisper: using %s backend (%s)\n", b.name, path)
|
||||
return path, nil
|
||||
}
|
||||
|
||||
for _, alt := range []string{"whisper-cli", "whisper-cpp", "main"} {
|
||||
if path, e := exec.LookPath(alt); e == nil {
|
||||
fmt.Fprintf(os.Stderr, "whisper: using CPU backend (%s)\n", path)
|
||||
return path, nil
|
||||
}
|
||||
}
|
||||
return "", fmt.Errorf("no whisper.cpp binary found (tried GPU builds whisper-cli-{cuda,rocm,vulkan} in ~/.local/bin and PATH, then CPU whisper-cli/whisper-cpp/main on PATH); pass --whisper-bin")
|
||||
}
|
||||
|
||||
// findBinary looks for an executable first in ~/.local/bin (the convention
|
||||
// for hand-built backends), then on PATH. Returns "" if neither has it.
|
||||
func findBinary(name string) string {
|
||||
if home, err := os.UserHomeDir(); err == nil {
|
||||
candidate := filepath.Join(home, ".local", "bin", name)
|
||||
if info, err := os.Stat(candidate); err == nil && !info.IsDir() {
|
||||
return candidate
|
||||
}
|
||||
}
|
||||
if path, err := exec.LookPath(name); err == nil {
|
||||
return path
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
// probeSucceeds runs the probe with a short timeout and reports whether it
|
||||
// exited 0. Used to confirm the GPU runtime is actually usable before we
|
||||
// commit to its whisper-cli build.
|
||||
func probeSucceeds(argv []string) bool {
|
||||
if _, err := exec.LookPath(argv[0]); err != nil {
|
||||
return false
|
||||
}
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
defer cancel()
|
||||
cmd := exec.CommandContext(ctx, argv[0], argv[1:]...)
|
||||
return cmd.Run() == nil
|
||||
}
|
||||
|
||||
// whisperJSONFile mirrors the structure whisper.cpp writes with -oj.
|
||||
type whisperJSONFile struct {
|
||||
Transcription []struct {
|
||||
Offsets struct {
|
||||
From int64 `json:"from"`
|
||||
To int64 `json:"to"`
|
||||
} `json:"offsets"`
|
||||
Text string `json:"text"`
|
||||
} `json:"transcription"`
|
||||
}
|
||||
|
||||
func parseWhisperJSON(data []byte) ([]Segment, error) {
|
||||
var f whisperJSONFile
|
||||
if err := json.Unmarshal(data, &f); err != nil {
|
||||
return nil, fmt.Errorf("parsing whisper JSON: %w", err)
|
||||
}
|
||||
if len(f.Transcription) == 0 {
|
||||
return nil, fmt.Errorf("whisper produced no transcription segments")
|
||||
}
|
||||
out := make([]Segment, 0, len(f.Transcription))
|
||||
for _, s := range f.Transcription {
|
||||
out = append(out, Segment{
|
||||
Start: float64(s.Offsets.From) / 1000.0,
|
||||
End: float64(s.Offsets.To) / 1000.0,
|
||||
Text: s.Text,
|
||||
})
|
||||
}
|
||||
return out, nil
|
||||
}
|
||||
400
main.go
Normal file
400
main.go
Normal file
@@ -0,0 +1,400 @@
|
||||
// publish — generate a markdown summary, a 60–90s social hook clip, or both
|
||||
// from a local audio/video file. Each mode is enabled by its own boolean flag.
|
||||
package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
_ "embed"
|
||||
"encoding/json"
|
||||
"flag"
|
||||
"fmt"
|
||||
"os"
|
||||
"os/signal"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"syscall"
|
||||
"time"
|
||||
|
||||
"publish/internal/audio"
|
||||
"publish/internal/clip"
|
||||
"publish/internal/output"
|
||||
"publish/internal/summarize"
|
||||
"publish/internal/transcribe"
|
||||
)
|
||||
|
||||
//go:embed prompts/church-service.md
|
||||
var defaultSummaryPrompt string
|
||||
|
||||
//go:embed prompts/clip-selector.md
|
||||
var defaultClipPrompt string
|
||||
|
||||
func main() {
|
||||
if err := run(os.Args[1:]); err != nil {
|
||||
fmt.Fprintln(os.Stderr, "publish: "+err.Error())
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
|
||||
type config struct {
|
||||
input string
|
||||
|
||||
// mode selection
|
||||
modeSummerize bool
|
||||
modeClip bool
|
||||
modePost bool
|
||||
|
||||
// shared
|
||||
summarizer string
|
||||
model string
|
||||
promptSummary string
|
||||
promptClip string
|
||||
whisperBin string
|
||||
whisperModel string
|
||||
whisperLang string
|
||||
whisperThreads int
|
||||
segmentsCache string
|
||||
keepWAV bool
|
||||
keepTranscript bool
|
||||
verbose bool
|
||||
|
||||
// --summerize inputs/outputs
|
||||
prompt string
|
||||
mdOut string
|
||||
spotifyOut string
|
||||
copyHTML bool
|
||||
|
||||
// --clip outputs
|
||||
minSec float64
|
||||
maxSec float64
|
||||
clipOut string
|
||||
copyCodec bool
|
||||
dryRun bool
|
||||
}
|
||||
|
||||
func run(args []string) error {
|
||||
var cfg config
|
||||
fs := flag.NewFlagSet("publish", flag.ContinueOnError)
|
||||
|
||||
// Mode flags.
|
||||
fs.BoolVar(&cfg.modeSummerize, "summerize", false, "produce a markdown summary (default if no mode is set)")
|
||||
fs.BoolVar(&cfg.modeClip, "clip", false, "pick a 60-90s hook clip and cut it out of the source")
|
||||
fs.BoolVar(&cfg.modePost, "post", false, "post the summary to Spotify (not implemented yet)")
|
||||
|
||||
// Shared flags.
|
||||
fs.StringVar(&cfg.summarizer, "summarizer", "claude-cli", "LLM backend: claude-cli | claude-api")
|
||||
fs.StringVar(&cfg.model, "model", "", "model name (claude-api default: claude-sonnet-4-6)")
|
||||
fs.StringVar(&cfg.promptSummary, "prompt-summary", "", "summary prompt path; empty uses bundled prompts/church-service.md")
|
||||
fs.StringVar(&cfg.promptClip, "prompt-clip", "", "clip-selection prompt path; empty uses bundled prompts/clip-selector.md")
|
||||
fs.StringVar(&cfg.whisperBin, "whisper-bin", "", "whisper.cpp binary (auto-detect if empty)")
|
||||
fs.StringVar(&cfg.whisperModel, "whisper-model", defaultWhisperModel(), "whisper.cpp ggml model path")
|
||||
fs.StringVar(&cfg.whisperLang, "whisper-lang", "", "force whisper language code (empty = auto)")
|
||||
fs.IntVar(&cfg.whisperThreads, "whisper-threads", 0, "whisper.cpp thread count (0 = library default)")
|
||||
fs.StringVar(&cfg.segmentsCache, "segments", "", `path to read/write whisper segments JSON; default: <input>.segments.json`)
|
||||
fs.BoolVar(&cfg.keepWAV, "keep-wav", false, "keep the normalized 16kHz WAV next to the input")
|
||||
fs.BoolVar(&cfg.keepTranscript, "keep-transcript", false, "also write <input>.transcript.txt")
|
||||
fs.BoolVar(&cfg.verbose, "v", false, "verbose progress output")
|
||||
|
||||
// --summerize inputs/outputs.
|
||||
fs.StringVar(&cfg.prompt, "prompt", "", "[--summerize] producer's notes to anchor the summary (titles, framing, key points). For longer notes use shell expansion: --prompt \"$(cat notes.txt)\"")
|
||||
fs.StringVar(&cfg.mdOut, "md", "", `[--summerize] markdown output; "-" for stdout, "" disables; default: <input>.summary.md`)
|
||||
fs.StringVar(&cfg.spotifyOut, "spotify", "", `[--summerize] Spotify HTML output; "-" for stdout (default: disabled)`)
|
||||
fs.BoolVar(&cfg.copyHTML, "copy", false, "[--summerize] copy Spotify HTML to clipboard")
|
||||
|
||||
// --clip outputs.
|
||||
fs.Float64Var(&cfg.minSec, "min", 60, "[--clip] minimum clip length in seconds")
|
||||
fs.Float64Var(&cfg.maxSec, "max", 90, "[--clip] maximum clip length in seconds")
|
||||
fs.StringVar(&cfg.clipOut, "out", "", `[--clip] clip output path; default: <input>.clip<ext> (or .clip.m4a for audio)`)
|
||||
fs.BoolVar(&cfg.copyCodec, "copy-codec", false, "[--clip] use ffmpeg stream copy instead of re-encoding (faster, keyframe-aligned)")
|
||||
fs.BoolVar(&cfg.dryRun, "dry-run", false, "[--clip] pick the clip and print metadata, but skip the ffmpeg cut")
|
||||
|
||||
fs.Usage = func() {
|
||||
fmt.Fprintf(os.Stderr, `usage: publish [mode...] [flags] <input>
|
||||
|
||||
modes (combine freely; defaults to --summerize):
|
||||
--summerize write a markdown summary
|
||||
--clip cut a 60-90s social hook clip
|
||||
--post post to Spotify (not implemented yet)
|
||||
|
||||
flags:
|
||||
`)
|
||||
fs.PrintDefaults()
|
||||
}
|
||||
if err := fs.Parse(args); err != nil {
|
||||
return err
|
||||
}
|
||||
if fs.NArg() != 1 {
|
||||
fs.Usage()
|
||||
return fmt.Errorf("exactly one input file is required")
|
||||
}
|
||||
cfg.input = fs.Arg(0)
|
||||
|
||||
// Default to --summerize if no mode flag was passed.
|
||||
if !cfg.modeSummerize && !cfg.modeClip && !cfg.modePost {
|
||||
cfg.modeSummerize = true
|
||||
}
|
||||
if cfg.modePost {
|
||||
return fmt.Errorf("--post is not implemented yet")
|
||||
}
|
||||
|
||||
// Output path defaults that depend on input.
|
||||
if cfg.mdOut == "" {
|
||||
cfg.mdOut = cfg.input + ".summary.md"
|
||||
}
|
||||
if cfg.mdOut == "-" && cfg.spotifyOut == "-" {
|
||||
return fmt.Errorf("--md and --spotify cannot both be \"-\"")
|
||||
}
|
||||
if cfg.segmentsCache == "" {
|
||||
cfg.segmentsCache = cfg.input + ".segments.json"
|
||||
}
|
||||
if cfg.clipOut == "" {
|
||||
cfg.clipOut = clip.DefaultOutputPath(cfg.input)
|
||||
}
|
||||
if cfg.minSec <= 0 || cfg.maxSec <= 0 || cfg.maxSec < cfg.minSec {
|
||||
return fmt.Errorf("invalid --min/--max bounds")
|
||||
}
|
||||
|
||||
ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
|
||||
defer cancel()
|
||||
|
||||
segs, err := loadOrTranscribeSegments(ctx, cfg)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if cfg.keepTranscript {
|
||||
if err := os.WriteFile(cfg.input+".transcript.txt", []byte(transcribe.PlainText(segs)), 0o644); err != nil {
|
||||
return fmt.Errorf("writing transcript: %w", err)
|
||||
}
|
||||
}
|
||||
|
||||
sum, err := buildSummarizer(cfg.summarizer, cfg.model)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if cfg.modeSummerize {
|
||||
if err := doSummerize(ctx, cfg, sum, segs); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
if cfg.modeClip {
|
||||
if err := doClip(ctx, cfg, sum, segs); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func doSummerize(ctx context.Context, cfg config, sum summarize.Summarizer, segs []transcribe.Segment) error {
|
||||
systemPrompt, err := loadPrompt(cfg.promptSummary, defaultSummaryPrompt)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
body := "Transcript:\n\n" + transcribe.PlainText(segs)
|
||||
if notes := strings.TrimSpace(cfg.prompt); notes != "" {
|
||||
body = "Producer's notes (treat these as authoritative for titles, framing, and key points; expand and enrich them using the transcript that follows):\n\n" +
|
||||
notes + "\n\n---\n\n" + body
|
||||
}
|
||||
|
||||
logIf(cfg.verbose, "summarizing with %s", sum.Name())
|
||||
t0 := time.Now()
|
||||
md, err := sum.Summarize(ctx, systemPrompt, body)
|
||||
if err != nil {
|
||||
return fmt.Errorf("summarize: %w", err)
|
||||
}
|
||||
md = strings.TrimSpace(md)
|
||||
logIf(cfg.verbose, "summary ready (%d chars, %s)", len(md), time.Since(t0).Round(time.Second))
|
||||
|
||||
if err := writeOutput(cfg.mdOut, md); err != nil {
|
||||
return fmt.Errorf("writing markdown: %w", err)
|
||||
}
|
||||
|
||||
var html string
|
||||
if cfg.spotifyOut != "" || cfg.copyHTML {
|
||||
html = output.MarkdownToSpotifyHTML(md)
|
||||
}
|
||||
if cfg.spotifyOut != "" {
|
||||
if err := writeOutput(cfg.spotifyOut, html); err != nil {
|
||||
return fmt.Errorf("writing spotify HTML: %w", err)
|
||||
}
|
||||
}
|
||||
if cfg.copyHTML {
|
||||
tool, err := output.CopyToClipboard(html)
|
||||
if err != nil {
|
||||
return fmt.Errorf("clipboard: %w", err)
|
||||
}
|
||||
logIf(cfg.verbose, "Spotify HTML copied via %s", tool)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func doClip(ctx context.Context, cfg config, sum summarize.Summarizer, segs []transcribe.Segment) error {
|
||||
prompt, err := loadPrompt(cfg.promptClip, defaultClipPrompt)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
logIf(cfg.verbose, "selecting clip with %s (looking for %g-%gs window)", sum.Name(), cfg.minSec, cfg.maxSec)
|
||||
t0 := time.Now()
|
||||
sel, raw, err := clip.Pick(ctx, sum, prompt, segs, cfg.minSec, cfg.maxSec)
|
||||
if err != nil {
|
||||
if raw != "" {
|
||||
fmt.Fprintf(os.Stderr, "model output:\n%s\n", raw)
|
||||
}
|
||||
return fmt.Errorf("selecting clip: %w", err)
|
||||
}
|
||||
logIf(cfg.verbose, "selection ready (%s)", time.Since(t0).Round(time.Second))
|
||||
|
||||
fmt.Printf("Title: %s\n", sel.Title)
|
||||
fmt.Printf("Hook: %s\n", sel.Hook)
|
||||
fmt.Printf("Quote: %s\n", sel.Quote)
|
||||
fmt.Printf("Window: %s -> %s (%.1fs)\n", mmss(sel.StartSeconds), mmss(sel.EndSeconds), sel.Duration())
|
||||
fmt.Printf("Reason: %s\n", sel.Reasoning)
|
||||
|
||||
if cfg.dryRun {
|
||||
return nil
|
||||
}
|
||||
|
||||
logIf(cfg.verbose, "cutting clip with ffmpeg -> %s", cfg.clipOut)
|
||||
if err := clip.Extract(ctx, cfg.input, sel, cfg.clipOut, !cfg.copyCodec); err != nil {
|
||||
return err
|
||||
}
|
||||
fmt.Printf("Wrote: %s\n", cfg.clipOut)
|
||||
return nil
|
||||
}
|
||||
|
||||
// loadOrTranscribeSegments reads cached whisper JSON if available; otherwise
|
||||
// extracts audio, runs whisper, writes the cache, and returns segments.
|
||||
func loadOrTranscribeSegments(ctx context.Context, cfg config) ([]transcribe.Segment, error) {
|
||||
if data, err := os.ReadFile(cfg.segmentsCache); err == nil {
|
||||
var segs []transcribe.Segment
|
||||
if jerr := json.Unmarshal(data, &segs); jerr == nil && len(segs) > 0 {
|
||||
logIf(cfg.verbose, "reusing cached segments from %s (%d segments)", cfg.segmentsCache, len(segs))
|
||||
return segs, nil
|
||||
}
|
||||
}
|
||||
|
||||
wavPath, cleanup, err := prepareWAV(ctx, cfg.input, cfg.keepWAV, cfg.verbose)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
defer cleanup()
|
||||
|
||||
tr := buildTranscriber(cfg)
|
||||
logIf(cfg.verbose, "transcribing with %s", tr.Name())
|
||||
t0 := time.Now()
|
||||
segs, err := tr.TranscribeSegments(ctx, wavPath)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("transcribe: %w", err)
|
||||
}
|
||||
logIf(cfg.verbose, "transcript ready (%d segments, %s)", len(segs), time.Since(t0).Round(time.Second))
|
||||
|
||||
if data, err := json.Marshal(segs); err == nil {
|
||||
_ = os.WriteFile(cfg.segmentsCache, data, 0o644)
|
||||
logIf(cfg.verbose, "cached segments to %s", cfg.segmentsCache)
|
||||
}
|
||||
return segs, nil
|
||||
}
|
||||
|
||||
// prepareWAV normalizes input to 16 kHz mono WAV. Returns the wav path and a
|
||||
// cleanup function (no-op if keep is true).
|
||||
func prepareWAV(ctx context.Context, input string, keep, verbose bool) (string, func(), error) {
|
||||
wavPath := input + ".16k.wav"
|
||||
cleanup := func() {}
|
||||
if !keep {
|
||||
tmpDir, err := os.MkdirTemp("", "publish-")
|
||||
if err != nil {
|
||||
return "", cleanup, err
|
||||
}
|
||||
wavPath = filepath.Join(tmpDir, "audio.wav")
|
||||
cleanup = func() { _ = os.RemoveAll(tmpDir) }
|
||||
}
|
||||
logIf(verbose, "extracting audio -> %s", wavPath)
|
||||
if err := audio.ExtractWAV(ctx, input, wavPath); err != nil {
|
||||
cleanup()
|
||||
return "", func() {}, fmt.Errorf("audio extraction: %w", err)
|
||||
}
|
||||
return wavPath, cleanup, nil
|
||||
}
|
||||
|
||||
func loadPrompt(path, fallback string) (string, error) {
|
||||
if path == "" {
|
||||
return fallback, nil
|
||||
}
|
||||
b, err := os.ReadFile(expand(path))
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("reading prompt %s: %w", path, err)
|
||||
}
|
||||
return string(b), nil
|
||||
}
|
||||
|
||||
func buildTranscriber(cfg config) *transcribe.WhisperCPP {
|
||||
return &transcribe.WhisperCPP{
|
||||
Bin: cfg.whisperBin,
|
||||
Model: expand(cfg.whisperModel),
|
||||
Language: cfg.whisperLang,
|
||||
Threads: cfg.whisperThreads,
|
||||
Verbose: cfg.verbose,
|
||||
}
|
||||
}
|
||||
|
||||
func buildSummarizer(kind, model string) (summarize.Summarizer, error) {
|
||||
switch kind {
|
||||
case "claude-cli", "cli":
|
||||
return &summarize.ClaudeCLI{Model: model}, nil
|
||||
case "claude-api", "anthropic", "api":
|
||||
return &summarize.Anthropic{Model: model}, nil
|
||||
default:
|
||||
return nil, fmt.Errorf("unknown summarizer %q", kind)
|
||||
}
|
||||
}
|
||||
|
||||
func writeOutput(path, data string) error {
|
||||
if path == "" {
|
||||
return nil
|
||||
}
|
||||
if path == "-" {
|
||||
_, err := os.Stdout.WriteString(data + "\n")
|
||||
return err
|
||||
}
|
||||
return os.WriteFile(expand(path), []byte(data+"\n"), 0o644)
|
||||
}
|
||||
|
||||
func expand(p string) string {
|
||||
if strings.HasPrefix(p, "~/") {
|
||||
if home, err := os.UserHomeDir(); err == nil {
|
||||
return filepath.Join(home, p[2:])
|
||||
}
|
||||
}
|
||||
return p
|
||||
}
|
||||
|
||||
func defaultWhisperModel() string {
|
||||
home, err := os.UserHomeDir()
|
||||
if err != nil {
|
||||
return ""
|
||||
}
|
||||
return filepath.Join(home, ".cache", "whisper.cpp", "ggml-base.en.bin")
|
||||
}
|
||||
|
||||
func logIf(on bool, format string, args ...any) {
|
||||
if !on {
|
||||
return
|
||||
}
|
||||
fmt.Fprintf(os.Stderr, "[publish] "+format+"\n", args...)
|
||||
}
|
||||
|
||||
func mmss(seconds float64) string {
|
||||
if seconds < 0 {
|
||||
seconds = 0
|
||||
}
|
||||
total := int(seconds)
|
||||
h := total / 3600
|
||||
m := (total % 3600) / 60
|
||||
s := total % 60
|
||||
if h > 0 {
|
||||
return fmt.Sprintf("%02d:%02d:%02d", h, m, s)
|
||||
}
|
||||
return fmt.Sprintf("%02d:%02d", m, s)
|
||||
}
|
||||
58
prompts/church-service.md
Normal file
58
prompts/church-service.md
Normal file
@@ -0,0 +1,58 @@
|
||||
You are summarizing a recorded Christian church service. The transcript may include
|
||||
welcome/announcements, worship songs, scripture readings, a sermon, prayer, and a
|
||||
closing/benediction. The transcript is auto-generated and may contain misheard words,
|
||||
missing punctuation, or speaker confusion — use context to interpret.
|
||||
|
||||
This is a King-James-Bible-preaching congregation. The summary must focus on the
|
||||
preaching of the Bible (KJV). When you reference or quote scripture in any field
|
||||
of the output, render it in King James Version English using KJV book/chapter/verse
|
||||
formatting (e.g. "Romans 5:8", "1 Corinthians 13:13"). If the preacher quoted
|
||||
scripture and the transcript mishears the wording, restore the KJV phrasing.
|
||||
Reject purely motivational framing that lacks a biblical anchor — every section
|
||||
should connect back to what the Bible says.
|
||||
|
||||
If the user message begins with a "Producer's notes" section before the transcript,
|
||||
treat those notes as authoritative. Use the producer's title, speaker name, framing,
|
||||
and any specific points they emphasize verbatim — do not contradict or rewrite them.
|
||||
Use the transcript to expand, enrich, and back up what the producer wrote (filling
|
||||
in scripture references, application points, memorable quotes, etc.).
|
||||
|
||||
Produce a faithful, neutral summary written in clear, accessible English. Do not
|
||||
invent facts, doctrines, scripture references, or quotes that are not present in the
|
||||
transcript or the producer's notes. If something is unclear, say so rather than guessing.
|
||||
|
||||
Output the summary in Markdown with this structure:
|
||||
|
||||
# {Sermon title or topic — infer from the message; if truly unknown, write "Sunday Service Summary"}
|
||||
|
||||
**Speaker:** {pastor/teacher name if identifiable, else "Unknown"}
|
||||
**Scripture:** {primary passages referenced, comma-separated; "—" if none}
|
||||
|
||||
## Overview
|
||||
A 2–4 sentence plain-English summary of the central message.
|
||||
|
||||
## Key Points
|
||||
- 4–7 bullets capturing the main teaching points in the order they were made.
|
||||
- Keep each bullet to one or two sentences.
|
||||
|
||||
## Scripture & References
|
||||
- Bullet list of every scripture reference, book/author quoted, or notable resource
|
||||
mentioned. Include the verse text only if it was read aloud and you can quote it
|
||||
accurately from the transcript.
|
||||
|
||||
## Application / Call to Action
|
||||
- What the speaker asked listeners to do, believe, or reflect on this week.
|
||||
|
||||
## Announcements & Prayer Requests
|
||||
- Brief bullets for anything congregational: events, missions updates, prayer
|
||||
requests, baptisms, etc. Omit this section entirely if none were mentioned.
|
||||
|
||||
## Memorable Quote
|
||||
> One short, verbatim quote from the speaker that captures the heart of the message.
|
||||
> Only include if you can quote it accurately from the transcript.
|
||||
|
||||
Style rules:
|
||||
- Be concise. Total length: 250–500 words.
|
||||
- Use the speaker's own framing and terminology when possible.
|
||||
- Do not add commentary, critique, or your own theological interpretation.
|
||||
- Do not include timestamps, filler ("uh", "you know"), or worship lyrics.
|
||||
45
prompts/clip-selector.md
Normal file
45
prompts/clip-selector.md
Normal file
@@ -0,0 +1,45 @@
|
||||
You are selecting the single best 60–90 second clip from a recorded Christian
|
||||
church service to use as a social-media hook (Reels, Shorts, TikTok, X video).
|
||||
This is a King-James-Bible-preaching congregation; the clip MUST come from the
|
||||
preaching of the sermon and MUST be rooted in the Bible (KJV).
|
||||
|
||||
You will be given a timestamped transcript. Each line is formatted:
|
||||
|
||||
[mm:ss] [mm:ss] text
|
||||
|
||||
The first timestamp is the segment start, the second is its end. Times are
|
||||
relative to the start of the recording.
|
||||
|
||||
Pick the clip that:
|
||||
- Comes from the preaching (sermon exposition) — NOT worship music, opening
|
||||
prayer, announcements, offering, "turn to verse X" housekeeping, altar call
|
||||
logistics, or the benediction.
|
||||
- Is anchored in the Bible — exposition of scripture, a scriptural truth being
|
||||
applied, or a story tied directly to a biblical principle. Reject purely
|
||||
motivational content with no biblical anchor.
|
||||
- Stands alone without prior context — a viewer who jumps in cold should still
|
||||
get the point.
|
||||
- Has emotional weight, vivid language, a sharp insight, or a memorable story.
|
||||
- Avoids mid-sentence cuts. Start at a natural beginning, end at a natural end.
|
||||
- Is between {{MIN_SECONDS}} and {{MAX_SECONDS}} seconds long.
|
||||
|
||||
Respond with ONLY a JSON object — no preamble, no code fence, no commentary.
|
||||
Use this exact shape:
|
||||
|
||||
{
|
||||
"start_seconds": 123.4,
|
||||
"end_seconds": 198.7,
|
||||
"title": "short punchy title for the clip",
|
||||
"hook": "one-sentence reason a viewer should stop scrolling",
|
||||
"quote": "the most quotable line in the clip, verbatim from the transcript",
|
||||
"reasoning": "1-2 sentences on why this is the best window in the service"
|
||||
}
|
||||
|
||||
Use the segment timestamps you were given for start_seconds and end_seconds —
|
||||
do not invent times. Keep the duration within the requested bounds.
|
||||
|
||||
Any scripture references you cite (in the "quote" field, the "hook", or the
|
||||
"reasoning") must use the King James Version. If the preacher quoted scripture
|
||||
within the chosen window, render that quote in KJV English even if the
|
||||
transcript misheard the wording, and use KJV book/chapter/verse formatting
|
||||
(e.g. "Romans 5:8", "1 Corinthians 13:13").
|
||||
480
scripts/install.sh
Executable file
480
scripts/install.sh
Executable file
@@ -0,0 +1,480 @@
|
||||
#!/usr/bin/env bash
|
||||
# install.sh — interactive setup for `publish`.
|
||||
#
|
||||
# Detects OS + GPU, walks through:
|
||||
# 1. system dependencies (ffmpeg, cmake, git, go, GPU runtime)
|
||||
# 2. whisper.cpp checkout + build for the chosen backend
|
||||
# 3. ggml model download
|
||||
# 4. publish binary build + symlink into ~/.local/bin
|
||||
#
|
||||
# Re-runnable. Each step is idempotent and skippable.
|
||||
#
|
||||
# Pass --doctor to print detection info and exit.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
REPO_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
|
||||
PREFIX="${PREFIX:-$HOME/.local}"
|
||||
BINDIR="$PREFIX/bin"
|
||||
MODELDIR="$HOME/.cache/whisper.cpp"
|
||||
WHISPER_REPO="${WHISPER_REPO:-$HOME/Git Repos/whisper.cpp}"
|
||||
WHISPER_GIT="https://github.com/ggerganov/whisper.cpp"
|
||||
MODEL_BASE_URL="https://huggingface.co/ggerganov/whisper.cpp/resolve/main"
|
||||
|
||||
DOCTOR_ONLY=0
|
||||
[[ "${1:-}" == "--doctor" ]] && DOCTOR_ONLY=1
|
||||
|
||||
bold() { printf '\033[1m%s\033[0m\n' "$*"; }
|
||||
info() { printf ' %s\n' "$*"; }
|
||||
warn() { printf '\033[33m warn: %s\033[0m\n' "$*" >&2; }
|
||||
err() { printf '\033[31m error: %s\033[0m\n' "$*" >&2; }
|
||||
hr() { printf -- '----------------------------------------\n'; }
|
||||
|
||||
# Portable CPU count: nproc on Linux, sysctl on macOS, fallback 4.
|
||||
ncpu() {
|
||||
if command -v nproc >/dev/null 2>&1; then nproc
|
||||
elif command -v sysctl >/dev/null 2>&1; then sysctl -n hw.ncpu 2>/dev/null || echo 4
|
||||
else echo 4; fi
|
||||
}
|
||||
|
||||
ask_yn() {
|
||||
# ask_yn "prompt" default(Y|N)
|
||||
local prompt="$1" default="${2:-Y}" reply
|
||||
local hint="[Y/n]"; [[ "$default" == "N" ]] && hint="[y/N]"
|
||||
while true; do
|
||||
read -r -p " $prompt $hint " reply || true
|
||||
reply="${reply:-$default}"
|
||||
case "$reply" in
|
||||
[Yy]*) return 0 ;;
|
||||
[Nn]*) return 1 ;;
|
||||
esac
|
||||
done
|
||||
}
|
||||
|
||||
ask_choice() {
|
||||
# ask_choice "prompt" default option1 option2 ...
|
||||
local prompt="$1" default="$2"; shift 2
|
||||
local options=("$@") reply
|
||||
local opts_joined; opts_joined="$(IFS='/'; echo "${options[*]}")"
|
||||
while true; do
|
||||
read -r -p " $prompt [$opts_joined] (default: $default): " reply || true
|
||||
reply="${reply:-$default}"
|
||||
for opt in "${options[@]}"; do
|
||||
[[ "$reply" == "$opt" ]] && { echo "$reply"; return 0; }
|
||||
done
|
||||
warn "must be one of: $opts_joined"
|
||||
done
|
||||
}
|
||||
|
||||
# ---------- detection ----------
|
||||
|
||||
detect_os() {
|
||||
case "$(uname -s)" in
|
||||
Linux*)
|
||||
if command -v pacman >/dev/null 2>&1; then echo "arch"
|
||||
elif command -v apt-get >/dev/null 2>&1; then echo "debian"
|
||||
elif command -v dnf >/dev/null 2>&1; then echo "fedora"
|
||||
else echo "linux-other"; fi ;;
|
||||
Darwin*) echo "macos" ;;
|
||||
*) echo "unknown" ;;
|
||||
esac
|
||||
}
|
||||
|
||||
detect_gpu() {
|
||||
# apple silicon
|
||||
if [[ "$(uname -s)" == "Darwin" ]]; then
|
||||
[[ "$(uname -m)" == "arm64" ]] && { echo "apple"; return; }
|
||||
echo "none"; return
|
||||
fi
|
||||
# nvidia
|
||||
if command -v nvidia-smi >/dev/null 2>&1 && nvidia-smi -L >/dev/null 2>&1; then
|
||||
echo "nvidia"; return
|
||||
fi
|
||||
if command -v lspci >/dev/null 2>&1; then
|
||||
local vga; vga="$(lspci 2>/dev/null | grep -iE 'vga|3d|display' || true)"
|
||||
if echo "$vga" | grep -iq 'nvidia'; then echo "nvidia"; return; fi
|
||||
if echo "$vga" | grep -iqE 'amd|ati|radeon'; then echo "amd"; return; fi
|
||||
if echo "$vga" | grep -iq 'intel'; then echo "intel"; return; fi
|
||||
fi
|
||||
echo "none"
|
||||
}
|
||||
|
||||
default_backend_for() {
|
||||
case "$1" in
|
||||
nvidia) echo "cuda" ;;
|
||||
amd) echo "vulkan" ;; # easier to set up than rocm; user can pick rocm
|
||||
apple) echo "metal" ;;
|
||||
intel) echo "vulkan" ;;
|
||||
*) echo "cpu" ;;
|
||||
esac
|
||||
}
|
||||
|
||||
OS="$(detect_os)"
|
||||
GPU="$(detect_gpu)"
|
||||
DEFAULT_BACKEND="$(default_backend_for "$GPU")"
|
||||
|
||||
# ---------- doctor ----------
|
||||
|
||||
print_detection() {
|
||||
bold "=== publish — environment ==="
|
||||
info "OS: $OS ($(uname -s) $(uname -r))"
|
||||
info "Arch: $(uname -m)"
|
||||
info "GPU: $GPU"
|
||||
info "Default backend: $DEFAULT_BACKEND"
|
||||
info "Repo dir: $REPO_DIR"
|
||||
info "Install prefix: $PREFIX"
|
||||
info "Whisper repo: $WHISPER_REPO"
|
||||
info "Model dir: $MODELDIR"
|
||||
hr
|
||||
bold "Dependencies"
|
||||
local deps=(go ffmpeg cmake git curl)
|
||||
case "$DEFAULT_BACKEND" in
|
||||
cuda) deps+=(nvidia-smi nvcc) ;;
|
||||
rocm) deps+=(rocminfo hipcc) ;;
|
||||
vulkan) deps+=(vulkaninfo glslc) ;;
|
||||
esac
|
||||
case "$OS" in
|
||||
macos) deps+=(brew xcode-select pbcopy) ;;
|
||||
*) deps+=(claude wl-copy xclip) ;;
|
||||
esac
|
||||
# Known off-PATH locations for tools that linux distros tuck away.
|
||||
extra_path_for() {
|
||||
case "$1" in
|
||||
nvcc) echo /opt/cuda/bin/nvcc ;;
|
||||
hipcc) echo /opt/rocm/bin/hipcc ;;
|
||||
rocminfo) echo /opt/rocm/bin/rocminfo ;;
|
||||
*) echo "" ;;
|
||||
esac
|
||||
}
|
||||
for d in "${deps[@]}"; do
|
||||
if command -v "$d" >/dev/null 2>&1; then
|
||||
info "$(printf '%-14s %s' "$d:" "$(command -v "$d")")"
|
||||
else
|
||||
extra="$(extra_path_for "$d")"
|
||||
if [[ -n "$extra" && -x "$extra" ]]; then
|
||||
info "$(printf '%-14s %s' "$d:" "$extra (not on PATH)")"
|
||||
else
|
||||
info "$(printf '%-14s %s' "$d:" "MISSING")"
|
||||
fi
|
||||
fi
|
||||
done
|
||||
}
|
||||
|
||||
print_detection
|
||||
|
||||
if [[ $DOCTOR_ONLY -eq 1 ]]; then
|
||||
exit 0
|
||||
fi
|
||||
|
||||
hr
|
||||
echo
|
||||
|
||||
# ---------- pick backend ----------
|
||||
|
||||
bold "Step 1 — pick whisper.cpp backend"
|
||||
echo
|
||||
echo " Detected default: $DEFAULT_BACKEND"
|
||||
echo " Options: cuda | rocm | vulkan | metal | cpu | skip"
|
||||
echo " cuda — NVIDIA GPU, fastest if you have one"
|
||||
echo " rocm — AMD GPU via ROCm/HIP, fastest on supported AMD cards"
|
||||
echo " vulkan — any GPU with a Vulkan driver (good cross-vendor fallback)"
|
||||
echo " metal — Apple Silicon"
|
||||
echo " cpu — no GPU, use the system whisper.cpp package"
|
||||
echo " skip — leave whisper.cpp alone (e.g. you've already built it)"
|
||||
echo
|
||||
BACKEND="$(ask_choice "Backend" "$DEFAULT_BACKEND" cuda rocm vulkan metal cpu skip)"
|
||||
|
||||
# ---------- macOS preflight ----------
|
||||
|
||||
if [[ "$OS" == "macos" ]]; then
|
||||
hr
|
||||
bold "macOS preflight — Xcode Command Line Tools + Homebrew"
|
||||
echo
|
||||
if ! xcode-select -p >/dev/null 2>&1; then
|
||||
warn "Xcode Command Line Tools not installed."
|
||||
echo " Run: xcode-select --install"
|
||||
echo " Then re-run 'make install'."
|
||||
exit 1
|
||||
else
|
||||
info "Xcode CLT present at $(xcode-select -p)"
|
||||
fi
|
||||
if ! command -v brew >/dev/null 2>&1; then
|
||||
warn "Homebrew not installed."
|
||||
echo
|
||||
echo " Install with:"
|
||||
echo ' /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"'
|
||||
echo
|
||||
if ask_yn "Install Homebrew now?" Y; then
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
|
||||
# Make brew available for the rest of this run on Apple Silicon (default prefix /opt/homebrew).
|
||||
if [[ -x /opt/homebrew/bin/brew ]]; then
|
||||
eval "$(/opt/homebrew/bin/brew shellenv)"
|
||||
elif [[ -x /usr/local/bin/brew ]]; then
|
||||
eval "$(/usr/local/bin/brew shellenv)"
|
||||
fi
|
||||
else
|
||||
err "Homebrew is required for the macOS install path. Install it and re-run."
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
info "Homebrew present at $(command -v brew)"
|
||||
fi
|
||||
fi
|
||||
|
||||
# ---------- system deps ----------
|
||||
|
||||
hr
|
||||
bold "Step 2 — system dependencies"
|
||||
echo
|
||||
|
||||
base_pkgs_arch=(go ffmpeg cmake git curl)
|
||||
base_pkgs_debian=(golang-go ffmpeg cmake git curl build-essential)
|
||||
base_pkgs_fedora=(golang ffmpeg cmake git curl gcc-c++)
|
||||
base_pkgs_macos=(go ffmpeg cmake git curl)
|
||||
|
||||
backend_pkgs_arch_cuda=(cuda gcc15)
|
||||
backend_pkgs_arch_rocm=(rocm-hip-sdk rocm-hip-runtime hipblas rocblas)
|
||||
backend_pkgs_arch_vulkan=(vulkan-headers vulkan-icd-loader shaderc)
|
||||
backend_pkgs_arch_cpu=(whisper.cpp)
|
||||
backend_pkgs_macos_metal=() # xcode-select on macOS provides Metal toolchain
|
||||
backend_pkgs_macos_cpu=(whisper-cpp)
|
||||
|
||||
pkgs_to_install=()
|
||||
case "$OS" in
|
||||
arch)
|
||||
pkgs_to_install+=("${base_pkgs_arch[@]}")
|
||||
case "$BACKEND" in
|
||||
cuda) pkgs_to_install+=("${backend_pkgs_arch_cuda[@]}") ;;
|
||||
rocm) pkgs_to_install+=("${backend_pkgs_arch_rocm[@]}") ;;
|
||||
vulkan) pkgs_to_install+=("${backend_pkgs_arch_vulkan[@]}") ;;
|
||||
cpu) pkgs_to_install+=("${backend_pkgs_arch_cpu[@]}") ;;
|
||||
esac
|
||||
;;
|
||||
macos)
|
||||
pkgs_to_install+=("${base_pkgs_macos[@]}")
|
||||
[[ "$BACKEND" == "cpu" ]] && pkgs_to_install+=("${backend_pkgs_macos_cpu[@]}")
|
||||
;;
|
||||
debian)
|
||||
pkgs_to_install+=("${base_pkgs_debian[@]}")
|
||||
warn "Debian/Ubuntu: GPU runtime install for $BACKEND is distro-specific; you'll need to handle it manually."
|
||||
;;
|
||||
fedora)
|
||||
pkgs_to_install+=("${base_pkgs_fedora[@]}")
|
||||
warn "Fedora: GPU runtime install for $BACKEND is distro-specific; you'll need to handle it manually."
|
||||
;;
|
||||
*)
|
||||
warn "Unknown OS '$OS' — install $BACKEND runtime, ffmpeg, cmake, git, and Go manually."
|
||||
;;
|
||||
esac
|
||||
|
||||
missing=()
|
||||
for p in "${pkgs_to_install[@]}"; do
|
||||
case "$p" in
|
||||
# arch package -> binary mapping for "is it installed?" checks
|
||||
go) command -v go >/dev/null 2>&1 || missing+=("$p") ;;
|
||||
golang-go|golang) command -v go >/dev/null 2>&1 || missing+=("$p") ;;
|
||||
ffmpeg) command -v ffmpeg >/dev/null 2>&1 || missing+=("$p") ;;
|
||||
cmake) command -v cmake >/dev/null 2>&1 || missing+=("$p") ;;
|
||||
git) command -v git >/dev/null 2>&1 || missing+=("$p") ;;
|
||||
curl) command -v curl >/dev/null 2>&1 || missing+=("$p") ;;
|
||||
cuda) command -v nvcc >/dev/null 2>&1 || missing+=("$p") ;;
|
||||
gcc15) [[ -x /usr/bin/g++-15 ]] || missing+=("$p") ;;
|
||||
rocm-hip-sdk) command -v hipcc >/dev/null 2>&1 || missing+=("$p") ;;
|
||||
rocm-hip-runtime|hipblas|rocblas) [[ -d /opt/rocm ]] || missing+=("$p") ;;
|
||||
vulkan-headers|vulkan-icd-loader) command -v vulkaninfo >/dev/null 2>&1 || missing+=("$p") ;;
|
||||
shaderc) command -v glslc >/dev/null 2>&1 || missing+=("$p") ;;
|
||||
whisper.cpp|whisper-cpp) command -v whisper-cli >/dev/null 2>&1 || command -v whisper-cpp >/dev/null 2>&1 || missing+=("$p") ;;
|
||||
build-essential|gcc-c++) command -v g++ >/dev/null 2>&1 || missing+=("$p") ;;
|
||||
*) missing+=("$p") ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if (( ${#missing[@]} == 0 )); then
|
||||
info "All system dependencies present."
|
||||
else
|
||||
info "Missing: ${missing[*]}"
|
||||
case "$OS" in
|
||||
arch) cmd="sudo pacman -S --needed ${missing[*]}" ;;
|
||||
debian) cmd="sudo apt-get update && sudo apt-get install -y ${missing[*]}" ;;
|
||||
fedora) cmd="sudo dnf install -y ${missing[*]}" ;;
|
||||
macos) cmd="brew install ${missing[*]}" ;;
|
||||
*) cmd="" ;;
|
||||
esac
|
||||
if [[ -n "$cmd" ]]; then
|
||||
echo
|
||||
echo " Suggested install command:"
|
||||
echo " $cmd"
|
||||
echo
|
||||
if ask_yn "Run it now?" Y; then
|
||||
bash -c "$cmd"
|
||||
else
|
||||
warn "Skipped. Re-run after installing manually."
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
# ---------- whisper.cpp build ----------
|
||||
|
||||
build_whisper() {
|
||||
local backend="$1"
|
||||
if [[ "$backend" == "cpu" ]]; then
|
||||
if command -v whisper-cli >/dev/null 2>&1 || command -v whisper-cpp >/dev/null 2>&1; then
|
||||
info "Using system whisper.cpp; no build needed."
|
||||
return 0
|
||||
fi
|
||||
err "No system whisper.cpp found and 'cpu' chosen; install the package or pick another backend."
|
||||
return 1
|
||||
fi
|
||||
if [[ "$backend" == "skip" ]]; then
|
||||
info "Skipping whisper.cpp."
|
||||
return 0
|
||||
fi
|
||||
|
||||
if [[ ! -d "$WHISPER_REPO/.git" ]]; then
|
||||
info "Cloning whisper.cpp into $WHISPER_REPO"
|
||||
mkdir -p "$(dirname "$WHISPER_REPO")"
|
||||
git clone --depth=1 "$WHISPER_GIT" "$WHISPER_REPO"
|
||||
else
|
||||
if ask_yn "whisper.cpp already at $WHISPER_REPO — git pull latest?" N; then
|
||||
git -C "$WHISPER_REPO" pull --ff-only || warn "git pull failed; continuing with current checkout"
|
||||
fi
|
||||
fi
|
||||
|
||||
local build_dir="$WHISPER_REPO/build-$backend"
|
||||
info "Configuring $backend build in $build_dir"
|
||||
case "$backend" in
|
||||
cuda)
|
||||
local host_cxx=""
|
||||
[[ -x /usr/bin/g++-15 ]] && host_cxx="-DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-15"
|
||||
local arch="86"
|
||||
if command -v nvidia-smi >/dev/null 2>&1; then
|
||||
local cap; cap="$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader 2>/dev/null | head -1 | tr -d '.')"
|
||||
[[ -n "$cap" ]] && arch="$cap"
|
||||
fi
|
||||
info " CUDA arch: sm_$arch"
|
||||
PATH="/opt/cuda/bin:$PATH" cmake -S "$WHISPER_REPO" -B "$build_dir" \
|
||||
-DGGML_CUDA=1 \
|
||||
-DCMAKE_CUDA_ARCHITECTURES="$arch" \
|
||||
$host_cxx \
|
||||
-DCMAKE_BUILD_TYPE=Release
|
||||
PATH="/opt/cuda/bin:$PATH" cmake --build "$build_dir" -j"$(ncpu)" --config Release
|
||||
;;
|
||||
rocm)
|
||||
local gpu_arch="gfx1102"
|
||||
if command -v rocminfo >/dev/null 2>&1; then
|
||||
local detected; detected="$(rocminfo 2>/dev/null | awk '/Name:[[:space:]]+gfx/ {print $2; exit}')"
|
||||
[[ -n "$detected" ]] && gpu_arch="$detected"
|
||||
fi
|
||||
info " AMDGPU target: $gpu_arch"
|
||||
HIPCXX="${HIPCXX:-/opt/rocm/llvm/bin/clang++}" \
|
||||
cmake -S "$WHISPER_REPO" -B "$build_dir" \
|
||||
-DGGML_HIP=1 \
|
||||
-DAMDGPU_TARGETS="$gpu_arch" \
|
||||
-DCMAKE_BUILD_TYPE=Release
|
||||
cmake --build "$build_dir" -j"$(ncpu)"
|
||||
;;
|
||||
vulkan)
|
||||
cmake -S "$WHISPER_REPO" -B "$build_dir" \
|
||||
-DGGML_VULKAN=1 \
|
||||
-DCMAKE_BUILD_TYPE=Release
|
||||
cmake --build "$build_dir" -j"$(ncpu)"
|
||||
;;
|
||||
metal)
|
||||
# Metal is on by default on Apple Silicon; no special flag.
|
||||
cmake -S "$WHISPER_REPO" -B "$build_dir" -DCMAKE_BUILD_TYPE=Release
|
||||
cmake --build "$build_dir" -j"$(ncpu)"
|
||||
;;
|
||||
*)
|
||||
err "Unknown backend: $backend"; return 1 ;;
|
||||
esac
|
||||
|
||||
mkdir -p "$BINDIR"
|
||||
local link="$BINDIR/whisper-cli-$backend"
|
||||
ln -sf "$build_dir/bin/whisper-cli" "$link"
|
||||
info "linked $link -> $build_dir/bin/whisper-cli"
|
||||
}
|
||||
|
||||
hr
|
||||
bold "Step 3 — whisper.cpp"
|
||||
echo
|
||||
build_whisper "$BACKEND"
|
||||
|
||||
# ---------- model download ----------
|
||||
|
||||
hr
|
||||
bold "Step 4 — whisper model"
|
||||
echo
|
||||
|
||||
mkdir -p "$MODELDIR"
|
||||
existing_models=()
|
||||
while IFS= read -r m; do existing_models+=("$(basename "$m")"); done < <(ls "$MODELDIR"/ggml-*.bin 2>/dev/null || true)
|
||||
if (( ${#existing_models[@]} > 0 )); then
|
||||
info "Existing models in $MODELDIR:"
|
||||
for m in "${existing_models[@]}"; do info " - $m"; done
|
||||
fi
|
||||
|
||||
if ask_yn "Download a model now?" $([ ${#existing_models[@]} -eq 0 ] && echo Y || echo N); then
|
||||
echo
|
||||
echo " Sizes (English-only suffix .en is faster on English audio):"
|
||||
echo " tiny.en ~75MB"
|
||||
echo " base.en ~142MB (default; good speed/quality balance)"
|
||||
echo " small.en ~466MB"
|
||||
echo " medium.en ~1.5GB"
|
||||
echo " large-v3 ~2.9GB (multilingual, highest quality)"
|
||||
echo
|
||||
SIZE="$(ask_choice "Model" "base.en" tiny.en base.en small.en medium.en large-v3)"
|
||||
target="$MODELDIR/ggml-$SIZE.bin"
|
||||
if [[ -f "$target" ]]; then
|
||||
info "$target already exists; skipping download."
|
||||
else
|
||||
info "Downloading ggml-$SIZE.bin ..."
|
||||
curl -L --fail -o "$target" "$MODEL_BASE_URL/ggml-$SIZE.bin"
|
||||
info "saved to $target"
|
||||
fi
|
||||
fi
|
||||
|
||||
# ---------- publish build + symlink ----------
|
||||
|
||||
hr
|
||||
bold "Step 5 — publish binary"
|
||||
echo
|
||||
|
||||
if ! command -v go >/dev/null 2>&1; then
|
||||
err "Go is not installed; cannot build publish."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
(cd "$REPO_DIR" && go build -o publish .)
|
||||
info "built $REPO_DIR/publish"
|
||||
|
||||
mkdir -p "$BINDIR"
|
||||
ln -sf "$REPO_DIR/publish" "$BINDIR/publish"
|
||||
info "linked $BINDIR/publish -> $REPO_DIR/publish"
|
||||
|
||||
# ---------- summarizer hint ----------
|
||||
|
||||
hr
|
||||
bold "Step 6 — summarizer"
|
||||
echo
|
||||
if command -v claude >/dev/null 2>&1; then
|
||||
info "Found 'claude' CLI — default --summarizer claude-cli will work."
|
||||
elif [[ -n "${ANTHROPIC_API_KEY:-}" ]]; then
|
||||
info "ANTHROPIC_API_KEY set — use --summarizer claude-api."
|
||||
else
|
||||
warn "Neither 'claude' CLI nor ANTHROPIC_API_KEY found. Install Claude Code or export ANTHROPIC_API_KEY before running summaries."
|
||||
fi
|
||||
|
||||
# ---------- done ----------
|
||||
|
||||
hr
|
||||
bold "Done."
|
||||
echo
|
||||
echo " Quick sanity check:"
|
||||
echo " publish --help"
|
||||
echo " publish --doctor # via this script: bash scripts/install.sh --doctor"
|
||||
echo
|
||||
echo " Make sure $BINDIR is on your PATH."
|
||||
case ":$PATH:" in
|
||||
*":$BINDIR:"*) ;;
|
||||
*) warn "$BINDIR is not currently on your PATH." ;;
|
||||
esac
|
||||
Reference in New Issue
Block a user