Hardening & operations
Operator playbook for running GopherTrunk in production: API authentication, Prometheus metrics, graceful shutdown, the Docker assets, and the RTL-SDR USB pass-through recipe.
API authentication
Every HTTP mutation endpoint (end-call, talkgroup priority/lockout/
scan, retention sweep, tone-detector reset, scanner cockpit, audio
cockpit, manual tune) is gated by api.auth in config.yaml. The
default policy (mode: auto) is loopback-permissive but requires a
bearer token on every public-interface bind.
Policy modes
auto(default). Require a bearer token on non-loopback binds; bypass the check when bound to127.0.0.1/::1. Reasonable for single-host operator boxes — kernel-enforced reachability is a peer-cred proxy. The daemon refuses to start inautomode on a public bind without a configured token.required. Every mutation request must carry a valid Bearer token, even from loopback. Use when the daemon shares a host with untrusted users.disabled. Wide-open mutations, no auth. Equivalent to the legacyallow_mutations: truebehaviour; the daemon logs a warning at startup. Only safe behind an external proxy that does its own auth, or on a strictly trusted segment.
Generating a token
# 32 bytes of urandom, hex-encoded — 64 ASCII chars.
openssl rand -hex 32 > /etc/gophertrunk/api-token
chmod 600 /etc/gophertrunk/api-token
chown gophertrunk:gophertrunk /etc/gophertrunk/api-token
Reference the file from config:
api:
http_addr: "0.0.0.0:8080"
auth:
mode: "required"
token_file: "/etc/gophertrunk/api-token"
The daemon re-reads token_file on every mutation request, so
rotation is a one-step openssl rand + file write — no SIGHUP, no
restart.
Client usage
TOKEN=$(cat /etc/gophertrunk/api-token)
# Probe capability first — always open.
curl -sS http://daemon:8080/api/v1/mutations
# Mutation — Authorization header required.
curl -sS -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"reason":"manual"}' \
http://daemon:8080/api/v1/calls/00000001/end
GET /api/v1/mutations reports auth_mode and can_mutate (plus a
allow_mutations legacy alias) so TUI / scripts can light up
write-side keybindings without probing real endpoints for 401.
Trusted networks (LAN bypass)
If the daemon binds to a LAN address and you trust everything on that
segment, list the prefix under auth.trusted_networks and run in
auto mode:
api:
http_addr: "192.168.1.10:8080"
auth:
mode: "auto"
trusted_networks:
- "192.168.0.0/16"
The middleware honours RemoteAddr only — X-Forwarded-For is
intentionally ignored so the bypass isn’t forgeable by a hostile
upstream proxy. If you front the daemon with an authenticating
reverse proxy (nginx, Caddy, Envoy), point its upstream at the daemon
on loopback and let the proxy handle auth; auth.mode: auto then
bypasses on the loopback hop.
Migrating from allow_mutations
Existing configs with allow_mutations: true still work: the daemon
logs a deprecation warning at startup and maps the flag to
auth.mode: disabled (the wide-open legacy behaviour). Migrate to
explicit auth.mode at the next config edit:
api:
http_addr: "127.0.0.1:8080"
- allow_mutations: true
+ auth:
+ mode: "auto" # loopback-bypass; switch to required on public binds
Metrics
GopherTrunk exposes Prometheus metrics on the HTTP API at
GET /metrics when the daemon is started with metrics enabled. The
collector lives in internal/metrics.
Series exposed:
| Series | Type | Description |
|---|---|---|
gophertrunk_events_total{kind=...} |
counter | Every event observed on the internal bus, by kind |
gophertrunk_calls_total{reason=...} |
counter | Calls completed, labelled by EndReason (normal/preempted/…) |
gophertrunk_calls_active |
gauge | Currently-active call count |
gophertrunk_control_channel_locked{system=...} |
gauge | 1 while the named system has its CC locked, 0 otherwise |
gophertrunk_sdr_iq_underruns_total{driver,serial} |
counter | IQ pipeline drops because a downstream stage was too slow |
gophertrunk_sdr_usb_reconnects_total{driver,serial} |
counter | Times the USB driver had to re-open a device |
gophertrunk_decode_errors_total{protocol,stage} |
counter | Decode failures by protocol + stage (e.g. p25/tsbk-crc) |
gophertrunk_sdr_attached{driver,serial,role} |
gauge | 1 per currently-attached SDR |
gophertrunk_build_info{version} |
gauge | Always 1; carries the build version as a label |
The bus-driven counters (events_total, calls_total, calls_active,
control_channel_locked) update automatically. Subsystems push the
others via Metrics.RecordIQUnderrun, RecordUSBReconnect,
RecordDecodeError, and SetSDRAttached.
Graceful shutdown
The daemon (cmd/gophertrunk run) installs a signal.NotifyContext
for SIGINT / SIGTERM. Each long-lived component (api.Server,
metrics.Metrics, storage.CallLog, trunking.Engine) accepts a
context and a Close() method, so the shutdown sequence is:
- Signal cancels the root context.
- Each component returns from
Run/ cleans up its bus subscription. - The engine drains active calls — every active
ActiveCallgets a finalevents.KindCallEndwithEndReasonNormalso the call log captures it before the database closes.
The pre-built engine.Engine.shutdown() already implements step 3;
the wiring for steps 1-2 lives in cmd/gophertrunk (and lands in
the demod-pipeline composer follow-up).
Connection-drain window
The HTTP server’s Shutdown ctx is bounded at 30 s (was 5 s before).
The window covers in-flight SSE / WebSocket / per-call audio-stream
subscribers — those connections see a clean close instead of being
torn down mid-frame on a daemon restart. Static REST requests
complete in milliseconds either way; the extra headroom only
matters for streaming endpoints. gRPC uses
grpc.Server.GracefulStop(), which waits for every in-flight RPC
(including AudioService.StreamAudio subscribers) to finish or
voluntarily close.
Transport encryption (TLS)
Both the HTTP API (REST + SSE + WebSocket) and the gRPC server
support optional TLS. Plain TCP stays the default — bearer-token
auth in api.auth is sufficient for loopback / trusted-LAN
deployments and TLS adds operational overhead (cert rotation, CA
choice). Public-bind operators should enable TLS.
Wire up by adding two paths to api in config.yaml:
api:
http_addr: ":8080"
grpc_addr: ":50051"
tls_cert: "/etc/gophertrunk/tls/cert.pem"
tls_key: "/etc/gophertrunk/tls/key.pem"
Both keys must be set together — setting one without the other is
a config error the daemon refuses to start with rather than
silently falling back to plain HTTP. The same cert / key pair
is used for both the HTTP and gRPC listeners. Generate a real
cert with certbot, your CA’s tooling, or for testing:
openssl req -x509 -newkey rsa:2048 -nodes -days 365 \
-keyout /etc/gophertrunk/tls/key.pem \
-out /etc/gophertrunk/tls/cert.pem \
-subj "/CN=gophertrunk.example.com" \
-addext "subjectAltName=DNS:gophertrunk.example.com,IP:10.0.0.1"
Cert rotation requires a daemon restart — internal/api/server.go’s
ServeTLS reads both files at start-up. SIGHUP / auto-reload is a
follow-up.
Once TLS is up, clients use https:// and grpc+tls://:
curl --cacert /etc/gophertrunk/tls/cert.pem \
-H "Authorization: Bearer $TOKEN" \
https://gophertrunk.example.com:8080/api/v1/health
grpcurl -cacert /etc/gophertrunk/tls/cert.pem \
gophertrunk.example.com:50051 \
gophertrunk.v1.AudioService/StreamAudio
Health endpoint diagnostics
GET /api/v1/health returns a JSON body shaped like:
{
"status": "ok",
"now": "2026-05-13T19:00:00Z",
"version": "v1.2.3",
"pool_attached_count": 2,
"active_calls": 1,
"db_connected": true,
"metrics_enabled": true,
"auth_mode": "auto"
}
The endpoint is always open (no token required) so liveness /
readiness probes can hit it from outside the auth boundary. Every
field beyond status and now is best-effort — a daemon
configured without a particular collaborator (no SDR pool, no
engine, no history DB) just reports the corresponding field at its
zero value rather than failing the request.
Suggested probe semantics:
| Probe type | Condition |
|---|---|
| Liveness | HTTP 200 + body decodes |
| Readiness | status == "ok" AND pool_attached_count >= 1 AND db_connected == true |
The two-field readiness check distinguishes “the daemon process is up” from “the daemon process is actually doing work” so orchestrators (k8s, Nomad) can roll deployments cleanly.
Timeouts and keep-alive
| Knob | Value | What it bounds |
|---|---|---|
http.Server.ReadHeaderTimeout |
10 s | Slowloris-style header send delays |
http.Server.ReadTimeout |
30 s | Total request-read time (bounds slow uploads) |
http.Server.WriteTimeout |
30 s | Per-request write window for non-streaming handlers; disabled per-request for SSE (/api/v1/events) and audio-streaming endpoints via http.ResponseController.SetWriteDeadline(time.Time{}); not enforced after WebSocket Upgrade hijacks the connection |
http.Server.IdleTimeout |
120 s | Idle keep-alive socket lifetime |
grpc.KeepaliveParams.Time |
30 s | Server pings idle clients after this much inactivity |
grpc.KeepaliveParams.Timeout |
10 s | Server waits this long for a ping ack before closing the connection |
grpc.KeepaliveEnforcementPolicy.MinTime |
5 s | Floors how often clients are allowed to ping the server (anti-flood) |
grpc.KeepaliveEnforcementPolicy.PermitWithoutStream |
true |
Long-lived idle StreamAudio subscribers stay open through their keep-alive pings |
The bounds are conservative — every standard REST endpoint
completes in well under 30 s, and live streaming endpoints (SSE,
WebSocket, gRPC StreamAudio) opt out of HTTP WriteTimeout
explicitly so they aren’t torn down mid-frame.
Docker
The repository ships a multi-stage Dockerfile:
docker build -t gophertrunk:dev .
For a single-daemon, single-dongle deployment use the
docker-compose.yml at the repo root. It
bind-mounts config.yaml, recordings/, and calls.db from the
host so data persists across container restarts.
USB pass-through
RTL-SDR dongles need three things to work inside a container:
-
Host udev rules that grant a non-root group access to the device node. Place this at
/etc/udev/rules.d/20-rtlsdr.rules:SUBSYSTEM=="usb", ATTRS{idVendor}=="0bda", ATTRS{idProduct}=="2838", \ MODE="0660", GROUP="plugdev"Reload with
sudo udevadm control --reload && sudo udevadm trigger, then unplug / replug the dongle. -
DVB blacklist so the kernel doesn’t claim the device first:
# /etc/modprobe.d/blacklist-dvb_usb_rtl28xxu.conf blacklist dvb_usb_rtl28xxu -
Container privileges:
- Map the device node into the container with
devices:in compose (or--devicewithdocker run). - Use
group_add:to put the container’s user in the same group as the host’s udev rule (plugdev= GID 46 on Debian; checkgetent group plugdev). - Add
cap_add: DAC_OVERRIDEso the non-root user inside the container can open the device node.
- Map the device node into the container with
The provided docker-compose.yml does all of this; verify the
devices: path matches your lsusb output.
Validation
After docker compose up -d, smoke-test:
curl -s http://localhost:8080/api/v1/health
curl -s http://localhost:8080/api/v1/version
curl -s http://localhost:8080/metrics | grep gophertrunk_build_info
docker exec gophertrunk gophertrunk sdr list
The last command should list the dongle by index, serial, tuner type,
and supported gains. If it reports no devices, the udev rules and / or
USB pass-through are misconfigured — check dmesg on the host for
DVB driver claims, and ls -l /dev/bus/usb/... from inside the
container.
Integration test
make integration boots the wired daemon end-to-end (no real SDR),
publishes a synthetic call grant on the events bus, and asserts every
component agrees: the call appears in the SQLite history, a .wav
lands under <recordings>/<system>/<talkgroup>/, /metrics reflects
gophertrunk_calls_active 1, and CallEnd ticks
gophertrunk_calls_total{reason="normal"}. The target runs on every
CI build alongside the unit tests.
A live-IQ replay variant (replay a recorded .cfile through the
real SDR mock + protocol decoders + composer chain and diff the
resulting event totals against a golden manifest) is future work —
it needs live grant publication from the protocol decoders, which is
gated on the channel-coding pieces still in flight (P25 trellis +
TSBK interleaver, DMR slot-type Hamming(20,8), NXDN SACCH FEC).