t1k:nakama:deploy

Field	Value
Module	`base`
Version	`1.6.3`
Effort	`medium`
Tools	—

Keywords: cluster, deploy, deployment, docker, hardening, helm, kubernetes, nakama, observability, postgres, production, prometheus, scaling, security, tls

How to invoke

/t1k:nakama:deploy

Nakama Deployment

Overview

Nakama is stateful — matches, parties, chat rooms, and presence live in process memory. This shapes every deployment decision: horizontal scaling needs Enterprise (clustering) or sharding, load balancers need session affinity for WebSocket, and Go plugin .so files must match the host binary’s ABI exactly.

This skill covers the eight axes of a Nakama deployment: local dev, production targets, database, config, plugin CI/CD, observability, scaling, and security. For pattern-level help on the plugin code itself, see t1k:nakama:plugin. For config-template syntax, see t1k:nakama:config.

Deployment Targets

Target	When to use	Caveats
Single-node Docker	<10k CCU, single region	Simplest; vertical-scale only
Docker Swarm	Stateless API replicas	Stateful WS sessions need sticky LB; not recommended at scale
Kubernetes	Multi-region, full ops stack	No official Helm chart (community only — louis030195’s chart is the de-facto choice)
Nakama Enterprise	>10k CCU, horizontal scale	Paid; provides built-in clustering + CRDT state sync

OSS has no built-in clustering. Running 2+ OSS nodes without Enterprise = guaranteed session loss on reconnect.

Local Dev (Docker Compose)

Standard layout: postgres → config-renderer (gomplate) → nakama-migrate → nakama-proxy.

Fixed ports (not negotiable):

7348 gRPC, 7349 WS, 7350 HTTP, 7351 Console, 5432 Postgres, 9100 Prometheus metrics

Volumes:

postgres-data:/var/lib/postgresql for persistence
nakama-config:/config:ro for rendered config
Plugin hot-reload (dev only): bind-mount the .so file directly, then docker compose restart nakama-proxy

Security basics in compose: cap_drop: [ALL], read_only: true (config-renderer), security_opt: [no-new-privileges:true], run as unprivileged user.

Database

Postgres 12+ required (Nakama 3.34+ uses pgx/v5).
Connection pooling: database.max_open_conns=100, max_idle_conns=50 (tune per CCU). Alternative: pgBouncer sidecar.
Migrations: nakama migrate up is idempotent. Run once before the server starts.
Backup: pg_dump or WAL archiving (pgBackRest → S3). High availability via Patroni — external to Nakama.

Plugin Distribution

Build inside heroiclabs/nakama-pluginbuilder:<host-tag> (must match host Nakama tag).

go build -trimpath -buildmode=plugin -o modules/backend.so ./modules

-trimpath is mandatory (reproducible builds).
-buildmode=plugin is mandatory (else not loadable).
amd64 only today — pluginbuilder has no arm64 image. Cross-compile via native arm64 runner if needed.

Pin every shared dep in go.mod to match the host binary; extract versions via:

docker create --name tmp heroiclabs/nakama:<tag>
docker cp tmp:/nakama/nakama /tmp/nakama-bin && docker rm tmp
go version -m /tmp/nakama-bin | grep -E 'google\.golang\.org|golang\.org/x'

Document the extraction in a DEPENDENCY_PINS.md at the project root.

Observability

Nakama exports Prometheus metrics at :9100/metrics. Key series:

api_rpc_calls_total, api_rpc_latency_ms_bucket — RPC throughput & latency.
authoritative_match_count, active_sessions_count — stateful workload size.
database_pool_conns_open, database_query_latency_ms — DB health.
socket_connections_active — WS load.

Recommended stack: Prometheus (15s scrape) + Grafana + Loki (logs via Promtail) + Jaeger (traces). For RPC→gRPC chain visibility, propagate trace IDs through gRPC metadata.

Scaling Reality Check

~10k CCU per node (Heroic Labs benchmarks).
WS connections need session affinity at the LB layer (source-IP hash, sticky cookies, or L4 NLB).
HTTP S2S calls are stateless — round-robin is fine.
For >10k CCU without Enterprise: (a) shard by game mode, (b) vertical scale (32-core/128GB), or (c) buy Enterprise.

Security Hardening (Pre-Production)

See references/production-checklist.md for the full list. Must-do items:

Generate fresh 32-byte keys for socket.server_key, session.encryption_key, session.refresh_encryption_key, runtime.http_key, console.signing_key. Default values shipped in compose files are insecure.
Lock down port 7351 (console) — never publicly accessible. Bastion or VPN only.
Terminate TLS at a reverse proxy (Nakama does not terminate TLS itself).
Secure plugin → backend gRPC: mTLS or API-key-in-metadata, plus VPC network isolation.
Add rate limiting at the proxy (Nakama OSS has none) — nginx limit_req, AWS WAF, or plugin-side checks.

Common Gotchas

See references/troubleshooting.md for the full 12-item decision tree. The top three:

Plugin ABI mismatch (plugin was built with a different version) → re-extract host dep versions, re-pin, rebuild.
Postgres collation version drift after OS upgrade → ALTER DATABASE nakama REFRESH COLLATION VERSION; or REINDEX DATABASE nakama;.
Config hot-reload doesn’t apply → env vars override config file; re-render via docker compose up --force-recreate config-renderer.

Checklist (Before Each Production Deploy)

Plugin built in matching nakama-pluginbuilder image; DEPENDENCY_PINS.md current
All security keys (5 above) regenerated, stored in secret manager, loaded via env at startup
Postgres on managed instance with backups + monitoring; collation locked to a pinned base image
Reverse proxy terminates TLS for :7350 (HTTP) and :7348 (gRPC); console (:7351) is not exposed
Prometheus scraping :9100; Grafana dashboards cover CCU, RPC latency, DB pool, match count
Rate limiting configured at proxy (no OSS built-in)
Plugin → backend gRPC secured (mTLS or API key + VPC SG)
Migrations container has restart: "no" so only one process runs migrate up
Health check timeout ≥ 60s (Nakama warms match cache on startup; large registries need more)

References

references/troubleshooting.md — 12 deployment failure modes with symptom → root cause → fix
references/production-checklist.md — security hardening + scaling decision tree
Heroic Labs docs: https://heroiclabs.com/docs/nakama/
Community Helm chart: https://louis030195.github.io/helm-charts/
Forum thread on K8s deployment: https://forum.heroiclabs.com/t/non-official-open-source-helm-chart-for-deploying-nakama-on-kubernetes/1284