t1k:nakama:deploy
| Field | Value |
|---|---|
| Module | base |
| Version | 1.6.2 |
| Effort | medium |
| Tools | — |
Keywords: cluster, deploy, deployment, docker, hardening, helm, kubernetes, nakama, observability, postgres, production, prometheus, scaling, security, tls
How to invoke
Section titled “How to invoke”/t1k:nakama:deployNakama Deployment
Section titled “Nakama Deployment”Overview
Section titled “Overview”Nakama is stateful — matches, parties, chat rooms, and presence live in process memory. This shapes every deployment decision: horizontal scaling needs Enterprise (clustering) or sharding, load balancers need session affinity for WebSocket, and Go plugin .so files must match the host binary’s ABI exactly.
This skill covers the eight axes of a Nakama deployment: local dev, production targets, database, config, plugin CI/CD, observability, scaling, and security. For pattern-level help on the plugin code itself, see t1k:nakama:plugin. For config-template syntax, see t1k:nakama:config.
Deployment Targets
Section titled “Deployment Targets”| Target | When to use | Caveats |
|---|---|---|
| Single-node Docker | <10k CCU, single region | Simplest; vertical-scale only |
| Docker Swarm | Stateless API replicas | Stateful WS sessions need sticky LB; not recommended at scale |
| Kubernetes | Multi-region, full ops stack | No official Helm chart (community only — louis030195’s chart is the de-facto choice) |
| Nakama Enterprise | >10k CCU, horizontal scale | Paid; provides built-in clustering + CRDT state sync |
OSS has no built-in clustering. Running 2+ OSS nodes without Enterprise = guaranteed session loss on reconnect.
Local Dev (Docker Compose)
Section titled “Local Dev (Docker Compose)”Standard layout: postgres → config-renderer (gomplate) → nakama-migrate → nakama-proxy.
Fixed ports (not negotiable):
7348gRPC,7349WS,7350HTTP,7351Console,5432Postgres,9100Prometheus metrics
Volumes:
postgres-data:/var/lib/postgresqlfor persistencenakama-config:/config:rofor rendered config- Plugin hot-reload (dev only): bind-mount the
.sofile directly, thendocker compose restart nakama-proxy
Security basics in compose: cap_drop: [ALL], read_only: true (config-renderer), security_opt: [no-new-privileges:true], run as unprivileged user.
Database
Section titled “Database”- Postgres 12+ required (Nakama 3.34+ uses
pgx/v5). - Connection pooling:
database.max_open_conns=100,max_idle_conns=50(tune per CCU). Alternative: pgBouncer sidecar. - Migrations:
nakama migrate upis idempotent. Run once before the server starts. - Backup:
pg_dumpor WAL archiving (pgBackRest → S3). High availability via Patroni — external to Nakama.
Plugin Distribution
Section titled “Plugin Distribution”Build inside heroiclabs/nakama-pluginbuilder:<host-tag> (must match host Nakama tag).
go build -trimpath -buildmode=plugin -o modules/backend.so ./modules-trimpathis mandatory (reproducible builds).-buildmode=pluginis mandatory (else not loadable).- amd64 only today —
pluginbuilderhas no arm64 image. Cross-compile via native arm64 runner if needed. - Pin every shared dep in
go.modto match the host binary; extract versions via:Terminal window docker create --name tmp heroiclabs/nakama:<tag>docker cp tmp:/nakama/nakama /tmp/nakama-bin && docker rm tmpgo version -m /tmp/nakama-bin | grep -E 'google\.golang\.org|golang\.org/x'
Document the extraction in a DEPENDENCY_PINS.md at the project root.
Observability
Section titled “Observability”Nakama exports Prometheus metrics at :9100/metrics. Key series:
api_rpc_calls_total,api_rpc_latency_ms_bucket— RPC throughput & latency.authoritative_match_count,active_sessions_count— stateful workload size.database_pool_conns_open,database_query_latency_ms— DB health.socket_connections_active— WS load.
Recommended stack: Prometheus (15s scrape) + Grafana + Loki (logs via Promtail) + Jaeger (traces). For RPC→gRPC chain visibility, propagate trace IDs through gRPC metadata.
Scaling Reality Check
Section titled “Scaling Reality Check”- ~10k CCU per node (Heroic Labs benchmarks).
- WS connections need session affinity at the LB layer (source-IP hash, sticky cookies, or L4 NLB).
- HTTP S2S calls are stateless — round-robin is fine.
- For >10k CCU without Enterprise: (a) shard by game mode, (b) vertical scale (32-core/128GB), or (c) buy Enterprise.
Security Hardening (Pre-Production)
Section titled “Security Hardening (Pre-Production)”See references/production-checklist.md for the full list. Must-do items:
- Generate fresh 32-byte keys for
socket.server_key,session.encryption_key,session.refresh_encryption_key,runtime.http_key,console.signing_key. Default values shipped in compose files are insecure. - Lock down port
7351(console) — never publicly accessible. Bastion or VPN only. - Terminate TLS at a reverse proxy (Nakama does not terminate TLS itself).
- Secure plugin → backend gRPC: mTLS or API-key-in-metadata, plus VPC network isolation.
- Add rate limiting at the proxy (Nakama OSS has none) — nginx
limit_req, AWS WAF, or plugin-side checks.
Common Gotchas
Section titled “Common Gotchas”See references/troubleshooting.md for the full 12-item decision tree. The top three:
- Plugin ABI mismatch (
plugin was built with a different version) → re-extract host dep versions, re-pin, rebuild. - Postgres collation version drift after OS upgrade →
ALTER DATABASE nakama REFRESH COLLATION VERSION;orREINDEX DATABASE nakama;. - Config hot-reload doesn’t apply → env vars override config file; re-render via
docker compose up --force-recreate config-renderer.
Checklist (Before Each Production Deploy)
Section titled “Checklist (Before Each Production Deploy)”- Plugin built in matching
nakama-pluginbuilderimage;DEPENDENCY_PINS.mdcurrent - All security keys (5 above) regenerated, stored in secret manager, loaded via env at startup
- Postgres on managed instance with backups + monitoring; collation locked to a pinned base image
- Reverse proxy terminates TLS for
:7350(HTTP) and:7348(gRPC); console (:7351) is not exposed - Prometheus scraping
:9100; Grafana dashboards cover CCU, RPC latency, DB pool, match count - Rate limiting configured at proxy (no OSS built-in)
- Plugin → backend gRPC secured (mTLS or API key + VPC SG)
- Migrations container has
restart: "no"so only one process runsmigrate up - Health check timeout ≥ 60s (Nakama warms match cache on startup; large registries need more)
References
Section titled “References”references/troubleshooting.md— 12 deployment failure modes with symptom → root cause → fixreferences/production-checklist.md— security hardening + scaling decision tree- Heroic Labs docs: https://heroiclabs.com/docs/nakama/
- Community Helm chart: https://louis030195.github.io/helm-charts/
- Forum thread on K8s deployment: https://forum.heroiclabs.com/t/non-official-open-source-helm-chart-for-deploying-nakama-on-kubernetes/1284