Skip to content

t1k:nakama:deploy

FieldValue
Modulebase
Version1.6.2
Effortmedium
Tools

Keywords: cluster, deploy, deployment, docker, hardening, helm, kubernetes, nakama, observability, postgres, production, prometheus, scaling, security, tls

/t1k:nakama:deploy

Nakama is stateful — matches, parties, chat rooms, and presence live in process memory. This shapes every deployment decision: horizontal scaling needs Enterprise (clustering) or sharding, load balancers need session affinity for WebSocket, and Go plugin .so files must match the host binary’s ABI exactly.

This skill covers the eight axes of a Nakama deployment: local dev, production targets, database, config, plugin CI/CD, observability, scaling, and security. For pattern-level help on the plugin code itself, see t1k:nakama:plugin. For config-template syntax, see t1k:nakama:config.

TargetWhen to useCaveats
Single-node Docker<10k CCU, single regionSimplest; vertical-scale only
Docker SwarmStateless API replicasStateful WS sessions need sticky LB; not recommended at scale
KubernetesMulti-region, full ops stackNo official Helm chart (community only — louis030195’s chart is the de-facto choice)
Nakama Enterprise>10k CCU, horizontal scalePaid; provides built-in clustering + CRDT state sync

OSS has no built-in clustering. Running 2+ OSS nodes without Enterprise = guaranteed session loss on reconnect.

Standard layout: postgres → config-renderer (gomplate) → nakama-migrate → nakama-proxy.

Fixed ports (not negotiable):

  • 7348 gRPC, 7349 WS, 7350 HTTP, 7351 Console, 5432 Postgres, 9100 Prometheus metrics

Volumes:

  • postgres-data:/var/lib/postgresql for persistence
  • nakama-config:/config:ro for rendered config
  • Plugin hot-reload (dev only): bind-mount the .so file directly, then docker compose restart nakama-proxy

Security basics in compose: cap_drop: [ALL], read_only: true (config-renderer), security_opt: [no-new-privileges:true], run as unprivileged user.

  • Postgres 12+ required (Nakama 3.34+ uses pgx/v5).
  • Connection pooling: database.max_open_conns=100, max_idle_conns=50 (tune per CCU). Alternative: pgBouncer sidecar.
  • Migrations: nakama migrate up is idempotent. Run once before the server starts.
  • Backup: pg_dump or WAL archiving (pgBackRest → S3). High availability via Patroni — external to Nakama.

Build inside heroiclabs/nakama-pluginbuilder:<host-tag> (must match host Nakama tag).

Terminal window
go build -trimpath -buildmode=plugin -o modules/backend.so ./modules
  • -trimpath is mandatory (reproducible builds).
  • -buildmode=plugin is mandatory (else not loadable).
  • amd64 only today — pluginbuilder has no arm64 image. Cross-compile via native arm64 runner if needed.
  • Pin every shared dep in go.mod to match the host binary; extract versions via:
    Terminal window
    docker create --name tmp heroiclabs/nakama:<tag>
    docker cp tmp:/nakama/nakama /tmp/nakama-bin && docker rm tmp
    go version -m /tmp/nakama-bin | grep -E 'google\.golang\.org|golang\.org/x'

Document the extraction in a DEPENDENCY_PINS.md at the project root.

Nakama exports Prometheus metrics at :9100/metrics. Key series:

  • api_rpc_calls_total, api_rpc_latency_ms_bucket — RPC throughput & latency.
  • authoritative_match_count, active_sessions_count — stateful workload size.
  • database_pool_conns_open, database_query_latency_ms — DB health.
  • socket_connections_active — WS load.

Recommended stack: Prometheus (15s scrape) + Grafana + Loki (logs via Promtail) + Jaeger (traces). For RPC→gRPC chain visibility, propagate trace IDs through gRPC metadata.

  • ~10k CCU per node (Heroic Labs benchmarks).
  • WS connections need session affinity at the LB layer (source-IP hash, sticky cookies, or L4 NLB).
  • HTTP S2S calls are stateless — round-robin is fine.
  • For >10k CCU without Enterprise: (a) shard by game mode, (b) vertical scale (32-core/128GB), or (c) buy Enterprise.

See references/production-checklist.md for the full list. Must-do items:

  1. Generate fresh 32-byte keys for socket.server_key, session.encryption_key, session.refresh_encryption_key, runtime.http_key, console.signing_key. Default values shipped in compose files are insecure.
  2. Lock down port 7351 (console) — never publicly accessible. Bastion or VPN only.
  3. Terminate TLS at a reverse proxy (Nakama does not terminate TLS itself).
  4. Secure plugin → backend gRPC: mTLS or API-key-in-metadata, plus VPC network isolation.
  5. Add rate limiting at the proxy (Nakama OSS has none) — nginx limit_req, AWS WAF, or plugin-side checks.

See references/troubleshooting.md for the full 12-item decision tree. The top three:

  • Plugin ABI mismatch (plugin was built with a different version) → re-extract host dep versions, re-pin, rebuild.
  • Postgres collation version drift after OS upgrade → ALTER DATABASE nakama REFRESH COLLATION VERSION; or REINDEX DATABASE nakama;.
  • Config hot-reload doesn’t apply → env vars override config file; re-render via docker compose up --force-recreate config-renderer.
  • Plugin built in matching nakama-pluginbuilder image; DEPENDENCY_PINS.md current
  • All security keys (5 above) regenerated, stored in secret manager, loaded via env at startup
  • Postgres on managed instance with backups + monitoring; collation locked to a pinned base image
  • Reverse proxy terminates TLS for :7350 (HTTP) and :7348 (gRPC); console (:7351) is not exposed
  • Prometheus scraping :9100; Grafana dashboards cover CCU, RPC latency, DB pool, match count
  • Rate limiting configured at proxy (no OSS built-in)
  • Plugin → backend gRPC secured (mTLS or API key + VPC SG)
  • Migrations container has restart: "no" so only one process runs migrate up
  • Health check timeout ≥ 60s (Nakama warms match cache on startup; large registries need more)