Skip to content

Scaling Strategies

KanjiIQ is designed to scale from its current single-node deployment to a multi-node, multi-region architecture as traffic grows.

Current State

Metric Value
Cluster nodes 1 (Hetzner dedicated)
Application replicas 2
Database Single PostgreSQL instance
Traffic handling ~100 req/min per IP (rate limited)

This handles the current traffic comfortably. The sections below outline the scaling path as demand increases.

Horizontal Pod Autoscaling (HPA)

The first scaling step is adding HPA to automatically adjust replica count based on load:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: jlpt-kanji-hpa
  namespace: jlpt-kanji
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: jlpt-kanji
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Requirements: Metrics Server must be installed (included in k3s by default).

Impact: Application scales automatically between 2-10 replicas based on CPU/memory pressure, with no code changes.

Multi-Node Cluster

When a single node reaches resource limits, add worker nodes to the k3s cluster:

# On the new worker node
curl -sfL https://get.k3s.io | K3S_URL=https://master:6443 \
  K3S_TOKEN=<node-token> sh -

Kubernetes automatically schedules pods across all available nodes. The application requires no changes — it is already stateless.

Node Affinity (Optional)

To control pod placement:

spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                app: jlpt-kanji
            topologyKey: kubernetes.io/hostname

This spreads replicas across different nodes for better fault tolerance.

Database Scaling

Connection Pooling

Add PgBouncer as a sidecar container to pool database connections:

containers:
  - name: pgbouncer
    image: edoburu/pgbouncer
    ports:
      - containerPort: 6432
    env:
      - name: DATABASE_URL
        valueFrom:
          secretKeyRef:
            name: jlpt-kanji-secrets
            key: database-url

The backend connects to PgBouncer on :6432 instead of PostgreSQL directly, reducing connection overhead.

Read Replicas

For read-heavy workloads (flashcard queries), add PostgreSQL streaming replication:

  1. Primary instance handles writes (study sessions, quiz results)
  2. Read replicas handle reads (kanji/vocabulary queries)
  3. Backend routes queries based on operation type

Managed Database

The simplest database scaling path is migrating to a managed service:

  • AWS RDS: Multi-AZ, automated backups, read replicas
  • GCP Cloud SQL: HA configuration, automatic failover
  • Hetzner Managed PostgreSQL: When available

See Portability for migration details.

CDN Layer

Static frontend assets can be served through a CDN for global performance:

graph LR
    U[User] --> CF[Cloudflare CDN]
    CF -->|Cache HIT| U
    CF -->|Cache MISS| N[Nginx Frontend]
    N --> CF

Since the Flutter Web frontend produces static files (JS, CSS, images), these are ideal CDN candidates:

  • Cache policy: 1 year for hashed assets, no-cache for index.html
  • Global PoPs: Content served from the nearest edge location
  • DDoS protection: CDN absorbs volumetric attacks

Cloudflare DNS is already in place — enabling proxy mode activates the CDN layer.

Scaling Roadmap

Traffic Level Infrastructure Key Changes
Current (low) 1 node, 2 replicas, single PG None needed
Growing (moderate) 1 node, HPA (2-10 replicas) Add HPA manifest
High 2-3 nodes, HPA, PgBouncer Add worker nodes + connection pooling
Very High Multi-node, managed DB, CDN Migrate DB to RDS/Cloud SQL, enable CDN
Global Multi-region clusters, read replicas Significant architecture evolution

Each step is incremental — no rewrites required. The application code remains unchanged across all scaling levels.