Skip to content

Database Architecture

KanjiIQ uses PostgreSQL 15 as its primary data store, deployed within the Kubernetes cluster with persistent storage.

Schema Overview

erDiagram
    users ||--o{ study_sessions : has
    users ||--o{ quiz_results : has
    users ||--o{ test_results : has
    kanji ||--o{ quiz_results : referenced_in
    vocabulary ||--o{ quiz_results : referenced_in

    users {
        uuid id PK
        text email
        text password_hash
        jsonb preferences
        jsonb stats
        timestamp created_at
    }

    kanji {
        uuid id PK
        text character
        int jlpt_level
        jsonb meanings
        jsonb readings
        text example_sentences
    }

    vocabulary {
        uuid id PK
        text expression
        text reading
        int jlpt_level
        jsonb meanings
        text part_of_speech
    }

    locale_configs {
        uuid id PK
        text locale_code
        text[] default_languages
        text[] available_languages
    }

    regional_analytics {
        uuid id PK
        text country_code
        text request_path
        text user_agent
        text device_type
        boolean is_suspicious
        int response_status
        timestamp created_at
    }

    ip_blocklist {
        uuid id PK
        text ip_address
        text reason
        text blocked_by
        timestamp expires_at
        boolean is_active
    }

Multilingual Content Storage

KanjiIQ stores translations using PostgreSQL's JSONB columns rather than separate translation tables. This provides flexible, schemaless multilingual storage:

// kanji.meanings column
{
  "en": "mountain",
  "es": "montaña",
  "fr": "montagne",
  "ja": "やま",
  "pt": "montanha",
  "ar": "جبل",
  "zh-CN": "山"
}

Why JSONB?

  • No schema migration needed when adding new languages
  • Single query retrieves all translations for a kanji
  • PostgreSQL JSONB operators enable efficient per-language lookups
  • GIN indexing available for full-text search across translations

Kubernetes Storage

PostgreSQL runs as a Kubernetes Deployment with a PersistentVolumeClaim:

# k8s/02-postgres-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
  namespace: jlpt-kanji
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

The database is not exposed outside the cluster — only the backend pod can reach it via the jlpt-postgres Service on port 5432.

Key Design Decisions

UUID Primary Keys

All tables use UUID primary keys instead of auto-incrementing integers. This supports:

  • Distributed ID generation (no central sequence)
  • Safe ID exposure in APIs (not guessable)
  • Future multi-region replication without ID conflicts

Cascade Deletions

Foreign keys use ON DELETE CASCADE to maintain referential integrity. Deleting a user automatically removes their study sessions, quiz results, and test results.

User Preferences as JSONB

User preferences and stats are stored as JSONB rather than fixed columns:

// users.preferences
{
  "defaultLanguages": ["en", "pt", "es"],
  "studyLevels": ["N5", "N4"],
  "showAllLanguages": false
}

// users.stats
{
  "totalKanjiStudied": 245,
  "averageScore": 78.5,
  "streakDays": 12
}

This avoids schema migrations for every new preference or stat added to the application.

Backup Strategy

The database follows a 3-2-1 backup rule:

  • 3 copies of data (live + 2 backups)
  • 2 different media (PVC + object storage)
  • 1 offsite copy (Hetzner Object Storage)

Daily pg_dump exports are stored with 30-day retention. See the Deployment section for operational details.