Database Architecture

TamborraData uses Supabase PostgreSQL with Row Level Security (RLS) to ensure data protection at the database level. The application is read-only from the web interface, with write access reserved for the data pipeline.

Database Stack

PostgreSQL

Relational database with JSONB support

Supabase

Managed PostgreSQL with built-in RLS

Row Level Security

Database-enforced access control

Database Schema

The database consists of three main tables:

Schema Diagram

Table Details

statistics
available_years
sys_status

Purpose: Stores all statistical data aggregated by category, scope, and year.

CREATE TABLE statistics (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  category text NOT NULL,
  scope text NOT NULL,
  year text NOT NULL,
  public_data jsonb,
  full_data jsonb,
  summary text,
  created_at timestamp DEFAULT now() NOT NULL,
  UNIQUE (category, scope, year)
);

Columns:

id: Primary key (UUID)
category: Type of statistic (e.g., “top-names”, “top-schools”)
scope: Data scope (“year” or “global”)
year: Year or “global” for aggregated data
public_data: Public-facing data (JSONB for flexibility)
full_data: Complete data (includes sensitive info)
summary: Human-readable description
created_at: Creation timestamp

Unique Constraint: (category, scope, year) ensures no duplicates

JSONB columns allow flexible schema while maintaining queryability. PostgreSQL can index and query JSONB efficiently.

Purpose: Tracks which years have complete data ready for display.

CREATE TABLE available_years (
  year text UNIQUE PRIMARY KEY,
  is_ready boolean DEFAULT FALSE NOT NULL,
  created_at timestamp DEFAULT now() NOT NULL,
  updated_at timestamptz DEFAULT now() NOT NULL
);

Columns:

year: Year identifier (e.g., “2024”, “global”)
is_ready: Whether data is complete and ready
created_at: When year was added
updated_at: Last modification timestamp

Usage:

// Frontend queries only ready years
SELECT year FROM available_years WHERE is_ready = true;

Purpose: Global system status flag for coordinating updates.

CREATE TABLE sys_status (
  id integer PRIMARY KEY DEFAULT 1,
  is_updating boolean DEFAULT FALSE NOT NULL,
  updated_at timestamptz DEFAULT now() NOT NULL,
  notes text
);

Columns:

id: Always 1 (singleton pattern)
is_updating: Whether data pipeline is running
updated_at: Last status change
notes: Optional status message

Singleton Pattern:

-- Only one row ever exists
INSERT INTO sys_status (id, is_updating, notes)
VALUES (1, false, 'Sistema iniciado')
ON CONFLICT (id) DO NOTHING;

This table prevents the frontend from showing stale data during pipeline updates.

Row Level Security (RLS)

TamborraData implements defense in depth with database-enforced access control:

Enable RLS

-- Enable RLS on all tables
ALTER TABLE statistics ENABLE ROW LEVEL SECURITY;
ALTER TABLE available_years ENABLE ROW LEVEL SECURITY;
ALTER TABLE sys_status ENABLE ROW LEVEL SECURITY;

RLS Policies

Read-Only for Anonymous
Write Access (Pipeline)
Security Benefits

-- Anonymous users (web app) can only read
CREATE POLICY "Anon read access on statistics"
ON statistics
FOR SELECT
TO anon
USING (true);

CREATE POLICY "Anon read access on available_years"
ON available_years
FOR SELECT
TO anon
USING (true);

CREATE POLICY "Anon read access on sys_status"
ON sys_status
FOR SELECT
TO anon
USING (true);

USING (true) means all rows are visible, but only for SELECT. INSERT, UPDATE, DELETE are denied by default.

Write access is granted to the service role (used by data pipeline):

-- Service role has full access
-- (Implicit: service_role bypasses RLS by default)

-- Pipeline uses SUPABASE_SERVICE_ROLE_KEY
-- Web app uses SUPABASE_ANON_KEY

Environment separation:

# Web app (.env)
SUPABASE_ANON_KEY=eyJ...    # Read-only

# Pipeline (.env)
SUPABASE_SERVICE_KEY=eyJ... # Read-write

RLS vs API Middleware

Aspect	API Middleware	Row Level Security
Enforcement	Application layer	Database layer
Bypassable	✅ Yes (if API compromised)	❌ No (database-enforced)
Performance	⚠️ Additional app logic	✅ Native PostgreSQL
Maintainability	❌ Duplicate in multiple APIs	✅ Single source of truth
Auditing	⚠️ Application logs	✅ Database logs
Testing	❌ Must test in each API	✅ Test once at DB level

RLS is the last line of defense. Even if your API routes are compromised, RLS ensures data cannot be modified.

Data Access Patterns

Repository Queries

All database access goes through repositories:

Fetch Statistics
Fetch Available Years
Check System Status

// app/(backend)/api/statistics/repositories/statistics.repo.ts
export async function fetchStatistics(year: string) {
  const { data, error } = await supabaseClient
    .from('statistics')
    .select('category, public_data, summary')
    .eq('year', year)
    .order('public_data', { ascending: false })
    .limit(30);

  if (error) {
    return { statistics: null, error: 'Error de la base de datos' };
  }

  return { statistics: data, error: null };
}

Query breakdown:

Select only needed columns (not full_data)
Filter by year
Order by data (JSONB ordering)
Limit results for performance

// app/(backend)/api/years/repositories/years.repo.ts
export async function fetchYears() {
  const { data, error } = await supabaseClient
    .from('available_years')
    .select('year, is_ready')
    .eq('is_ready', true)
    .order('year', { ascending: false });

  if (error) {
    return { years: null, error: 'Error de la base de datos' };
  }

  return { years: data, error: null };
}

Query breakdown:

Select year and ready status
Filter for ready years only
Order descending (newest first)

// app/(backend)/shared/utils/getSysStatus.ts
export async function getSysStatus(): Promise<boolean> {
  const { data, error } = await supabaseClient
    .from('sys_status')
    .select('is_updating')
    .eq('id', 1)
    .single();

  if (error) return false;

  return data?.is_updating ?? false;
}

Query breakdown:

Select only is_updating column
Get singleton row (id=1)
Use .single() to return object, not array

JSONB Data Structure

TamborraData uses JSONB for flexible statistical data:

Example Data

Statistics Row
Querying JSONB
JSONB Benefits

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "category": "top-names",
  "scope": "year",
  "year": "2024",
  "public_data": [
    {
      "rank": 1,
      "name": "Juan",
      "count": 245,
      "percentage": 8.5
    },
    {
      "rank": 2,
      "name": "María",
      "count": 230,
      "percentage": 8.0
    }
  ],
  "summary": "Top 10 most common names in 2024",
  "created_at": "2024-11-07T18:14:24.891472Z"
}

-- Query JSONB fields
SELECT 
  category,
  public_data->0->>'name' as top_name,
  public_data->0->>'count' as count
FROM statistics
WHERE year = '2024'
  AND category = 'top-names';

-- Filter by JSONB content
SELECT *
FROM statistics
WHERE public_data @> '[{"name": "Juan"}]';

-- Get array length
SELECT 
  category,
  jsonb_array_length(public_data) as item_count
FROM statistics;

Indexing Strategy

Indexes improve query performance by allowing PostgreSQL to find rows without scanning entire tables.

Recommended Indexes

-- Composite index for common query pattern
CREATE INDEX idx_statistics_year_category 
ON statistics (year, category);

-- Index on is_ready for filtering
CREATE INDEX idx_years_ready 
ON available_years (is_ready) 
WHERE is_ready = true;

-- GIN index for JSONB queries
CREATE INDEX idx_statistics_public_data 
ON statistics USING GIN (public_data);

-- Index on created_at for time-based queries
CREATE INDEX idx_statistics_created 
ON statistics (created_at DESC);

Index Usage Examples

Composite Index
GIN Index
Explain Query

-- Uses idx_statistics_year_category
SELECT * 
FROM statistics 
WHERE year = '2024' 
  AND category = 'top-names';

-- Index covers both conditions efficiently

-- Uses idx_statistics_public_data
SELECT * 
FROM statistics 
WHERE public_data @> '{"rank": 1}';

-- GIN index enables fast JSONB containment queries

-- Check if index is used
EXPLAIN ANALYZE
SELECT * 
FROM statistics 
WHERE year = '2024';

-- Output shows "Index Scan using idx_statistics_year_category"

Connection Management

Supabase handles connection pooling automatically:

// app/(backend)/core/db/supabaseClient.ts
import 'server-only';
import { createClient } from '@supabase/supabase-js';

const supabaseUrl = process.env.SUPABASE_URL!;
const supabaseAnonKey = process.env.SUPABASE_ANON_KEY!;

export const supabaseClient = createClient(supabaseUrl, supabaseAnonKey);

Connection pooling is handled by Supabase’s infrastructure. The client automatically manages connections, retries, and timeouts.

Environment Variables

# .env.local
SUPABASE_URL=https://xxxxx.supabase.co
SUPABASE_ANON_KEY=eyJhbGc...  # Read-only key

Never commit .env.local to version control! Use .env.example as a template.

Data Pipeline Separation

TamborraData separates data generation (pipeline) from data visualization (web app):

Architecture
Pipeline Access
Web Access
Security Benefits

Pipeline (Private Repository)

Python scripts for data scraping
Data cleaning and aggregation
Database write access
Uses SUPABASE_SERVICE_KEY

# Pipeline uses service key
supabase = create_client(
    os.environ["SUPABASE_URL"],
    os.environ["SUPABASE_SERVICE_KEY"]  # Write access
)

# Insert statistics
supabase.table("statistics").insert({
    "category": "top-names",
    "year": "2024",
    "public_data": data,
}).execute()

Web App (Public Repository)

TypeScript/Next.js frontend
Read-only database access
Uses SUPABASE_ANON_KEY
No sensitive data in code

// Web app uses anon key
const supabaseClient = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_ANON_KEY!  // Read-only
);

// Can only SELECT
const { data } = await supabaseClient
  .from('statistics')
  .select('*');

Backup and Recovery

Automatic Backups
Migration Strategy
Disaster Recovery

Supabase provides automatic backups:

Daily backups retained for 7 days (free tier)
Point-in-time recovery available (paid tiers)
Manual backups via SQL dump

# Manual backup
pg_dump $DATABASE_URL > backup.sql

# Restore from backup
psql $DATABASE_URL < backup.sql

// migrations/001_initial_schema.sql
CREATE TABLE statistics (
  -- schema definition
);

-- migrations/002_add_indexes.sql
CREATE INDEX idx_statistics_year 
ON statistics (year);

// Track migrations in version control

Performance Considerations

Query Optimization

Use indexes for frequently queried columns
Select only needed columns, not SELECT *
Use LIMIT to prevent large result sets
Avoid N+1 queries with proper joins

-- ❌ Bad: Select everything
SELECT * FROM statistics;

-- ✅ Good: Select only needed columns
SELECT category, public_data 
FROM statistics 
WHERE year = '2024' 
LIMIT 30;

JSONB Performance

Use GIN indexes for JSONB queries
Avoid deep nesting (>3 levels)
Consider denormalization for hot paths
Use jsonb_array_elements for array queries

-- Create GIN index
CREATE INDEX ON statistics USING GIN (public_data);

-- Efficient JSONB query
SELECT * FROM statistics 
WHERE public_data @> '{"rank": 1}';

Connection Pooling

Supabase handles pooling automatically
Use persistent connections in serverless
Monitor connection count in Supabase dashboard
Set appropriate timeout values

Supabase free tier allows 500 concurrent connections. Upgrade if you need more.

Monitoring

Monitor these metrics in Supabase dashboard:

Query performance (slow queries)
Database size and growth
Connection count
Cache hit ratio
Index usage

-- Find slow queries
SELECT * FROM pg_stat_statements 
ORDER BY total_time DESC 
LIMIT 10;

Get Started

Core Features

Architecture

Development

Advanced Topics

Database Architecture

Database Architecture

Database Stack

PostgreSQL

Supabase

Row Level Security

Database Schema

Schema Diagram

Table Details

Row Level Security (RLS)

Enable RLS

RLS Policies

RLS vs API Middleware

Data Access Patterns

Repository Queries

JSONB Data Structure

Example Data

Flexible Schema

Queryable

Type Safe

Performance

Indexing Strategy

Recommended Indexes

Index Usage Examples

Connection Management

Environment Variables

Data Pipeline Separation

Backup and Recovery

Performance Considerations

Next Steps

Backend

Frontend

References

Build docs developers (and LLMs) love

Get Started

Core Features

Architecture

Development

Advanced Topics

​Database Architecture

​Database Stack

PostgreSQL

Supabase

Row Level Security

​Database Schema

​Schema Diagram

​Table Details

​Row Level Security (RLS)

​Enable RLS

​RLS Policies

​RLS vs API Middleware

​Data Access Patterns

​Repository Queries

​JSONB Data Structure

​Example Data

Flexible Schema

Queryable

Type Safe

Performance

​Indexing Strategy

​Recommended Indexes

​Index Usage Examples

​Connection Management

​Environment Variables

​Data Pipeline Separation

​Backup and Recovery

​Performance Considerations

​Next Steps

Backend

Frontend

​References

Build docs developers (and LLMs) love

Database Architecture

Database Stack

Database Schema

Schema Diagram

Table Details

Row Level Security (RLS)

Enable RLS

RLS Policies

RLS vs API Middleware

Data Access Patterns

Repository Queries

JSONB Data Structure

Example Data

Indexing Strategy

Recommended Indexes

Index Usage Examples

Connection Management

Environment Variables

Data Pipeline Separation

Backup and Recovery

Performance Considerations

Next Steps

References