Skip to main content

Database Architecture

TamborraData uses Supabase PostgreSQL with Row Level Security (RLS) to ensure data protection at the database level. The application is read-only from the web interface, with write access reserved for the data pipeline.

Database Stack

PostgreSQL

Relational database with JSONB support

Supabase

Managed PostgreSQL with built-in RLS

Row Level Security

Database-enforced access control

Database Schema

The database consists of three main tables:

Schema Diagram

Table Details

Purpose: Stores all statistical data aggregated by category, scope, and year.
CREATE TABLE statistics (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  category text NOT NULL,
  scope text NOT NULL,
  year text NOT NULL,
  public_data jsonb,
  full_data jsonb,
  summary text,
  created_at timestamp DEFAULT now() NOT NULL,
  UNIQUE (category, scope, year)
);
Columns:
  • id: Primary key (UUID)
  • category: Type of statistic (e.g., “top-names”, “top-schools”)
  • scope: Data scope (“year” or “global”)
  • year: Year or “global” for aggregated data
  • public_data: Public-facing data (JSONB for flexibility)
  • full_data: Complete data (includes sensitive info)
  • summary: Human-readable description
  • created_at: Creation timestamp
Unique Constraint: (category, scope, year) ensures no duplicates
JSONB columns allow flexible schema while maintaining queryability. PostgreSQL can index and query JSONB efficiently.

Row Level Security (RLS)

TamborraData implements defense in depth with database-enforced access control:

Enable RLS

-- Enable RLS on all tables
ALTER TABLE statistics ENABLE ROW LEVEL SECURITY;
ALTER TABLE available_years ENABLE ROW LEVEL SECURITY;
ALTER TABLE sys_status ENABLE ROW LEVEL SECURITY;

RLS Policies

-- Anonymous users (web app) can only read
CREATE POLICY "Anon read access on statistics"
ON statistics
FOR SELECT
TO anon
USING (true);

CREATE POLICY "Anon read access on available_years"
ON available_years
FOR SELECT
TO anon
USING (true);

CREATE POLICY "Anon read access on sys_status"
ON sys_status
FOR SELECT
TO anon
USING (true);
USING (true) means all rows are visible, but only for SELECT. INSERT, UPDATE, DELETE are denied by default.

RLS vs API Middleware

AspectAPI MiddlewareRow Level Security
EnforcementApplication layerDatabase layer
Bypassable✅ Yes (if API compromised)❌ No (database-enforced)
Performance⚠️ Additional app logic✅ Native PostgreSQL
Maintainability❌ Duplicate in multiple APIs✅ Single source of truth
Auditing⚠️ Application logs✅ Database logs
Testing❌ Must test in each API✅ Test once at DB level
RLS is the last line of defense. Even if your API routes are compromised, RLS ensures data cannot be modified.

Data Access Patterns

Repository Queries

All database access goes through repositories:
// app/(backend)/api/statistics/repositories/statistics.repo.ts
export async function fetchStatistics(year: string) {
  const { data, error } = await supabaseClient
    .from('statistics')
    .select('category, public_data, summary')
    .eq('year', year)
    .order('public_data', { ascending: false })
    .limit(30);

  if (error) {
    return { statistics: null, error: 'Error de la base de datos' };
  }

  return { statistics: data, error: null };
}
Query breakdown:
  1. Select only needed columns (not full_data)
  2. Filter by year
  3. Order by data (JSONB ordering)
  4. Limit results for performance

JSONB Data Structure

TamborraData uses JSONB for flexible statistical data:

Example Data

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "category": "top-names",
  "scope": "year",
  "year": "2024",
  "public_data": [
    {
      "rank": 1,
      "name": "Juan",
      "count": 245,
      "percentage": 8.5
    },
    {
      "rank": 2,
      "name": "María",
      "count": 230,
      "percentage": 8.0
    }
  ],
  "summary": "Top 10 most common names in 2024",
  "created_at": "2024-11-07T18:14:24.891472Z"
}

Indexing Strategy

Indexes improve query performance by allowing PostgreSQL to find rows without scanning entire tables.
-- Composite index for common query pattern
CREATE INDEX idx_statistics_year_category 
ON statistics (year, category);

-- Index on is_ready for filtering
CREATE INDEX idx_years_ready 
ON available_years (is_ready) 
WHERE is_ready = true;

-- GIN index for JSONB queries
CREATE INDEX idx_statistics_public_data 
ON statistics USING GIN (public_data);

-- Index on created_at for time-based queries
CREATE INDEX idx_statistics_created 
ON statistics (created_at DESC);

Index Usage Examples

-- Uses idx_statistics_year_category
SELECT * 
FROM statistics 
WHERE year = '2024' 
  AND category = 'top-names';

-- Index covers both conditions efficiently

Connection Management

Supabase handles connection pooling automatically:
// app/(backend)/core/db/supabaseClient.ts
import 'server-only';
import { createClient } from '@supabase/supabase-js';

const supabaseUrl = process.env.SUPABASE_URL!;
const supabaseAnonKey = process.env.SUPABASE_ANON_KEY!;

export const supabaseClient = createClient(supabaseUrl, supabaseAnonKey);
Connection pooling is handled by Supabase’s infrastructure. The client automatically manages connections, retries, and timeouts.

Environment Variables

# .env.local
SUPABASE_URL=https://xxxxx.supabase.co
SUPABASE_ANON_KEY=eyJhbGc...  # Read-only key
Never commit .env.local to version control! Use .env.example as a template.

Data Pipeline Separation

TamborraData separates data generation (pipeline) from data visualization (web app):

Backup and Recovery

Supabase provides automatic backups:
  • Daily backups retained for 7 days (free tier)
  • Point-in-time recovery available (paid tiers)
  • Manual backups via SQL dump
# Manual backup
pg_dump $DATABASE_URL > backup.sql

# Restore from backup
psql $DATABASE_URL < backup.sql

Performance Considerations

  • Use indexes for frequently queried columns
  • Select only needed columns, not SELECT *
  • Use LIMIT to prevent large result sets
  • Avoid N+1 queries with proper joins
-- ❌ Bad: Select everything
SELECT * FROM statistics;

-- ✅ Good: Select only needed columns
SELECT category, public_data 
FROM statistics 
WHERE year = '2024' 
LIMIT 30;
  • Use GIN indexes for JSONB queries
  • Avoid deep nesting (>3 levels)
  • Consider denormalization for hot paths
  • Use jsonb_array_elements for array queries
-- Create GIN index
CREATE INDEX ON statistics USING GIN (public_data);

-- Efficient JSONB query
SELECT * FROM statistics 
WHERE public_data @> '{"rank": 1}';
  • Supabase handles pooling automatically
  • Use persistent connections in serverless
  • Monitor connection count in Supabase dashboard
  • Set appropriate timeout values
Supabase free tier allows 500 concurrent connections. Upgrade if you need more.
Monitor these metrics in Supabase dashboard:
  • Query performance (slow queries)
  • Database size and growth
  • Connection count
  • Cache hit ratio
  • Index usage
-- Find slow queries
SELECT * FROM pg_stat_statements 
ORDER BY total_time DESC 
LIMIT 10;

Next Steps

Backend

Learn how repositories access this database

Frontend

See how data flows from database to UI

References

Build docs developers (and LLMs) love