Rosie Overview

Rosie reads reimbursements filed under Brazil’s Quota for Exercising Parliamentary Activity (CEAP) and applies a pipeline of classifiers to detect irregularities. When a reimbursement looks suspicious, she records why and posts about it on Twitter as @RosieDaSerenata.

How Rosie Works

Fetch datasets

Rosie downloads reimbursement CSVs and the companies dataset from public government sources via the serenata-toolbox. Data is fetched back to the year 2009.

Merge and normalize

The Adapter class merges reimbursements with company registration data (left-joined on CNPJ), renames columns to the Serenata standard, coerces dates, and normalizes category labels.

Run classifiers

The Core engine iterates over every classifier defined in the module’s settings.py. Each classifier receives the full dataset and returns a prediction per row: suspicious (True / -1) or normal (False / 1).

Write suspicions

Results are collected into a suspicions DataFrame keyed by unique identifiers and written to a compressed CSV at /tmp/serenata-data/suspicions.xz.

Modules

Rosie covers two legislative bodies, each with its own adapter and classifier settings:

Chamber of Deputies

The primary module. Uses six classifiers covering meal prices, travel speeds, election expenses, irregular companies, monthly subquota limits, and invalid tax IDs.Unique IDs: applicant_id, year, document_id

Federal Senate

A lighter module that currently runs the InvalidCnpjCpfClassifier against Senate reimbursements. Document types are normalised to unknown since the Senate data does not include a document type column.Unique IDs: none (full dataset is kept)

Output

After a run finishes, Rosie writes a single compressed file:

/tmp/serenata-data/suspicions.xz

This is a UTF-8 CSV compressed with xz. Each row represents one reimbursement and each classifier column contains True (suspicious) or False (normal).

Key Dependencies

Package	Role
`scikit-learn`	Classifier base classes (`TransformerMixin`) and KMeans clustering
`pandas`	DataFrame manipulation and CSV I/O
`numpy`	Numerical operations and array helpers
`geopy`	Geodesic distance calculation (Vincenty formula) for the travel speed classifier
`brutils`	Brazilian CPF and CNPJ validation
`serenata-toolbox`	Fetches reimbursement and company datasets from government sources
`docopt`	CLI argument parsing for `rosie.py`
`scipy`	Scientific computing (must be installed before scikit-learn)

scipy must appear before scikit-learn in requirements.txt so the wheel builds correctly inside Docker.

Overview

Getting Started

Rosie (AI Engine)

Jarbas (Web Platform)

Contributing

How Rosie Works

Modules

Chamber of Deputies

Federal Senate

Output

Key Dependencies

Build docs developers (and LLMs) love

Overview

Getting Started

Rosie (AI Engine)

Jarbas (Web Platform)

Contributing

​How Rosie Works

​Modules

Chamber of Deputies

Federal Senate

​Output

​Key Dependencies

Build docs developers (and LLMs) love

How Rosie Works

Modules

Output

Key Dependencies