Skip to main content
Rosie is invoked through rosie.py, a small CLI wrapper built with docopt. It supports two sub-commands: run (produce suspicions) and test (run the test suite).
Usage:
  rosie.py run (chamber_of_deputies|federal_senate) [--output=<directory>]
  rosie.py test [chamber_of_deputies|federal_senate|core]
Docker is the simplest way to run Rosie because all Python dependencies and the correct environment are bundled in the image.
1

Pull or build the image

The pre-built image is serenata/rosie. No extra configuration is needed.
2

Run the analysis

Mount a host directory to /tmp/serenata-data so the output file is accessible after the container exits.
docker run --rm \
  -v /tmp/serenata-data:/tmp/serenata-data \
  serenata/rosie \
  python rosie.py run chamber_of_deputies
3

Retrieve the output

After the container exits, the results are at:
/tmp/serenata-data/suspicions.xz

Running tests with Docker

docker run --rm \
  -v /tmp/serenata-data:/tmp/serenata-data \
  serenata/rosie \
  python rosie.py test

Running without Docker

1

Install Anaconda

Download and install Anaconda for your platform.
2

Create and activate the environment

conda update conda
conda create --name serenata python=3
conda activate serenata
3

Install Python dependencies

pip install -r requirements.txt
scipy is listed before scikit-learn in requirements.txt intentionally — this ordering is required for the wheel to build correctly.
4

Run the analysis

python rosie.py run chamber_of_deputies
Output is written to /tmp/serenata-data/suspicions.xz by default.

Custom output directory

Use the --output flag to write the output to a different location:
python rosie.py run chamber_of_deputies --output /my/serenata/directory/
The directory will be created automatically if it does not exist.

Running Tests

python rosie.py test
Tests are discovered automatically using unittest.TestLoader.discover starting from the rosie/ directory (or a subdirectory when a module name is passed). The runner exits with code 1 if any test fails.
def test(module=None):
    loader = unittest.TestLoader()
    tests_path = 'rosie'

    if module:
        tests_path = os.path.join(tests_path, module)

    tests = loader.discover(tests_path)
    testRunner = unittest.runner.TextTestRunner()
    result = testRunner.run(tests)
    if not result.wasSuccessful():
        exit(1)

Output File

After a successful run, Rosie produces:
<output_directory>/suspicions.xz
This is a UTF-8 CSV compressed with xz. Each row corresponds to one reimbursement and includes:
  • The unique identifiers for the reimbursement (applicant_id, year, document_id for Chamber of Deputies)
  • One boolean column per classifier (e.g. meal_price_outlier, invalid_cnpj_cpf)
The file is written by the Core engine:
output = os.path.join(self.data_path, 'suspicions.xz')
kwargs = dict(compression='xz', encoding='utf-8', index=False)
self.suspicions.to_csv(output, **kwargs)
Trained classifier models (except MonthlySubquotaLimitClassifier) are cached as .pkl files in the output directory using joblib. Re-running Rosie will reuse these cached models, making subsequent runs faster.

Build docs developers (and LLMs) love