Running Analysis

Rosie is invoked through rosie.py, a small CLI wrapper built with docopt. It supports two sub-commands: run (produce suspicions) and test (run the test suite).

Usage:
  rosie.py run (chamber_of_deputies|federal_senate) [--output=<directory>]
  rosie.py test [chamber_of_deputies|federal_senate|core]

Running with Docker (Recommended)

Docker is the simplest way to run Rosie because all Python dependencies and the correct environment are bundled in the image.

Pull or build the image

The pre-built image is serenata/rosie. No extra configuration is needed.

Run the analysis

Mount a host directory to /tmp/serenata-data so the output file is accessible after the container exits.

Chamber of Deputies
Federal Senate

docker run --rm \
  -v /tmp/serenata-data:/tmp/serenata-data \
  serenata/rosie \
  python rosie.py run chamber_of_deputies

docker run --rm \
  -v /tmp/serenata-data:/tmp/serenata-data \
  serenata/rosie \
  python rosie.py run federal_senate

Retrieve the output

After the container exits, the results are at:

/tmp/serenata-data/suspicions.xz

Running tests with Docker

docker run --rm \
  -v /tmp/serenata-data:/tmp/serenata-data \
  serenata/rosie \
  python rosie.py test

Running without Docker

Install Anaconda

Download and install Anaconda for your platform.

Create and activate the environment

conda update conda
conda create --name serenata python=3
conda activate serenata

Install Python dependencies

pip install -r requirements.txt

scipy is listed before scikit-learn in requirements.txt intentionally — this ordering is required for the wheel to build correctly.

Run the analysis

Chamber of Deputies
Federal Senate

python rosie.py run chamber_of_deputies

python rosie.py run federal_senate

Output is written to /tmp/serenata-data/suspicions.xz by default.

Custom output directory

Use the --output flag to write the output to a different location:

python rosie.py run chamber_of_deputies --output /my/serenata/directory/

The directory will be created automatically if it does not exist.

Running Tests

All tests
Core module only
Chamber of Deputies only
Federal Senate only

python rosie.py test

python rosie.py test core

python rosie.py test chamber_of_deputies

python rosie.py test federal_senate

Tests are discovered automatically using unittest.TestLoader.discover starting from the rosie/ directory (or a subdirectory when a module name is passed). The runner exits with code 1 if any test fails.

def test(module=None):
    loader = unittest.TestLoader()
    tests_path = 'rosie'

    if module:
        tests_path = os.path.join(tests_path, module)

    tests = loader.discover(tests_path)
    testRunner = unittest.runner.TextTestRunner()
    result = testRunner.run(tests)
    if not result.wasSuccessful():
        exit(1)

Output File

After a successful run, Rosie produces:

<output_directory>/suspicions.xz

This is a UTF-8 CSV compressed with xz. Each row corresponds to one reimbursement and includes:

The unique identifiers for the reimbursement (applicant_id, year, document_id for Chamber of Deputies)
One boolean column per classifier (e.g. meal_price_outlier, invalid_cnpj_cpf)

The file is written by the Core engine:

output = os.path.join(self.data_path, 'suspicions.xz')
kwargs = dict(compression='xz', encoding='utf-8', index=False)
self.suspicions.to_csv(output, **kwargs)

Trained classifier models (except MonthlySubquotaLimitClassifier) are cached as .pkl files in the output directory using joblib. Re-running Rosie will reuse these cached models, making subsequent runs faster.

Overview

Getting Started

Rosie (AI Engine)

Jarbas (Web Platform)

Contributing

Running with Docker (Recommended)

Running tests with Docker

Running without Docker

Custom output directory

Running Tests

Output File

Build docs developers (and LLMs) love

Overview

Getting Started

Rosie (AI Engine)

Jarbas (Web Platform)

Contributing

​Running with Docker (Recommended)

​Running tests with Docker

​Running without Docker

​Custom output directory

​Running Tests

​Output File

Build docs developers (and LLMs) love

Running with Docker (Recommended)

Running tests with Docker

Running without Docker

Custom output directory

Running Tests

Output File