The Innova AI Engine testing strategy is built around a single principle: pure domain logic is tested deterministically; every external dependency is mocked. BKT grid search and IRT 2PL MLE are mathematical algorithms with known properties, so they are covered withDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/vruizz22/innova-ai-engine/llms.txt
Use this file to discover all available pages before exploring further.
hypothesis property tests that verify parameter recovery. LLM and OCR providers are replaced with fakes that assert the engine sends the correct prompt structure (prompt caching headers, forced tool_choice). AWS services (SQS, S3) are emulated in-process with moto. Real provider calls are quarantined behind a smoke marker and must be run manually by developers who have live API keys — they are excluded from all CI jobs.
Test categories
Unit Tests
BKT/IRT math, LLM classifier (mocked Anthropic provider), guide pipeline components, exercise generator. No I/O; all fast and deterministic.
Property Tests
BKT calibration with
hypothesis — generates synthetic attempt histories and verifies that the grid-search recovers the ground-truth parameters within tolerance.Integration Tests
SQS/S3 flows using
moto — exercises the full handler path including queue polling, S3 uploads, and Postgres writes against in-process AWS mocks.Smoke Tests
Real Anthropic/Gemini API calls. Marked
@pytest.mark.smoke. Excluded from all CI jobs — must be run manually by a developer with live API keys.Running tests
The CI workflow (
ci.yml) runs pytest tests/ -m "not smoke" --cov=src --cov-fail-under=75 -q on every push and PR. Coverage below 75% fails the build.The smoke marker
The smoke pytest marker is declared in pyproject.toml:
@pytest.mark.smoke. These tests are excluded from the default uv run pytest run and from all CI jobs — ci.yml always passes -m "not smoke", so smoke tests must be run manually by a developer who has live API keys available.
Coverage gate
The--cov-fail-under=75 flag in CI enforces a minimum line coverage of 75% across the src/ package. Coverage is measured with pytest-cov, configured in pyproject.toml:
What the tests cover
| Area | Test approach |
|---|---|
| BKT parameter calibration | hypothesis property tests — generate synthetic attempt histories from known (p_l0, p_transit, p_slip, p_guess), run grid search, assert recovery within ±tolerance |
| IRT 2PL calibration | Unit tests on scipy.optimize L-BFGS-B fit; boundary checks on a ∈ [0.5, 3.0] and b ∈ [-3, 3] |
| LLM classifier | Mocked Anthropic provider; asserts cache_control on the system block and forced tool_choice in the request; verifies batch-20 grouping by domain |
| Guide pipeline (A6–A8) | End-to-end handler test with moto S3 + mocked Gemini/Anthropic; asserts question extraction, solution key generation, and submission grading each write the correct Postgres rows |
| SQS/S3 flows | moto-backed SQS and S3; asserts message acknowledgement (ReportBatchItemFailures) and dead-letter semantics on provider error |
| OCR worker | Mocked Gemini response below confidence threshold → asserts escalation to Claude vision |
Type checking with Pyright
Pyright is configured in strict mode inpyproject.toml:
Optional. The reportUnknown* suppressions are pragmatic exceptions for third-party libraries that ship without stubs (e.g., asyncpg, boto3).
Lint with Ruff
Ruff is configured inpyproject.toml with the following rule sets:
| Rule set | What it checks |
|---|---|
E / F | PEP 8 style + pyflakes (unused imports, undefined names) |
I | Import order (isort-compatible) |
N | PEP 8 naming conventions |
UP | pyupgrade — modernise syntax to Python 3.11 |
B | flake8-bugbear — likely bugs and design issues |
RUF | Ruff-native rules |
T201 | Disallows bare print() calls in production code |
ruff format --check src/ tests/ to enforce consistent formatting (double quotes, line length 100).