Documentation Index
Fetch the complete documentation index at: https://mintlify.com/open-contracting/cardinal-rs/llms.txt
Use this file to discover all available pages before exploring further.
The Prepare module corrects, normalizes, and validates OCDS data. It applies defaults, redacts sensitive information, fixes common data quality issues, and validates against OCDS codelists.
Function Signature
impl Prepare {
pub fn run<W: Write + Send>(
buffer: impl BufRead + Send,
settings: Settings,
output: &mut W,
errors: &mut W,
) -> Result<(), anyhow::Error>
}
buffer
impl BufRead + Send
required
Buffered reader containing line-delimited JSON releases to process.
Configuration for data preparation, including defaults, redactions, corrections, and modifications.
Output writer where corrected releases are written (one JSON object per line).
Error writer where data quality issues are logged in CSV format.
Basic Usage
use std::fs::File;
use std::io::{BufReader, BufWriter};
use ocdscardinal::{Prepare, Settings};
fn main() -> Result<(), anyhow::Error> {
// Input: raw OCDS data
let input = File::open("raw_releases.jsonl")?;
let reader = BufReader::new(input);
// Output: corrected data
let output_file = File::create("corrected_releases.jsonl")?;
let mut output = BufWriter::new(output_file);
// Errors: data quality report
let errors_file = File::create("errors.csv")?;
let mut errors = BufWriter::new(errors_file);
// Run preparation
let settings = Settings::default();
Prepare::run(reader, settings, &mut output, &mut errors)?;
Ok(())
}
Settings Configuration
Defaults
Apply default values to missing fields:
use ocdscardinal::{Settings, Defaults};
let mut settings = Settings::default();
settings.defaults = Some(Defaults {
currency: Some("USD".to_string()),
item_classification_scheme: Some("UNSPSC".to_string()),
bid_status: Some("valid".to_string()),
award_status: Some("active".to_string()),
party_roles: Some(true),
});
- currency: Default currency for bids/awards without
value.currency
- item_classification_scheme: Default scheme for items without
classification.scheme
- bid_status: Default status for bids without
status
- award_status: Default status for awards without
status
- party_roles: If
true, populate parties[].roles based on where organizations appear
Redactions
Remove sensitive information:
use ocdscardinal::{Settings, Redactions};
let mut settings = Settings::default();
settings.redactions = Some(Redactions {
amount: Some("0|999999".to_string()), // Pipe-separated amounts
organization_id: Some("REDACTED|UNKNOWN".to_string()),
});
- amount: Remove
value.amount if it matches any of these values
- organization_id: Remove
id from organizations matching these IDs
Corrections
Fix common data quality issues:
use ocdscardinal::{Settings, Corrections};
let mut settings = Settings::default();
settings.corrections = Some(Corrections {
award_status_by_contract_status: Some(true),
});
- award_status_by_contract_status: If all contracts for an award are cancelled, set the award status to “cancelled”
Modifications
Transform data structure:
use ocdscardinal::{Settings, Modifications};
let mut settings = Settings::default();
settings.modifications = Some(Modifications {
move_auctions: Some(true),
prefix_buyer_or_procuring_entity_id: Some("PE-".to_string()),
prefix_tenderer_or_supplier_id: Some("ORG-".to_string()),
split_procurement_method_details: Some("-".to_string()),
});
- move_auctions: Move bids from
/auctions to /bids/details
- prefix_buyer_or_procuring_entity_id: Add prefix to buyer/procuring entity IDs
- prefix_tenderer_or_supplier_id: Add prefix to tenderer/supplier IDs
- split_procurement_method_details: Split
procurementMethodDetails on this separator and keep only the first part
Codelists
Map non-standard codelist values to standard OCDS codes:
codelists
Option<HashMap<Codelist, HashMap<String, String>>>
use std::collections::HashMap;
use ocdscardinal::{Settings, Codelist};
let mut settings = Settings::default();
let mut bid_status_map = HashMap::new();
bid_status_map.insert("qualified".to_string(), "valid".to_string());
bid_status_map.insert("passed".to_string(), "valid".to_string());
let mut award_status_map = HashMap::new();
award_status_map.insert("Active".to_string(), "active".to_string());
let mut codelists = HashMap::new();
codelists.insert(Codelist::BidStatus, bid_status_map);
codelists.insert(Codelist::AwardStatus, award_status_map);
settings.codelists = Some(codelists);
Corrected Data
The output writer receives one JSON object per line:
{"ocid":"ocds-213czf-1","buyer":{"id":"PE-GOV001"},"tender":{...},"bids":{"details":[...]},"awards":[...]}
{"ocid":"ocds-213czf-2","buyer":{"id":"PE-GOV002"},"tender":{...},"bids":{"details":[...]},"awards":[...]}
Error Log
The errors writer receives a CSV with data quality issues:
line,ocid,path,index,value,message
15,ocds-213czf-1,/bids/details[]/value/currency,0,,not set
42,ocds-213czf-5,/bids/details[]/status,1,"pending",invalid
78,ocds-213czf-9,/awards[]/items[]/classification/scheme,0.2,,not set
Line number in the input file (1-based)
OCID of the release with the issue
JSON path to the problematic field
Array index or indices (e.g., “0” for single array, “2.1” for nested arrays)
The problematic value (empty if missing)
Error description (e.g., “not set”, “invalid”, “is zero”)
Complete Example
use std::collections::HashMap;
use std::fs::File;
use std::io::{BufReader, BufWriter, Write};
use ocdscardinal::{Prepare, Settings, Defaults, Modifications, Codelist};
fn prepare_ocds_data() -> Result<(), anyhow::Error> {
// Setup input and outputs
let input = File::open("raw_releases.jsonl")?;
let reader = BufReader::new(input);
let output_file = File::create("prepared_releases.jsonl")?;
let mut output = BufWriter::new(output_file);
let errors_file = File::create("quality_issues.csv")?;
let mut errors = BufWriter::new(errors_file);
// Configure comprehensive settings
let mut settings = Settings::default();
// Apply defaults
settings.defaults = Some(Defaults {
currency: Some("USD".to_string()),
item_classification_scheme: Some("UNSPSC".to_string()),
bid_status: Some("valid".to_string()),
award_status: Some("active".to_string()),
party_roles: Some(true),
});
// Prefix organization IDs
settings.modifications = Some(Modifications {
move_auctions: Some(true),
prefix_buyer_or_procuring_entity_id: Some("GOV-".to_string()),
prefix_tenderer_or_supplier_id: Some("ORG-".to_string()),
split_procurement_method_details: None,
});
// Map non-standard codes
let mut bid_status_map = HashMap::new();
bid_status_map.insert("qualified".to_string(), "valid".to_string());
let mut codelists = HashMap::new();
codelists.insert(Codelist::BidStatus, bid_status_map);
settings.codelists = Some(codelists);
// Run preparation
Prepare::run(reader, settings, &mut output, &mut errors)?;
// Ensure all data is flushed
output.flush()?;
errors.flush()?;
println!("Data preparation complete!");
println!("Corrected data: prepared_releases.jsonl");
println!("Quality issues: quality_issues.csv");
Ok(())
}
Validation
Prepare automatically validates codelist values against OCDS standards:
- bid_status: Must be one of
invited, pending, valid, disqualified, withdrawn
- award_status: Must be one of
pending, active, unsuccessful, cancelled
Invalid values are logged to the errors output but not modified.
Prepare performs these transformations:
- ID normalization: Converts numeric IDs to strings
- Object coercion: Converts single objects to arrays where OCDS expects arrays (e.g.,
suppliers, tenderers)
- Role inference: Populates
parties[].roles based on where organizations appear in the release
- Auction migration: Moves bid data from
/auctions/*/stages/*/bids to /bids/details
- Parallel processing: Uses Rayon for multi-threaded execution
- Streaming I/O: Buffers both input and output for efficiency
- Error isolation: Invalid lines don’t stop processing
For deterministic output order, set RAYON_NUM_THREADS=1. This is useful for testing but reduces performance.
Notes
- Lines that are not JSON objects are skipped with a warning
- Empty lines (whitespace only) are silently skipped
- The function flushes output buffers before returning to ensure all data is written
- Organization IDs that match redaction patterns are completely removed (not just masked)