Overview
Before generating wrappers from file-based data sources, you need to upload your CSV or Excel files to the Resource Service. The system validates the files, extracts preview data, and returns a file_id that you’ll use for wrapper generation.
Supported File Types
CSV Comma-separated values (.csv)
XLSX Excel 2007+ format (.xlsx)
XLS Legacy Excel format (.xls)
File Requirements
Files must meet these requirements to be accepted:
Maximum file size: 50 MB
Minimum 2 columns (for time-series data)
Valid CSV or Excel format
At least one row of data
Uploading a File
Using cURL
curl -X POST "http://localhost:8000/api/v1/files/upload" \
-F "file=@/path/to/carbon_emissions.csv"
Using Python
import requests
with open ( 'carbon_emissions.csv' , 'rb' ) as f:
files = { 'file' : ( 'carbon_emissions.csv' , f, 'text/csv' )}
response = requests.post(
'http://localhost:8000/api/v1/files/upload' ,
files = files
)
if response.status_code == 200 :
upload_result = response.json()
print ( f "File ID: { upload_result[ 'file_id' ] } " )
print ( f "Status: { upload_result[ 'validation_status' ] } " )
print ( f " \n Preview: \n { upload_result[ 'preview_data' ] } " )
else :
print ( f "Upload failed: { response.json()[ 'detail' ] } " )
Using JavaScript (Browser)
const uploadFile = async ( file ) => {
const formData = new FormData ();
formData . append ( 'file' , file );
const response = await fetch ( 'http://localhost:8000/api/v1/files/upload' , {
method: 'POST' ,
body: formData
});
if ( response . ok ) {
const result = await response . json ();
console . log ( 'File ID:' , result . file_id );
console . log ( 'Validation:' , result . validation_status );
console . log ( 'Preview:' , result . preview_data );
return result ;
} else {
const error = await response . json ();
throw new Error ( error . detail );
}
};
// Usage with file input
document . querySelector ( '#fileInput' ). addEventListener ( 'change' , async ( e ) => {
const file = e . target . files [ 0 ];
try {
const result = await uploadFile ( file );
// Use result.file_id for wrapper generation
} catch ( error ) {
console . error ( 'Upload failed:' , error . message );
}
});
Upload Response
Successful Upload:
{
"file_id" : "123e4567-e89b-12d3-a456-426614174000" ,
"filename" : "carbon_emissions.csv" ,
"file_size" : 15420 ,
"preview_data" : "year emissions \n 2020 5234.2 \n 2021 5198.3 \n 2022 5087.1 \n 2023 4956.8 \n 2024 4823.5" ,
"validation_status" : "valid" ,
"validation_errors" : null ,
"message" : "File uploaded successfully"
}
Upload with Validation Errors:
{
"file_id" : "987f6543-e21b-43d1-a654-426614174001" ,
"filename" : "invalid_data.csv" ,
"file_size" : 234 ,
"preview_data" : null ,
"validation_status" : "invalid" ,
"validation_errors" : [
"File must have at least 2 columns for time series data"
],
"message" : "File uploaded but has validation errors"
}
Even files with validation errors are stored and assigned a file_id. You can inspect the errors and fix the data before re-uploading.
File Validation
The system performs automatic validation:
File Type Check
Verifies the file extension and content type are supported.
Size Check
Ensures file size doesn’t exceed 50 MB.
Content Parsing
Attempts to parse the file as CSV or Excel.
Structure Validation
Checks for minimum 2 columns and at least one data row.
Preview Generation
Extracts first 5 rows for preview display.
Validation Statuses
Status Description validFile passed all validation checks invalidFile has validation errors (see validation_errors) pendingValidation in progress (rare, for very large files)
Get details about an uploaded file:
curl "http://localhost:8000/api/v1/files/123e4567-e89b-12d3-a456-426614174000"
Listing Uploaded Files
View all uploaded files (useful for debugging):
curl "http://localhost:8000/api/v1/files/"
Response:
{
"files" : [
{
"file_id" : "123e4567-e89b-12d3-a456-426614174000" ,
"filename" : "carbon_emissions.csv" ,
"file_size" : 15420 ,
"content_type" : "text/csv" ,
"upload_timestamp" : "2026-03-03T10:15:00Z" ,
"preview_data" : "..." ,
"validation_status" : "valid" ,
"validation_errors" : null
},
{
"file_id" : "987f6543-e21b-43d1-a654-426614174001" ,
"filename" : "energy_data.xlsx" ,
"file_size" : 28630 ,
"content_type" : "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" ,
"upload_timestamp" : "2026-03-03T09:45:00Z" ,
"preview_data" : "..." ,
"validation_status" : "valid" ,
"validation_errors" : null
}
],
"count" : 2
}
The file_path field is excluded from list responses for security. It’s only available internally.
Deleting Uploaded Files
Remove an uploaded file from storage:
curl -X DELETE "http://localhost:8000/api/v1/files/123e4567-e89b-12d3-a456-426614174000"
Response:
{
"message" : "File deleted successfully"
}
Deleting a file does NOT delete wrappers that were generated from it. Wrappers store their own copy of the file path.
Using Files for Wrapper Generation
Once uploaded, use the file_id to generate a wrapper:
curl -X POST "http://localhost:8000/api/v1/wrappers/generate" \
-H "Content-Type: application/json" \
-d '{
"source_type": "CSV",
"source_config": {
"file_id": "123e4567-e89b-12d3-a456-426614174000"
},
"metadata": {
"name": "Carbon Emissions",
"domain": "Environment",
"subdomain": "Climate",
"description": "Annual CO2 emissions",
"unit": "tons",
"source": "EPA",
"scale": "National",
"governance_indicator": false,
"periodicity": "annual"
}
}'
You don’t need to provide the location field - the backend automatically resolves the file_id to the internal file path.
For best results, structure your CSV files with:
Header row with column names
Date/time column for temporal data
Value column(s) with numeric data
Consistent formatting (dates, numbers, missing values)
Example CSV:
year, emissions, temperature_change
2020, 5234.2, 1.02
2021, 5198.3, 1.09
2022, 5087.1, 1.15
2023, 4956.8, 1.17
2024, 4823.5, 1.24
For Excel files:
Use the first sheet for data (or specify sheet name)
Include headers in the first row
Avoid merged cells in data area
Use consistent data types per column
Remove formulas - values only
Both .xlsx (Excel 2007+) and .xls (legacy) formats are supported, but .xlsx is recommended for better compatibility.
Error Handling
Common Upload Errors
Error: {
"detail" : "File too large. Maximum size: 50.0MB"
}
Solution: Compress the file, remove unnecessary data, or split into multiple files.
Unsupported file type (400)
Error: {
"detail" : "File type not supported. Allowed: .csv, .xlsx, .xls"
}
Solution: Convert your file to CSV or Excel format.
Error: {
"detail" : "File upload failed: Error reading file: File appears to be corrupted"
}
Solution: Verify file isn’t corrupted. Try opening in Excel/Numbers and re-saving.
Error: {
"detail" : "No file provided"
}
Solution: Ensure you’re sending the file as multipart/form-data with field name “file”.
Storage Details
Uploaded files are stored:
Location: /app/uploads/{file_id}/{filename}
Organization: Each file gets its own subdirectory by file_id
Metadata: Stored in MongoDB uploaded_files collection
Retention: Files persist until explicitly deleted
File paths are internal and not exposed in API responses for security. Use the file_id to reference files.
Best Practices
Validate Before Upload Check file format, size, and data quality locally before uploading.
Use Descriptive Filenames Name files clearly to identify data source and content.
Review Preview Data Always check the preview_data in the response to verify correct parsing.
Clean Up Old Files Delete files you no longer need to save storage space.
Next Steps
Generate Wrappers Use your uploaded file to generate an AI-powered wrapper
Wrapper Execution Learn how wrappers execute and process file data
API Reference View complete file upload API documentation
Monitoring Monitor wrapper health after generation