Data Sources

Overview

The Resource Service supports three types of data sources for collecting sustainability indicators. Each source type has its own configuration schema and wrapper behavior:

API

Real-time data from REST APIs with authentication support

CSV

Comma-separated value files with structured tabular data

XLSX

Excel spreadsheets with support for multiple sheets

Source Type Enumeration

Source types are defined as an enumeration:

class SourceType(str, Enum):
    API = "API"
    CSV = "CSV"
    XLSX = "XLSX"

API Sources

API sources enable continuous, real-time data collection from REST endpoints.

Configuration Schema

class APISourceConfig(BaseModel):
    location: str                      # API endpoint URL
    auth_type: str = "none"            # Authentication method
    api_key: Optional[str] = None      # API key value
    api_key_header: str = "X-API-Key"  # Header name for API key
    bearer_token: Optional[str] = None # Bearer token
    username: Optional[str] = None     # Basic auth username
    password: Optional[str] = None     # Basic auth password
    timeout_seconds: int = 30          # Request timeout
    date_field: Optional[str] = None   # Timestamp field name
    value_field: Optional[str] = None  # Value field name
    custom_headers: Dict[str, str] = {}     # Additional headers
    query_params: Dict[str, str] = {}       # Default query parameters

Authentication Methods

The service supports four authentication types:

API Key
Bearer Token
Basic Auth
None

Pass an API key in a custom header:

{
  "source_type": "API",
  "source_config": {
    "location": "https://api.example.com/data",
    "auth_type": "api_key",
    "api_key": "sk_live_abc123xyz",
    "api_key_header": "X-API-Key"
  }
}

Generated Auth Code:

headers = {"X-API-Key": "sk_live_abc123xyz"}
response = requests.get(endpoint, headers=headers)

Use OAuth 2.0 bearer token authentication:

{
  "source_type": "API",
  "source_config": {
    "location": "https://api.example.com/data",
    "auth_type": "bearer",
    "bearer_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
  }
}

Generated Auth Code:

headers = {"Authorization": "Bearer eyJhbGci..."}
response = requests.get(endpoint, headers=headers)

HTTP Basic Authentication with username and password:

{
  "source_type": "API",
  "source_config": {
    "location": "https://api.example.com/data",
    "auth_type": "basic",
    "username": "user@example.com",
    "password": "secretpass123"
  }
}

Generated Auth Code:

import base64
credentials = base64.b64encode(b"user@example.com:secretpass123").decode()
headers = {"Authorization": f"Basic {credentials}"}
response = requests.get(endpoint, headers=headers)

No authentication required (public API):

{
  "source_type": "API",
  "source_config": {
    "location": "https://api.open-data.gov/indicators",
    "auth_type": "none"
  }
}

Custom Headers and Parameters

You can specify additional headers and query parameters:

{
  "source_type": "API",
  "source_config": {
    "location": "https://api.example.com/data",
    "auth_type": "none",
    "custom_headers": {
      "Accept": "application/json",
      "User-Agent": "ResourceService/1.0"
    },
    "query_params": {
      "region": "europe",
      "format": "json"
    }
  }
}

Generated Request:

headers = {
    "Accept": "application/json",
    "User-Agent": "ResourceService/1.0"
}
params = {"region": "europe", "format": "json"}
response = requests.get(
    "https://api.example.com/data",
    headers=headers,
    params=params,
    timeout=30
)

AI-Assisted API Exploration

For API sources, the AI can use tools to explore the endpoint during wrapper generation:

async def _call_model_with_tools(
    self,
    prompt: str,
    auth_config: Dict[str, Any],
    max_tool_calls: int = 15,
    max_chars: int = 2500,
    wrapper_id: str = None,
) -> str:
    runtime = create_tool_runtime(
        auth_config=auth_config,
        max_chars=max_chars,
    )
    
    config = types.GenerateContentConfig(
        tools=runtime.get_tools(),
        tool_config=types.ToolConfig(
            function_calling_config=types.FunctionCallingConfig(mode="AUTO")
        ),
    )
    
    response = await self.client.aio.models.generate_content(
        model=self.model_name,
        contents=contents,
        config=config,
    )

The AI makes actual API calls during generation to understand response structure, field naming, and data formats. This results in more accurate, robust wrappers.

API Sample Extraction

Before generation, the system fetches a sample response:

def get_api_sample(
    self, endpoint: str, auth_config: Dict[str, Any], max_chars: int = 2500
) -> str:
    # Prepare headers
    headers = auth_config.get("headers", {})
    
    # Add API key if specified
    if "api_key" in auth_config and "header_name" in auth_config:
        headers[auth_config["header_name"]] = auth_config["api_key"]
    
    # Make API call
    response = requests.get(
        endpoint,
        headers=headers,
        params=auth_config.get("params", {}),
        timeout=30
    )
    
    # Parse and truncate response
    json_data = response.json()
    sample_text = json.dumps(json_data, indent=2)
    if len(sample_text) > max_chars:
        sample_text = sample_text[:max_chars] + "\n... (truncated)"
    
    return f"API Response (Status: {response.status_code}):\n{sample_text}"

API Wrapper Behavior

Execution Mode: Continuous (runs indefinitely) Phase Progression:

Historical Phase: Collects all past data
Continuous Phase: Polls for new data based on periodicity

Lifecycle:

Created → Generating → Executing → (runs until stopped)
Automatically resumes after service restart using checkpoints

CSV Sources

CSV sources are ideal for one-time bulk imports or periodic file uploads.

Configuration Schema

class CSVSourceConfig(BaseModel):
    file_id: str                       # Uploaded file identifier
    location: Optional[str] = None     # Computed file path (backend-populated)

The location field is automatically populated by the backend when a file is uploaded. You only need to provide the file_id received from the file upload endpoint.

File Upload Flow

Upload CSV file

curl -X POST "http://api.example.com/resources/wrappers/files/upload" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@temperature_data.csv"

Response:

{
  "file_id": "csv_a1b2c3d4e5f6",
  "filename": "temperature_data.csv",
  "size": 45678
}

Create wrapper with file_id

{
  "source_type": "CSV",
  "source_config": {
    "file_id": "csv_a1b2c3d4e5f6"
  },
  "metadata": {
    "name": "Temperature Data - Lisbon",
    "domain": "Environment",
    ...
  }
}

Backend computes location

The service automatically computes and stores the file path:

file_path = f"/app/uploads/{file_id}.csv"
source_config.location = file_path

CSV Sample Extraction

The generator extracts the first 20 lines for AI analysis:

def get_csv_sample(self, file_path: str, max_lines: int = 20) -> str:
    sample_lines = []
    with open(file_path, "r", encoding="utf-8") as file:
        for i, line in enumerate(file):
            if i >= max_lines:
                break
            sample_lines.append(line.strip())
    
    return "\n".join(sample_lines)

Example Sample:

date,temperature,humidity
2024-01-01,15.5,72
2024-01-02,16.2,68
2024-01-03,14.8,75
...

CSV Wrapper Behavior

Execution Mode: Once (processes file and completes) Lifecycle:

Created → Generating → Executing → Completed
Status changes to COMPLETED after all rows are processed

CSV wrappers process the entire file in a single execution. For large files (>100,000 rows), consider splitting into smaller chunks or using an API source if possible.

XLSX Sources

XLSX sources support Excel spreadsheets with multiple sheets and complex structures.

Configuration Schema

class XLSXSourceConfig(BaseModel):
    file_id: str                       # Uploaded file identifier
    location: Optional[str] = None     # Computed file path (backend-populated)

File Upload Flow

Identical to CSV, but with .xlsx files:

curl -X POST "http://api.example.com/resources/wrappers/files/upload" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@emissions_report.xlsx"

XLSX Sample Extraction

Extracts metadata and samples from all sheets:

def get_xlsx_sample(self, file_path: str, max_lines_per_sheet: int = 15) -> str:
    excel_file = pd.ExcelFile(file_path)
    sheet_names = excel_file.sheet_names
    
    sample_data = []
    sample_data.append(f"XLSX File: {file_path}")
    sample_data.append(f"Total sheets: {len(sheet_names)}")
    sample_data.append(f"Sheet names: {sheet_names}")
    
    for sheet_name in sheet_names:
        sample_data.append(f"\n=== Sheet: {sheet_name} ===")
        df = pd.read_excel(file_path, sheet_name=sheet_name, nrows=max_lines_per_sheet)
        sample_data.append(f"Columns: {list(df.columns)}")
        sample_data.append(f"Shape: {df.shape}")
        sample_data.append("Sample data:")
        sample_data.append(df.to_string())
    
    return "\n".join(sample_data)

Example Sample:

XLSX File: /app/uploads/emissions_report.xlsx
Total sheets: 3
Sheet names: ['2023', '2024', 'Metadata']

=== Sheet: 2023 ===
Columns: ['Month', 'CO2_tons', 'Region']
Shape: (12, 3)
Sample data:
   Month  CO2_tons     Region
0    Jan     1250.5     North
1    Feb     1180.2     North
...

=== Sheet: 2024 ===
...

XLSX Wrapper Behavior

Execution Mode: Once (processes all sheets and completes) Multi-Sheet Handling:

AI determines which sheet(s) contain the actual data
Can extract from multiple sheets if needed
Handles varying column structures across sheets

The AI is particularly good at identifying the correct sheet when files contain metadata sheets, summary sheets, and data sheets together.

Source Configuration Comparison

Feature Comparison
Use Cases
Performance

Feature	API	CSV	XLSX
Execution	Continuous	Once	Once
Authentication	Yes	No	No
File Upload	No	Yes	Yes
Historical Data	Yes	Full file	Full file
Real-time Updates	Yes	No	No
Multiple Sources	Single endpoint	Single file	Multiple sheets
Checkpointing	Yes	No	No
Auto-resume	Yes	No	No

Data Source Selection Guide

Choose the appropriate source type based on your needs:

Assess data update frequency

Real-time or frequent updates → API
Periodic updates → API with appropriate periodicity
One-time or rare updates → CSV or XLSX

Evaluate data structure

Simple tabular data → CSV
Multiple related datasets → XLSX
Dynamic/nested structures → API

Consider data source constraints

Authentication required → API (supports auth)
No API available → File upload (CSV/XLSX)
Rate limits → Adjust API periodicity

Plan for scalability

Growing dataset → API (incremental)
Fixed historical data → CSV/XLSX (one-time)
Continuous monitoring → API

Best Practices

API Sources

✅ Do:

Test the API endpoint before creating a wrapper
Use the most specific date_field and value_field possible
Set appropriate timeout values for slow APIs
Use query parameters to filter data at the source

❌ Don’t:

Hard-code credentials in the configuration
Use extremely short periodicities that could hit rate limits
Ignore authentication errors (check logs immediately)

CSV Sources

✅ Do:

Use UTF-8 encoding
Include clear column headers in the first row
Use consistent date formats (ISO 8601 recommended)
Keep files under 50MB for best performance

❌ Don’t:

Use non-standard delimiters (stick to commas)
Include summary rows or merged cells
Use multiple date/time formats in the same file

XLSX Sources

✅ Do:

Use descriptive sheet names
Place data in the first row/column without gaps
Use consistent formatting within each sheet
Name sheets according to their content (e.g., “2024_Data”)

❌ Don’t:

Use heavily formatted spreadsheets with merged cells
Include charts or pivot tables in data sheets
Use formulas in data cells (export values only)
Mix data types in the same column

Troubleshooting

API Connection Errors

Symptoms: Wrapper status is ERROR, logs show connection timeoutsSolutions:

Verify the endpoint URL is correct and accessible
Check authentication credentials are valid
Increase timeout_seconds for slow APIs
Ensure network connectivity from the service

CSV Parsing Errors

Symptoms: Wrapper completes but no data points sentSolutions:

Verify file encoding is UTF-8
Check for consistent column structure
Ensure date columns use recognizable formats
Look for special characters or malformed rows

XLSX Sheet Detection Issues

Symptoms: AI selects wrong sheet or misses dataSolutions:

Use descriptive sheet names (avoid generic names like “Sheet1”)
Move metadata/summary sheets to the end
Ensure data sheets have clear headers in row 1
Manually specify the sheet in indicator description

Next Steps

Create a Wrapper

Step-by-step guide to creating wrappers for each source type

Wrappers

Learn more about how wrappers work and execute

File Upload API

API reference for uploading CSV and XLSX files

Wrapper API

Complete API reference for wrapper management

Get Started

Core Concepts

Guides

Deployment

Overview

API

CSV

XLSX

Source Type Enumeration

API Sources

Configuration Schema

Authentication Methods

Custom Headers and Parameters

AI-Assisted API Exploration

API Sample Extraction

API Wrapper Behavior

CSV Sources

Configuration Schema

File Upload Flow

CSV Sample Extraction

CSV Wrapper Behavior

XLSX Sources

Configuration Schema

File Upload Flow

XLSX Sample Extraction

XLSX Wrapper Behavior

Source Configuration Comparison

Data Source Selection Guide

Best Practices

Troubleshooting

Next Steps

Create a Wrapper

Wrappers

File Upload API

Wrapper API

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Deployment

​Overview

API

CSV

XLSX

​Source Type Enumeration

​API Sources

​Configuration Schema

​Authentication Methods

​Custom Headers and Parameters

​AI-Assisted API Exploration

​API Sample Extraction

​API Wrapper Behavior

​CSV Sources

​Configuration Schema

​File Upload Flow

​CSV Sample Extraction

​CSV Wrapper Behavior

​XLSX Sources

​Configuration Schema

​File Upload Flow

​XLSX Sample Extraction

​XLSX Wrapper Behavior

​Source Configuration Comparison

​Data Source Selection Guide

​Best Practices

​Troubleshooting

​Next Steps

Create a Wrapper

Wrappers

File Upload API

Wrapper API

Build docs developers (and LLMs) love

Overview

Source Type Enumeration

API Sources

Configuration Schema

Authentication Methods

Custom Headers and Parameters

AI-Assisted API Exploration

API Sample Extraction

API Wrapper Behavior

CSV Sources

Configuration Schema

File Upload Flow

CSV Sample Extraction

CSV Wrapper Behavior

XLSX Sources

Configuration Schema

File Upload Flow

XLSX Sample Extraction

XLSX Wrapper Behavior

Source Configuration Comparison

Data Source Selection Guide

Best Practices

Troubleshooting

Next Steps