Skip to main content
The File Upload Agent processes uploaded files by parsing them, generating AI descriptions, uploading to storage, and updating conversation state with dataset metadata.

Function Signature

// src/agents/fileUpload/index.ts
export async function fileUploadAgent(input: {
  conversationState: ConversationState;
  files: File[];
  userId: string;
}): Promise<{
  uploadedDatasets: Array<{
    id: string;
    filename: string;
    description: string;
    path?: string;
    size?: number;
  }>;
  errors: string[];
}>;

Supported File Types

Spreadsheets

  • CSV (.csv)
  • Excel (.xlsx, .xls)

Documents

  • PDF (.pdf)
  • Markdown (.md)
  • Text (.txt)

Data

  • JSON (.json)

Images

  • PNG, JPG (OCR with Tesseract)

File Processing Pipeline

1

Parse file content

Extract text from each file using format-specific parsers:
// src/agents/fileUpload/parsers.ts
const parsed = await parseFile(buffer, filename, mimeType);
Returns { text, metadata } with extracted content.
2

Upload to storage

Upload raw file to S3-compatible storage:
const uploadedFiles = await uploadFilesToStorage(
  userId,
  conversationStateId,
  rawFiles
);
Files stored at uploads/{conversationStateId}/{filename}.
3

Generate AI descriptions

Use LLM to generate dataset descriptions:
const dataset = {
  id: generateUUID(),
  filename: file.name,
  description: await generateDescription(parsedText),
  path: uploadPath,
  size: buffer.length
};
4

Update conversation state

Add datasets to conversation state:
conversationState.values.uploadedDatasets = [
  ...existing,
  ...newDatasets
];

File Size Limits

// src/agents/fileUpload/config.ts
export const MAX_FILE_SIZE_MB = 50;
Files exceeding 50MB are rejected with error message.

Usage Example

// src/routes/chat.ts
if (files.length > 0) {
  const fileResult = await fileUploadAgent({
    conversationState,
    files,
    userId: state.values.userId || "unknown"
  });
  
  logger.info({
    uploadedDatasets: fileResult.uploadedDatasets,
    errors: fileResult.errors,
    fileCount: files.length
  }, "file_upload_agent_completed");
}

AI-Generated Descriptions

The agent generates concise, informative descriptions: Example CSV (gene_expression.csv):
gene_id,sample1,sample2,sample3
TP53,12.3,14.1,11.8
MYC,8.2,9.1,7.9
...
Generated description:
Gene expression matrix with 3 samples. Contains normalized expression values 
for genes including TP53 and MYC. Suitable for differential expression analysis.

Parser Implementations

CSV Parser

// src/agents/fileUpload/parsers.ts
import Papa from "papaparse";

const result = Papa.parse(content, {
  header: true,
  skipEmptyLines: true
});

const rows = result.data.slice(0, 100); // First 100 rows
const text = JSON.stringify(rows, null, 2);

PDF Parser

import pdfParse from "pdf-parse";

const data = await pdfParse(buffer);
const text = data.text;

Excel Parser

import XLSX from "xlsx";

const workbook = XLSX.read(buffer, { type: "buffer" });
const sheetName = workbook.SheetNames[0];
const worksheet = workbook.Sheets[sheetName];
const jsonData = XLSX.utils.sheet_to_json(worksheet);

Image Parser (OCR)

import Tesseract from "tesseract.js";

const { data: { text } } = await Tesseract.recognize(buffer, "eng");

Error Handling

The agent returns errors without crashing:
const result = await fileUploadAgent({
  conversationState,
  files: [largePDF, corruptedExcel, validCSV],
  userId
});

// result.errors:
// [
//   "large.pdf: File too large (52.3 MB, max 50MB)",
//   "corrupted.xlsx: Failed to parse Excel file"
// ]
// 
// result.uploadedDatasets:
// [
//   { id: "...", filename: "valid.csv", description: "...", ... }
// ]

Storage Configuration

Files are uploaded to S3-compatible storage:
# .env
STORAGE_PROVIDER=s3
AWS_ACCESS_KEY_ID=your_key
AWS_SECRET_ACCESS_KEY=your_secret
AWS_REGION=us-east-1
S3_BUCKET=your-bucket
See Storage Configuration for details.

Integration with Analysis Agent

Uploaded datasets flow to the Analysis Agent:
// Planning agent includes datasets in ANALYSIS tasks
const analysisTasks = plan.filter(t => t.type === "ANALYSIS");

for (const task of analysisTasks) {
  const result = await analysisAgent({
    objective: task.objective,
    datasets: task.datasets, // From uploaded files
    type: "BIO",
    userId,
    conversationStateId
  });
}

File Upload API

S3 presigned URL flow

Analysis Agent

Process uploaded datasets

Storage Config

Configure S3 storage

Build docs developers (and LLMs) love