Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/pranavkrishnasuresh/chemAgent/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The pubchem_fetcher.py module handles all interactions with the PubChem REST API, fetching chemical compound information using multiple search strategies (CID, formula, and name-based lookups).

Constants

PUBCHEM_API_BASE

PUBCHEM_API_BASE = "https://pubchem.ncbi.nlm.nih.gov/rest/pug"
Base URL for the PubChem PUG REST API.

Functions

fetch_pubchem_data

Retrieves chemical compound descriptions from PubChem using multiple API endpoints.
terms
list[str]
required
List of chemistry terms extracted from the user query
context
str
Formatted string containing all retrieved PubChem descriptions, or empty string if no data found

Search Strategy

For each term, the function attempts three different PubChem endpoints:
  1. CID (Compound ID): /compound/cid/{term}/description/JSON
  2. Formula: /compound/formula/{term}/description/JSON
  3. Name: /compound/name/{term}/description/JSON
This multi-endpoint approach maximizes the chances of finding relevant data regardless of how the user phrases their query.

Error Handling

  • requests.exceptions.RequestException: Silently skips failed requests
  • ValueError: Handles JSON parsing errors gracefully
  • Timeouts set to 10 seconds per request

extract_pubchem_descriptions

Parses PubChem API responses to extract description text.
data
dict
required
JSON response from PubChem API
descriptions
str | None
Concatenated description strings, or None if no descriptions found

Example Usage

from plan_execute_agent.pubchem_rag.pubchem_fetcher import fetch_pubchem_data

# Fetch data for multiple chemistry terms
terms = ["aspirin", "C9H8O4", "acetylsalicylic acid"]
context = fetch_pubchem_data(terms)
print(context)

Response Format

The function returns formatted text blocks for each successful query:
**Term**: aspirin
**Endpoint**: /compound/name/aspirin/description/JSON
Aspirin is a salicylate drug that inhibits cyclooxygenase...

**Term**: C9H8O4
**Endpoint**: /compound/formula/C9H8O4/description/JSON
Acetylsalicylic acid is an organic compound...

PubChem API Structure

The function expects responses in this format:
{
  "InformationList": {
    "Information": [
      {
        "Description": "Chemical description text..."
      }
    ]
  }
}

Integration Notes

  • Called by query_chemistry_related() in the RAG pipeline
  • Provides the context that feeds into the LLM response generator
  • Designed to be fault-tolerant, continuing even if some requests fail
  • No API key required (uses public PubChem endpoints)

Source Location

  • fetch_pubchem_data: plan_execute_agent/pubchem_rag/pubchem_fetcher.py:6
  • extract_pubchem_descriptions: plan_execute_agent/pubchem_rag/pubchem_fetcher.py:40

Build docs developers (and LLMs) love