Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/pranavkrishnasuresh/chemAgent/llms.txt

Use this file to discover all available pages before exploring further.

ChemAgent can extract chemical names, formulas, structures, and other chemistry-related text from images using the gpt4o_chem_extract function.

Overview

The image processing pipeline uses GPT-4o’s vision capabilities to:
  • Extract chemical names and formulas from diagrams
  • Read SMILES strings from images
  • Identify structural features from molecular drawings
  • Process scanned documents and handwritten notes

Basic Usage

Standalone Image Extraction

import asyncio
from plan_execute_agent.rdkit_agent import gpt4o_chem_extract

# Extract chemistry text from an image
image_path = "molecule_diagram.png"
input_prompt = "Extract all chemical information from this image"

result = asyncio.run(gpt4o_chem_extract(input_prompt, image_path))
print(result)

With Agent Integration

The --image flag automatically integrates image extraction into the agent workflow:
python plan_execute_agent/rdkit_agent.py \
  --query "What is the IUPAC name of this molecule?" \
  --image "path/to/molecule.png"

How It Works

The image processing workflow (plan_execute_agent/rdkit_agent.py:257):
async def gpt4o_chem_extract(input_prompt: str, image_path: str = None) -> str:
    """
    Calls GPT-4o with optional image input to extract relevant chemistry-related text.

    Args:
        input_prompt (str): The input query.
        image_path (str, optional): Path to the image file containing chemistry text.

    Returns:
        str: Extracted relevant chemistry-related text.
    """
    client = openai.AsyncOpenAI()

    messages = [
        {
            "role": "system",
            "content": "You are an expert chemistry assistant. Extract only chemical names, formulas, or related text from the given prompt or image.",
        },
        {"role": "user", "content": input_prompt},
    ]

    if image_path:
        with open(image_path, "rb") as image_file:
            image_data = base64.b64encode(image_file.read()).decode("utf-8")

        messages.append(
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Please extract relevant chemistry-related text from this image as well.",
                    },
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/png;base64,{image_data}"},
                    },
                ],
            }
        )

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        max_tokens=500,
        temperature=0.2,
    )

    chem_extracted = response.choices[0].message.content.strip()
    return chem_extracted

Integration with Queries

When using process_input() with an image, the extracted text is automatically combined with your query:
async def process_input(
    input_prompt: str, image_path: str = None, use_rag: bool = False
) -> tuple:
    # ... (from rdkit_agent.py:318)
    
    # GPT-4o extraction (optional)
    extracted_text = ""
    if image_path:
        print("Extracting Chemistry text from image...")
        extracted_text = await gpt4o_chem_extract(input_prompt, image_path)
        print("Chemistry text extracted from image:", extracted_text)
    
    # The extracted text is combined with the original query
    edited_prompt = (
        "'" + input_prompt +
        (f"\nExtracted Chemistry Text from image: {extracted_text}\n"
         if extracted_text else "\n")
        + "..."
    )

Supported Image Formats

The function supports common image formats:
  • PNG (.png)
  • JPEG (.jpg, .jpeg)
  • GIF (.gif)
  • BMP (.bmp)
  • WebP (.webp)

Use Cases

Document OCR

Extract chemical data from scanned papers and patents

Structure Recognition

Read molecular structures from diagrams

Lab Notes

Process handwritten chemical formulas

Whiteboard Capture

Digitize reactions from classroom photos

Examples

Extract SMILES from Structural Diagram

import asyncio
from plan_execute_agent.rdkit_agent import process_input

query = "Convert the molecule in this image to SMILES notation"
image_path = "benzene_structure.png"

result, completed, attempts, _, errors, _ = \
    asyncio.run(process_input(query, image_path=image_path))

if completed:
    print(f"SMILES: {result}")

Identify Compound from Image

query = "What is the IUPAC name of this compound?"
image_path = "unknown_molecule.jpg"

result, completed, _, _, _, _ = \
    asyncio.run(process_input(query, image_path=image_path))

print(f"IUPAC name: {result}")

Extract Reaction from Scheme

query = "What is the reaction shown in this scheme?"
image_path = "reaction_scheme.png"

result, completed, _, _, _, _ = \
    asyncio.run(process_input(query, image_path=image_path))

print(f"Reaction: {result}")

Batch Processing

import asyncio
import os
from plan_execute_agent.rdkit_agent import gpt4o_chem_extract

async def process_images(image_dir):
    results = {}
    
    for filename in os.listdir(image_dir):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            image_path = os.path.join(image_dir, filename)
            extracted = await gpt4o_chem_extract(
                "Extract all chemical information",
                image_path
            )
            results[filename] = extracted
    
    return results

# Usage
results = asyncio.run(process_images("molecule_images/"))
for filename, extracted in results.items():
    print(f"\n{filename}:")
    print(extracted)

Combining with RAG

Use both image extraction and PubChem RAG for comprehensive analysis:
python plan_execute_agent/rdkit_agent.py \
  --query "What are the properties of this molecule?" \
  --image "molecule.png" \
  --use_rag
import asyncio
from plan_execute_agent.rdkit_agent import process_input

query = "Describe the biological activity of this compound"
image_path = "drug_molecule.png"

result, completed, attempts, _, _, _ = \
    asyncio.run(process_input(query, image_path=image_path, use_rag=True))

print(result)
# Combines:
# 1. Extracted chemical name/structure from image
# 2. PubChem data about the compound
# 3. LlaSMol analysis

Best Practices

  • Use high-resolution images (at least 300 DPI for scans)
  • Ensure good contrast between text/structures and background
  • Avoid blurry or distorted images
  • Crop to relevant content when possible
  • Be specific about what to extract
  • Mention if image contains multiple compounds
  • Indicate expected format (SMILES, IUPAC, formula)
  • Provide context when ambiguity is possible
  • Always validate extracted SMILES with validate_smiles_rdkit
  • Cross-reference extracted names with databases
  • Review complex structures manually
  • Use multiple angles/views for 3D structures
  • Process images asynchronously for better performance
  • Cache results for repeated queries
  • Batch similar images together
  • Consider preprocessing (crop, enhance) before extraction

Error Handling

import asyncio
import os
from plan_execute_agent.rdkit_agent import process_input

async def safe_image_processing(query, image_path):
    # Validate image path
    if not os.path.isfile(image_path):
        return {"error": f"Image not found: {image_path}"}
    
    # Check file size (GPT-4o has limits)
    if os.path.getsize(image_path) > 20 * 1024 * 1024:  # 20MB
        return {"error": "Image too large (max 20MB)"}
    
    try:
        result, completed, attempts, _, errors, _ = \
            await process_input(query, image_path=image_path)
        
        if completed:
            return {"result": result, "attempts": attempts}
        else:
            return {"error": f"Processing failed: {errors}"}
    
    except Exception as e:
        return {"error": f"Exception: {str(e)}"}

# Usage
result = asyncio.run(safe_image_processing(
    "What is this molecule?",
    "molecule.png"
))

if "error" in result:
    print(f"Error: {result['error']}")
else:
    print(f"Success: {result['result']}")

Limitations

  • Complex 3D structures may not be accurately interpreted
  • Handwritten formulas might be misread
  • Very low quality or heavily annotated images can cause issues
  • Stereochemistry in 2D projections may be ambiguous
  • Non-standard notation might not be recognized

Advanced Usage

Custom Extraction Prompts

from plan_execute_agent.rdkit_agent import gpt4o_chem_extract
import asyncio

async def extract_with_context(image_path, context):
    prompt = f"""
    Context: {context}
    
    Extract all chemical information from the image that relates to this context.
    Focus on:
    - SMILES representations
    - IUPAC names
    - Molecular formulas
    - Reaction conditions
    """
    
    result = await gpt4o_chem_extract(prompt, image_path)
    return result

# Usage
result = asyncio.run(extract_with_context(
    "synthesis_scheme.png",
    "Aspirin synthesis from salicylic acid"
))
print(result)

See Also

Build docs developers (and LLMs) love