Documentation Index Fetch the complete documentation index at: https://mintlify.com/pranavkrishnasuresh/chemAgent/llms.txt
Use this file to discover all available pages before exploring further.
ChemAgent can extract chemical names, formulas, structures, and other chemistry-related text from images using the gpt4o_chem_extract function.
Overview
The image processing pipeline uses GPT-4o’s vision capabilities to:
Extract chemical names and formulas from diagrams
Read SMILES strings from images
Identify structural features from molecular drawings
Process scanned documents and handwritten notes
Basic Usage
Standalone Image Extraction
import asyncio
from plan_execute_agent.rdkit_agent import gpt4o_chem_extract
# Extract chemistry text from an image
image_path = "molecule_diagram.png"
input_prompt = "Extract all chemical information from this image"
result = asyncio.run(gpt4o_chem_extract(input_prompt, image_path))
print (result)
With Agent Integration
The --image flag automatically integrates image extraction into the agent workflow:
python plan_execute_agent/rdkit_agent.py \
--query "What is the IUPAC name of this molecule?" \
--image "path/to/molecule.png"
How It Works
The image processing workflow (plan_execute_agent/rdkit_agent.py:257):
async def gpt4o_chem_extract ( input_prompt : str , image_path : str = None ) -> str :
"""
Calls GPT-4o with optional image input to extract relevant chemistry-related text.
Args:
input_prompt (str): The input query.
image_path (str, optional): Path to the image file containing chemistry text.
Returns:
str: Extracted relevant chemistry-related text.
"""
client = openai.AsyncOpenAI()
messages = [
{
"role" : "system" ,
"content" : "You are an expert chemistry assistant. Extract only chemical names, formulas, or related text from the given prompt or image." ,
},
{ "role" : "user" , "content" : input_prompt},
]
if image_path:
with open (image_path, "rb" ) as image_file:
image_data = base64.b64encode(image_file.read()).decode( "utf-8" )
messages.append(
{
"role" : "user" ,
"content" : [
{
"type" : "text" ,
"text" : "Please extract relevant chemistry-related text from this image as well." ,
},
{
"type" : "image_url" ,
"image_url" : { "url" : f "data:image/png;base64, { image_data } " },
},
],
}
)
response = await client.chat.completions.create(
model = "gpt-4o" ,
messages = messages,
max_tokens = 500 ,
temperature = 0.2 ,
)
chem_extracted = response.choices[ 0 ].message.content.strip()
return chem_extracted
Integration with Queries
When using process_input() with an image, the extracted text is automatically combined with your query:
async def process_input (
input_prompt : str , image_path : str = None , use_rag : bool = False
) -> tuple :
# ... (from rdkit_agent.py:318)
# GPT-4o extraction (optional)
extracted_text = ""
if image_path:
print ( "Extracting Chemistry text from image..." )
extracted_text = await gpt4o_chem_extract(input_prompt, image_path)
print ( "Chemistry text extracted from image:" , extracted_text)
# The extracted text is combined with the original query
edited_prompt = (
"'" + input_prompt +
( f " \n Extracted Chemistry Text from image: { extracted_text } \n "
if extracted_text else " \n " )
+ "..."
)
The function supports common image formats:
PNG (.png)
JPEG (.jpg, .jpeg)
GIF (.gif)
BMP (.bmp)
WebP (.webp)
Use Cases
Document OCR Extract chemical data from scanned papers and patents
Structure Recognition Read molecular structures from diagrams
Lab Notes Process handwritten chemical formulas
Whiteboard Capture Digitize reactions from classroom photos
Examples
import asyncio
from plan_execute_agent.rdkit_agent import process_input
query = "Convert the molecule in this image to SMILES notation"
image_path = "benzene_structure.png"
result, completed, attempts, _, errors, _ = \
asyncio.run(process_input(query, image_path = image_path))
if completed:
print ( f "SMILES: { result } " )
Identify Compound from Image
query = "What is the IUPAC name of this compound?"
image_path = "unknown_molecule.jpg"
result, completed, _, _, _, _ = \
asyncio.run(process_input(query, image_path = image_path))
print ( f "IUPAC name: { result } " )
query = "What is the reaction shown in this scheme?"
image_path = "reaction_scheme.png"
result, completed, _, _, _, _ = \
asyncio.run(process_input(query, image_path = image_path))
print ( f "Reaction: { result } " )
Batch Processing
import asyncio
import os
from plan_execute_agent.rdkit_agent import gpt4o_chem_extract
async def process_images ( image_dir ):
results = {}
for filename in os.listdir(image_dir):
if filename.lower().endswith(( '.png' , '.jpg' , '.jpeg' )):
image_path = os.path.join(image_dir, filename)
extracted = await gpt4o_chem_extract(
"Extract all chemical information" ,
image_path
)
results[filename] = extracted
return results
# Usage
results = asyncio.run(process_images( "molecule_images/" ))
for filename, extracted in results.items():
print ( f " \n { filename } :" )
print (extracted)
Combining with RAG
Use both image extraction and PubChem RAG for comprehensive analysis:
python plan_execute_agent/rdkit_agent.py \
--query "What are the properties of this molecule?" \
--image "molecule.png" \
--use_rag
import asyncio
from plan_execute_agent.rdkit_agent import process_input
query = "Describe the biological activity of this compound"
image_path = "drug_molecule.png"
result, completed, attempts, _, _, _ = \
asyncio.run(process_input(query, image_path = image_path, use_rag = True ))
print (result)
# Combines:
# 1. Extracted chemical name/structure from image
# 2. PubChem data about the compound
# 3. LlaSMol analysis
Best Practices
Use high-resolution images (at least 300 DPI for scans)
Ensure good contrast between text/structures and background
Avoid blurry or distorted images
Crop to relevant content when possible
Always validate extracted SMILES with validate_smiles_rdkit
Cross-reference extracted names with databases
Review complex structures manually
Use multiple angles/views for 3D structures
Error Handling
import asyncio
import os
from plan_execute_agent.rdkit_agent import process_input
async def safe_image_processing ( query , image_path ):
# Validate image path
if not os.path.isfile(image_path):
return { "error" : f "Image not found: { image_path } " }
# Check file size (GPT-4o has limits)
if os.path.getsize(image_path) > 20 * 1024 * 1024 : # 20MB
return { "error" : "Image too large (max 20MB)" }
try :
result, completed, attempts, _, errors, _ = \
await process_input(query, image_path = image_path)
if completed:
return { "result" : result, "attempts" : attempts}
else :
return { "error" : f "Processing failed: { errors } " }
except Exception as e:
return { "error" : f "Exception: { str (e) } " }
# Usage
result = asyncio.run(safe_image_processing(
"What is this molecule?" ,
"molecule.png"
))
if "error" in result:
print ( f "Error: { result[ 'error' ] } " )
else :
print ( f "Success: { result[ 'result' ] } " )
Limitations
Complex 3D structures may not be accurately interpreted
Handwritten formulas might be misread
Very low quality or heavily annotated images can cause issues
Stereochemistry in 2D projections may be ambiguous
Non-standard notation might not be recognized
Advanced Usage
from plan_execute_agent.rdkit_agent import gpt4o_chem_extract
import asyncio
async def extract_with_context ( image_path , context ):
prompt = f """
Context: { context }
Extract all chemical information from the image that relates to this context.
Focus on:
- SMILES representations
- IUPAC names
- Molecular formulas
- Reaction conditions
"""
result = await gpt4o_chem_extract(prompt, image_path)
return result
# Usage
result = asyncio.run(extract_with_context(
"synthesis_scheme.png" ,
"Aspirin synthesis from salicylic acid"
))
print (result)
See Also