Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/pranavkrishnasuresh/chemAgent/llms.txt

Use this file to discover all available pages before exploring further.

ChemAgent can predict the products of chemical reactions (forward synthesis) and suggest reactants needed to synthesize a target molecule (retrosynthesis).

Overview

Forward Synthesis

Predict products from reactants and reagents

Retrosynthesis

Identify reactants needed for a target product

Forward Synthesis

Given reactants and reagents, predict the resulting product(s).

Basic Usage

from LLM4Chem.generation import LlaSMolGeneration

generator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')

query = "<SMILES> NC1=CC=C2OCOC2=C1.O=CO </SMILES> Based on the reactants and reagents given above, suggest a possible product."
result = generator.generate(query)
print(result[0]['output'][0])
# Output: A possible product can be <SMILES> O=CNC1=CC=C2OCOC2=C1 </SMILES> .
Reaction SMILES use the format: reactants.reagents>conditions>productFor queries, separate multiple reactants/reagents with dots: reactant1.reactant2.reagent1

Multiple Reactants

query = """What product is formed from the following reaction?
<SMILES> CC(=O)OC1=CC=CC=C1C(=O)O.NaOH </SMILES>"""

result = generator.generate(query)
print(result[0]['output'][0])

Alternative Query Formats

query = "Predict the product: <SMILES> C1=CC=CC=C1Br.Mg </SMILES>"
result = generator.generate(query)

With Validation

Validate the predicted product:
from plan_execute_agent.chem_tools import validate_smiles_rdkit
import re

query = "<SMILES> NC1=CC=CC=C1.O=CO </SMILES> Predict the product."
result = generator.generate(query)

# Extract product SMILES
match = re.search(r'<SMILES>\s*(.+?)\s*</SMILES>', result[0]['output'][0])
if match:
    product_smiles = match.group(1)
    validation = validate_smiles_rdkit.invoke({"smiles_string": product_smiles})
    
    if validation['valid']:
        print(f"Valid product: {product_smiles}")
    else:
        print(f"Invalid product: {validation['error_message']}")

Retrosynthesis

Given a target molecule, identify possible reactants and reagents needed for its synthesis.

Basic Usage

query = "Identify possible reactants that could have been used to create the specified product. <SMILES> CC1=CC=C(N)N=C1N </SMILES>"
result = generator.generate(query)
print(result[0]['output'][0])
# Output: <SMILES> CC(C#N)CCC#N.N </SMILES>

Query Variations

query = "What reactants are needed to synthesize <SMILES> CCO </SMILES>?"
result = generator.generate(query)

Multi-Step Retrosynthesis

For complex molecules, you may need iterative retrosynthesis:
def retrosynthetic_analysis(target_smiles, max_steps=3):
    """Perform multi-step retrosynthetic analysis"""
    steps = []
    current_target = target_smiles
    
    for i in range(max_steps):
        query = f"What reactants are needed for <SMILES> {current_target} </SMILES>?"
        result = generator.generate(query)
        
        # Extract reactants
        match = re.search(r'<SMILES>\s*(.+?)\s*</SMILES>', result[0]['output'][0])
        if not match:
            break
            
        reactants = match.group(1)
        steps.append({
            'step': i + 1,
            'target': current_target,
            'reactants': reactants
        })
        
        # For next iteration, analyze the most complex reactant
        # (simplified - just take the first reactant)
        if '.' in reactants:
            current_target = reactants.split('.')[0]
        else:
            break
    
    return steps

# Example usage
target = "CC(=O)OC1=CC=CC=C1C(=O)O"  # Aspirin
analysis = retrosynthetic_analysis(target)

for step in analysis:
    print(f"\nStep {step['step']}:")
    print(f"Target: {step['target']}")
    print(f"Reactants: {step['reactants']}")

With Agent Integration

The agent automatically validates and formats reaction predictions:
import asyncio
from plan_execute_agent.rdkit_agent import process_input

# Forward synthesis with agent
query = "<SMILES> C1=CC=CC=C1.Br2 </SMILES> Predict the product."

result, completed, attempts, llasmol_response, errors, formatted_input = \
    asyncio.run(process_input(query))

if completed:
    print(f"Product: {result}")
    if errors:
        print(f"Validation warnings: {errors}")
else:
    print(f"Prediction failed: {errors}")

Reaction SMILES Format

Reaction SMILES follow the pattern: reactants>conditions>product
# Reactants only
"C1=CC=CC=C1.Br2>>C1=CC=C(Br)C=C1"
When querying ChemAgent:
  • Use only the reactants section in <SMILES> tags
  • Separate multiple components with dots (.)
  • The model will predict the products

Practical Applications

Reaction Planning

def plan_synthesis(target_smiles):
    """Plan a synthesis route for a target molecule"""
    
    # Step 1: Retrosynthesis
    retro_query = f"Identify reactants for <SMILES> {target_smiles} </SMILES>"
    retro_result = generator.generate(retro_query)
    
    # Extract reactants
    match = re.search(r'<SMILES>\s*(.+?)\s*</SMILES>', retro_result[0]['output'][0])
    if not match:
        return "No retrosynthesis found"
    
    reactants = match.group(1)
    print(f"Proposed reactants: {reactants}")
    
    # Step 2: Forward synthesis verification
    forward_query = f"<SMILES> {reactants} </SMILES> Predict the product."
    forward_result = generator.generate(forward_query)
    
    # Extract product
    match = re.search(r'<SMILES>\s*(.+?)\s*</SMILES>', forward_result[0]['output'][0])
    if match:
        predicted_product = match.group(1)
        print(f"Predicted product: {predicted_product}")
        print(f"Target product: {target_smiles}")
        
        # Ideally, compare canonicalized forms
        return {
            'reactants': reactants,
            'predicted_product': predicted_product,
            'target': target_smiles,
            'match': predicted_product == target_smiles
        }
    
    return "Forward synthesis failed"

# Example
result = plan_synthesis("CC(=O)O")
print(result)
def find_similar_reactions(query_reactants, reaction_database):
    """Find similar reactions in a database"""
    
    # Predict product for query
    query = f"<SMILES> {query_reactants} </SMILES> Predict the product."
    result = generator.generate(query)
    
    match = re.search(r'<SMILES>\s*(.+?)\s*</SMILES>', result[0]['output'][0])
    if not match:
        return []
    
    query_product = match.group(1)
    
    # Search database for similar transformations
    similar = []
    for rxn in reaction_database:
        if rxn['product'] == query_product:
            similar.append(rxn)
    
    return similar

Common Reaction Types

# Nucleophilic substitution
query = "<SMILES> CCBr.NaOH </SMILES> Predict the product."
result = generator.generate(query)
# Expected: CCO (ethanol)
# Addition to alkene
query = "<SMILES> C=C.HBr </SMILES> What is the product?"
result = generator.generate(query)
# Expected: CCBr (bromoethane)
# Dehydration
query = "<SMILES> CCO.H2SO4 </SMILES> Predict the product."
result = generator.generate(query)
# Expected: C=C (ethene) or CCOCC (diethyl ether)
# Esterification
query = "<SMILES> CCO.CC(=O)O </SMILES> What product forms?"
result = generator.generate(query)
# Expected: CCOC(=O)C (ethyl acetate)

Best Practices

  1. Always validate - Use validate_smiles_rdkit on both reactants and products
  2. Canonicalize - Ensure SMILES are in canonical form for consistent predictions
  3. Verify stoichiometry - Check that products make chemical sense
  4. Consider alternatives - Some reactions have multiple possible products
  5. Use context - Include relevant reagents and conditions when known

Limitations

  • Predictions are based on learned patterns, not mechanistic chemistry
  • May not account for reaction conditions (temperature, pressure, solvents)
  • Multiple products or regioselectivity may not be fully captured
  • Novel reactions outside training data may not be predicted accurately

See Also

Build docs developers (and LLMs) love