Chemical Reactions

ChemAgent can predict the products of chemical reactions (forward synthesis) and suggest reactants needed to synthesize a target molecule (retrosynthesis).

Overview

Forward Synthesis

Predict products from reactants and reagents

Retrosynthesis

Identify reactants needed for a target product

Forward Synthesis

Given reactants and reagents, predict the resulting product(s).

Basic Usage

from LLM4Chem.generation import LlaSMolGeneration

generator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')

query = "<SMILES> NC1=CC=C2OCOC2=C1.O=CO </SMILES> Based on the reactants and reagents given above, suggest a possible product."
result = generator.generate(query)
print(result[0]['output'][0])
# Output: A possible product can be <SMILES> O=CNC1=CC=C2OCOC2=C1 </SMILES> .

Reaction SMILES use the format: reactants.reagents>conditions>productFor queries, separate multiple reactants/reagents with dots: reactant1.reactant2.reagent1

Multiple Reactants

query = """What product is formed from the following reaction?
<SMILES> CC(=O)OC1=CC=CC=C1C(=O)O.NaOH </SMILES>"""

result = generator.generate(query)
print(result[0]['output'][0])

Alternative Query Formats

query = "Predict the product: <SMILES> C1=CC=CC=C1Br.Mg </SMILES>"
result = generator.generate(query)

With Validation

Validate the predicted product:

from plan_execute_agent.chem_tools import validate_smiles_rdkit
import re

query = "<SMILES> NC1=CC=CC=C1.O=CO </SMILES> Predict the product."
result = generator.generate(query)

# Extract product SMILES
match = re.search(r'<SMILES>\s*(.+?)\s*</SMILES>', result[0]['output'][0])
if match:
    product_smiles = match.group(1)
    validation = validate_smiles_rdkit.invoke({"smiles_string": product_smiles})
    
    if validation['valid']:
        print(f"Valid product: {product_smiles}")
    else:
        print(f"Invalid product: {validation['error_message']}")

Retrosynthesis

Given a target molecule, identify possible reactants and reagents needed for its synthesis.

Basic Usage

query = "Identify possible reactants that could have been used to create the specified product. <SMILES> CC1=CC=C(N)N=C1N </SMILES>"
result = generator.generate(query)
print(result[0]['output'][0])
# Output: <SMILES> CC(C#N)CCC#N.N </SMILES>

Query Variations

Standard
Formal
Detailed

query = "What reactants are needed to synthesize <SMILES> CCO </SMILES>?"
result = generator.generate(query)

query = "Suggest a retrosynthetic route for <SMILES> CC(=O)O </SMILES>"
result = generator.generate(query)

query = """Propose reactants and reagents for synthesizing the target:
<SMILES> C1=CC=C(C=C1)O </SMILES>"""
result = generator.generate(query)

Multi-Step Retrosynthesis

For complex molecules, you may need iterative retrosynthesis:

def retrosynthetic_analysis(target_smiles, max_steps=3):
    """Perform multi-step retrosynthetic analysis"""
    steps = []
    current_target = target_smiles
    
    for i in range(max_steps):
        query = f"What reactants are needed for <SMILES> {current_target} </SMILES>?"
        result = generator.generate(query)
        
        # Extract reactants
        match = re.search(r'<SMILES>\s*(.+?)\s*</SMILES>', result[0]['output'][0])
        if not match:
            break
            
        reactants = match.group(1)
        steps.append({
            'step': i + 1,
            'target': current_target,
            'reactants': reactants
        })
        
        # For next iteration, analyze the most complex reactant
        # (simplified - just take the first reactant)
        if '.' in reactants:
            current_target = reactants.split('.')[0]
        else:
            break
    
    return steps

# Example usage
target = "CC(=O)OC1=CC=CC=C1C(=O)O"  # Aspirin
analysis = retrosynthetic_analysis(target)

for step in analysis:
    print(f"\nStep {step['step']}:")
    print(f"Target: {step['target']}")
    print(f"Reactants: {step['reactants']}")

With Agent Integration

The agent automatically validates and formats reaction predictions:

import asyncio
from plan_execute_agent.rdkit_agent import process_input

# Forward synthesis with agent
query = "<SMILES> C1=CC=CC=C1.Br2 </SMILES> Predict the product."

result, completed, attempts, llasmol_response, errors, formatted_input = \
    asyncio.run(process_input(query))

if completed:
    print(f"Product: {result}")
    if errors:
        print(f"Validation warnings: {errors}")
else:
    print(f"Prediction failed: {errors}")

Reaction SMILES Format

Reaction SMILES follow the pattern: reactants>conditions>product

# Reactants only
"C1=CC=CC=C1.Br2>>C1=CC=C(Br)C=C1"

When querying ChemAgent:

Use only the reactants section in <SMILES> tags
Separate multiple components with dots (.)
The model will predict the products

Practical Applications

Reaction Planning

def plan_synthesis(target_smiles):
    """Plan a synthesis route for a target molecule"""
    
    # Step 1: Retrosynthesis
    retro_query = f"Identify reactants for <SMILES> {target_smiles} </SMILES>"
    retro_result = generator.generate(retro_query)
    
    # Extract reactants
    match = re.search(r'<SMILES>\s*(.+?)\s*</SMILES>', retro_result[0]['output'][0])
    if not match:
        return "No retrosynthesis found"
    
    reactants = match.group(1)
    print(f"Proposed reactants: {reactants}")
    
    # Step 2: Forward synthesis verification
    forward_query = f"<SMILES> {reactants} </SMILES> Predict the product."
    forward_result = generator.generate(forward_query)
    
    # Extract product
    match = re.search(r'<SMILES>\s*(.+?)\s*</SMILES>', forward_result[0]['output'][0])
    if match:
        predicted_product = match.group(1)
        print(f"Predicted product: {predicted_product}")
        print(f"Target product: {target_smiles}")
        
        # Ideally, compare canonicalized forms
        return {
            'reactants': reactants,
            'predicted_product': predicted_product,
            'target': target_smiles,
            'match': predicted_product == target_smiles
        }
    
    return "Forward synthesis failed"

# Example
result = plan_synthesis("CC(=O)O")
print(result)

Reaction Database Search

def find_similar_reactions(query_reactants, reaction_database):
    """Find similar reactions in a database"""
    
    # Predict product for query
    query = f"<SMILES> {query_reactants} </SMILES> Predict the product."
    result = generator.generate(query)
    
    match = re.search(r'<SMILES>\s*(.+?)\s*</SMILES>', result[0]['output'][0])
    if not match:
        return []
    
    query_product = match.group(1)
    
    # Search database for similar transformations
    similar = []
    for rxn in reaction_database:
        if rxn['product'] == query_product:
            similar.append(rxn)
    
    return similar

Common Reaction Types

Substitution Reactions

# Nucleophilic substitution
query = "<SMILES> CCBr.NaOH </SMILES> Predict the product."
result = generator.generate(query)
# Expected: CCO (ethanol)

Addition Reactions

# Addition to alkene
query = "<SMILES> C=C.HBr </SMILES> What is the product?"
result = generator.generate(query)
# Expected: CCBr (bromoethane)

Elimination Reactions

# Dehydration
query = "<SMILES> CCO.H2SO4 </SMILES> Predict the product."
result = generator.generate(query)
# Expected: C=C (ethene) or CCOCC (diethyl ether)

Condensation Reactions

# Esterification
query = "<SMILES> CCO.CC(=O)O </SMILES> What product forms?"
result = generator.generate(query)
# Expected: CCOC(=O)C (ethyl acetate)

Best Practices

Always validate - Use validate_smiles_rdkit on both reactants and products
Canonicalize - Ensure SMILES are in canonical form for consistent predictions
Verify stoichiometry - Check that products make chemical sense
Consider alternatives - Some reactions have multiple possible products
Use context - Include relevant reagents and conditions when known

Limitations

Predictions are based on learned patterns, not mechanistic chemistry
May not account for reaction conditions (temperature, pressure, solvents)
Multiple products or regioselectivity may not be fully captured
Novel reactions outside training data may not be predicted accurately

Get Started

Core Concepts

Guides

Configuration

Chemical Reactions

Overview

Forward Synthesis

Retrosynthesis

Forward Synthesis

Basic Usage

Multiple Reactants

Alternative Query Formats

With Validation

Retrosynthesis

Basic Usage

Query Variations

Multi-Step Retrosynthesis

With Agent Integration

Reaction SMILES Format

Practical Applications

Reaction Planning

Reaction Database Search

Common Reaction Types

Best Practices

Limitations

See Also

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Configuration

Documentation Index

​Overview

Forward Synthesis

Retrosynthesis

​Forward Synthesis

​Basic Usage

​Multiple Reactants

​Alternative Query Formats

​With Validation

​Retrosynthesis

​Basic Usage

​Query Variations

​Multi-Step Retrosynthesis

​With Agent Integration

​Reaction SMILES Format

​Practical Applications

​Reaction Planning

​Reaction Database Search

​Common Reaction Types

​Best Practices

​Limitations

​See Also

Build docs developers (and LLMs) love

Overview

Forward Synthesis

Basic Usage

Multiple Reactants

Alternative Query Formats

With Validation

Retrosynthesis

Basic Usage

Query Variations

Multi-Step Retrosynthesis

With Agent Integration

Reaction SMILES Format

Practical Applications

Reaction Planning

Reaction Database Search

Common Reaction Types

Best Practices

Limitations

See Also