Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/pranavkrishnasuresh/chemAgent/llms.txt

Use this file to discover all available pages before exploring further.

ChemAgent provides powerful molecule-to-text and text-to-molecule capabilities, enabling you to describe molecular structures or generate novel compounds from natural language descriptions.

Overview

Two core operations are supported:

Molecule Captioning

Generate human-readable descriptions from SMILES structures

Molecule Generation

Create SMILES structures from text descriptions

Molecule Captioning

Convert a SMILES representation into a detailed natural language description of the molecule’s properties, structure, and potential uses.

Basic Usage

from LLM4Chem.generation import LlaSMolGeneration

generator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')

query = "Describe this molecule: <SMILES> CCOC(=O)C1=CN=CN1[C@H](C)C1=CC=CC=C1 </SMILES>"
result = generator.generate(query)
print(result[0]['output'][0])
Output:
The molecule is an imidazole derivative with short-acting sedative, hypnotic, 
and general anesthetic properties. Etomidate appears to have gamma-aminobutyric 
acid (GABA) like effects, mediated through GABA-A receptor. The action enhances 
the inhibitory effect of GABA on the central nervous system by causing chloride 
channel opening events which leads to membrane hyperpolarization.

Alternative Phrasings

query = "What does this molecule do? <SMILES> CCO </SMILES>"
result = generator.generate(query)

Batch Descriptions

Generate descriptions for multiple molecules:
smiles_list = [
    "CCO",  # Ethanol
    "C1=CC=CC=C1",  # Benzene
    "CC(=O)O"  # Acetic acid
]

for smiles in smiles_list:
    query = f"Describe this molecule: <SMILES> {smiles} </SMILES>"
    result = generator.generate(query)
    print(f"\n{smiles}:")
    print(result[0]['output'][0])
Molecule captioning is particularly useful for:
  • Generating dataset annotations
  • Creating chemical documentation
  • Understanding drug mechanisms
  • Educational purposes

Molecule Generation

Create novel molecular structures from natural language descriptions. The model generates SMILES representations that match the specified properties.

Basic Usage

query = """Give me a molecule that satisfies the conditions outlined in the description: 
The molecule is a member of the class of tripyrroles that is a red-coloured pigment 
with antibiotic properties produced by Serratia marcescens. It has a role as an 
antimicrobial agent, a biological pigment, a bacterial metabolite, an apoptosis 
inducer and an antineoplastic agent. It is a tripyrrole, an aromatic ether and 
a ring assembly."""

result = generator.generate(query)
print(result[0]['output'][0])
# Output: Here is a potential molecule: <SMILES> CCCCCC1=C(C)NC(/C=C2\N=C(C3=CC=CN3)C=C2OC)=C1 </SMILES>
Unlike other tasks, molecule generation does not require tags around the input description. Simply provide the natural language description directly.

Generating by Properties

query = """Generate a molecule with the following properties:
- Antimicrobial activity
- Ability to cross blood-brain barrier
- Low toxicity
- Water soluble
"""
result = generator.generate(query)

Validating Generated Molecules

Always validate generated SMILES to ensure chemical validity:
from plan_execute_agent.chem_tools import validate_smiles_rdkit

query = "Generate a simple aromatic compound"
result = generator.generate(query)

# Extract SMILES from output
import re
match = re.search(r'<SMILES>\s*(.+?)\s*</SMILES>', result[0]['output'][0])
if match:
    smiles = match.group(1)
    validation = validate_smiles_rdkit.invoke({"smiles_string": smiles})
    
    if validation['valid']:
        print(f"Valid molecule generated: {smiles}")
    else:
        print(f"Invalid SMILES: {validation['error_message']}")

With Agent Integration

Use the agent for automatic validation:
import asyncio
from plan_execute_agent.rdkit_agent import process_input

query = "Generate a molecule that is an NMDA receptor antagonist"

result, completed, attempts, llasmol_response, errors, formatted_input = \
    asyncio.run(process_input(query))

if completed and not errors:
    print(f"Generated and validated: {result}")
else:
    print(f"Generation issues: {errors}")

Combining Caption and Generation

Create a description-generation-validation workflow:
# Step 1: Start with a reference molecule
reference_smiles = "CC(C)Cl"

# Step 2: Get its description
caption_query = f"Describe this molecule: <SMILES> {reference_smiles} </SMILES>"
description = generator.generate(caption_query)[0]['output'][0]

print(f"Original description:\n{description}\n")

# Step 3: Modify the description
modified_desc = description + " Additionally, the molecule should have a hydroxyl group."

# Step 4: Generate a new molecule
gen_query = f"Generate a molecule: {modified_desc}"
new_molecule = generator.generate(gen_query)[0]['output'][0]

print(f"New molecule:\n{new_molecule}\n")

# Step 5: Verify the new molecule
import re
match = re.search(r'<SMILES>\s*(.+?)\s*</SMILES>', new_molecule)
if match:
    new_smiles = match.group(1)
    verify_query = f"Describe this molecule: <SMILES> {new_smiles} </SMILES>"
    verification = generator.generate(verify_query)[0]['output'][0]
    print(f"Verification:\n{verification}")

Advanced Patterns

Structure-Activity Relationships

Explore SAR by generating molecular variants:
base_description = "A molecule that inhibits HIV replication"

variants = [
    f"{base_description} with high water solubility",
    f"{base_description} with improved lipophilicity",
    f"{base_description} with reduced toxicity"
]

for i, variant in enumerate(variants, 1):
    result = generator.generate(f"Generate a molecule: {variant}")
    print(f"\nVariant {i}:")
    print(result[0]['output'][0])

Scaffold Hopping

Generate molecules with similar properties but different cores:
# Get description of original molecule
original = "C1=CC=C(C=C1)O"  # Phenol
caption = generator.generate(f"Describe: <SMILES> {original} </SMILES>")

# Generate alternative scaffolds
query = f"{caption[0]['output'][0]}. Generate a molecule with different core structure but similar properties."
result = generator.generate(query)

Use Cases

Drug Discovery

Generate lead compounds with desired properties

Chemical Space Exploration

Discover novel structures in unexplored regions

Property Optimization

Modify molecules to improve specific characteristics

Documentation

Auto-generate descriptions for chemical databases

Best Practices

  • Provide valid, canonicalized SMILES
  • Use the <SMILES> tags consistently
  • Consider asking for specific aspects (e.g., “mechanism of action”, “structural features”)
  • Be specific in descriptions
  • Include both structural and functional requirements
  • Validate all generated molecules
  • Use multiple generations and select the best candidate
  • Always validate generated SMILES with validate_smiles_rdkit
  • Verify that generated molecules match the description
  • Check for unwanted properties (toxicity, instability)
  • Use property prediction to confirm characteristics

See Also

Build docs developers (and LLMs) love