Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/pranavkrishnasuresh/chemAgent/llms.txt

Use this file to discover all available pages before exploring further.

ChemAgent supports bidirectional conversion between different chemical representations including IUPAC names, SMILES notation, and molecular formulas.

Overview

The name conversion tasks handle:
  • IUPAC ↔ SMILES conversions
  • SMILES ↔ Molecular Formula conversions
  • IUPAC ↔ Molecular Formula conversions
All SMILES strings are automatically canonicalized to ensure consistency.

IUPAC to SMILES

Convert IUPAC chemical names to SMILES notation.
from LLM4Chem.generation import LlaSMolGeneration

generator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')

query = "Could you provide the SMILES for <IUPAC> 4-ethyl-4-methyloxolan-2-one </IUPAC>?"
result = generator.generate(query)
print(result[0]['output'][0])
# Output: Of course. It's <SMILES> CCC1(C)COC(=O)C1 </SMILES> .
Always wrap IUPAC names in <IUPAC> ... </IUPAC> tags for proper processing.

SMILES to IUPAC

Translate SMILES notation into systematic IUPAC names.
from LLM4Chem.generation import LlaSMolGeneration

generator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')

query = "Translate the given SMILES formula of a molecule <SMILES> CCC(C)C1CNCCCNC1 </SMILES> into its IUPAC name."
result = generator.generate(query)
print(result[0]['output'][0])
# Output: <IUPAC> 3-butan-2-yl-1,5-diazocane </IUPAC>

SMILES to Molecular Formula

Determine the molecular formula from a SMILES string.
query = "Given the SMILES representation <SMILES> S=P1(N(CCCl)CCCl)NCCCO1 </SMILES>, what would be its molecular formula?"
result = generator.generate(query)
print(result[0]['output'][0])
# Output: It is <MOLFORMULA> C7H15Cl2N2OPS </MOLFORMULA> .

IUPAC to Molecular Formula

Extract molecular formulas directly from IUPAC names.
query = "What is the molecular formula of the compound with this IUPAC name <IUPAC> 2,5-diphenyl-1,3-oxazole </IUPAC>?"
result = generator.generate(query)
print(result[0]['output'][0])
# Output: <MOLFORMULA> C15H11NO </MOLFORMULA>

Automatic Canonicalization

ChemAgent automatically canonicalizes SMILES strings to ensure consistent representations.

How It Works

The canonicalization process (LLM4Chem/utils/smiles_canonicalization.py:64):
  1. Parses the SMILES string using RDKit
  2. Removes atom mapping numbers
  3. Standardizes stereochemistry
  4. Applies Kekulization (optional)
  5. Generates canonical SMILES
from LLM4Chem.generation import LlaSMolGeneration

# Non-canonical input
query = "What is the IUPAC name of <SMILES> C1CCOC1 </SMILES>?"

# The SMILES is automatically canonicalized before processing
generator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')
result = generator.generate(query, canonicalize_smiles=True)
Canonicalization can be disabled by setting canonicalize_smiles=False in the generate() method, but this is not recommended for most use cases.

Tag Format

Input Tags

  • <SMILES> ... </SMILES>
  • <IUPAC> ... </IUPAC>

Output Tags

  • <MOLFORMULA> ... </MOLFORMULA>
  • <SMILES> ... </SMILES>
  • <IUPAC> ... </IUPAC>

Auto-Processing

SMILES canonicalizationTag extractionValidation

Common Patterns

Multiple Conversions

queries = [
    "Convert <IUPAC> ethanol </IUPAC> to SMILES",
    "What is the molecular formula of <SMILES> CCO </SMILES>?",
    "Give me the IUPAC name for <SMILES> C1=CC=CC=C1 </SMILES>"
]

for query in queries:
    result = generator.generate(query)
    print(f"Query: {query}")
    print(f"Result: {result[0]['output'][0]}\n")

With Validation

from plan_execute_agent.chem_tools import validate_smiles_rdkit

# Generate SMILES
query = "Convert <IUPAC> benzene </IUPAC> to SMILES"
result = generator.generate(query)

# Extract and validate
smiles = result[0]['output'][0].split('<SMILES>')[1].split('</SMILES>')[0].strip()
validation = validate_smiles_rdkit.invoke({"smiles_string": smiles})
print(f"Valid: {validation['valid']}")

Error Handling

If the conversion fails or the input is invalid, the model will indicate the issue:
query = "What is the SMILES for <IUPAC> invalidchemicalname123 </IUPAC>?"
result = generator.generate(query)
# The model will attempt to process but may return an error or empty result
For best results, ensure chemical names are spelled correctly and use standard IUPAC nomenclature.

See Also

Build docs developers (and LLMs) love