from LLM4Chem.generation import LlaSMolGenerationgenerator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')query = "Could you provide the SMILES for <IUPAC> 4-ethyl-4-methyloxolan-2-one </IUPAC>?"result = generator.generate(query)print(result[0]['output'][0])# Output: Of course. It's <SMILES> CCC1(C)COC(=O)C1 </SMILES> .
Always wrap IUPAC names in <IUPAC> ... </IUPAC> tags for proper processing.
Translate SMILES notation into systematic IUPAC names.
from LLM4Chem.generation import LlaSMolGenerationgenerator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')query = "Translate the given SMILES formula of a molecule <SMILES> CCC(C)C1CNCCCNC1 </SMILES> into its IUPAC name."result = generator.generate(query)print(result[0]['output'][0])# Output: <IUPAC> 3-butan-2-yl-1,5-diazocane </IUPAC>
Determine the molecular formula from a SMILES string.
query = "Given the SMILES representation <SMILES> S=P1(N(CCCl)CCCl)NCCCO1 </SMILES>, what would be its molecular formula?"result = generator.generate(query)print(result[0]['output'][0])# Output: It is <MOLFORMULA> C7H15Cl2N2OPS </MOLFORMULA> .
Extract molecular formulas directly from IUPAC names.
query = "What is the molecular formula of the compound with this IUPAC name <IUPAC> 2,5-diphenyl-1,3-oxazole </IUPAC>?"result = generator.generate(query)print(result[0]['output'][0])# Output: <MOLFORMULA> C15H11NO </MOLFORMULA>
The canonicalization process (LLM4Chem/utils/smiles_canonicalization.py:64):
Parses the SMILES string using RDKit
Removes atom mapping numbers
Standardizes stereochemistry
Applies Kekulization (optional)
Generates canonical SMILES
from LLM4Chem.generation import LlaSMolGeneration# Non-canonical inputquery = "What is the IUPAC name of <SMILES> C1CCOC1 </SMILES>?"# The SMILES is automatically canonicalized before processinggenerator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')result = generator.generate(query, canonicalize_smiles=True)
Canonicalization can be disabled by setting canonicalize_smiles=False in the generate() method, but this is not recommended for most use cases.
queries = [ "Convert <IUPAC> ethanol </IUPAC> to SMILES", "What is the molecular formula of <SMILES> CCO </SMILES>?", "Give me the IUPAC name for <SMILES> C1=CC=CC=C1 </SMILES>"]for query in queries: result = generator.generate(query) print(f"Query: {query}") print(f"Result: {result[0]['output'][0]}\n")
If the conversion fails or the input is invalid, the model will indicate the issue:
query = "What is the SMILES for <IUPAC> invalidchemicalname123 </IUPAC>?"result = generator.generate(query)# The model will attempt to process but may return an error or empty result
For best results, ensure chemical names are spelled correctly and use standard IUPAC nomenclature.