Documentation Index
Fetch the complete documentation index at: https://mintlify.com/pranavkrishnasuresh/chemAgent/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The structure_chem_prompt tool formats chemistry queries by adding appropriate XML tags around SMILES and IUPAC chemical identifiers. This preprocessing step ensures that the answer_chemistry_query tool can correctly parse and process chemical information.
IMPORTANT: Always pass the output of this tool directly to the answer_chemistry_query tool.
Function Signature
@tool
def structure_chem_prompt(original_prompt: str) -> dict:
"""Structure and tag IUPAC or SMILES chemical information for preprocessing the input query.
PASS THE OUTPUT OF THIS TOOL DIRECTLY TO THE 'answer_chemistry_query' Tool!!
"""
Parameters
The unstructured chemistry query containing SMILES representations or IUPAC names that need to be tagged.
Response
The formatted query with chemical information wrapped in <SMILES> and <IUPAC> tags.
Error message if the structuring process fails. Only present when an error occurs.
How It Works
The tool uses GPT-4o with structured outputs to identify and tag chemical information based on the following system prompt:
SYSTEM_TAG_PROMPT = """
You are an EXPERT chemical information tagger. I will give you an INPUT QUERY and your task is to format it based on the information below.
You MUST return ONLY the formatted input query!!
When processing chemical information, use only two tags in the input query: <SMILES> for SMILES representations and <IUPAC> for IUPAC names. The input should include only <SMILES> and <IUPAC> tags ONLY when needed to mark chemical information.
Tag Definitions:
SMILES: <SMILES> ... </SMILES> for chemical structure in SMILES notation.
IUPAC: <IUPAC> ... </IUPAC> for the IUPAC name of the compound.
Instructions:
1. In the input query, use only the <SMILES> and <IUPAC> tags to wrap the SMILES representation or IUPAC name.
2. Ensure no extra characters or spaces are present within the tags.
"""
Tag Definitions
SMILES Tag
Wraps chemical structures in SMILES notation. Ensures no extra characters or spaces within the tags.
IUPAC Tag
Wraps IUPAC names of compounds. Maintains exact naming without modifications.
Examples
Input:
What is the molecular formula of 2,5-diphenyl-1,3-oxazole?
Output:
{
"new_prompt": "What is the molecular formula of <IUPAC> 2,5-diphenyl-1,3-oxazole </IUPAC>?"
}
Example 2: IUPAC to SMILES
Input:
Please provide the SMILES representation for 4-ethyl-4-methyloxolan-2-one.
Output:
{
"new_prompt": "Please provide the SMILES representation for <IUPAC> 4-ethyl-4-methyloxolan-2-one </IUPAC>."
}
Input:
What is the molecular formula for S=P1(N(CCCl)CCCl)NCCCO1?
Output:
{
"new_prompt": "What is the molecular formula for <SMILES> S=P1(N(CCCl)CCCl)NCCCO1 </SMILES>?"
}
Example 4: SMILES to IUPAC
Input:
Translate CCC(C)C1CNCCCNC1 to its IUPAC name.
Output:
{
"new_prompt": "Translate <SMILES> CCC(C)C1CNCCCNC1 </SMILES> to its IUPAC name."
}
Example 5: Property Prediction (ESOL Solubility)
Input:
Output:
{
"new_prompt": "How soluble is <SMILES> CC(C)Cl </SMILES>?"
}
Example 6: Multiple Chemical Identifiers
Input:
What is the molecular formula of the compound with this IUPAC name 2,5-diphenyl-1,3-oxazole and what is the name of C1CCOC1?
Output:
{
"new_prompt": "What is the molecular formula of the compound with this IUPAC name <IUPAC> 2,5-diphenyl-1,3-oxazole </IUPAC> and what is the name of <SMILES> C1CCOC1 </SMILES>?"
}
Implementation Details
- Model: GPT-4o with structured outputs
- Temperature: 0 (deterministic)
- Max Tokens: 1024
- Response Format: Uses Pydantic
StructuredPrompt model
Error Handling
If an error occurs during the structuring process, the tool returns:
{
"error": "Error generating structured prompt: [error details]"
}
Workflow Integration
Typical workflow:
- User provides unstructured chemistry query
structure_chem_prompt tags chemical identifiers
- Tagged output passed to
answer_chemistry_query
- LlaSMol processes the structured query
# Step 1: Structure the prompt
structured = structure_chem_prompt("What is the IUPAC name of C1CCOC1?")
# Step 2: Pass to chemistry query tool
result = answer_chemistry_query(structured["new_prompt"])