Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/pranavkrishnasuresh/chemAgent/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The structure_chem_prompt tool formats chemistry queries by adding appropriate XML tags around SMILES and IUPAC chemical identifiers. This preprocessing step ensures that the answer_chemistry_query tool can correctly parse and process chemical information. IMPORTANT: Always pass the output of this tool directly to the answer_chemistry_query tool.

Function Signature

@tool
def structure_chem_prompt(original_prompt: str) -> dict:
    """Structure and tag IUPAC or SMILES chemical information for preprocessing the input query.
    PASS THE OUTPUT OF THIS TOOL DIRECTLY TO THE 'answer_chemistry_query' Tool!!
    """

Parameters

original_prompt
str
required
The unstructured chemistry query containing SMILES representations or IUPAC names that need to be tagged.

Response

new_prompt
str
The formatted query with chemical information wrapped in <SMILES> and <IUPAC> tags.
error
str
Error message if the structuring process fails. Only present when an error occurs.

How It Works

The tool uses GPT-4o with structured outputs to identify and tag chemical information based on the following system prompt:
SYSTEM_TAG_PROMPT = """
You are an EXPERT chemical information tagger. I will give you an INPUT QUERY and your task is to format it based on the information below.
You MUST return ONLY the formatted input query!!
When processing chemical information, use only two tags in the input query: <SMILES> for SMILES representations and <IUPAC> for IUPAC names. The input should include only <SMILES> and <IUPAC> tags ONLY when needed to mark chemical information.

Tag Definitions:
SMILES: <SMILES> ... </SMILES> for chemical structure in SMILES notation.
IUPAC: <IUPAC> ... </IUPAC> for the IUPAC name of the compound.

Instructions:
1. In the input query, use only the <SMILES> and <IUPAC> tags to wrap the SMILES representation or IUPAC name.
2. Ensure no extra characters or spaces are present within the tags.
"""

Tag Definitions

SMILES Tag

<SMILES> ... </SMILES>
Wraps chemical structures in SMILES notation. Ensures no extra characters or spaces within the tags.

IUPAC Tag

<IUPAC> ... </IUPAC>
Wraps IUPAC names of compounds. Maintains exact naming without modifications.

Examples

Example 1: IUPAC to Molecular Formula

Input:
What is the molecular formula of 2,5-diphenyl-1,3-oxazole?
Output:
{
  "new_prompt": "What is the molecular formula of <IUPAC> 2,5-diphenyl-1,3-oxazole </IUPAC>?"
}

Example 2: IUPAC to SMILES

Input:
Please provide the SMILES representation for 4-ethyl-4-methyloxolan-2-one.
Output:
{
  "new_prompt": "Please provide the SMILES representation for <IUPAC> 4-ethyl-4-methyloxolan-2-one </IUPAC>."
}

Example 3: SMILES to Molecular Formula

Input:
What is the molecular formula for S=P1(N(CCCl)CCCl)NCCCO1?
Output:
{
  "new_prompt": "What is the molecular formula for <SMILES> S=P1(N(CCCl)CCCl)NCCCO1 </SMILES>?"
}

Example 4: SMILES to IUPAC

Input:
Translate CCC(C)C1CNCCCNC1 to its IUPAC name.
Output:
{
  "new_prompt": "Translate <SMILES> CCC(C)C1CNCCCNC1 </SMILES> to its IUPAC name."
}

Example 5: Property Prediction (ESOL Solubility)

Input:
How soluble is CC(C)Cl?
Output:
{
  "new_prompt": "How soluble is <SMILES> CC(C)Cl </SMILES>?"
}

Example 6: Multiple Chemical Identifiers

Input:
What is the molecular formula of the compound with this IUPAC name 2,5-diphenyl-1,3-oxazole and what is the name of C1CCOC1?
Output:
{
  "new_prompt": "What is the molecular formula of the compound with this IUPAC name <IUPAC> 2,5-diphenyl-1,3-oxazole </IUPAC> and what is the name of <SMILES> C1CCOC1 </SMILES>?"
}

Implementation Details

  • Model: GPT-4o with structured outputs
  • Temperature: 0 (deterministic)
  • Max Tokens: 1024
  • Response Format: Uses Pydantic StructuredPrompt model

Error Handling

If an error occurs during the structuring process, the tool returns:
{
  "error": "Error generating structured prompt: [error details]"
}

Workflow Integration

Typical workflow:
  1. User provides unstructured chemistry query
  2. structure_chem_prompt tags chemical identifiers
  3. Tagged output passed to answer_chemistry_query
  4. LlaSMol processes the structured query
# Step 1: Structure the prompt
structured = structure_chem_prompt("What is the IUPAC name of C1CCOC1?")

# Step 2: Pass to chemistry query tool
result = answer_chemistry_query(structured["new_prompt"])

Build docs developers (and LLMs) love