Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/pranavkrishnasuresh/chemAgent/llms.txt

Use this file to discover all available pages before exploring further.

ChemAgent can predict various molecular properties from SMILES representations, including solubility, lipophilicity, blood-brain barrier permeability, toxicity, and more.

Supported Properties

ESOL

Water solubility (log S)

LIPO

Lipophilicity (logD at pH 7.4)

BBBP

Blood-brain barrier permeability

Clintox

Clinical toxicity

HIV

HIV replication inhibition

SIDER

Side effects by organ system

Output Formats

Property predictions return values in two formats:
  • <NUMBER> - Continuous numerical values (ESOL, LIPO)
  • <BOOLEAN> - Binary yes/no predictions (BBBP, Clintox, HIV, SIDER)

ESOL - Water Solubility

Predict the water solubility of a compound in log S (mol/L).
from LLM4Chem.generation import LlaSMolGeneration

generator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')

query = "How soluble is <SMILES> CC(C)Cl </SMILES>?"
result = generator.generate(query)
print(result[0]['output'][0])
# Output: Its log solubility is <NUMBER> -1.41 </NUMBER> mol/L.
ESOL values are in log S units. More negative values indicate lower solubility.

LIPO - Lipophilicity

Predict the octanol/water distribution coefficient (logD) at pH 7.4.
query = "Predict the octanol/water distribution coefficient logD under the circumstance of pH 7.4 for <SMILES> NC(=O)C1=CC=CC=C1O </SMILES>."
result = generator.generate(query)
print(result[0]['output'][0])
# Output: <NUMBER> 1.090 </NUMBER>
LogD at pH 7.4 is particularly relevant for predicting drug absorption and distribution in physiological conditions.

BBBP - Blood-Brain Barrier Permeability

Determine whether a compound can cross the blood-brain barrier.
query = "Is blood-brain barrier permeability (BBBP) a property of <SMILES> CCNC(=O)/C=C/C1=CC=CC(Br)=C1 </SMILES>?"
result = generator.generate(query)
print(result[0]['output'][0])
# Output: <BOOLEAN> Yes </BOOLEAN>

Use Cases

  • CNS Drug Design - Identify compounds that can reach the brain
  • Safety Assessment - Evaluate potential neurotoxicity risks
  • Formulation - Optimize delivery for neurological conditions

Clintox - Clinical Toxicity

Predict whether a compound is toxic in clinical trials.
query = "Is <SMILES> COC[C@@H](NC(C)=O)C(=O)NCC1=CC=CC=C1 </SMILES> toxic?"
result = generator.generate(query)
print(result[0]['output'][0])
# Output: <BOOLEAN> No </BOOLEAN>

HIV - Replication Inhibition

Predict whether a compound can inhibit HIV replication.
query = "Can <SMILES> CC1=CN(C2C=CCCC2O)C(=O)NC1=O </SMILES> serve as an inhibitor of HIV replication?"
result = generator.generate(query)
print(result[0]['output'][0])
# Output: <BOOLEAN> No </BOOLEAN>
This prediction is based on the HIV dataset from MoleculeNet, which contains experimental inhibition data.

SIDER - Side Effects

Predict drug side effects by organ system (cardiovascular, digestive, nervous system, etc.).
query = "Are there any known side effects of <SMILES> CC1=CC(C)=C(NC(=O)CN(CC(=O)O)CC(=O)O)C(C)=C1Br </SMILES> affecting the heart?"
result = generator.generate(query)
print(result[0]['output'][0])
# Output: <BOOLEAN> No </BOOLEAN>

Available Organ Systems

query = "Does <SMILES> ... </SMILES> have cardiovascular side effects?"

Batch Predictions

Predict multiple properties for the same compound:
from LLM4Chem.generation import LlaSMolGeneration

generator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')
smiles = "CC(C)Cl"

properties = {
    "solubility": f"How soluble is <SMILES> {smiles} </SMILES>?",
    "toxicity": f"Is <SMILES> {smiles} </SMILES> toxic?",
    "bbbp": f"Can <SMILES> {smiles} </SMILES> cross the blood-brain barrier?",
    "lipo": f"What is the lipophilicity of <SMILES> {smiles} </SMILES>?"
}

results = {}
for prop_name, query in properties.items():
    result = generator.generate(query)
    results[prop_name] = result[0]['output'][0]
    print(f"{prop_name}: {results[prop_name]}")

With Agent Integration

Use the agent for automatic validation and error handling:
import asyncio
from plan_execute_agent.rdkit_agent import process_input

query = "Is <SMILES> CCO </SMILES> toxic?"

result, completed, attempts, llasmol_response, errors, formatted_input = \
    asyncio.run(process_input(query))

if completed:
    print(f"Result: {result}")
    print(f"Completed in {attempts} attempts")
else:
    print(f"Failed: {errors}")

Parsing Output Values

Extract numerical and boolean values from predictions:
import re

def extract_number(output):
    """Extract NUMBER tag value"""
    match = re.search(r'<NUMBER>\s*([\d.-]+)\s*</NUMBER>', output)
    return float(match.group(1)) if match else None

def extract_boolean(output):
    """Extract BOOLEAN tag value"""
    match = re.search(r'<BOOLEAN>\s*(Yes|No)\s*</BOOLEAN>', output, re.IGNORECASE)
    return match.group(1).lower() == 'yes' if match else None

# Example usage
query = "How soluble is <SMILES> CCO </SMILES>?"
result = generator.generate(query)
solubility = extract_number(result[0]['output'][0])
print(f"Solubility: {solubility} log S")

query = "Is <SMILES> CCO </SMILES> toxic?"
result = generator.generate(query)
is_toxic = extract_boolean(result[0]['output'][0])
print(f"Toxic: {is_toxic}")

Understanding Predictions

  • > 0: Highly soluble
  • -2 to 0: Moderately soluble
  • -4 to -2: Slightly soluble
  • < -4: Poorly soluble
  • < 0: Hydrophilic
  • 0 to 3: Moderate lipophilicity (drug-like)
  • > 3: Highly lipophilic
  • > 5: May have poor absorption
BBBP, Clintox, HIV, and SIDER return binary Yes/No predictions based on trained classifiers. These are probabilistic predictions based on structural features.

See Also

Build docs developers (and LLMs) love