Documentation Index
Fetch the complete documentation index at: https://mintlify.com/pranavkrishnasuresh/chemAgent/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Thevalidate_smiles_rdkit tool validates SMILES (Simplified Molecular Input Line Entry System) strings using RDKit and a custom chemistry parser. It detects both syntax and semantic issues, providing detailed validity vectors that pinpoint exact error locations.
Function Signature
Parameters
The SMILES notation string to validate. Can contain syntax errors, semantic issues, or be completely valid.
Response
True if the SMILES string is valid, False if any errors are detected.Contains the validity vector information. For valid SMILES, this is a string of
1s. For invalid SMILES, this contains detailed error descriptions with validity vectors.Validity Vector Format
The validity vector is a binary string where:1= valid character at this position0= invalid character at this position
Format Structure
For invalid SMILES, the error message follows this pattern:Error Categories
The parser detects four main categories of errors:1. Unclosed Ring
Detects rings that are opened but never closed. Example:0 at position 0 indicates the unclosed ring marker 1.
2. Invalid Character
Detects unrecognized characters or syntax errors in SMILES notation. Example:0s at positions 4-5 indicate the invalid characters Q.
3. Invalid Parentheses
Detects mismatched or unclosed parentheses. Example:0 at position 8 indicates the extra closing parenthesis.
4. Semantic Issues
Detects chemistry problems flagged by RDKit’sDetectChemistryProblems, such as:
- Explicit valence errors
- Kekulization failures
- Aromaticity issues
- Radical electrons
Validation Process
The tool performs validation in three stages:Stage 1: Syntax Validation
Uses PartialSMILES parser to detect:- Invalid characters using SMILES tokenizer pattern
- Unclosed or mismatched parentheses
- Unclosed ring markers (including
%(N)notation)
Stage 2: Molecule Creation
Stage 3: Semantic Analysis
For successfully created molecules:Usage Examples
Example 1: Valid SMILES
Input:Example 2: Invalid Character
Input:. at position 6 is invalid in this context.
Example 3: Unclosed Ring
Input:Example 4: Multiple Errors
Input:Example 5: Semantic Error (Valence)
Input:Example 6: Ring Notation with %(N)
Input (Valid):Integration with Chemistry Parser
The tool uses the customparse_smiles() function from chemistry_parser.py:
Error Logging
When invalid SMILES are detected, errors are logged tollasmol_response.errors:
$ for parsing by other components.
SMILES Tokenizer Pattern
The validator uses the SMILES tokenizer from the Molecular Transformer:- Bracketed atoms:
[NH3+],[C@@H] - Elements:
Br,Cl,N,O,S,P,F,I - Aromatic:
b,c,n,o,s,p - Bonds:
=,#,-,\,/ - Branches:
(,) - Rings:
1-9,%10-%99 - Special:
.,+,@, etc.
Best Practices
- Validate before processing: Always validate SMILES before passing to
answer_chemistry_query - Parse validity vectors: Extract the position of errors from the binary string
- Handle multiple errors: Split comma-separated error messages
- Log errors: Store validation failures for debugging and analysis
Workflow Integration
Typical validation workflow:RDKit Configuration
The tool configures RDKit logging:References
- RDKit Documentation: Chemistry Problems Detection
- Molecular Transformer: SMILES Tokenization
- PartialSMILES: Syntax error detection library