Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/pranavkrishnasuresh/chemAgent/llms.txt

Use this file to discover all available pages before exploring further.

Overview

ChemAgent provides a LOW_VRAM configuration flag to control whether the LlaSMol model is loaded. The LlaSMol model requires at least 15GB of VRAM to run properly.

Configuration File

The VRAM setting is controlled in:
plan_execute_agent/config.py
# Flag to avoid running LLaSmol with <15GB VRAM (MIN REQUIREMENT)
LOW_VRAM = True
The LlaSMol model requires a minimum of 15GB VRAM. Systems with less VRAM should keep LOW_VRAM = True.

VRAM Modes

Low VRAM Mode (Default)

Configuration:
LOW_VRAM = True
Behavior:
  • LlaSMol model is NOT loaded (chem_tools.py:115-119)
  • answer_chemistry_query tool will raise a RuntimeError if called
  • Agent relies entirely on OpenAI GPT-4o for chemistry queries
  • Suitable for systems with less than 15GB VRAM
Error Handling: When LOW_VRAM = True, attempting to use the chemistry query tool will produce:
RuntimeError: answer_chemistry_query tool cannot be used with LOW_VRAM enabled.
The model response is set to:
"LlaSmol model unused. Low VRAM enabled."

High VRAM Mode (Cluster/GPU)

Configuration:
LOW_VRAM = False
Requirements:
  • Minimum 15GB VRAM
  • CUDA-enabled GPU
  • PyTorch with CUDA support
Behavior:
  • LlaSMol model is loaded into GPU memory
  • answer_chemistry_query tool becomes available
  • Model uses bfloat16 precision for memory efficiency
  • Automatic device mapping with device_map="auto"

Implementation Details

Conditional Loading

The VRAM flag controls model initialization in plan_execute_agent/chem_tools.py:
chem_tools.py:115-119
# Tool to use LlaSmol to answer prompts related to Chemistry
# Won't initialize with low VRAM
if not LOW_VRAM:
    from LLM4Chem.generation import LlaSMolGeneration
    generator = LlaSMolGeneration("osunlp/LlaSMol-Mistral-7B", device="cuda")
else:
    generator = None

Runtime Checks

The answer_chemistry_query tool validates VRAM mode:
chem_tools.py:148-151
if LOW_VRAM:
    llasmol_response.model_response = "LlaSmol model unused. Low VRAM enabled."
    raise RuntimeError(
        "answer_chemistry_query tool cannot be used with LOW_VRAM enabled."
    )

Model Memory Usage

When loaded, the LlaSMol model uses:

Memory Optimizations

  1. bfloat16 Precision (model.py:38, 45)
    model = AutoModelForCausalLM.from_pretrained(
        base_model,
        torch_dtype=torch.bfloat16,
        device_map="auto",
    )
    
  2. PEFT/LoRA Loading (model.py:42-46)
    model = PeftModelForCausalLM.from_pretrained(
        model,
        model_name,
        torch_dtype=torch.bfloat16,
    )
    
  3. Model Merging (model.py:50)
    model = model.merge_and_unload()
    
  4. Torch Compilation (model.py:58-59)
    if torch.__version__ >= "2" and sys.platform != "win32":
        model = torch.compile(model)
    

Device Selection

The model automatically detects available devices:
model.py:10-16
def get_device():
    if torch.cuda.is_available():
        device = "cuda"
    else:
        device = "cpu"
    return device
Currently, CPU-only inference is not implemented. The model loader raises NotImplementedError for CPU devices (model.py:48).

Configuration for Different Environments

Local Development (Low VRAM)

plan_execute_agent/config.py
LOW_VRAM = True
Suitable for:
  • Laptops with consumer GPUs (less than 15GB VRAM)
  • Development machines with limited GPU memory
  • Testing agent logic without model inference

Cluster/Production (High VRAM)

plan_execute_agent/config.py
LOW_VRAM = False
Suitable for:
  • NVIDIA A100 (40GB/80GB)
  • NVIDIA V100 (16GB/32GB)
  • NVIDIA RTX 3090 (24GB)
  • Cloud GPU instances with ≥15GB VRAM

Troubleshooting

Out of Memory Errors

If you encounter CUDA OOM errors:
  1. Verify VRAM availability:
    nvidia-smi
    
  2. Check available memory:
    import torch
    print(f"Available VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    
  3. Set LOW_VRAM = True if VRAM < 15GB

Model Not Loading

If the model fails to load:
# Check CUDA availability
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
The same LOW_VRAM flag exists in:
  • plan_execute_agent/config.py:2 (active flag)
  • LLM4Chem/config.py:2 (legacy, not actively used)
Only modify the flag in plan_execute_agent/config.py. The flag in LLM4Chem/config.py is not referenced by the agent.

Next Steps

Model Selection

Choose which LlaSMol model to use

Environment Setup

Configure API keys and environment variables

Build docs developers (and LLMs) love