VRAM Settings

Overview

ChemAgent provides a LOW_VRAM configuration flag to control whether the LlaSMol model is loaded. The LlaSMol model requires at least 15GB of VRAM to run properly.

Configuration File

The VRAM setting is controlled in:

plan_execute_agent/config.py

# Flag to avoid running LLaSmol with <15GB VRAM (MIN REQUIREMENT)
LOW_VRAM = True

The LlaSMol model requires a minimum of 15GB VRAM. Systems with less VRAM should keep LOW_VRAM = True.

VRAM Modes

Low VRAM Mode (Default)

Configuration:

LOW_VRAM = True

Behavior:

LlaSMol model is NOT loaded (chem_tools.py:115-119)
answer_chemistry_query tool will raise a RuntimeError if called
Agent relies entirely on OpenAI GPT-4o for chemistry queries
Suitable for systems with less than 15GB VRAM

Error Handling: When LOW_VRAM = True, attempting to use the chemistry query tool will produce:

RuntimeError: answer_chemistry_query tool cannot be used with LOW_VRAM enabled.

The model response is set to:

"LlaSmol model unused. Low VRAM enabled."

High VRAM Mode (Cluster/GPU)

Configuration:

LOW_VRAM = False

Requirements:

Minimum 15GB VRAM
CUDA-enabled GPU
PyTorch with CUDA support

Behavior:

LlaSMol model is loaded into GPU memory
answer_chemistry_query tool becomes available
Model uses bfloat16 precision for memory efficiency
Automatic device mapping with device_map="auto"

Implementation Details

Conditional Loading

The VRAM flag controls model initialization in plan_execute_agent/chem_tools.py:

chem_tools.py:115-119

# Tool to use LlaSmol to answer prompts related to Chemistry
# Won't initialize with low VRAM
if not LOW_VRAM:
    from LLM4Chem.generation import LlaSMolGeneration
    generator = LlaSMolGeneration("osunlp/LlaSMol-Mistral-7B", device="cuda")
else:
    generator = None

Runtime Checks

The answer_chemistry_query tool validates VRAM mode:

chem_tools.py:148-151

if LOW_VRAM:
    llasmol_response.model_response = "LlaSmol model unused. Low VRAM enabled."
    raise RuntimeError(
        "answer_chemistry_query tool cannot be used with LOW_VRAM enabled."
    )

Model Memory Usage

When loaded, the LlaSMol model uses:

Memory Optimizations

bfloat16 Precision (model.py:38, 45)

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

PEFT/LoRA Loading (model.py:42-46)

model = PeftModelForCausalLM.from_pretrained(
    model,
    model_name,
    torch_dtype=torch.bfloat16,
)

Model Merging (model.py:50)
```
model = model.merge_and_unload()
```

Torch Compilation (model.py:58-59)

if torch.__version__ >= "2" and sys.platform != "win32":
    model = torch.compile(model)

Device Selection

The model automatically detects available devices:

model.py:10-16

def get_device():
    if torch.cuda.is_available():
        device = "cuda"
    else:
        device = "cpu"
    return device

Currently, CPU-only inference is not implemented. The model loader raises NotImplementedError for CPU devices (model.py:48).

Configuration for Different Environments

Local Development (Low VRAM)

plan_execute_agent/config.py

LOW_VRAM = True

Suitable for:

Laptops with consumer GPUs (less than 15GB VRAM)
Development machines with limited GPU memory
Testing agent logic without model inference

Cluster/Production (High VRAM)

plan_execute_agent/config.py

LOW_VRAM = False

Suitable for:

NVIDIA A100 (40GB/80GB)
NVIDIA V100 (16GB/32GB)
NVIDIA RTX 3090 (24GB)
Cloud GPU instances with ≥15GB VRAM

Troubleshooting

Out of Memory Errors

If you encounter CUDA OOM errors:

Verify VRAM availability:
```
nvidia-smi
```

Check available memory:

import torch
print(f"Available VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

Set LOW_VRAM = True if VRAM < 15GB

Model Not Loading

If the model fails to load:

# Check CUDA availability
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")

The same LOW_VRAM flag exists in:

plan_execute_agent/config.py:2 (active flag)
LLM4Chem/config.py:2 (legacy, not actively used)

Only modify the flag in plan_execute_agent/config.py. The flag in LLM4Chem/config.py is not referenced by the agent.

Get Started

Core Concepts

Guides

Configuration

Overview

Configuration File

VRAM Modes

Low VRAM Mode (Default)

High VRAM Mode (Cluster/GPU)

Implementation Details

Conditional Loading

Runtime Checks

Model Memory Usage

Memory Optimizations

Device Selection

Configuration for Different Environments

Local Development (Low VRAM)

Cluster/Production (High VRAM)

Troubleshooting

Out of Memory Errors

Model Not Loading

Next Steps

Model Selection

Environment Setup

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Configuration

Documentation Index

​Overview

​Configuration File

​VRAM Modes

​Low VRAM Mode (Default)

​High VRAM Mode (Cluster/GPU)

​Implementation Details

​Conditional Loading

​Runtime Checks

​Model Memory Usage

​Memory Optimizations

​Device Selection

​Configuration for Different Environments

​Local Development (Low VRAM)

​Cluster/Production (High VRAM)

​Troubleshooting

​Out of Memory Errors

​Model Not Loading

​Related Configuration

​Next Steps

Model Selection

Environment Setup

Build docs developers (and LLMs) love

Overview

Configuration File

VRAM Modes

Low VRAM Mode (Default)

High VRAM Mode (Cluster/GPU)

Implementation Details

Conditional Loading

Runtime Checks

Model Memory Usage

Memory Optimizations

Device Selection

Configuration for Different Environments

Local Development (Low VRAM)

Cluster/Production (High VRAM)

Troubleshooting

Out of Memory Errors

Model Not Loading

Related Configuration

Next Steps