Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vrashmanyu605-eng/Langchain_Interview_Multi_Agents_Flow/llms.txt

Use this file to discover all available pages before exploring further.

This project uses a locally-hosted language model instead of a cloud API. Running the LLM locally means resume data and job descriptions never leave your machine — important for hiring workflows that handle personally identifiable information. It also removes per-token API costs, which matters when running long multi-agent pipelines that invoke the LLM once per agent per run. LM Studio provides a lightweight desktop application that loads open-weight models and exposes them on an OpenAI-compatible HTTP server, so the existing ChatOpenAI client in llm.py works without modification.

Set up LM Studio and start the server

1

Download LM Studio

Go to lmstudio.ai and download the installer for your operating system. LM Studio supports macOS, Windows, and Linux. Install and launch it.
2

Load a model

In the LM Studio interface, search for and download an instruction-tuned model. The workflow is configured for gemma-3-4b-it by default, which runs comfortably on machines with 8 GB of RAM. For better JSON output quality — especially from the supervisor — a 7B or larger model is recommended.Instruction-tuned models (those with -it, -instruct, or -chat in their name) follow prompts more reliably than base models. Avoid base models for this workflow.
3

Start the local server

In LM Studio, open the Local Server tab (the <-> icon in the left sidebar). Select the model you loaded, then click Start Server. By default the server listens on http://localhost:1234. Leave the default port unless you have a conflict.
4

Verify the endpoint is live

With the server running, confirm it responds correctly:
curl http://localhost:1234/v1/models
You should see a JSON response listing the loaded model. If you get a connection refused error, check that the LM Studio server is started and the port matches.
5

Update llm.py if needed

If you use a different model name or port, open llm.py and update the relevant fields:
llm = ChatOpenAI(
    model="your-model-name-here",        # must match the name shown in LM Studio
    base_url="http://localhost:1234/v1",  # update port if you changed it
    api_key="lm-studio",
    temperature=0.3,
    streaming=True
)

Understanding the llm.py configuration

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gemma-3-4b-it",
    base_url="http://localhost:1234/v1",
    api_key="lm-studio",
    temperature=0.3,
    streaming=True
)
FieldValuePurpose
model"gemma-3-4b-it"The model identifier sent in each API request. Must match exactly what LM Studio reports.
base_url"http://localhost:1234/v1"Points the client at the local server instead of OpenAI’s cloud.
api_key"lm-studio"LM Studio requires a non-empty string; the value is ignored.
temperature0.3Controls output randomness. Lower values produce more consistent JSON.
streamingTrueStreams tokens as they are generated. Required for LangGraph’s streaming mode to work correctly.

Using a different provider

Because the client uses the standard OpenAI API interface, you can swap base_url to point at any compatible server.
from langchain_openai import ChatOpenAI

# OpenAI
llm = ChatOpenAI(model="gpt-4o", api_key="sk-...")

Model recommendations

Model sizeSuitability
3B–4BAdequate for single-output agents (resume parser, email). Struggles with complex routing logic in the supervisor.
7BGood balance of speed and quality. Handles JSON output reliably on most agents.
13B+Best JSON consistency and supervisor reliability. Requires more RAM (16 GB+).
Instruction-tuned variants (those ending in -it, -instruct, or -chat) are required. Base models do not reliably follow the structured prompt format used by each agent.
The workflow relies on the LLM returning valid JSON. If responses are malformed, the supervisor will catch parse errors and route to finished. Use a model with strong instruction-following. If you see the supervisor terminating early with a parse error, try switching to a larger or more capable model before changing the prompt.

Build docs developers (and LLMs) love