Serving External Models via the LeanDojo v2 API

The LeanDojo v2 external API is a lightweight FastAPI server that bridges Lean 4 editor extensions with large language models. When a user invokes a tactic suggestion in the Lean editor via LeanCopilot, the extension sends the current proof state to this server over HTTP, receives ranked tactic candidates, and inserts the best suggestion into the editor. The server is entirely separate from the in-process Python prover stack — it is designed to be deployed on a GPU node and queried remotely.

Setting Up the Server

Create a dedicated conda environment

Isolate the server’s dependencies from your main LeanDojo environment to avoid version conflicts with vllm and other heavy packages.

conda create --name lean-copilot python=3.10 numpy
conda activate lean-copilot

Install Python dependencies

Install PyTorch first (select the wheel that matches your CUDA version), then the remaining packages:

pip install torch --index-url https://download.pytorch.org/whl/cu121
pip install fastapi uvicorn loguru transformers openai anthropic google.generativeai vllm

vllm requires a CUDA-capable GPU. Omit it if you are running on CPU or will only use the HuggingFace Inference API backend.

Start the server

Change into the server directory and launch uvicorn on port 23337 (the port LeanCopilot defaults to):

cd lean_dojo_v2/external_api/python/
uvicorn server:app --port 23337

The server logs startup messages via loguru. Once you see Application startup complete, the endpoints are live.

Available Endpoints

The server (server.py) exposes two POST routes:

`POST /generate`

Accepts a tactic generation request and returns ranked tactic candidates. Optionally integrates a LeanProgress scorer to attach reward signals to each candidate. Request body:

{
  "name": "deepseek",
  "input": "n : ℕ\n⊢ gcd n n = n",
  "prefix": null,
  "use_reward": false
}

Field	Type	Description
`name`	`string`	Model bundle key registered in `server.py` (e.g., `"deepseek"`).
`input`	`string`	The current Lean proof state (goal string).
`prefix`	`string \| null`	Optional tactic prefix to constrain generation.
`use_reward`	`boolean`	If `true`, runs each candidate through the LeanProgress scorer and returns `steps_remaining` and `reward` fields. Requires `LEANPROGRESS_MODEL` to be set.

Response body:

{
  "outputs": [
    {
      "output": "exact Nat.gcd_self n",
      "score": 0.91,
      "model_score": 0.91,
      "steps_remaining": null,
      "reward": null
    }
  ]
}

Each Generation object in outputs has:

Field	Type	Description
`output`	`string`	The generated tactic text.
`score`	`float`	Composite score (equals `reward` when `use_reward=true`, otherwise the model log-probability score).
`model_score`	`float \| null`	Raw log-probability score from the generation model. Only populated when `use_reward=true`.
`steps_remaining`	`float \| null`	Predicted steps remaining from LeanProgress (only populated when `use_reward=true`).
`reward`	`float \| null`	Negative of `steps_remaining` — higher is better (only when `use_reward=true`).

If use_reward: true is sent but no LEANPROGRESS_MODEL environment variable is configured, the server returns HTTP 400 with "LeanProgress scoring requested but no model configured." Set the variable before starting the server.

`POST /encode`

Returns a dense embedding vector for the given input string. Only available for model bundles whose generator implements the Encoder mixin. Request body:

{
  "name": "deepseek",
  "input": "n : ℕ\n⊢ gcd n n = n"
}

Response body:

{
  "outputs": [0.021, -0.134, 0.879, "..."]
}

If the requested model does not support encoding, the server returns HTTP 400.

Model Bundles

The server registers models as ModelBundle objects that pair a Generator with an optional LeanProgressScorer:

from external_models import HFTacticGenerator
from leanprogress import LeanProgressScorer
from dataclasses import dataclass
from typing import Optional

@dataclass
class ModelBundle:
    generator: Generator
    progress_scorer: Optional[LeanProgressScorer] = None

The default bundle registered in server.py wraps deepseek-ai/DeepSeek-Prover-V2-671B:novita via the HuggingFace Inference router:

models = {
    "deepseek": ModelBundle(
        generator=HFTacticGenerator(
            model="deepseek-ai/DeepSeek-Prover-V2-671B:novita",
            temperature=0.6,
            max_new_tokens=256,
            top_p=0.9,
            length_penalty=0,
            num_return_sequences=4,
            do_sample=True,
            output_scores=True,
            output_logits=False,
            return_dict_in_generate=True,
            device="auto",
        ),
        progress_scorer=_build_progress_scorer(),
    ),
}

Add additional bundles by extending the models dictionary with new keys before starting the server.

`HFTacticGenerator` — HuggingFace Inference API Backend

The HFTacticGenerator class (external_models/hf_runner.py) calls the HuggingFace Inference Router (https://router.huggingface.co/v1/chat/completions) and requires the HF_TOKEN environment variable:

from lean_dojo_v2.external_api.python.external_models.hf_runner import HFTacticGenerator

generator = HFTacticGenerator(
    model_name="deepseek-ai/DeepSeek-Prover-V2-671B:novita"
)

# Generate a tactic candidate for a proof state
tactic = generator.generate("n : ℕ\n⊢ gcd n n = n")
# Returns a tactic string, or "sorry" on failure

# Generate a complete proof for a theorem statement
proof = generator.generate_whole_proof(
    "theorem gcd_self (n : ℕ) : Nat.gcd n n = n := by"
)

HFTacticGenerator also deduplicates across calls: once a (input, tactic) pair has been returned, it skips that pair and tries the next candidate, reducing redundant suggestions during beam search.

`ExternalProver` and `ExternalAgent`

The Python-side ExternalAgent / ExternalProver classes use HFTacticGenerator directly — they do not go through the HTTP server. They call the HuggingFace Inference API in-process:

from lean_dojo_v2.agent import ExternalAgent

agent = ExternalAgent(model_name="deepseek-ai/DeepSeek-Prover-V2-671B:novita")
agent.setup_github_repository(
    url="https://github.com/durant42040/lean4-example",
    commit="3e23ab0bfdcfdbd5b11ab53c2cd8b5d16492e9c2",
)
agent.prove(whole_proof=True)

The HTTP server is for in-editor use via LeanCopilot’s Lean 4 frontend. The Python ExternalAgent and ExternalProver classes bypass the server entirely and call the HuggingFace Inference API directly from Python. You only need to run the server if you are using the Lean 4 editor extension.

Lean 4 Editor Frontend

The Lean 4 side of LeanCopilot communicates with the running server using the ExternalAPI.lean module, which provides a #suggest_tactics command and related tactics. After the server is running:

Open a Lean 4 file that imports LeanCopilot.
Invoke the suggestion tactic — LeanCopilot sends the goal over HTTP and displays the returned candidates inline.

Testing the Server

After the server is up, run the integration tests in LeanCopilotTests/ModelAPIs.lean to verify the connection:

# From the repo root, with the server already running on port 23337
lake test ModelAPIs

The test file sends sample proof states to each registered model and asserts that valid tactic strings are returned.

For LeanProgress-enhanced scoring, set LEANPROGRESS_MODEL to the path of a trained checkpoint and LEANPROGRESS_DEVICE to "cuda" (or "cpu") before starting the server. See the LeanProgress guide for training instructions.

Get Started

Concepts

Guides

Reference

Serving External Models via the LeanDojo v2 API

Setting Up the Server

Available Endpoints

`POST /generate`

`POST /encode`

Model Bundles

`HFTacticGenerator` — HuggingFace Inference API Backend

`ExternalProver` and `ExternalAgent`

Lean 4 Editor Frontend

Testing the Server

Build docs developers (and LLMs) love

Get Started

Concepts

Guides

Reference

Documentation Index

​Setting Up the Server

​Available Endpoints

​POST /generate

​POST /encode

​Model Bundles

​HFTacticGenerator — HuggingFace Inference API Backend

​ExternalProver and ExternalAgent

​Lean 4 Editor Frontend

​Testing the Server

Build docs developers (and LLMs) love

Setting Up the Server

Available Endpoints

`POST /generate`

`POST /encode`

Model Bundles

`HFTacticGenerator` — HuggingFace Inference API Backend

`ExternalProver` and `ExternalAgent`

Lean 4 Editor Frontend

Testing the Server