Deployment options

LFM models are designed to run efficiently across a wide range of environments, from mobile devices and laptops to cloud infrastructure. This guide provides an overview of where you can deploy LFM models and how to get started with each option.

Mobile deployment

Deploy LFM models natively on iOS and Android using the LEAP Edge SDK.

Android

Use case: On-device chat apps, audio processing, vision models, and structured output generationThe LEAP Edge SDK for Android (Kotlin) makes Small Language Model deployment as easy as calling a cloud LLM API endpoint. Perfect for building privacy-focused apps with real-time streaming and persistent conversation history.Example apps:

LeapChat — Full-featured chat with streaming
LeapAudioDemo — Audio input/output
VLM Example — Vision-language model integration
Recipe Generator — Structured output

iOS

Use case: SwiftUI chat applications, on-device inference, audio demos, and structured data generationThe LEAP Edge SDK for iOS (Swift) provides native integration with SwiftUI for building modern, responsive apps. Run LFM models entirely on-device for maximum privacy and offline capability.Example apps:

LeapChat — SwiftUI chat interface
Audio Demo — On-device audio processing
Recipe Generator — JSON structured output
LeapSloganExample — Basic text generation

The LEAP Edge SDK is optimized for small language models, making deployment simple and efficient. Check out the mobile deployment examples for complete implementation guides.

Desktop deployment

Run LFM models locally on your computer using popular inference frameworks.

Mac

Use case: Local development, testing, and privacy-focused applicationsRun LFM models natively on Mac using Metal acceleration for fast inference on Apple Silicon.Tools:

llama.cpp with Metal support
Ollama for easy model management
LM Studio for GUI-based interaction

llama.cpp

Use case: Command-line inference, custom integrations, maximum controlThe most flexible option for running LFM models locally. Supports CPU and GPU acceleration across platforms.Features:

Cross-platform support (Windows, Mac, Linux)
Quantization for reduced memory usage
Server mode for API-like access

Ollama

Use case: Quick local deployment, model management, REST API accessSimplifies running LFM models locally with automatic model downloading and management.Benefits:

One-command model installation
Built-in REST API
Easy switching between models

LM Studio

Use case: GUI-based model testing, local development, non-technical usersUser-friendly desktop application for running LFM models without command-line tools.Features:

Visual model browser
Interactive chat interface
Model performance monitoring

Cloud deployment

Deploy LFM models on cloud infrastructure for scalable, production workloads.

vLLM

Use case: High-throughput production serving, batch processingProduction-grade inference server optimized for serving LLMs at scale with continuous batching and efficient memory management.Key features:

PagedAttention for memory efficiency
Continuous batching for high throughput
OpenAI-compatible API

Modal

Use case: Serverless deployment, auto-scaling workloads, development to productionDeploy LFM models as serverless functions that scale automatically based on demand.Benefits:

Zero infrastructure management
Pay per execution
GPU access on demand

Baseten

Use case: Production ML deployment, model monitoring, team collaborationML infrastructure platform for deploying and scaling LFM models with built-in observability.Features:

One-click deployment
Built-in monitoring
Version management

Fal

Use case: Fast inference, real-time applications, edge cloud deploymentPlatform optimized for fast model serving with global edge infrastructure.Advantages:

Low latency worldwide
Simple API integration
Automatic scaling

Choosing a deployment option

Select the right deployment approach based on your requirements:

Privacy and offline capability

Best options: Mobile (iOS/Android), Desktop (llama.cpp, Ollama)Deploy models directly on user devices for:

Complete data privacy (no data leaves the device)
Offline functionality
Zero latency from network calls
No API costs

Scalability and high throughput

Best options: Cloud (vLLM, Baseten, Modal)Use cloud deployment when you need:

Auto-scaling based on demand
High concurrent request handling
Centralized model updates
Team collaboration features

Development and testing

Best options: Desktop (LM Studio, Ollama), Cloud (Modal)For rapid prototyping:

Quick model switching and testing
GUI tools for experimentation
Local development before production
Easy debugging and monitoring

Cost optimization

Best options: Mobile, Desktop, Serverless (Modal, Fal)Minimize costs by:

Running on user devices (zero server costs)
Pay-per-use serverless options
Efficient inference engines (vLLM)
Smaller models (LFM2 350M-3B)

Getting started

Ready to deploy? Here are your next steps:

Choose your platform

Select a deployment option based on your use case and requirements

Follow the quickstart

Use our platform-specific guides:

Explore examples

Check out working examples in the cookbook repository

Join the community

Get help and share your deployment on Discord

Additional resources

Community projects

See how others have deployed LFM models in production

Technical deep dives

Watch detailed deployment tutorials and best practices

LEAP SDK docs

Complete documentation for the LEAP Edge SDK

Model hub

Download LFM models from Hugging Face

Learning

Mobile deployment

Android

iOS

Desktop deployment

Mac

llama.cpp

Ollama

LM Studio

Cloud deployment

vLLM

Modal

Baseten

Fal

Choosing a deployment option

Getting started

Additional resources

Community projects

Technical deep dives

LEAP SDK docs

Model hub

Build docs developers (and LLMs) love

Learning

Documentation Index

​Mobile deployment

Android

iOS

​Desktop deployment

Mac

llama.cpp

Ollama

LM Studio

​Cloud deployment

vLLM

Modal

Baseten

Fal

​Choosing a deployment option

​Getting started

​Additional resources

Community projects

Technical deep dives

LEAP SDK docs

Model hub

Build docs developers (and LLMs) love

Mobile deployment

Desktop deployment

Cloud deployment

Choosing a deployment option

Getting started

Additional resources