Skip to main content
LFM models are designed to run efficiently across a wide range of environments, from mobile devices and laptops to cloud infrastructure. This guide provides an overview of where you can deploy LFM models and how to get started with each option.

Mobile deployment

Deploy LFM models natively on iOS and Android using the LEAP Edge SDK.

Android

Use case: On-device chat apps, audio processing, vision models, and structured output generationThe LEAP Edge SDK for Android (Kotlin) makes Small Language Model deployment as easy as calling a cloud LLM API endpoint. Perfect for building privacy-focused apps with real-time streaming and persistent conversation history.Example apps:
  • LeapChat — Full-featured chat with streaming
  • LeapAudioDemo — Audio input/output
  • VLM Example — Vision-language model integration
  • Recipe Generator — Structured output

iOS

Use case: SwiftUI chat applications, on-device inference, audio demos, and structured data generationThe LEAP Edge SDK for iOS (Swift) provides native integration with SwiftUI for building modern, responsive apps. Run LFM models entirely on-device for maximum privacy and offline capability.Example apps:
  • LeapChat — SwiftUI chat interface
  • Audio Demo — On-device audio processing
  • Recipe Generator — JSON structured output
  • LeapSloganExample — Basic text generation
The LEAP Edge SDK is optimized for small language models, making deployment simple and efficient. Check out the mobile deployment examples for complete implementation guides.

Desktop deployment

Run LFM models locally on your computer using popular inference frameworks.

Mac

Use case: Local development, testing, and privacy-focused applicationsRun LFM models natively on Mac using Metal acceleration for fast inference on Apple Silicon.Tools:
  • llama.cpp with Metal support
  • Ollama for easy model management
  • LM Studio for GUI-based interaction

llama.cpp

Use case: Command-line inference, custom integrations, maximum controlThe most flexible option for running LFM models locally. Supports CPU and GPU acceleration across platforms.Features:
  • Cross-platform support (Windows, Mac, Linux)
  • Quantization for reduced memory usage
  • Server mode for API-like access

Ollama

Use case: Quick local deployment, model management, REST API accessSimplifies running LFM models locally with automatic model downloading and management.Benefits:
  • One-command model installation
  • Built-in REST API
  • Easy switching between models

LM Studio

Use case: GUI-based model testing, local development, non-technical usersUser-friendly desktop application for running LFM models without command-line tools.Features:
  • Visual model browser
  • Interactive chat interface
  • Model performance monitoring

Cloud deployment

Deploy LFM models on cloud infrastructure for scalable, production workloads.

vLLM

Use case: High-throughput production serving, batch processingProduction-grade inference server optimized for serving LLMs at scale with continuous batching and efficient memory management.Key features:
  • PagedAttention for memory efficiency
  • Continuous batching for high throughput
  • OpenAI-compatible API

Modal

Use case: Serverless deployment, auto-scaling workloads, development to productionDeploy LFM models as serverless functions that scale automatically based on demand.Benefits:
  • Zero infrastructure management
  • Pay per execution
  • GPU access on demand

Baseten

Use case: Production ML deployment, model monitoring, team collaborationML infrastructure platform for deploying and scaling LFM models with built-in observability.Features:
  • One-click deployment
  • Built-in monitoring
  • Version management

Fal

Use case: Fast inference, real-time applications, edge cloud deploymentPlatform optimized for fast model serving with global edge infrastructure.Advantages:
  • Low latency worldwide
  • Simple API integration
  • Automatic scaling

Choosing a deployment option

Select the right deployment approach based on your requirements:
Best options: Mobile (iOS/Android), Desktop (llama.cpp, Ollama)Deploy models directly on user devices for:
  • Complete data privacy (no data leaves the device)
  • Offline functionality
  • Zero latency from network calls
  • No API costs
Best options: Cloud (vLLM, Baseten, Modal)Use cloud deployment when you need:
  • Auto-scaling based on demand
  • High concurrent request handling
  • Centralized model updates
  • Team collaboration features
Best options: Desktop (LM Studio, Ollama), Cloud (Modal)For rapid prototyping:
  • Quick model switching and testing
  • GUI tools for experimentation
  • Local development before production
  • Easy debugging and monitoring
Best options: Mobile, Desktop, Serverless (Modal, Fal)Minimize costs by:
  • Running on user devices (zero server costs)
  • Pay-per-use serverless options
  • Efficient inference engines (vLLM)
  • Smaller models (LFM2 350M-3B)

Getting started

Ready to deploy? Here are your next steps:
1

Choose your platform

Select a deployment option based on your use case and requirements
2

Follow the quickstart

Use our platform-specific guides:
3

Explore examples

Check out working examples in the cookbook repository
4

Join the community

Get help and share your deployment on Discord

Additional resources

Community projects

See how others have deployed LFM models in production

Technical deep dives

Watch detailed deployment tutorials and best practices

LEAP SDK docs

Complete documentation for the LEAP Edge SDK

Model hub

Download LFM models from Hugging Face

Build docs developers (and LLMs) love