Mobile deployment
Deploy LFM models natively on iOS and Android using the LEAP Edge SDK.Android
Use case: On-device chat apps, audio processing, vision models, and structured output generationThe LEAP Edge SDK for Android (Kotlin) makes Small Language Model deployment as easy as calling a cloud LLM API endpoint. Perfect for building privacy-focused apps with real-time streaming and persistent conversation history.Example apps:
- LeapChat — Full-featured chat with streaming
- LeapAudioDemo — Audio input/output
- VLM Example — Vision-language model integration
- Recipe Generator — Structured output
iOS
Use case: SwiftUI chat applications, on-device inference, audio demos, and structured data generationThe LEAP Edge SDK for iOS (Swift) provides native integration with SwiftUI for building modern, responsive apps. Run LFM models entirely on-device for maximum privacy and offline capability.Example apps:
- LeapChat — SwiftUI chat interface
- Audio Demo — On-device audio processing
- Recipe Generator — JSON structured output
- LeapSloganExample — Basic text generation
Desktop deployment
Run LFM models locally on your computer using popular inference frameworks.Mac
Use case: Local development, testing, and privacy-focused applicationsRun LFM models natively on Mac using Metal acceleration for fast inference on Apple Silicon.Tools:
- llama.cpp with Metal support
- Ollama for easy model management
- LM Studio for GUI-based interaction
llama.cpp
Use case: Command-line inference, custom integrations, maximum controlThe most flexible option for running LFM models locally. Supports CPU and GPU acceleration across platforms.Features:
- Cross-platform support (Windows, Mac, Linux)
- Quantization for reduced memory usage
- Server mode for API-like access
Ollama
Use case: Quick local deployment, model management, REST API accessSimplifies running LFM models locally with automatic model downloading and management.Benefits:
- One-command model installation
- Built-in REST API
- Easy switching between models
LM Studio
Use case: GUI-based model testing, local development, non-technical usersUser-friendly desktop application for running LFM models without command-line tools.Features:
- Visual model browser
- Interactive chat interface
- Model performance monitoring
Cloud deployment
Deploy LFM models on cloud infrastructure for scalable, production workloads.vLLM
Use case: High-throughput production serving, batch processingProduction-grade inference server optimized for serving LLMs at scale with continuous batching and efficient memory management.Key features:
- PagedAttention for memory efficiency
- Continuous batching for high throughput
- OpenAI-compatible API
Modal
Use case: Serverless deployment, auto-scaling workloads, development to productionDeploy LFM models as serverless functions that scale automatically based on demand.Benefits:
- Zero infrastructure management
- Pay per execution
- GPU access on demand
Baseten
Use case: Production ML deployment, model monitoring, team collaborationML infrastructure platform for deploying and scaling LFM models with built-in observability.Features:
- One-click deployment
- Built-in monitoring
- Version management
Fal
Use case: Fast inference, real-time applications, edge cloud deploymentPlatform optimized for fast model serving with global edge infrastructure.Advantages:
- Low latency worldwide
- Simple API integration
- Automatic scaling
Choosing a deployment option
Select the right deployment approach based on your requirements:Privacy and offline capability
Privacy and offline capability
Best options: Mobile (iOS/Android), Desktop (llama.cpp, Ollama)Deploy models directly on user devices for:
- Complete data privacy (no data leaves the device)
- Offline functionality
- Zero latency from network calls
- No API costs
Scalability and high throughput
Scalability and high throughput
Best options: Cloud (vLLM, Baseten, Modal)Use cloud deployment when you need:
- Auto-scaling based on demand
- High concurrent request handling
- Centralized model updates
- Team collaboration features
Development and testing
Development and testing
Best options: Desktop (LM Studio, Ollama), Cloud (Modal)For rapid prototyping:
- Quick model switching and testing
- GUI tools for experimentation
- Local development before production
- Easy debugging and monitoring
Cost optimization
Cost optimization
Best options: Mobile, Desktop, Serverless (Modal, Fal)Minimize costs by:
- Running on user devices (zero server costs)
- Pay-per-use serverless options
- Efficient inference engines (vLLM)
- Smaller models (LFM2 350M-3B)
Getting started
Ready to deploy? Here are your next steps:Follow the quickstart
Use our platform-specific guides:
Explore examples
Check out working examples in the cookbook repository
Join the community
Get help and share your deployment on Discord
Additional resources
Community projects
See how others have deployed LFM models in production
Technical deep dives
Watch detailed deployment tutorials and best practices
LEAP SDK docs
Complete documentation for the LEAP Edge SDK
Model hub
Download LFM models from Hugging Face