Browser requirements
Minimum requirements
| Requirement | Specification |
|---|---|
| Browser version | Chrome 57+, Firefox 52+, Safari 11+, Edge 79+ |
| JavaScript | ES6 support required |
| WebGL | WebGL 2.0 recommended (WebGL 1.0 minimum) |
| Memory | 2 GB RAM minimum, 4 GB recommended |
| Network | Broadband connection for initial model download |
Recommended browsers
Chrome / Edge
Best performance with WebGL 2.0 and optimized TensorFlow.js backend
Firefox
Good performance with full WebGL support and SIMD acceleration
Safari
Compatible on macOS and iOS with WebGL support
Mobile browsers
Supported but slower on resource-constrained devices
Model loading performance
Initial load time
The model requires a one-time download when first accessed:- Model size: 99.3 MB (model.json + 25 weight shards)
- Network speed impact:
- Broadband (10 Mbps): ~80 seconds
- Fast connection (50 Mbps): ~16 seconds
- Very fast (100 Mbps): ~8 seconds
Browsers automatically cache the model files, so subsequent page visits load almost instantly from cache.
Loading optimization
The model uses parallel shard loading to maximize download efficiency:- Download
model.json(architecture definition) - Parse layer configuration and weight manifest
- Download 25 weight shards in parallel
- Reconstruct model weights from shards
- Initialize TensorFlow.js computation graph
Caching strategy
Caching strategy
Browser caching significantly improves load times for returning users:
- First visit: Full download (~99.3 MB)
- Subsequent visits: Cache validation only (~1-2 seconds)
- Cache duration: Controlled by HTTP cache headers
- Storage: Model files stored in browser HTTP cache
Cache-Control headers when serving model files:Inference performance
Prediction speed
Once loaded, the model performs real-time inference:| Hardware | Backend | Inference time |
|---|---|---|
| Desktop (GPU) | WebGL 2.0 | 50-150 ms |
| Desktop (CPU) | WASM/CPU | 200-500 ms |
| Mobile (high-end) | WebGL | 150-400 ms |
| Mobile (mid-range) | WebGL/CPU | 400-1000 ms |
TensorFlow.js automatically selects the fastest available backend (WebGL, WASM, or CPU).
Inference workflow
The prediction process consists of several stages:- Preprocessing: 5-10 ms (resize and tensor conversion)
- Model forward pass: 50-500 ms (varies by hardware)
- Post-processing: 1-2 ms (argmax and formatting)
- Total: 56-512 ms typical end-to-end time
Memory usage
Runtime memory footprint
- Model weights: ~99 MB in memory
- Activation tensors: ~15-25 MB during inference
- Input buffer: ~0.2 MB per image
- Total peak: ~125-140 MB
Memory optimization
Automatic memory management
Automatic memory management
TensorFlow.js handles tensor lifecycle automatically:
- Tensor disposal: Intermediate tensors freed after computation
- Garbage collection: WebGL textures released when out of scope
- Memory reuse: Buffers recycled across predictions
Handling memory constraints
Handling memory constraints
On low-memory devices, consider these strategies:
- Limit concurrent predictions: Process one image at a time
- Manual tensor cleanup: Use
tf.dispose()if needed - Monitor memory: Use
tf.memory()to track usage
Optimization strategies
WebGL acceleration
The model leverages GPU acceleration when available:WebGL backend provides 3-10x speedup compared to CPU-only execution.
- Parallel computation of convolution operations
- Efficient matrix multiplications in dense layers
- Hardware-accelerated activation functions
- Reduced memory transfers between CPU and GPU
Model quantization
The current model uses float32 precision:- Accuracy: Full precision for medical-grade predictions
- Trade-off: Larger file size vs. potential int8 quantization
- Future optimization: Could reduce to ~25 MB with 4x quantization
Batch inference
While the current implementation processes single images:Network considerations
Bandwidth optimization
Serving strategies
Serving strategies
Optimize model delivery with proper server configuration:Compression:
- Enable gzip/brotli compression for .bin files
- Typical compression ratio: 2-3x smaller transfer size
- Example: 99 MB → 33-50 MB over network
- Serve model files from CDN for global distribution
- Reduce latency with edge caching
- Handle traffic spikes without server load
- Multiplexed downloads of 25 shards
- Reduced connection overhead
- Better parallel loading performance
Offline support
Progressive Web App integration
Progressive Web App integration
The model can be cached for offline use:Benefits:
- Zero network latency on repeat visits
- Offline functionality
- Instant predictions without internet
Performance monitoring
Tracking inference time
Performance metrics to track
- Model load time: Time from page load to model ready
- Inference latency: Time per prediction
- Memory usage: Peak memory during inference
- Backend type: Which TensorFlow.js backend is active
- Frame rate: For real-time video inference scenarios
Use browser DevTools Performance tab to profile TensorFlow.js operations and identify bottlenecks.
Scalability
Concurrent users
Client-side inference scales horizontally:- No server bottleneck: Each user runs inference locally
- Zero backend load: Model computation happens in browser
- Cost efficiency: No GPU server infrastructure required