The vector database is the storage and retrieval system for face embeddings. FaceNet Android uses ObjectBox with HNSW indexing to enable fast similarity search. This page explains how embeddings are stored, indexed, and queried.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/shubham0204/OnDevice-Face-Recognition-Android/llms.txt
Use this file to discover all available pages before exploring further.
What is a vector database?
A vector database specializes in storing and searching high-dimensional vectors (embeddings). Unlike traditional databases that search for exact matches, vector databases find similar vectors using distance metrics. For face recognition:- Each face is represented as a 512D embedding
- Query: “Which stored embedding is most similar to this new face?”
- Result: Nearest neighbor(s) with similarity scores
ObjectBox overview
ObjectBox is a high-performance NoSQL database for mobile devices with built-in vector search capabilities.Key features
- HNSW indexing: Fast approximate nearest-neighbor search
- Native code: Written in C++ for speed
- On-device: No network latency or privacy concerns
- Typed entities: Compile-time safety with Kotlin data classes
- Automatic indexing: Vectors indexed on insert
Database initialization
The database is initialized once at app startup:MainApplication.onCreate():
Data model
TheFaceImageRecord entity stores embeddings with metadata:
Field descriptions
recordID: Auto-generated unique identifier for each face recordpersonID: Foreign key linking toPersonRecord(one person can have multiple face images)personName: Denormalized name for quick access without joinsfaceEmbedding: The 512-dimensional embedding vector with HNSW index
HNSW index configuration
The@HnswIndex annotation configures vector search:
COSINE: Measures angular similarity (used in the app)EUCLIDEAN: Measures straight-line distanceDOT_PRODUCT: Measures alignment
Cosine distance is ideal for face embeddings because it’s invariant to vector magnitude, focusing purely on directional similarity.
Database operations
TheImagesVectorDB class wraps ObjectBox operations:
Adding embeddings
Insert a new face embedding:- Generates a unique
recordID - Stores the record in the database
- Updates the HNSW index with the new embedding
Searching embeddings
Find the nearest neighbor to a query embedding:1. HNSW search (default, fast)
nearestNeighbors(embedding, 10): Find up to 10 nearest neighborsfindWithScores(): Return results with similarity scores.firstOrNull(): Take the closest match
maxResultCount of 10 is used to improve HNSW search quality. The top-10 candidates are retrieved, but only the best match is returned.2. Flat search (precise, slower)
- Retrieves all records from database
- Splits into 4 batches for parallel processing
- Computes exact cosine similarity for each embedding
- Returns the true nearest neighbor
Deleting embeddings
Remove all face records for a person:Similarity calculation
Both search methods use cosine similarity:Interpretation
Cosine similarity ranges from -1 to 1:| Range | Meaning |
|---|---|
| 0.8 - 1.0 | Very similar (likely same person) |
| 0.5 - 0.8 | Similar (possibly same person) |
| 0.3 - 0.5 | Moderately similar (threshold region) |
| 0.0 - 0.3 | Different people |
| < 0.0 | Opposite directions (very different) |
HNSW algorithm
HNSW (Hierarchical Navigable Small World) is an approximate nearest-neighbor algorithm.How HNSW works
-
Build phase (during insertion):
- Construct a multi-layer graph
- Each layer is a navigable small-world network
- Higher layers are sparser, lower layers are denser
- New vector is connected to nearby neighbors
-
Search phase (during query):
- Start at top layer
- Greedily navigate towards the query vector
- Descend to lower layers for refinement
- Return k-nearest neighbors from bottom layer
Complexity
- Insertion: O(log N) with N being the number of vectors
- Search: O(log N) on average
- Space: O(N × M) where M is average connections per node
Tradeoffs
Advantages:- Sublinear search time (much faster than linear)
- Good recall (finds true neighbors most of the time)
- Scalable to millions of vectors
- Approximate (may miss true nearest neighbor)
- Uses extra memory for graph structure
- Slower insertion than flat storage
Search strategy comparison
HNSW search (default)
| Aspect | Performance |
|---|---|
| Search time | 5-20ms for 1000 faces |
| Accuracy | ~95% recall (may miss true neighbor) |
| Scalability | Excellent (handles 100k+ vectors) |
| Best for | Real-time recognition, large databases |
Flat search (precise)
| Aspect | Performance |
|---|---|
| Search time | 50-200ms for 1000 faces (parallelized) |
| Accuracy | 100% recall (always finds true neighbor) |
| Scalability | Poor (linear with database size) |
| Best for | High-accuracy requirements, small databases |
Configuration
Enable flat search inFaceDetectionOverlay.kt:
Parallelization
Flat search parallelizes computation across 4 coroutines:- Thread 1: Records 0-249
- Thread 2: Records 250-499
- Thread 3: Records 500-749
- Thread 4: Records 750-999
Precision refinement
After HNSW search, the app re-computes exact similarity:- ObjectBox uses lossy compression on embeddings
- Stored embeddings are not exact
- Distance returned by HNSW is approximate
- Re-computing ensures threshold comparison is accurate
This two-stage approach (fast HNSW search + precise similarity check) balances speed and accuracy.
Person database
A separatePersonRecord entity tracks person metadata:
- One
PersonRecordhas manyFaceImageRecords personIDlinks the entities- Deleting a person cascades to delete all their face embeddings
Database management
Storage location
ObjectBox stores data in the app’s private directory:File structure
data.mdb: Main database filelock.mdb: Lock file for concurrent accessobjectbox.db: Metadata and schema
Database size
Storage requirements:- Each
FaceImageRecord: ~2.5 KB (512 floats + metadata + index overhead) - 100 faces: ~250 KB
- 1000 faces: ~2.5 MB
- 10000 faces: ~25 MB
Clearing database
To reset the database (clears all enrolled faces):- Uninstall and reinstall the app, or
- Clear app data in Android settings
Performance optimization
Indexing strategy
ObjectBox automatically maintains indices:@IndexonpersonID: Fast lookups by person@HnswIndexonfaceEmbedding: Fast similarity search@IdonrecordID: Fast direct access
Query optimization
UsemaxResultCount for better HNSW quality:
Batch operations
When enrolling multiple images:Error handling
The database layer doesn’t throw exceptions but returns nullable results:null to handle “no match found” scenarios: