Before the UK Travel Recommendation app can suggest any destinations, the database must be populated with attraction records — including names, geographic coordinates, categories, and pre-computed vector embeddings. These embeddings are what power the personalised recommendation engine: when a user swipes on attractions, their preferences are compared against stored vectors to surface similar places they are likely to enjoy. TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/viet2811/uk-travel-recommendation/llms.txt
Use this file to discover all available pages before exploring further.
load_attractions Django management command handles the entire import process, joining two source files, computing a final normalised embedding for each record, and bulk-inserting everything into the pgvector-enabled PostgreSQL database.
Overview
Attraction data is not generated at runtime — it is sourced from two pre-prepared files that must be present in thebackend/ directory before you run the import command. Once loaded, the Attraction table contains every record needed for the recommendation engine to operate. Without this data, API calls to recommendation endpoints will return empty results.
The import workflow is:
- Ensure the database is running and migrations have been applied
- Place the two data files in
backend/ - Execute
python manage.py load_attractionsinside the backend container - Confirm the “DB Load Complete!” message in the terminal output
Required Data Files
Two files must both be present in thebackend/ directory (the directory that is mounted as /app inside the container) before running the command.
uk_attractions_main_valid_locations.csv
Format: CSV, semicolon-delimited (
;), UTF-8 encodedContains the human-readable metadata for each attraction. Key columns:| Column | Description |
|---|---|
id | Unique identifier — used as the join key with the pickle file |
name | Attraction name |
parentTypeLabel | Broad category (e.g. Museum) |
typeLabel | Specific sub-type label |
coordinates | WKT POINT(longitude latitude) string |
wikipedia | Wikipedia page URL |
summary | Text description used for embedding |
image | Raw image field — dropped before the join |
image_path | Path or URL to the attraction image (from the pickle join) |
county | UK county |
region | UK region |
country | Country within the UK (England, Scotland, Wales, etc.) |
uk_attractions_vectors.pkl
Format: Pandas DataFrame serialised as a Python pickle (
.pkl)Contains pre-computed vector embeddings indexed by the same id column. Columns:| Column | Dimensions | Description |
|---|---|---|
labelMHE | 9-dim | Multi-hot encoded category label vector |
labelVectors | 384-dim | Semantic embedding of the attraction’s type labels |
summaryVectors | 384-dim | Semantic embedding of the attraction’s text summary |
name | — | Attraction name (dropped before join to avoid duplicates) |
image_path | — | Image path (dropped before join; taken from CSV instead) |
Running the Command
Copy the data files to the backend directory
Place both data files in the Verify the files are present:
backend/ folder of the project:Ensure the backend container is running
If the stack is not already up, start it:Confirm the backend container is healthy:The
STATUS column should show Up (not Exited or Restarting).Run the load_attractions command
Execute the management command inside the running container:The command prints a progress dot (
.) for each batch of 1 000 records inserted. A typical run for a large dataset might look like:What the Command Does
Understanding the internals ofload_attractions helps you diagnose data issues and adapt the command if your source files change.
Step 1 — Load and join source files
Step 1 — Load and join source files
The command reads both files into Pandas DataFrames:The two DataFrames are joined on the
id column. The image column is dropped from the CSV DataFrame before the join, and the name and image_path columns are dropped from the vectors DataFrame to avoid duplicates:Step 2 — Parse WKT coordinates
Step 2 — Parse WKT coordinates
Each row’s These are then stored as separate numeric fields on the
coordinates field contains a Well-Known Text (WKT) string in the format POINT(longitude latitude). The command parses this with the shapely library:Attraction model.Step 3 — Compute the final embedding vector
Step 3 — Compute the final embedding vector
The three pre-computed vectors from the pickle file are concatenated and normalised into a single The
finalVector that the recommendation engine uses for cosine-similarity lookups:normalize function concatenates the three arrays (9 + 384 + 384 = 777 dimensions) and applies L2 normalisation so that all stored vectors sit on the unit sphere, making cosine similarity equivalent to a dot product.Step 4 — Bulk insert in batches of 1 000
Step 4 — Bulk insert in batches of 1 000
Attraction objects are accumulated in a list and written to the database with
Attraction.objects.bulk_create() in batches of 1 000 rows. This is significantly faster than inserting one row at a time and avoids holding the entire dataset in memory as a single transaction.Re-Loading Data
Theload_attractions command is designed to be idempotent. Every time it runs, it deletes all existing attraction records before inserting the new ones:
Prerequisites Reminder
Database migrations must be applied before running See the Docker Setup page for the full migration walkthrough.
load_attractions. The command inserts directly into the recommendations_attraction table, which is created by migration 0002_initial. If you have not yet run migrations, execute: