Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/viet2811/uk-travel-recommendation/llms.txt

Use this file to discover all available pages before exploring further.

Before the UK Travel Recommendation app can suggest any destinations, the database must be populated with attraction records — including names, geographic coordinates, categories, and pre-computed vector embeddings. These embeddings are what power the personalised recommendation engine: when a user swipes on attractions, their preferences are compared against stored vectors to surface similar places they are likely to enjoy. The load_attractions Django management command handles the entire import process, joining two source files, computing a final normalised embedding for each record, and bulk-inserting everything into the pgvector-enabled PostgreSQL database.

Overview

Attraction data is not generated at runtime — it is sourced from two pre-prepared files that must be present in the backend/ directory before you run the import command. Once loaded, the Attraction table contains every record needed for the recommendation engine to operate. Without this data, API calls to recommendation endpoints will return empty results. The import workflow is:
  1. Ensure the database is running and migrations have been applied
  2. Place the two data files in backend/
  3. Execute python manage.py load_attractions inside the backend container
  4. Confirm the “DB Load Complete!” message in the terminal output

Required Data Files

Two files must both be present in the backend/ directory (the directory that is mounted as /app inside the container) before running the command.

uk_attractions_main_valid_locations.csv

Format: CSV, semicolon-delimited (;), UTF-8 encodedContains the human-readable metadata for each attraction. Key columns:
ColumnDescription
idUnique identifier — used as the join key with the pickle file
nameAttraction name
parentTypeLabelBroad category (e.g. Museum)
typeLabelSpecific sub-type label
coordinatesWKT POINT(longitude latitude) string
wikipediaWikipedia page URL
summaryText description used for embedding
imageRaw image field — dropped before the join
image_pathPath or URL to the attraction image (from the pickle join)
countyUK county
regionUK region
countryCountry within the UK (England, Scotland, Wales, etc.)

uk_attractions_vectors.pkl

Format: Pandas DataFrame serialised as a Python pickle (.pkl)Contains pre-computed vector embeddings indexed by the same id column. Columns:
ColumnDimensionsDescription
labelMHE9-dimMulti-hot encoded category label vector
labelVectors384-dimSemantic embedding of the attraction’s type labels
summaryVectors384-dimSemantic embedding of the attraction’s text summary
nameAttraction name (dropped before join to avoid duplicates)
image_pathImage path (dropped before join; taken from CSV instead)
Both files must be placed inside the backend/ directory on your host machine. Because backend/ is mounted as a volume into the container at /app, files you add there become immediately visible to the running container without a rebuild.

Running the Command

1

Copy the data files to the backend directory

Place both data files in the backend/ folder of the project:
cp /path/to/uk_attractions_main_valid_locations.csv backend/
cp /path/to/uk_attractions_vectors.pkl backend/
Verify the files are present:
ls backend/uk_attractions_*
# backend/uk_attractions_main_valid_locations.csv
# backend/uk_attractions_vectors.pkl
2

Ensure the backend container is running

If the stack is not already up, start it:
docker compose up -d
Confirm the backend container is healthy:
docker ps --filter name=backend_csproj
The STATUS column should show Up (not Exited or Restarting).
3

Run the load_attractions command

Execute the management command inside the running container:
docker exec backend_csproj python manage.py load_attractions
The command prints a progress dot (.) for each batch of 1 000 records inserted. A typical run for a large dataset might look like:
..........
DB Load Complete!
4

Confirm the import succeeded

You can verify the data was loaded by checking the record count directly in the database:
docker exec postgres_vector psql -U postgres -d uktravel \
  -c "SELECT COUNT(*) FROM recommendations_attraction;"
Or via the Django shell inside the backend container:
docker exec -it backend_csproj python manage.py shell -c \
  "from recommendations.models import Attraction; print(Attraction.objects.count())"

What the Command Does

Understanding the internals of load_attractions helps you diagnose data issues and adapt the command if your source files change.
The command reads both files into Pandas DataFrames:
df = pd.read_csv("uk_attractions_main_valid_locations.csv", sep=";", encoding="utf-8")
vectors_df = pd.read_pickle("uk_attractions_vectors.pkl")
The two DataFrames are joined on the id column. The image column is dropped from the CSV DataFrame before the join, and the name and image_path columns are dropped from the vectors DataFrame to avoid duplicates:
df2 = df.drop(columns="image").join(vectors_df.drop(columns=['name', 'image_path']), on="id")
Each row’s coordinates field contains a Well-Known Text (WKT) string in the format POINT(longitude latitude). The command parses this with the shapely library:
location = wkt.loads(row['coordinates'])
latitude = location.y
longtitude = location.x
These are then stored as separate numeric fields on the Attraction model.
The three pre-computed vectors from the pickle file are concatenated and normalised into a single finalVector that the recommendation engine uses for cosine-similarity lookups:
finalVector = normalize(row['labelMHE'], row['labelVectors'], row['summaryVectors']).tolist()
The normalize function concatenates the three arrays (9 + 384 + 384 = 777 dimensions) and applies L2 normalisation so that all stored vectors sit on the unit sphere, making cosine similarity equivalent to a dot product.
Attraction objects are accumulated in a list and written to the database with Attraction.objects.bulk_create() in batches of 1 000 rows. This is significantly faster than inserting one row at a time and avoids holding the entire dataset in memory as a single transaction.
Attraction.objects.bulk_create(objects_to_create, ignore_conflicts=True)

Re-Loading Data

The load_attractions command is designed to be idempotent. Every time it runs, it deletes all existing attraction records before inserting the new ones:
Attraction.objects.all().delete()
This means you can safely re-run the command as many times as needed — for example, after updating the source CSV or regenerating embeddings — without ending up with duplicate records. The full dataset is always replaced atomically.
If you want to test a small subset of attractions locally, you can truncate either source file to a few hundred rows before running the command. The join-on-id logic will silently skip any id values that appear in only one of the two files.

Prerequisites Reminder

Database migrations must be applied before running load_attractions. The command inserts directly into the recommendations_attraction table, which is created by migration 0002_initial. If you have not yet run migrations, execute:
docker exec backend_csproj python manage.py migrate
See the Docker Setup page for the full migration walkthrough.
Running load_attractions permanently deletes all existing attraction records before importing from the source files. Do not run this command with incomplete or corrupted data files — doing so will leave the database empty and break the recommendation endpoints until you re-run the command with valid files.

Build docs developers (and LLMs) love