The hotel bookings dataset contains reservation records from two property types — a City Hotel and a Resort Hotel — spanning multiple years. Each row represents one booking and captures information about the guest, the reservation parameters, the assigned room, financial details, and the final reservation status. The dataset is publicly available and widely used in hospitality analytics research.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/frxxxnz/1ACC0216-TB1-2026-1/llms.txt
Use this file to discover all available pages before exploring further.
Dataset files
Two versions of the dataset are used in this project:| File | Description |
|---|---|
data/hotel_bookings.csv | Raw source file loaded directly into R at the start of the analysis. Contains all original records including duplicates and missing values. |
data/hotel_bookings_cleaned.csv | Preprocessed version saved after cleaning steps. Used for validation and as the basis for all visualizations. |
Three preprocessing steps transform the raw file into the cleaned version: deduplication (removing exact duplicate rows with
unique()), median imputation for missing values in the children column, and Winsorization of the adr variable at the 95th percentile to cap extreme outlier values without discarding records.Variable reference
The dataset contains 32 columns. The table below lists each variable and its meaning.Booking information
Booking information
These variables describe the reservation itself — when it was made, how it was modified, and its final outcome.
| Variable | Description |
|---|---|
is_canceled | Whether the booking was canceled (1) or honored (0). Cast to factor during analysis. |
lead_time | Number of days between the booking date and the arrival date. |
booking_changes | Number of changes made to the reservation before arrival or cancellation. |
days_in_waiting_list | Days the booking spent on a waiting list before confirmation. |
deposit_type | Type of deposit made: No Deposit, Non Refund, or Refundable. |
reservation_status | Final status: Check-Out, Canceled, or No-Show. |
reservation_status_date | Date of the last reservation status update. Cast to Date during analysis. |
agent | ID of the travel agency that made the booking (NULL if direct). |
company | ID of the company that made or guaranteed the booking (NULL if individual). |
Hotel and arrival information
Hotel and arrival information
These variables identify the property and describe when the guest was expected to arrive.
| Variable | Description |
|---|---|
hotel | Property type: Resort Hotel or City Hotel. Cast to factor during analysis. |
arrival_date_year | Year of the scheduled arrival date. |
arrival_date_month | Month of the scheduled arrival date (character, cast to factor). |
arrival_date_week_number | ISO week number of the scheduled arrival. |
arrival_date_day_of_month | Day of the month of the scheduled arrival. |
stays_in_weekend_nights | Number of weekend nights (Saturday and Sunday) included in the stay. |
stays_in_week_nights | Number of weekday nights (Monday through Friday) included in the stay. |
reserved_room_type | Room type code originally reserved by the guest. |
assigned_room_type | Room type code actually assigned at check-in (may differ from reserved). |
market_segment | Market segment through which the booking arrived (e.g., Online TA, Direct). |
distribution_channel | Booking distribution channel (e.g., Direct, Corporate, TA/TO). |
Guest information
Guest information
These variables describe the guests and their history with the property.
| Variable | Description |
|---|---|
adults | Number of adults in the reservation. |
children | Number of children. Contains missing values; imputed with the column median during preprocessing. |
babies | Number of babies in the reservation. |
meal | Meal plan booked: BB (Bed & Breakfast), HB (Half Board), FB (Full Board), SC (Self-Catering). Cast to factor. |
country | Country of origin of the guest (ISO 3155-3-1 alpha-3 code). |
customer_type | Guest category: Transient, Contract, Group, or Transient-Party. |
is_repeated_guest | Whether the guest has previously stayed at the property (1 = yes, 0 = no). |
previous_cancellations | Number of prior reservations the guest canceled before the current booking. |
previous_bookings_not_canceled | Number of prior reservations the guest completed without canceling. |
required_car_parking_spaces | Number of parking spaces requested by the guest. |
total_of_special_requests | Total number of special requests made (e.g., high floor, twin beds). |
Financial information
Financial information
These variables capture pricing and revenue-related data.
| Variable | Description |
|---|---|
adr | Average Daily Rate — mean cost per night for the reservation, in euros. Winsorized at the 95th percentile to handle outliers. |
Column header (raw CSV)
The raw CSV file opens with the following header row, confirming the 32-column structure:Analysis workflow
See how the dataset is loaded, inspected, and transformed step by step in R.
Visualizations
Explore the eight ggplot2 charts derived from these variables.