Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/frxxxnz/1ACC0216-TB1-2026-1/llms.txt

Use this file to discover all available pages before exploring further.

The hotel bookings dataset contains reservation records from two property types — a City Hotel and a Resort Hotel — spanning multiple years. Each row represents one booking and captures information about the guest, the reservation parameters, the assigned room, financial details, and the final reservation status. The dataset is publicly available and widely used in hospitality analytics research.

Dataset files

Two versions of the dataset are used in this project:
FileDescription
data/hotel_bookings.csvRaw source file loaded directly into R at the start of the analysis. Contains all original records including duplicates and missing values.
data/hotel_bookings_cleaned.csvPreprocessed version saved after cleaning steps. Used for validation and as the basis for all visualizations.
Three preprocessing steps transform the raw file into the cleaned version: deduplication (removing exact duplicate rows with unique()), median imputation for missing values in the children column, and Winsorization of the adr variable at the 95th percentile to cap extreme outlier values without discarding records.

Variable reference

The dataset contains 32 columns. The table below lists each variable and its meaning.
These variables describe the reservation itself — when it was made, how it was modified, and its final outcome.
VariableDescription
is_canceledWhether the booking was canceled (1) or honored (0). Cast to factor during analysis.
lead_timeNumber of days between the booking date and the arrival date.
booking_changesNumber of changes made to the reservation before arrival or cancellation.
days_in_waiting_listDays the booking spent on a waiting list before confirmation.
deposit_typeType of deposit made: No Deposit, Non Refund, or Refundable.
reservation_statusFinal status: Check-Out, Canceled, or No-Show.
reservation_status_dateDate of the last reservation status update. Cast to Date during analysis.
agentID of the travel agency that made the booking (NULL if direct).
companyID of the company that made or guaranteed the booking (NULL if individual).
These variables identify the property and describe when the guest was expected to arrive.
VariableDescription
hotelProperty type: Resort Hotel or City Hotel. Cast to factor during analysis.
arrival_date_yearYear of the scheduled arrival date.
arrival_date_monthMonth of the scheduled arrival date (character, cast to factor).
arrival_date_week_numberISO week number of the scheduled arrival.
arrival_date_day_of_monthDay of the month of the scheduled arrival.
stays_in_weekend_nightsNumber of weekend nights (Saturday and Sunday) included in the stay.
stays_in_week_nightsNumber of weekday nights (Monday through Friday) included in the stay.
reserved_room_typeRoom type code originally reserved by the guest.
assigned_room_typeRoom type code actually assigned at check-in (may differ from reserved).
market_segmentMarket segment through which the booking arrived (e.g., Online TA, Direct).
distribution_channelBooking distribution channel (e.g., Direct, Corporate, TA/TO).
These variables describe the guests and their history with the property.
VariableDescription
adultsNumber of adults in the reservation.
childrenNumber of children. Contains missing values; imputed with the column median during preprocessing.
babiesNumber of babies in the reservation.
mealMeal plan booked: BB (Bed & Breakfast), HB (Half Board), FB (Full Board), SC (Self-Catering). Cast to factor.
countryCountry of origin of the guest (ISO 3155-3-1 alpha-3 code).
customer_typeGuest category: Transient, Contract, Group, or Transient-Party.
is_repeated_guestWhether the guest has previously stayed at the property (1 = yes, 0 = no).
previous_cancellationsNumber of prior reservations the guest canceled before the current booking.
previous_bookings_not_canceledNumber of prior reservations the guest completed without canceling.
required_car_parking_spacesNumber of parking spaces requested by the guest.
total_of_special_requestsTotal number of special requests made (e.g., high floor, twin beds).
These variables capture pricing and revenue-related data.
VariableDescription
adrAverage Daily Rate — mean cost per night for the reservation, in euros. Winsorized at the 95th percentile to handle outliers.

Column header (raw CSV)

The raw CSV file opens with the following header row, confirming the 32-column structure:
hotel, is_canceled, lead_time, arrival_date_year, arrival_date_month,
arrival_date_week_number, arrival_date_day_of_month, stays_in_weekend_nights,
stays_in_week_nights, adults, children, babies, meal, country,
market_segment, distribution_channel, is_repeated_guest,
previous_cancellations, previous_bookings_not_canceled, reserved_room_type,
assigned_room_type, booking_changes, deposit_type, agent, company,
days_in_waiting_list, customer_type, adr, required_car_parking_spaces,
total_of_special_requests, reservation_status, reservation_status_date
To load the dataset in R and immediately inspect its structure, use the two-line pattern from upc-grupo5-tb1.R:
df <- read.csv("hotel_bookings.csv", header = TRUE, stringsAsFactors = FALSE)
str(df)

Analysis workflow

See how the dataset is loaded, inspected, and transformed step by step in R.

Visualizations

Explore the eight ggplot2 charts derived from these variables.

Build docs developers (and LLMs) love