This documentation covers the TB1 data analysis project developed by Grupo 5 for course 1ACC0216 at UPC. The project applies R and ggplot2 to explore a hotel bookings dataset, uncovering patterns in cancellations, seasonal demand, guest composition, and pricing outliers.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/frxxxnz/1ACC0216-TB1-2026-1/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
Project overview, objectives, and team context for the hotel bookings analysis.
Dataset
Explore the hotel_bookings dataset — its structure, variables, and source.
Analysis Workflow
Step-by-step walkthrough of data loading, cleaning, transformation, and preprocessing.
Visualizations
Eight ggplot2 charts revealing booking trends, cancellations, and guest behavior.
Key Findings
Main insights from the analysis: seasonality, cancellation drivers, and ADR patterns.
Conclusions
Summary of findings and implications for hotel revenue and operations management.
How the analysis works
Load and inspect the data
Read
hotel_bookings.csv into R, inspect structure with str() and summary(), and identify duplicates.Clean and transform
Remove duplicates, cast variables to factors and dates, and impute missing values in
children using the median.Treat outliers
Apply Winsorization at the 95th percentile to the Average Daily Rate (
adr) variable to reduce the impact of extreme values.All analysis code is in
code/upc-grupo5-tb1.R. The cleaned dataset is saved as data/hotel_bookings_cleaned.csv.