This page introduces the TB1 data analysis project developed by Grupo 5 for course 1ACC0216 at UPC (Universidad Peruana de Ciencias Aplicadas). The project explores a real-world hotel bookings dataset using R and ggplot2, following a structured data science workflow from raw ingestion through preprocessing and visual analysis.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/frxxxnz/1ACC0216-TB1-2026-1/llms.txt
Use this file to discover all available pages before exploring further.
Project objectives
Load and inspect the hotel bookings data
Read the raw CSV into R, examine variable types and distributions with
str() and summary(), and count duplicate records before any transformation.Clean and transform variables
Remove duplicate rows, cast categorical columns (
hotel, meal, is_canceled, arrival_date_month) to factors, and convert reservation_status_date to a proper Date type.Treat missing values and outliers
Impute missing values in
children using the column median, then apply Winsorization at the 95th percentile to the Average Daily Rate (adr) to reduce the distortion caused by extreme values.Tools used
The analysis relies entirely on R for data manipulation and statistical operations, and ggplot2 for all visualizations. No external Python or SQL tooling is used — all processing runs inside a single R session.R scripts
The project contains two R scripts in thecode/ directory:
| Script | Purpose |
|---|---|
upc-grupo5-tb1.R | Main analysis script. Handles data loading, cleaning, transformation, outlier treatment, and all eight visualizations. |
upc-grupo5-validacion.R | Supplementary validation script included in the project submission. |
All analysis steps in
upc-grupo5-tb1.R are numbered to match the TB1 report sections (3.1 through 3.5), making it straightforward to cross-reference code with the written deliverable.Explore the project
Dataset structure
Review the 32 variables in the hotel bookings dataset, grouped by category, with notes on raw and cleaned file versions.
Analysis workflow
Walk through each step of the analysis: data loading, inspection, transformation, and preprocessing.
Visualizations
See the eight ggplot2 charts produced by the analysis and the patterns each one highlights.
Key findings
Read the main insights on seasonality, cancellation drivers, and ADR distribution across hotel types.