Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/frxxxnz/1ACC0216-TB1-2026-1/llms.txt

Use this file to discover all available pages before exploring further.

This page introduces the TB1 data analysis project developed by Grupo 5 for course 1ACC0216 at UPC (Universidad Peruana de Ciencias Aplicadas). The project explores a real-world hotel bookings dataset using R and ggplot2, following a structured data science workflow from raw ingestion through preprocessing and visual analysis.

Project objectives

1

Load and inspect the hotel bookings data

Read the raw CSV into R, examine variable types and distributions with str() and summary(), and count duplicate records before any transformation.
2

Clean and transform variables

Remove duplicate rows, cast categorical columns (hotel, meal, is_canceled, arrival_date_month) to factors, and convert reservation_status_date to a proper Date type.
3

Treat missing values and outliers

Impute missing values in children using the column median, then apply Winsorization at the 95th percentile to the Average Daily Rate (adr) to reduce the distortion caused by extreme values.
4

Produce and interpret visualizations

Generate eight ggplot2 charts that reveal booking volumes by hotel type, seasonal demand trends, cancellation rates, guest composition, parking demand, and the relationship between lead time and cancellations.

Tools used

The analysis relies entirely on R for data manipulation and statistical operations, and ggplot2 for all visualizations. No external Python or SQL tooling is used — all processing runs inside a single R session.
library(ggplot2)

R scripts

The project contains two R scripts in the code/ directory:
ScriptPurpose
upc-grupo5-tb1.RMain analysis script. Handles data loading, cleaning, transformation, outlier treatment, and all eight visualizations.
upc-grupo5-validacion.RSupplementary validation script included in the project submission.
All analysis steps in upc-grupo5-tb1.R are numbered to match the TB1 report sections (3.1 through 3.5), making it straightforward to cross-reference code with the written deliverable.

Explore the project

Dataset structure

Review the 32 variables in the hotel bookings dataset, grouped by category, with notes on raw and cleaned file versions.

Analysis workflow

Walk through each step of the analysis: data loading, inspection, transformation, and preprocessing.

Visualizations

See the eight ggplot2 charts produced by the analysis and the patterns each one highlights.

Key findings

Read the main insights on seasonality, cancellation drivers, and ADR distribution across hotel types.

Build docs developers (and LLMs) love