Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/frxxxnz/1ACC0216-TB1-2026-1/llms.txt

Use this file to discover all available pages before exploring further.

The final two charts in section 3.5 focus on cancellation behavior. Chart 7 compares the proportion of canceled bookings between Resort Hotel and City Hotel using a stacked proportional bar chart. Chart 8 examines whether guests who cancel tend to have booked further in advance, using a boxplot of lead time grouped by cancellation status. Together these charts identify which hotel type faces greater cancellation risk and whether early booking is a signal of eventual cancellation. Lead time in the hotel industry is the number of days between the date a reservation is made and the scheduled arrival date. A high lead time means a guest booked far in advance; a low lead time means the booking was made close to arrival.

Chart 7 — Proporción de Cancelaciones por Hotel

Chart 7 uses position = "fill" inside geom_bar(), which rescales all bars to a common height of 1 (100%). Each bar is split by is_canceled (0 = kept, 1 = canceled), so the chart shows the proportion of bookings in each cancellation state rather than raw counts. This makes it straightforward to compare cancellation rates between Resort Hotel and City Hotel even if the two hotel types have very different total booking volumes.
upc-grupo5-tb1.R
# 7. Cancelaciones por tipo de hotel
ggplot(df, aes(x = hotel, fill = is_canceled)) +
  geom_bar(position = "fill") +
  labs(title = "Proporción de Cancelaciones por Hotel", y = "Proporción")
is_canceled was converted to a factor earlier in the script (df$is_canceled <- as.factor(df$is_canceled)). This ensures ggplot2 maps it to a discrete fill palette with a legend showing 0 and 1, rather than treating it as a continuous numeric variable.
Chart 8 maps is_canceled to the x-axis and lead_time to the y-axis, with fill = is_canceled coloring each box. A boxplot per cancellation group shows the median lead time, interquartile range, and outliers, making it easy to see whether canceled bookings were made substantially earlier than kept ones.
The original script contains a typo on the geom_boxplots() call — the function does not exist in ggplot2. The correct function name is geom_boxplot() (no trailing s). The corrected version is shown below.
upc-grupo5-tb1.R
# 8. PREGUNTA DEL EQUIPO: Lead Time vs Cancelaciones
ggplot(df, aes(x = is_canceled, y = lead_time, fill = is_canceled)) +
  geom_boxplot() +
  labs(title = "Antelación de Reserva vs Cancelación", x = "¿Cancelado?", y = "Días de Antelación")
If the boxplot is difficult to read because of many high-lead-time outliers, add outlier.alpha = 0.2 inside geom_boxplot() to reduce the visual weight of individual outlier points.

Build docs developers (and LLMs) love