Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/frxxxnz/1ACC0216-TB1-2026-1/llms.txt

Use this file to discover all available pages before exploring further.

Raw CSV data arrives with every column typed as either numeric or character. For the hotel bookings dataset several columns carry discrete categorical meaning and one carries a calendar date. Storing them with the wrong type causes functions like table(), ggplot2 aesthetics, and date arithmetic to behave incorrectly or fail. This section deduplicates the data frame and re-types those columns before any visualisation or modelling takes place.

Full transformation code

upc-grupo5-tb1.R
# Eliminar registros duplicados
df <- unique(df)

# Transformación de variables a Factores y Fechas
df$hotel                 <- as.factor(df$hotel)
df$arrival_date_month    <- as.factor(df$arrival_date_month)
df$meal                  <- as.factor(df$meal)
df$is_canceled           <- as.factor(df$is_canceled)
df$reservation_status_date <- as.Date(df$reservation_status_date)

Why each conversion is necessary

unique(df) — Returns a copy of the data frame with duplicate rows removed. This must come first so that the type conversions operate on the cleaned row set. The result is written back to df. as.factor(df$hotel)hotel holds two values ("City Hotel" and "Resort Hotel"). Converting it to a factor lets bar charts and grouped summaries treat it as two discrete groups with stable levels rather than unsorted character strings. as.factor(df$arrival_date_month) — Month names like "January" and "July" are text. As a factor, plot axes and frequency tables can respect the natural calendar order once level ordering is applied, and cross-tabulation functions produce correct marginal counts. as.factor(df$meal) — Meal-plan codes (BB, HB, FB, SC) are a closed set of categories. Factor encoding prevents them from being treated as continuous or ordinal values in regression models and makes group comparisons straightforward. as.factor(df$is_canceled) — Although this column stores 0 and 1, it encodes a binary outcome, not a quantity. Converting it to a factor stops numeric functions (e.g., mean()) from treating cancellation as a number and ensures classifier models receive the correct input type. as.Date(df$reservation_status_date) — Dates stored as character strings ("2015-07-01") cannot be used in date arithmetic, time-series plots, or range filters. as.Date() parses the ISO 8601 strings into R’s Date class, enabling operations like computing the number of days between events.

Order of operations

1

Deduplicate

Call unique(df) and reassign to df. All subsequent steps work on the clean row set.
2

Convert categorical columns to factors

Apply as.factor() to hotel, arrival_date_month, meal, and is_canceled in any order. Each call is independent.
3

Parse the date column

Apply as.Date() to reservation_status_date. Place this last as a reminder that it uses a different conversion function from the factor calls above.
4

Verify types

Re-run str(df) to confirm that the four factor columns show Factor and the date column shows Date in the output.

Build docs developers (and LLMs) love