Raw CSV data arrives with every column typed as either numeric or character. For the hotel bookings dataset several columns carry discrete categorical meaning and one carries a calendar date. Storing them with the wrong type causes functions likeDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/frxxxnz/1ACC0216-TB1-2026-1/llms.txt
Use this file to discover all available pages before exploring further.
table(), ggplot2 aesthetics, and date arithmetic to behave incorrectly or fail. This section deduplicates the data frame and re-types those columns before any visualisation or modelling takes place.
Full transformation code
upc-grupo5-tb1.R
Why each conversion is necessary
unique(df) — Returns a copy of the data frame with duplicate rows removed. This must come first so that the type conversions operate on the cleaned row set. The result is written back to df.
as.factor(df$hotel) — hotel holds two values ("City Hotel" and "Resort Hotel"). Converting it to a factor lets bar charts and grouped summaries treat it as two discrete groups with stable levels rather than unsorted character strings.
as.factor(df$arrival_date_month) — Month names like "January" and "July" are text. As a factor, plot axes and frequency tables can respect the natural calendar order once level ordering is applied, and cross-tabulation functions produce correct marginal counts.
as.factor(df$meal) — Meal-plan codes (BB, HB, FB, SC) are a closed set of categories. Factor encoding prevents them from being treated as continuous or ordinal values in regression models and makes group comparisons straightforward.
as.factor(df$is_canceled) — Although this column stores 0 and 1, it encodes a binary outcome, not a quantity. Converting it to a factor stops numeric functions (e.g., mean()) from treating cancellation as a number and ensures classifier models receive the correct input type.
as.Date(df$reservation_status_date) — Dates stored as character strings ("2015-07-01") cannot be used in date arithmetic, time-series plots, or range filters. as.Date() parses the ISO 8601 strings into R’s Date class, enabling operations like computing the number of days between events.
Order of operations
Convert categorical columns to factors
Apply
as.factor() to hotel, arrival_date_month, meal, and is_canceled in any order. Each call is independent.Parse the date column
Apply
as.Date() to reservation_status_date. Place this last as a reminder that it uses a different conversion function from the factor calls above.