Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/frxxxnz/1ACC0216-TB1-2026-1/llms.txt

Use this file to discover all available pages before exploring further.

Before modifying the dataset, it is important to understand what you are working with. This step uses three standard R functions to profile the data frame’s structure, statistical properties, and record uniqueness. Running these checks early surfaces problems — wrong column types, unexpected ranges, missing values, and duplicates — that would otherwise corrupt later analysis.

Inspection code

upc-grupo5-tb1.R
str(df)             # Estructura del dataset
summary(df)         # Resumen estadístico inicial
sum(duplicated(df)) # Identificación de duplicados

What each function reveals

str(df) — Prints a compact structural overview: the number of observations and variables, the name of each column, its storage type (chr, int, num, logi), and the first few values. This is the fastest way to confirm whether columns imported with the correct types. summary(df) — For numeric columns, prints the minimum, first quartile, median, mean, third quartile, and maximum. For character columns it prints the length and class. Missing values appear as NA's: n under the affected column, giving an immediate count without extra code. sum(duplicated(df))duplicated() returns a logical vector that is TRUE for every row that is an exact copy of an earlier row. Wrapping it in sum() converts those TRUE values to 1 and gives the total count of duplicate rows. A non-zero result means unique() must be applied before modelling.
When reading str() output, pay particular attention to columns that should be categorical — such as hotel, meal, and arrival_date_month — arriving as chr. These need to be converted to factors (see Transformations) so that statistical functions treat them as discrete groups rather than arbitrary strings.

Build docs developers (and LLMs) love