Function Reference

Helper Functions

These three functions are defined in sumo.Rmd (and sumo.read is also in pred_sumo.R). They handle data loading, train/test splitting, and feature normalization.

`sumo.read(csv)`

Reads the sumo match CSV file and returns a clean tibble ready for modelling. Every column is coerced to numeric first, then the result column is cast to logical so that TRUE represents a win for wrestler 1 and FALSE represents a win for wrestler 2. Signature

sumo.read <- function(csv)

Parameters

csv

character

required

Path to the CSV file containing match data (e.g. "sumo.csv").

Returns A tibble where all columns are numeric except result, which is logical (TRUE/FALSE). Rows with a blank result field in the raw CSV are imported as NA and represent undecided matches. Implementation

sumo.read <- function(csv) {
  data <- tibble(read.csv(csv)) %>%
    mutate_all(as.numeric) %>%
    mutate(result = as.logical(result))
  return(data)
}

Example

data <- sumo.read("sumo.csv")

# Separate decided matches from undecided ones
history   <- na.omit(data)            # rows where result is TRUE or FALSE
undecided <- filter(data, is.na(result))  # rows where result is NA

mutate_all(as.numeric) is applied before as.logical, so the result column passes through the numeric coercion step (becoming 1 or 0 or NA) and is then converted to TRUE/FALSE/NA by as.logical.

`data.split(data, ratio)`

Randomly partitions a tibble into a training set and an evaluation set. Row indices for the training set are drawn with sample(), and the remaining indices are found with setdiff(). Signature

data.split <- function(data, ratio)

Parameters

data

tibble

required

The full dataset to split — typically history (decided matches only).

ratio

numeric

required

Proportion of rows to allocate to the training set. Must be between 0 and 1 (exclusive). For example, 0.85 reserves 85 % of rows for training and 15 % for evaluation.

Returns A named list of two tibbles:

Index	Contents
`[[1]]`	Training set — `round(nrow(data) * ratio)` rows sampled at random
`[[2]]`	Test / evaluation set — the remaining rows

Implementation

data.split <- function(data, ratio) {
  n     <- round(nrow(data) * ratio)
  train <- sample(1:nrow(data), n)
  test  <- setdiff(1:nrow(data), train)
  return(list(data[train,], data[test,]))
}

Example

data <- sumo.read("sumo.csv")
history <- na.omit(data)

spl <- data.split(history, 0.85)
tr  <- spl[[1]]   # training set  (~85 % of rows)
ev  <- spl[[2]]   # evaluation set (~15 % of rows)

sample() is called without a fixed set.seed(), so results differ on every run. Set a seed before calling data.split() if you need reproducible splits.

set.seed(42)
spl <- data.split(history, 0.85)

`normalize(x)`

Applies min-max normalization to a numeric vector, scaling all values to the interval [0, 1]. This is required before training a neuralnet model because neural networks are sensitive to feature scale. Signature

normalize <- function(x)

Parameters

numeric vector

required

A numeric vector (or column) to be scaled. Typically an entire training-set tibble passed through normalize() column-wise.

Returns A numeric vector of the same length as x with all values rescaled to [0, 1] using the formula:

(x - min(x)) / (max(x) - min(x))

Implementation

normalize <- function(x) {
  return( (x - min(x)) / (max(x) - min(x)) )
}

Example

# Apply normalize() to the training tibble before fitting the neural network
nn <- neuralnet(result ~ ., normalize(tr), hidden = c(4, 2),
                linear.output = FALSE, act.fct = "logistic")

Apply normalize() only to the training data. When you call predict() on the evaluation set you should pass the raw (un-normalized) ev tibble, as shown in the testing workflow in sumo.Rmd.

Model-Fitting Functions

These standard R and package functions are used directly in pred_sumo.R and sumo.Rmd.

`glm()` — Binomial Logistic Regression

Fits a generalized linear model with a binomial family link to predict match outcomes. Provided by base R’s stats package. Usage in this project

# Full-data prediction (pred_sumo.R)
bin <- glm(result ~ ., history, family = "binomial")

# Train-set evaluation (sumo.Rmd)
bin <- glm(result ~ ., tr, family = "binomial")

Argument	Value	Meaning
`formula`	`result ~ .`	Predict `result` from all other columns
`data`	`history` / `tr`	Decided-match rows only
`family`	`"binomial"`	Logistic regression

`neuralnet()` — Neural Network

Fits a feed-forward neural network. Provided by the neuralnet package. Usage in this project

# Single hidden layer, 8 units (pred_sumo.R)
nn <- neuralnet(result ~ ., history, hidden = 8)

# Two hidden layers, normalized training data (sumo.Rmd)
nn <- neuralnet(result ~ ., normalize(tr), hidden = c(4, 2),
                linear.output = FALSE, act.fct = "logistic")

Argument	Value	Meaning
`formula`	`result ~ .`	Predict `result` from all other columns
`data`	`history` / `normalize(tr)`	Training data (normalize before passing)
`hidden`	`8` or `c(4, 2)`	Hidden-layer sizes
`linear.output`	`FALSE`	Use activation function on output node
`act.fct`	`"logistic"`	Logistic (sigmoid) activation

Inference Functions

`predict.glm()` — GLM Predictions

Generates log-odds predictions from a fitted glm object. Positive values indicate wrestler 1 wins; negative values indicate wrestler 2 wins. Usage in this project

bin.ans <- predict.glm(bin, undecided)

if (bin.ans > 0) {
  print("GLM says left.")
} else {
  print("GLM says right.")
}

predict.glm() returns log-odds by default (i.e. the linear predictor). A value greater than 0 corresponds to a predicted probability above 0.5 for wrestler 1 winning.

`predict()` — Neural Network Predictions

Generates output-layer predictions from a fitted neuralnet object. Values are in [0, 1]; a value ≥ 0.5 is interpreted as a win for wrestler 1. Usage in this project

nn.ans <- predict(nn, undecided)

if (nn.ans >= 0.5) {
  print("NN says left.")
} else {
  print("NN says right.")
}

Evaluation Functions

`confusionMatrix()` — Classification Metrics

Computes a confusion matrix plus accuracy, sensitivity, specificity, and other metrics for a set of predicted vs. actual class labels. Provided by the caret package. Usage in this project

# Convert numeric predictions to logical class labels first
results$bin.ans <- sapply(results$bin.ans, function(x) { x > 0 })
results$nn.ans  <- sapply(results$nn.ans,  function(x) { x >= 0.5 })

confusionMatrix(as.factor(results$glm),  as.factor(results$truth))
confusionMatrix(as.factor(results$nn),   as.factor(results$truth))

Both predicted and reference vectors must be converted to factor before passing to confusionMatrix(). Full evaluation workflow (from sumo.Rmd)

data <- sumo.read("sumo.csv")
spl  <- data.split(history, 0.85)
tr   <- spl[[1]]
ev   <- spl[[2]]

bin <- glm(result ~ ., tr, family = "binomial")
nn  <- neuralnet(result ~ ., normalize(tr), hidden = c(4, 2),
                 linear.output = FALSE, act.fct = "logistic")

bin.ans <- predict.glm(bin, ev)
nn.ans  <- predict(nn, ev)

results <- tibble(bin.ans) %>% cbind(nn.ans) %>% cbind(ev$result)

results$bin.ans <- sapply(results$bin.ans,
                          function(x) { if (x > 0) TRUE else FALSE })
results$nn.ans  <- sapply(results$nn.ans,
                          function(x) { if (x >= 0.5) TRUE else FALSE })

colnames(results) <- c("glm", "nn", "truth")

confusionMatrix(as.factor(results$glm), as.factor(results$truth))
confusionMatrix(as.factor(results$nn),  as.factor(results$truth))

Get Started

Concepts

Guides

Reference

Helper Functions

`sumo.read(csv)`

`data.split(data, ratio)`

`normalize(x)`

Model-Fitting Functions

`glm()` — Binomial Logistic Regression

`neuralnet()` — Neural Network

Inference Functions

`predict.glm()` — GLM Predictions

`predict()` — Neural Network Predictions

Evaluation Functions

`confusionMatrix()` — Classification Metrics

Build docs developers (and LLMs) love

Get Started

Concepts

Guides

Reference

​Helper Functions

​sumo.read(csv)

​data.split(data, ratio)

​normalize(x)

​Model-Fitting Functions

​glm() — Binomial Logistic Regression

​neuralnet() — Neural Network

​Inference Functions

​predict.glm() — GLM Predictions

​predict() — Neural Network Predictions

​Evaluation Functions

​confusionMatrix() — Classification Metrics

Build docs developers (and LLMs) love

Helper Functions

`sumo.read(csv)`

`data.split(data, ratio)`

`normalize(x)`

Model-Fitting Functions

`glm()` — Binomial Logistic Regression

`neuralnet()` — Neural Network

Inference Functions

`predict.glm()` — GLM Predictions

`predict()` — Neural Network Predictions

Evaluation Functions

`confusionMatrix()` — Classification Metrics