Training Models

This guide walks through the full training workflow: formatting your data, splitting it into train/test sets, fitting both a GLM and a neural network, and evaluating accuracy with a confusion matrix.

Dataset format

Sumo Oracle expects a CSV file where each row represents one historical match. The columns must be numeric. The result column encodes the outcome as 1 (left wrestler wins) or 0 (right wrestler wins). Leave result blank only for matches you want to predict — those rows are treated as undecided and excluded from training automatically.

weight1,weight2,wins1,wins2,age1,age2,height1,height2,result
401,379,6,5,35,31,190,184,1
344,370,8,11,36,34,189,192,1
351,245,3,2,22,27,189,169,0
335,291,3,12,32,30,177,176,0

The column names are flexible — you can add or remove features freely. The model formula result ~ . automatically uses every column except result as a predictor, so no formula changes are needed when you add columns.

All columns are coerced to numeric by sumo.read(). Categorical values will be converted by R’s as.numeric(), which may not produce meaningful encodings. Encode any categorical features manually before adding them to the CSV.

Loading data

Use sumo.read() to load the CSV. It reads the file, converts every column to numeric, and casts result to a logical vector so the GLM family works correctly.

sumo.read <- function(csv) {
  data <- tibble(read.csv(csv)) %>%
    mutate_all(as.numeric) %>%
    mutate(result = as.logical(result))
  return(data)
}

data <- sumo.read('sumo.csv')

After loading, separate decided matches (the training history) from undecided ones:

history <- na.omit(data)
undecided <- filter(data, is.na(result))

Splitting into train and test sets

data.split() takes a data frame and a ratio, then randomly partitions the rows into a training set and an evaluation set.

data.split <- function(data, ratio) {
  n <- round(nrow(data) * ratio)
  train <- sample(1:nrow(data), n)
  test <- setdiff(1:nrow(data), train)
  return(list(data[train,], data[test,]))
}

A ratio of 0.85 allocates 85% of rows to training and 15% to evaluation:

spl <- data.split(history, 0.85)
tr <- spl[[1]]
ev <- spl[[2]]

With small datasets (under ~100 rows), use a higher ratio like 0.9 to ensure the model sees enough examples. With larger datasets you can lower the ratio to 0.7–0.8 to get a more reliable evaluation set.

Normalization

The normalize() function scales values to the [0, 1] range using min-max normalization. This is required before training the neural network — neuralnet’s gradient-based optimization is sensitive to feature scale.

normalize <- function(x) {
  return( (x - min(x)) / (max(x) - min(x)) )
}

Apply it to the training split when passing data to neuralnet(). The GLM does not require normalization.

Only normalize the training set when passing it to neuralnet(). Pass the raw (unnormalized) evaluation set to predict() — the network uses the training distribution internally and does not expect pre-normalized inputs at prediction time.

Training the GLM

Fit a binomial GLM (logistic regression) on the training split:

bin <- glm(result ~ ., tr, family = 'binomial')

result ~ . means: predict result using all other columns.
family = 'binomial' specifies logistic regression, which is appropriate for a binary outcome.

The GLM is fast to train, interpretable, and works well even with limited data. It is a good baseline before adding a neural network.

Training the neural network

nn <- neuralnet(result ~ ., normalize(tr), hidden = c(4, 2),
                linear.output = FALSE, act.fct = 'logistic')

hidden = c(4, 2) defines two hidden layers with 4 and 2 neurons respectively.
linear.output = FALSE tells the network to apply the activation function to the output layer, which is necessary for classification.
act.fct = 'logistic' uses the sigmoid activation throughout, producing outputs in (0, 1).

Evaluating accuracy

After training, generate predictions on the evaluation set and convert them to logical values using the same thresholds used during prediction:

bin.ans <- predict.glm(bin, ev)
nn.ans <- predict(nn, ev)

results <- tibble(bin.ans) %>% cbind(nn.ans) %>% cbind(ev$result)

results$bin.ans = sapply(results$bin.ans,
                         function(x) {if (x > 0) {
                           TRUE
                           } else {
                             FALSE}})

results$nn.ans = sapply(results$nn.ans,
                         function(x) {if (x >= 0.5) {
                           TRUE
                           } else {
                             FALSE}})

colnames(results) <- c('glm', 'nn', 'truth')

confusionMatrix(as.factor(results$glm), as.factor(results$truth))
confusionMatrix(as.factor(results$nn), as.factor(results$truth))

confusionMatrix() from the caret package reports accuracy, sensitivity, specificity, and other metrics for each model.

Complete training script

The full training and evaluation workflow from sumo.Rmd:

library(dplyr)
library(MASS)
library(ggplot2)
library(neuralnet)
library(caret)

sumo.read <- function(csv) {
  data <- tibble(read.csv(csv)) %>%
    mutate_all(as.numeric) %>%
    mutate(result = as.logical(result))
  return(data)
}

data.split <- function(data, ratio) {
  n <- round(nrow(data) * ratio)
  train <- sample(1:nrow(data), n)
  test <- setdiff(1:nrow(data), train)
  return(list(data[train,], data[test,]))
}

normalize <- function(x) {
  return( (x - min(x)) / (max(x) - min(x)) )
}

data <- sumo.read('sumo.csv')
history <- na.omit(data)
spl <- data.split(history, 0.85)
tr <- spl[[1]]
ev <- spl[[2]]

bin <- glm(result ~ ., tr, family = 'binomial')

nn <- neuralnet(result ~ ., normalize(tr), hidden = c(4, 2),
                linear.output = FALSE, act.fct = 'logistic')

bin.ans <- predict.glm(bin, ev)
nn.ans <- predict(nn, ev)

results <- tibble(bin.ans) %>% cbind(nn.ans) %>% cbind(ev$result)

results$bin.ans = sapply(results$bin.ans,
                         function(x) {if (x > 0) {
                           TRUE
                           } else {
                             FALSE}})

results$nn.ans = sapply(results$nn.ans,
                         function(x) {if (x >= 0.5) {
                           TRUE
                           } else {
                             FALSE}})

colnames(results) <- c('glm', 'nn', 'truth')

confusionMatrix(as.factor(results$glm), as.factor(results$truth))
confusionMatrix(as.factor(results$nn), as.factor(results$truth))

plot(bin)
plot(nn)

GLM vs neural network

Consideration	GLM	Neural network
Dataset size	Works well with small datasets	Benefits from more data
Interpretability	Coefficients are directly readable	Black box
Training speed	Near-instant	Slower (gradient descent)
Non-linear patterns	Cannot capture them	Can capture them
Overfitting risk	Low	Higher — tune hidden layers carefully

For most sumo datasets with under a few hundred rows, the GLM is the more reliable choice. The neural network may outperform it when you have several hundred or more historical matches and the relationships between features are non-linear.

Get Started

Concepts

Guides

Reference

Dataset format

Loading data

Splitting into train and test sets

Normalization

Training the GLM

Training the neural network

Evaluating accuracy

Complete training script

GLM vs neural network

Build docs developers (and LLMs) love

Get Started

Concepts

Guides

Reference

​Dataset format

​Loading data

​Splitting into train and test sets

​Normalization

​Training the GLM

​Training the neural network

​Evaluating accuracy

​Complete training script

​GLM vs neural network

Build docs developers (and LLMs) love

Dataset format

Loading data

Splitting into train and test sets

Normalization

Training the GLM

Training the neural network

Evaluating accuracy

Complete training script

GLM vs neural network