Skip to main content
This guide covers the end-to-end workflow for predicting the outcome of upcoming matches: marking them in the CSV, loading the data, training on history, and interpreting the output from both models.

Marking matches as undecided

To predict a match, add a row to your CSV with all feature columns filled in but result left blank:
weight1,weight2,wins1,wins2,age1,age2,height1,height2,result
401,379,6,5,35,31,190,184,1
344,370,8,11,36,34,189,192,1
375,295,2,4,28,30,183,176,
The last row has no result value. When sumo.read() loads the file and coerces columns to numeric, a blank becomes NA. The workflow uses this to separate undecided matches from the training history.
You can have multiple undecided rows in the CSV. Each one will receive a prediction. However, pred_sumo.R as written assumes a single undecided match — the if statements operate on a scalar. For multiple predictions, iterate over the rows of nn.ans and bin.ans.

The prediction workflow

1

Load the CSV

sumo.read() reads the file, converts all columns to numeric, and casts result to logical. Blank result values become NA.
data <- sumo.read('sumo.csv')
2

Separate history from undecided matches

na.omit() drops any row with an NA, leaving only completed matches for training. filter() keeps only rows where result is NA — those are your predictions.
history <- na.omit(data)
undecided <- filter(data, is.na(result))
3

Train both models on full history

Train on all historical data (not a split) so the models use every available match before predicting:
nn <- neuralnet(result ~ ., history, hidden = 8)
bin <- glm(result ~ ., history, family = 'binomial')
The prediction script (pred_sumo.R) uses hidden = 8 (a single hidden layer of 8 neurons) and does not normalize or set act.fct. The training guide’s more complete setup (hidden = c(4, 2), linear.output = FALSE, act.fct = 'logistic', normalization) is recommended for better-calibrated outputs.
4

Generate predictions

bin.ans <- predict.glm(bin, undecided)
nn.ans <- predict(nn, undecided)
5

Interpret the output

if (bin.ans > 0) {
  print('GLM says left.')
} else {print('GLM says right.')}
bin.ans

if (nn.ans >= 0.5) {
  print('NN says left.')
} else {print('NN says right.')}
nn.ans

Interpreting GLM output

predict.glm() returns a log-odds value (the linear predictor, not a probability):
  • Positive value → the model favors the left wrestler. The console prints "GLM says left."
  • Negative value → the model favors the right wrestler. The console prints "GLM says right."
  • A value close to zero means the models considers the match a near toss-up.
To convert to a probability: p = 1 / (1 + exp(-bin.ans)).

Interpreting neural network output

predict() on a neuralnet object returns the network’s output node value, which is in the range (0, 1) when act.fct = 'logistic' and linear.output = FALSE are set:
  • >= 0.5 → the model favors the left wrestler. The console prints "NN says left."
  • < 0.5 → the model favors the right wrestler. The console prints "NN says right."
  • Values near 0.5 indicate low confidence.

Complete prediction script

The full pred_sumo.R with inline explanation:
library(dplyr)
library(MASS)
library(neuralnet)

# Load and type-coerce the CSV. Blank result values become NA.
sumo.read <- function(csv) {
  data <- tibble(read.csv(csv)) %>%
    mutate_all(as.numeric) %>%
    mutate(result = as.logical(result))
  return(data)
}

data <- sumo.read('sumo.csv')

# Rows with a known result are used to train the models.
history <- na.omit(data)

# Rows with no result are the matches to predict.
undecided <- filter(data, is.na(result))

# Train the neural network on all historical data.
nn <- neuralnet(result ~ ., history, hidden = 8)

# Train the logistic regression model on all historical data.
bin <- glm(result ~ ., history, family = 'binomial')

# GLM prediction: returns log-odds. Positive = left wins.
bin.ans <- predict.glm(bin, undecided)
nn.ans <- predict(nn, undecided)

if (bin.ans > 0) {
  print('GLM says left.')
} else {print('GLM says right.')}
bin.ans

# NN prediction: returns a value in (0, 1). >= 0.5 = left wins.
if (nn.ans >= 0.5) {
  print('NN says left.')
} else {print('NN says right.')}
nn.ans

When the two models disagree

The GLM and neural network are trained on the same data but make different assumptions about the relationships between features. Disagreement is normal, especially when the match is closely contested. Some approaches when the models give conflicting predictions:
  • Trust the GLM when you have limited training data (under ~100 matches). The GLM is less likely to overfit and tends to generalize better in low-data regimes.
  • Trust the neural network when you have a larger dataset and have verified that the NN accuracy exceeded GLM accuracy on the evaluation set (see the Training guide).
  • Treat the match as a toss-up when both models produce values close to their respective thresholds (near 0 for GLM, near 0.5 for NN). In that case, neither model has strong signal.
  • Re-evaluate features if the models consistently disagree — it may indicate that the available features are not sufficient to distinguish the outcome.
Run the full training and evaluation workflow from sumo.Rmd before predicting. Checking confusionMatrix output on a held-out evaluation set tells you which model has been more accurate on your specific dataset, giving you a principled basis for choosing between them when they disagree.

Build docs developers (and LLMs) love