Sumo Oracle trains two independent classifiers on the same history dataset and compares their outputs. Both treat the problem as binary classification: predict whether wrestler 1 (left) or wrestler 2 (right) wins.
Comparison at a glance
| Property | GLM (Binomial) | Neural Network |
|---|
| Package | stats (base R) | neuralnet |
| Algorithm | Logistic regression | Feedforward neural network |
| Hidden layers | None | 1 layer (pred_sumo.R) or 2 (sumo.Rmd) |
| Output type | Log-odds score | Probability [0, 1] |
| “Left wins” rule | bin.ans > 0 | nn.ans >= 0.5 |
| Accuracy | ~70% | Lower than GLM |
| Interpretability | High (coefficients) | Low (black box) |
| Training speed | Fast | Slower |
GLM — Binomial logistic regression
The GLM is fit with glm() using family = 'binomial', which applies a logit link and models the log-odds of wrestler 1 winning. All eight feature columns are included via the formula shorthand result ~ ..
bin <- glm(result ~ ., history, family = 'binomial')
bin.ans <- predict.glm(bin, undecided)
if (bin.ans > 0) {
print('GLM says left.')
} else {print('GLM says right.')}
predict.glm() returns the log-odds of wrestler 1 winning (the linear predictor on the link scale, not a probability):
- Positive score → log-odds favour wrestler 1 → predict left wins.
- Negative score → log-odds favour wrestler 2 → predict right wins.
- Zero is the decision boundary; the further the value from zero, the more confident the model.
The GLM achieves around 70% accuracy on held-out data — higher than the neural network. For a lightweight, interpretable baseline it is the recommended choice.
Neural network
The neural network is built with the neuralnet package. Two configurations appear across the source files:
nn <- neuralnet(result ~ ., history, hidden = 8)
One hidden layer with 8 nodes.nn <- neuralnet(result ~ ., normalize(tr), hidden = c(4, 2),
linear.output = FALSE, act.fct = 'logistic')
Two hidden layers: 4 nodes then 2 nodes. Used during evaluation with a normalised training split.
Key options used in sumo.Rmd:
| Option | Value | Effect |
|---|
hidden | c(4, 2) | Two hidden layers with 4 and 2 nodes respectively |
linear.output | FALSE | Applies the activation function to the output node too, giving a [0,1] probability |
act.fct | 'logistic' | Logistic (sigmoid) activation for all hidden and output nodes |
Prediction
nn.ans <- predict(nn, undecided)
if (nn.ans >= 0.5) {
print('NN says left.')
} else {print('NN says right.')}
Because linear.output = FALSE with a logistic activation is used, predict() returns a value in [0, 1] that can be read directly as the probability that wrestler 1 wins:
- >= 0.5 → wrestler 1 favoured → left wins.
- < 0.5 → wrestler 2 favoured → right wins.
In pred_sumo.R the hidden = 8 model does not set linear.output = FALSE, so the output may not be bounded to [0, 1]. The threshold rule nn.ans >= 0.5 still applies, but interpret the raw score with care if it falls outside that range.