Marking matches as undecided
To predict a match, add a row to your CSV with all feature columns filled in butresult left blank:
result value. When sumo.read() loads the file and coerces columns to numeric, a blank becomes NA. The workflow uses this to separate undecided matches from the training history.
You can have multiple undecided rows in the CSV. Each one will receive a prediction. However,
pred_sumo.R as written assumes a single undecided match — the if statements operate on a scalar. For multiple predictions, iterate over the rows of nn.ans and bin.ans.The prediction workflow
Load the CSV
sumo.read() reads the file, converts all columns to numeric, and casts result to logical. Blank result values become NA.Separate history from undecided matches
na.omit() drops any row with an NA, leaving only completed matches for training. filter() keeps only rows where result is NA — those are your predictions.Train both models on full history
Train on all historical data (not a split) so the models use every available match before predicting:
The prediction script (
pred_sumo.R) uses hidden = 8 (a single hidden layer of 8 neurons) and does not normalize or set act.fct. The training guide’s more complete setup (hidden = c(4, 2), linear.output = FALSE, act.fct = 'logistic', normalization) is recommended for better-calibrated outputs.Interpreting GLM output
predict.glm() returns a log-odds value (the linear predictor, not a probability):
- Positive value → the model favors the left wrestler. The console prints
"GLM says left." - Negative value → the model favors the right wrestler. The console prints
"GLM says right." - A value close to zero means the models considers the match a near toss-up.
p = 1 / (1 + exp(-bin.ans)).
Interpreting neural network output
predict() on a neuralnet object returns the network’s output node value, which is in the range (0, 1) when act.fct = 'logistic' and linear.output = FALSE are set:
- >= 0.5 → the model favors the left wrestler. The console prints
"NN says left." - < 0.5 → the model favors the right wrestler. The console prints
"NN says right." - Values near
0.5indicate low confidence.
Complete prediction script
The fullpred_sumo.R with inline explanation:
When the two models disagree
The GLM and neural network are trained on the same data but make different assumptions about the relationships between features. Disagreement is normal, especially when the match is closely contested. Some approaches when the models give conflicting predictions:- Trust the GLM when you have limited training data (under ~100 matches). The GLM is less likely to overfit and tends to generalize better in low-data regimes.
- Trust the neural network when you have a larger dataset and have verified that the NN accuracy exceeded GLM accuracy on the evaluation set (see the Training guide).
- Treat the match as a toss-up when both models produce values close to their respective thresholds (near 0 for GLM, near 0.5 for NN). In that case, neither model has strong signal.
- Re-evaluate features if the models consistently disagree — it may indicate that the available features are not sufficient to distinguish the outcome.