What problem it solves
Sumo match outcomes depend on a complex interplay of physical attributes and experience. Sumo Oracle formalises this as a binary classification problem. You supply a CSV of past matches with known outcomes, add one or more rows with a blankresult field for the matches you want to forecast, and the tool does the rest.
The two models
Sumo Oracle trains and runs two models in parallel.Bayesian GLM
The generalized linear model uses logistic regression via R’sglm() with family = 'binomial'. It is fit on the full history of labeled matches and produces a real-valued log-odds score for each prediction.
- Accuracy: approximately 70% on held-out data
- Output threshold: scores above
0→"GLM says left.", scores at or below0→"GLM says right."
Neural network
The neural network is fit using R’sneuralnet package with 8 hidden units. It is trained on the same historical data as the GLM and produces a probability between 0 and 1.
- Output threshold: probability ≥
0.5→"NN says left.", probability <0.5→"NN says right." - Useful as a secondary signal and for comparing model agreement
Input features
Each row insumo.csv describes a single match between two wrestlers. The eight input columns are:
| Column | Description |
|---|---|
weight1 | Weight of the left wrestler (lbs) |
weight2 | Weight of the right wrestler (lbs) |
wins1 | Number of wins for the left wrestler |
wins2 | Number of wins for the right wrestler |
age1 | Age of the left wrestler (years) |
age2 | Age of the right wrestler (years) |
height1 | Height of the left wrestler (cm) |
height2 | Height of the right wrestler (cm) |
result column is cast to logical (TRUE/FALSE) internally.
Prediction output
Theresult column encodes the match outcome as a binary integer:
1— the left wrestler wins0— the right wrestler wins
result are used as training data. Rows where result is blank (NA) are treated as undecided matches and passed to both models for prediction.
Running pred_sumo.R prints one line per model:
The dataset
sumo.csv is a plain CSV file with 135 historical matches. A sample of the first few rows:
result empty:
License
Sumo Oracle is released under the GNU General Public License v3.0 (GPL-3.0). You are free to use, modify, and distribute the code under the terms of that license.Where to go next
Quickstart
Install the required R packages and run your first prediction end to end.
Data format
Learn the full CSV schema, how training and prediction rows differ, and how to prepare your own data.
Models
Understand the GLM and neural network architectures, training process, and accuracy evaluation.
Functions
Reference documentation for
sumo.read(), data.split(), and normalize().