Prerequisites
- R 4.0 or later — download from cran.r-project.org
- Git (optional) — only needed if you want to clone the repository
RStudio is not required, but it makes running
.R and .Rmd files easier. You can run everything from the R console or a terminal.Install the required packages
Open an R console and install all five dependencies:
install.packages
| Package | Purpose |
|---|---|
dplyr | Data wrangling and tibble construction |
MASS | Bayesian GLM support via glm() with binomial |
neuralnet | Neural network training and prediction |
ggplot2 | Plotting model diagnostics (used in sumo.Rmd) |
caret | Confusion matrix and accuracy evaluation |
Download the project
Clone the repository or download the source directly.The project contains three key files:
pred_sumo.R— the main prediction scriptsumo.Rmd— the R Markdown notebook with testing and evaluationsumo.csv— historical match data (135 matches)
Prepare sumo.csv
Open Rows with a numeric
sumo.csv in any text editor. It contains one row per match with nine columns:result (0 or 1) are used as training data. To predict a new match, append a row with the eight wrestler attributes and leave result blank:result = 1means the left wrestler wonresult = 0means the right wrestler wonresultblank means this row will be predicted
Run pred_sumo.R
From your terminal, run the script with Or source it from an R console:
Rscript:Make sure your working directory is set to the folder containing both
pred_sumo.R and sumo.csv. The script loads the CSV with a relative path ('sumo.csv').Interpret the output
The script prints one prediction line per model, followed by the raw numeric score.Both models predict the left wrestler wins:Both models predict the right wrestler wins:How to read the scores:
- GLM score: a log-odds value. Positive → left wins. Negative → right wins. Larger absolute values indicate higher confidence.
- NN score: a probability between 0 and 1. ≥ 0.5 → left wins. < 0.5 → right wins. Values close to 0.5 indicate low confidence.
Complete working example
The full source ofpred_sumo.R, exactly as it appears in the repository:
pred_sumo.R
sumo.read()— reads the CSV, coerces all columns to numeric, then castsresultto logical so GLM receives a proper binary response variable.history— all rows whereresultis notNA; used to train both models.undecided— all rows whereresultisNA; the matches to predict.neuralnet(..., hidden = 8)— trains a single hidden-layer network with 8 units using the full feature set (result ~ .).glm(..., family = 'binomial')— fits a logistic regression model (Bayesian GLM) on the same data.- Prediction and thresholding — GLM uses a sign check on the log-odds; the neural network uses a 0.5 probability threshold.
Next steps
Data format
Full CSV schema reference, NA handling rules, and tips for building your own dataset.
Models
How the GLM and neural network are configured, trained, and evaluated with confusion matrices.