Skip to main content
Sumo Oracle is an R-based machine learning tool that forecasts the outcome of sumo wrestling matches. By training on historical match data — wrestler weight, age, height, and win history — it applies two complementary models to make predictions: a Bayesian generalized linear model (GLM) and a neural network. The GLM achieves approximately 70% prediction accuracy, making it the stronger of the two models. Both models can be run side-by-side for comparison.

Introduction

Learn what Sumo Oracle does and how the two models work together.

Quickstart

Set up your R environment and run your first prediction in minutes.

Concepts

Understand the data format, model architecture, and evaluation approach.

Reference

Explore the full function reference for all helper utilities.

How it works

1

Prepare your data

Provide a CSV file with wrestler attributes for each match: weight, age, height, and number of wins. Mark completed matches with a result (0 or 1), and leave result blank for matches you want to predict.
2

Train the models

Sumo Oracle reads your CSV, separates historical matches from undecided ones, and fits both a logistic regression (GLM) and a neural network to the historical data.
3

Generate predictions

Run predict.glm() and predict() against the undecided matches. Each model outputs a directional prediction: the wrestler on the left wins, or the wrestler on the right wins.
4

Evaluate accuracy

Use the built-in train/test split and confusion matrix evaluation to measure how well each model performs on held-out data.
The GLM model consistently outperforms the neural network on this dataset. Start with GLM predictions and use the neural network as a secondary signal.

Build docs developers (and LLMs) love