Dataset Format

Overview

All data is stored in a single flat CSV file (conventionally named sumo.csv). Each row represents one match between two wrestlers. Rows with a known outcome are used as training and evaluation data; rows with a blank result column are the undecided matches the models are asked to predict.

Column Schema

Column	Type	Description	Example
`weight1`	numeric	Weight of wrestler 1 (lbs)	`401`
`weight2`	numeric	Weight of wrestler 2 (lbs)	`379`
`wins1`	numeric	Number of wins for wrestler 1	`6`
`wins2`	numeric	Number of wins for wrestler 2	`5`
`age1`	numeric	Age of wrestler 1 (years)	`35`
`age2`	numeric	Age of wrestler 2 (years)	`31`
`height1`	numeric	Height of wrestler 1 (cm)	`190`
`height2`	numeric	Height of wrestler 2 (cm)	`184`
`result`	logical/numeric	Match outcome: `1` = wrestler 1 wins, `0` = wrestler 2 wins, blank = undecided	`1`

There are no additional columns. The result column is always the last (rightmost) column.

Example Data

The following is a representative excerpt from sumo.csv:

weight1,weight2,wins1,wins2,age1,age2,height1,height2,result
401,379,6,5,35,31,190,184,1
344,370,8,11,36,34,189,192,1
351,245,3,2,22,27,189,169,0
335,291,3,12,32,30,177,176,0
375,306,0,0,35,28,186,185,0
326,295,1,2,29,30,173,174,0
355,375,4,7,25,27,176,185,1
311,328,2,2,22,31,185,183,0
386,333,2,5,37,28,187,187,1
289,366,2,1,27,29,180,184,1

To represent an undecided match (one you want to predict), leave the result field blank:

weight1,weight2,wins1,wins2,age1,age2,height1,height2,result
370,355,4,3,28,26,183,180,

Encoding Rules

Decided matches

result must be either 1 or 0:

Value	Meaning
`1`	Wrestler 1 wins
`0`	Wrestler 2 wins

Undecided matches

Leave the result field empty (trailing comma, no value). read.csv() will import this as NA. After sumo.read() processes the file, these rows have result = NA and are selected with:

undecided <- filter(data, is.na(result))

A blank result field and a missing result field are equivalent in CSV — both are imported as NA by R’s read.csv().

Validation Rules

Before loading the file with sumo.read(), confirm the following:

All eight feature columns (weight1, weight2, wins1, wins2, age1, age2, height1, height2) must contain numeric values — no text, no empty cells.
result must be 1, 0, or blank. Any other value (e.g. "win", 2) will be coerced to NA by as.numeric() and the row will be treated as undecided.
The header row must be present and column names must match exactly (case-sensitive).
The file must use standard CSV encoding (comma-delimited, UTF-8 or ASCII).

If a feature column contains a non-numeric value, mutate_all(as.numeric) will silently coerce it to NA, which causes glm() and neuralnet() to drop that row or error. Inspect your data with summary(data) after loading to catch unexpected NA values.

How `sumo.read()` Processes the File

sumo.read() performs three sequential transformations:

sumo.read <- function(csv) {
  data <- tibble(read.csv(csv)) %>%   # 1. Read CSV into a tibble
    mutate_all(as.numeric) %>%         # 2. Coerce every column to numeric
    mutate(result = as.logical(result)) # 3. Cast result: 1→TRUE, 0→FALSE, NA→NA
  return(data)
}

Step	Operation	Effect on `result`
1	`read.csv()`	`"1"` → `1`, `""` → `NA` (character)
2	`mutate_all(as.numeric)`	`"1"` → `1.0`, blank already `NA`
3	`mutate(result = as.logical(result))`	`1` → `TRUE`, `0` → `FALSE`, `NA` → `NA`

After loading, the caller is expected to split the tibble into decided and undecided subsets:

data      <- sumo.read("sumo.csv")
history   <- na.omit(data)                  # decided matches  → train/evaluate models
undecided <- filter(data, is.na(result))    # undecided matches → run inference

Keep decided and undecided matches in the same file. The split is done in code, so you only need to maintain one CSV. Append a new row with a blank result whenever you want a new prediction.

Get Started

Concepts

Guides

Reference

Overview

Column Schema

Example Data

Encoding Rules

Decided matches

Undecided matches

Validation Rules

How `sumo.read()` Processes the File

Build docs developers (and LLMs) love

Get Started

Concepts

Guides

Reference

​Overview

​Column Schema

​Example Data

​Encoding Rules

​Decided matches

​Undecided matches

​Validation Rules

​How sumo.read() Processes the File

Build docs developers (and LLMs) love

Overview

Column Schema

Example Data

Encoding Rules

Decided matches

Undecided matches

Validation Rules

How `sumo.read()` Processes the File