← Back to Research/Projects

Introduction

Every March, millions of Americans enter March Madness squares pools with nothing but hope — and that’s entirely by design. Unlike challenges that are a race to formulate the perfect bracket, squares are assigned randomly, meaning there is no strategy, no edge, and no skill that can tip the odds in your favor. You get what you get.

And while you have no say in which square you receive, you can know whether the square you’ve been handed has been a historical winner, or a longshot.

This report analyzes every Men’s NCAA Tournament final score from 1985 to 2025 (less 2020), identifying which final digit combinations have appeared most frequently and statistically exploring why certain squares are anything but a Cinderella Story.

The Data

Source

The dataset used for this analysis was found via KStreet13’s public GitHub Repository, which is freely accessible and well-documented. Sports reference data of this nature is increasingly easy to find in the public domain — a testament to the growing culture of open-source data sharing.

Sidenote on Data Collection: This dataset was compiled through web scraping, a technique used to programmatically extract data from websites. Readers interested in the methodology behind web scraping can explore KStreet13’s repository directly, where the scraping engine is fully visible (under the ‘code’ folder) and well worth a look.

Set-Up

This analysis was conducted through the use of R scripts embedded into this markdown file you are currently reading, leveraging the tidyverse collection of packages for data manipulation and visualization. The dataset was loaded directly from a local CSV file (mentioned above).

library(tidyverse)
games <- read.csv("Game_Results.csv")

Structure

To get a feel for this data, let’s take a look at the structure examining both the first and last few rows of the dataset using head() and tail() functions.

##      Seed1      Team1 Score1 Seed2         Team2 Score2 Round Year
## 1        1 Georgetown     68    16        Lehigh     43    64 1985
## 2        8     Temple     60     9 Virginia Tech     57    64 1985
## 3        5        SMU     85    12  Old Dominion     68    64 1985
## 4      ...        ...    ...   ...           ...    ...   ...  ...
## 5      ...        ...    ...   ...           ...    ...   ...  ...
## 6      ...        ...    ...   ...           ...    ...   ...  ...
## 2584     1    Florida     79     1        Auburn     73     4 2025
## 2585     1    Houston     70     1          Duke     67     4 2025
## 2586     1    Florida     65     1       Houston     63     2 2025

One aspect of this dataset that greatly simplified the analysis is that the winning team is consistently recorded under Team1, Seed1, and Score1 — meaning the data was already structured in a winner-first format. This removed the need to manually sort or identify winners, allowing the analysis to move forward both cleanly and efficiently.

That being said, let’s cover what each column indicates:

Seed1, Team1, Score1: Winner’s seed, school name, and final score
Seed2, Team2, Score2: Womp, womp, womp.
Round: Tournament round (64 –> Round 1, … , 4 –> Final 4)
Year: Season year

Takeaway: We have the final buzzer results of all 2,586 contests played in the Men’s NCAA Tournament since the field expanded to 64 teams in 1985.

At a Glance

Let’s get our feet wet with a classic box-and-whisker plot, comparing winning team final scores, to their opponents (losing teams).

Fig. 1: Distribution of Winning vs. Losing Scores (1985 - 2025)

At a macro level, this makes perfect sense; winning teams score higher across the board than their opponents. But, if you look carefully, we have an edge case.

Turn your attention to the bottom of Fig. 1 above and you will see two plotted points at ~0 — these are not missed free throws, but rather a 2021 First Round matchup in which 10-seed VCU was forced to forfeit their contest against 7-seed Oregon due to a COVID-related withdrawal. The game was ruled a no contest and recorded as a 1-0 result in the dataset.

As this was ruled a no contest and does not reflect a true game result, this observation was removed prior to analysis. Shown below:

games <- games %>% filter(!(Score1 == 1 & Score2 == 0))


Extracting the Last Digit

In a squares pool, winners are determined by the last digit of each team’s final score — not the score itself. To isolate this, we use the modulo operator (%%), which returns the remainder after dividing by 10.

A final score of 83 becomes 3, a final score of 76 becomes 6, and so on.

We’ll add each team’s digits as columns to the games dataframe with the code below:

games <- games %>%
  mutate(Digit1 = Score1 %% 10, Digit2 = Score2 %% 10)


And to carefully select 2 games, and see the digit columns in action, read carefully below:

##                 Team1 Score1    Team2 Score2 Digit1 Digit2 Year
## 1             Indiana     73 Kentucky     67      3      7 2016
## 2 Fairleigh Dickinson     63   Purdue     58      3      8 2023


Now that we know what we have, and got rid of a pesky edge case, let’s go dancing

The Squares

Frequency Table

The code below creates an “invisible” frequency table — this is the engine running quietly under the hood. Each row represents a unique pairing of the winning team’s last digit (Digit1) and the losing team’s last digit (Digit2), alongside the number of times that exact combination has occurred.

While the table alone is difficult to digest at 100 rows (not shown), it lays the groundwork for the visualization that follows — where the patterns become immediately clear.

squares <- games %>%
  count(Digit1, Digit2) %>%
  complete(Digit1 = 0:9, Digit2 = 0:9, fill = list(n = 0))

The Heatmap

With our digit combinations counted behind the scenes, we can now visualize all 100 possible squares at once. Each cell represents a unique (Winner, Loser) digit pairing — the darker the red, the more often that square has appeared as a final score combination in the Men’s NCAA Tournament.

Fig. 2: Most Common March Madness Final Score Squares (1985 - 2025)


The results are in — and they are anything but random.

The two hottest squares spanning across forty years of The Big Dance are (2,0) and (4,1), each appearing 45 times across 2,585 games. Close behind sits (2,8) at 43. If you are sitting on one of these squares, history is on your side.

On the other end of the spectrum, (7,7) has appeared just 12 times — the coldest square on the board. The bottom-right corner of the grid tells a similar story, with (9,9) sitting at a bleak 15. If that is your square, may the odds be ever in your favor (or ask for a refund).



The Math Behind the Squares

The heatmap told us what — but let’s see if the math can explain why.

Remember

We are only working with 2,585 games in this section, not 2,586 — VCU forgot to wear their masks on the plane back in 2021.

What Scores Should We Expect?

We begin by calculating the expected final score for both winning and losing teams across all 2,585 games.



\[E(\text{Winning Score}) = \frac{1}{n} \sum_{i=1}^{n} \text{Score1}_i\]

## [1] 76.94507

\[E(\text{Losing Score}) = \frac{1}{n} \sum_{i=1}^{n} \text{Score2}_i\]

## [1] 65.13153

The Expected Value of the Margin states that the average point difference between winning and losing teams can be expressed as:

\[E(\text{Margin}) = E(\text{Winning Score}) - E(\text{Losing Score})\]

## [1] 11.81354

This reveals that on average, we can expect the winning team of an NCAA Tournament game to win by approximately 11.8 points. Rounding to 12, and returning to Fig. 2, we notice that many of the hottest squares share a digit difference of 2 between the winning and losing digit — a direct reflection of that average margin playing out in the final score.

Where Do Scores Actually Land?

Although the expected value calculations can be labeled conclusive with well over 2,000 contests to measure, we can also standardize winning margins, because in a squares pool, only the second digit matters. Thus, we perform the same modulo operation (Margin %% 10).

Now, we standardize the margins of every game’s final score: no matter if the margin was 10+, 20+, or even 30+, after all …

The last digit of the scores is what we are measuring


Fig. 3: Distribution of Winning Margins (1985 - 2025)

Figure 3 drives the point home. The last digit of the winning margin — not the margin itself — is what determines your square. Winning by 9 and winning by 19 are identical in a squares pool. Winning by 10, 20, or 30 all collapse into the same 0 digit.

With that in mind, the chart tells a clear story. Margin digits of 2 and 3 are the most common, meaning the winning team’s score most frequently ends 2 or 3 digits higher than the loser’s — exactly what we see heating up the heatmap in Fig. 2. Meanwhile, margin digit 0 is the least common — blowouts divisible by 10 are rare, and close games ending in exact multiples of 10 are rarer still.

The math and the heatmap are telling the same story. Now you have the receipts.


How Much Better is the Best?

If we’ve concluded that some squares are ‘better’, to what degree is that so?

To keep this simple, we’ll take max_square / min_square which finds the ratio between the most frequent square, and least.

[1] 3.75

The best square (2,0) has appeared nearly 4 times more often than the worst square (7,7).

Holding a (2,0), you can expect to see your square hit roughly once every tournament. Holding a (7,7)? You might be waiting three to four years just to see it show up once (just like IU fans).

Key Takeaways

  • Squares are not random. While your assignment is, the outcomes are not. Forty years of data proves that certain digit combinations appear far more frequently than others.

  • Basketball scoring is the reason. The nature of the game — scores clustering in the 60s-80s, margins typically between 5-20 points — naturally favors certain last digit combinations over others.

  • Margins compound the effect. Winning by 2 and winning by 12 produce the same square. This means the distribution of margin digits, not raw margins, is what truly drives the board — and digits 2 and 3 dominate.

  • The best square is nearly 4x better than the worst. (2,0) has appeared 3.75 times more often than (7,7) across 2,585 games. That is not luck. That is structure.

  • You still can’t choose your square. But now you know exactly what you are holding — and that is more than everyone else at the table.


Full project code and data available on GitHub

← Back to Research/Projects