Every March, millions of Americans enter March Madness squares pools with nothing but hope — and that’s entirely by design. Unlike challenges that are a race to formulate the perfect bracket, squares are assigned randomly, meaning there is no strategy, no edge, and no skill that can tip the odds in your favor. You get what you get.
And while you have no say in which square you receive, you can know whether the square you’ve been handed has been a historical winner, or a longshot.
This report analyzes every Men’s NCAA Tournament final score from 1985 to 2025 (less 2020), identifying which final digit combinations have appeared most frequently and statistically exploring why certain squares are anything but a Cinderella Story.
The dataset used for this analysis was found via KStreet13’s public GitHub Repository, which is freely accessible and well-documented. Sports reference data of this nature is increasingly easy to find in the public domain — a testament to the growing culture of open-source data sharing.
Sidenote on Data Collection: This dataset was compiled through web scraping, a technique used to programmatically extract data from websites. Readers interested in the methodology behind web scraping can explore KStreet13’s repository directly, where the scraping engine is fully visible (under the ‘code’ folder) and well worth a look.
This analysis was conducted through the use of R scripts embedded
into this markdown file you are currently reading, leveraging the
tidyverse collection of packages for data manipulation and
visualization. The dataset was loaded directly from a local CSV file
(mentioned above).
library(tidyverse)
games <- read.csv("Game_Results.csv")
To get a feel for this data, let’s take a look at the structure
examining both the first and last few rows of the dataset using
head() and tail() functions.
## Seed1 Team1 Score1 Seed2 Team2 Score2 Round Year
## 1 1 Georgetown 68 16 Lehigh 43 64 1985
## 2 8 Temple 60 9 Virginia Tech 57 64 1985
## 3 5 SMU 85 12 Old Dominion 68 64 1985
## 4 ... ... ... ... ... ... ... ...
## 5 ... ... ... ... ... ... ... ...
## 6 ... ... ... ... ... ... ... ...
## 2584 1 Florida 79 1 Auburn 73 4 2025
## 2585 1 Houston 70 1 Duke 67 4 2025
## 2586 1 Florida 65 1 Houston 63 2 2025
One aspect of this dataset that greatly simplified the analysis is
that the winning team is consistently recorded under Team1,
Seed1, and Score1 — meaning the data was
already structured in a winner-first format. This removed the need to
manually sort or identify winners, allowing the analysis to move forward
both cleanly and efficiently.
That being said, let’s cover what each column indicates:
Seed1, Team1, Score1:
Winner’s seed, school name, and final score
Seed2,
Team2, Score2: Womp, womp, womp.
Round: Tournament round (64 –> Round 1, … , 4 –>
Final 4)
Year: Season year
Takeaway: We have the final buzzer results of all 2,586 contests played in the Men’s NCAA Tournament since the field expanded to 64 teams in 1985.
Let’s get our feet wet with a classic box-and-whisker plot, comparing winning team final scores, to their opponents (losing teams).
Fig. 1: Distribution of Winning vs. Losing Scores (1985 - 2025)
At a macro level, this makes perfect sense; winning teams score higher across the board than their opponents. But, if you look carefully, we have an edge case.
Turn your attention to the bottom of Fig. 1 above and you will see two plotted points at ~0 — these are not missed free throws, but rather a 2021 First Round matchup in which 10-seed VCU was forced to forfeit their contest against 7-seed Oregon due to a COVID-related withdrawal. The game was ruled a no contest and recorded as a 1-0 result in the dataset.
As this was ruled a no contest and does not reflect a true game result, this observation was removed prior to analysis. Shown below:
games <- games %>% filter(!(Score1 == 1 & Score2 == 0))
In a squares pool, winners are determined by the last
digit of each team’s final score — not the score itself. To
isolate this, we use the modulo operator (%%), which
returns the remainder after dividing by 10.
A final score of 83 becomes 3, a final score of 76 becomes 6, and so on.
We’ll add each team’s digits as columns to the games
dataframe with the code below:
games <- games %>%
mutate(Digit1 = Score1 %% 10, Digit2 = Score2 %% 10)
And to carefully select 2 games, and see the digit columns in
action, read carefully below:
## Team1 Score1 Team2 Score2 Digit1 Digit2 Year
## 1 Indiana 73 Kentucky 67 3 7 2016
## 2 Fairleigh Dickinson 63 Purdue 58 3 8 2023
Now that we know what we have, and got rid of a pesky edge case,
let’s go dancing…
The code below creates an “invisible” frequency table — this is the
engine running quietly under the hood. Each row
represents a unique pairing of the winning team’s last digit
(Digit1) and the losing team’s last digit
(Digit2), alongside the number of times that exact
combination has occurred.
While the table alone is difficult to digest at 100 rows (not shown),
it lays the groundwork for the visualization that follows — where the
patterns become immediately clear.
squares <- games %>%
count(Digit1, Digit2) %>%
complete(Digit1 = 0:9, Digit2 = 0:9, fill = list(n = 0))
With our digit combinations counted behind the scenes, we can now
visualize all 100 possible squares at once. Each cell represents a
unique (Winner, Loser) digit pairing — the darker the
red, the more often that square has appeared as a final score
combination in the Men’s NCAA Tournament.
Fig. 2: Most Common March Madness Final Score Squares (1985 - 2025)
The results are in — and they are anything but random.
The two hottest squares spanning across forty years of The Big Dance are (2,0) and (4,1), each appearing 45 times across 2,585 games. Close behind sits (2,8) at 43. If you are sitting on one of these squares, history is on your side.
On the other end of the spectrum, (7,7) has appeared just 12 times — the coldest square on the board. The bottom-right corner of the grid tells a similar story, with (9,9) sitting at a bleak 15. If that is your square, may the odds be ever in your favor (or ask for a refund).
The heatmap told us what — but let’s see if the math can explain why.
Remember…
We are only working with 2,585 games in this section, not 2,586 —
VCU forgot to wear their masks on the plane back in 2021.
We begin by calculating the expected final score for both winning and losing teams across all 2,585 games.
\[E(\text{Winning Score}) = \frac{1}{n} \sum_{i=1}^{n} \text{Score1}_i\]
## [1] 76.94507
\[E(\text{Losing Score}) = \frac{1}{n} \sum_{i=1}^{n} \text{Score2}_i\]
## [1] 65.13153
The Expected Value of the Margin states that the average point difference between winning and losing teams can be expressed as:
\[E(\text{Margin}) = E(\text{Winning Score}) - E(\text{Losing Score})\]
## [1] 11.81354
This reveals that on average, we can expect the winning team of an NCAA Tournament game to win by approximately 11.8 points. Rounding to 12, and returning to Fig. 2, we notice that many of the hottest squares share a digit difference of 2 between the winning and losing digit — a direct reflection of that average margin playing out in the final score.
Although the expected value calculations can be labeled conclusive
with well over 2,000 contests to measure, we can also standardize
winning margins, because in a squares pool, only the second digit
matters. Thus, we perform the same modulo operation
(Margin %% 10).
Now, we standardize the margins of every game’s final score: no matter if the margin was 10+, 20+, or even 30+, after all …
The last digit of the scores is what we are measuring
Fig. 3: Distribution of Winning Margins (1985 - 2025)
Figure 3 drives the point home. The last digit of the winning margin — not the margin itself — is what determines your square. Winning by 9 and winning by 19 are identical in a squares pool. Winning by 10, 20, or 30 all collapse into the same 0 digit.
With that in mind, the chart tells a clear story. Margin digits of 2 and 3 are the most common, meaning the winning team’s score most frequently ends 2 or 3 digits higher than the loser’s — exactly what we see heating up the heatmap in Fig. 2. Meanwhile, margin digit 0 is the least common — blowouts divisible by 10 are rare, and close games ending in exact multiples of 10 are rarer still.
The math and the heatmap are telling the same story. Now you have the receipts.
If we’ve concluded that some squares are ‘better’, to what degree is that so?
To keep this simple, we’ll take max_square / min_square
which finds the ratio between the most frequent square, and least.
[1] 3.75
The best square (2,0) has appeared nearly 4 times more often than the worst square (7,7).
Holding a (2,0), you can expect to see your square hit roughly once every tournament. Holding a (7,7)? You might be waiting three to four years just to see it show up once (just like IU fans).
Squares are not random. While your assignment is, the outcomes are not. Forty years of data proves that certain digit combinations appear far more frequently than others.
Basketball scoring is the reason. The nature of the game — scores clustering in the 60s-80s, margins typically between 5-20 points — naturally favors certain last digit combinations over others.
Margins compound the effect. Winning by 2 and winning by 12 produce the same square. This means the distribution of margin digits, not raw margins, is what truly drives the board — and digits 2 and 3 dominate.
The best square is nearly 4x better than the worst. (2,0) has appeared 3.75 times more often than (7,7) across 2,585 games. That is not luck. That is structure.
You still can’t choose your square. But now you know exactly what you are holding — and that is more than everyone else at the table.
Full project code and data available on GitHub