How Hard is This Puzzle Anyway?
by Roy Leban
Puzzle solvers love to talk about puzzles. We’re solving puzzles to challenge ourselves, so, of course, a frequent topic is how hard puzzles are and how well we do in solving them. In the case of the New York Times crossword, which features an escalating level of difficulty each week, it’s fairly common to hear people say things like “This puzzle is easy for a Monday” or “This puzzle is hard for a Thursday.” I wondered if it was possible to quantify those feelings, to actually know if a puzzle was easy or hard relative to expectations. So we set out to do just that.
As a result, we’ve added a Difficulty Index to the Puzzazz leaderboard page for every New York Times crossword going back to July 1st, 2015 and every future NYT crossword will have the Difficulty Index added to its leaderboard page automatically as soon as we have collected enough data to calculate one. Click here to see the Puzzazz leaderboard page for a recent New York Times crossword.
In this document, we use the term solve to refer to when a person works on a puzzle, whether or not they finish it and get the correct solution. We use the term complete to refer to when a person solves a puzzle, finishes it, and gets the correct solution.
Many people consider the New York Times crossword puzzle, which started in 1942 (see right), to be the “gold standard” of crosswords. “Best” is always subjective, but the Gray Lady’s puzzle’s longevity (74 years with only four editors) as well as its consistency has made it the standard-bearer of American-style crosswords for a long time. When the NYT Crossword leads, others follow.
One way the Times leads is with an escalating difficulty level of the puzzles throughout the week, with Monday puzzles being the easiest and Saturday puzzles being the hardest. Sunday puzzles are special and are about twice as large as a typical daily puzzle and aimed at a mid-week level. The practice dates back to the Times first editor, Margaret Farrar. According to Will Shortz, the current puzzle editor, Farrar “made Saturday’s crosswords a little harder than those for the other days. She figured, as many people didn’t have to work on Saturday, they’d have more time to solve. She referred to the Saturday Times crossword as a ‘two cups of coffee’ puzzle.”
Somewhere along the line, according to Shortz, the difficulty of the puzzles began to increase throughout the week. When Shortz became editor in 1993, he decided to steepen the slope of difficulty: “The Monday puzzles I edit,” he says, “are probably easier than they’d ever been before, and the Friday/Saturday puzzles tend to be harder.” Shortz’s Monday crossword became one that “anyone in America” could solve, though not necessarily quite as quickly as the best solvers.
Today, the vernacular of puzzle solvers includes phrases like “Monday puzzle” and “Sunday puzzle” to describe a certain level of difficulty, and, in the case of “Sunday puzzle,” a certain size.
The Analysis (and some interesting things we learned)
Of course, puzzle creation isn’t a hard science, and different people know different things, so it will never be the case that every “Monday puzzle” is the same level of difficulty for every person. We collected a data set representing hundreds of thousands of puzzle solves for a wide range of NYT crosswords. We have even more data on non-NYT puzzles, but, for now, we decided to limit our analysis to just the NYT corpus.
With that in mind, let’s look at this chart:
For each day of the week, there is an overall average (the black line). Rather than using times, we’ve calibrated the data with the difficulty index of an average Times “Monday puzzle” fixed at 1.0. A puzzle with a difficulty index of 2.0 would be expected to take twice as long to complete as one with an index of 1.0. This allows each person to think about the overall data relative to their own solving speed, and our analysis shows that this is pretty accurate for most people. So, for an average solver, a typical Sunday puzzle is a little more than four times as hard as a typical Monday puzzle. Better solvers generally have a shallower curve, while weaker solvers have a steeper curve. Averaging them all together, we get the curve shown. We’ve also calculated ranges for “Typical” difficulty puzzles (purple), plus “Easy” and “Hard” puzzles (pink), based on clusterings of completion times. The gray area represents puzzles which are “Very Easy” or “Very Hard”.
Fridays are easier than Thursdays. As expected, the difficulty does increase from Monday through Saturday. What we didn’t expect is that the average difficulty actually goes down slightly from Thursday to Friday. But there’s a logical explanation. Thursday puzzles are typically the tricky ones, which can have anything from a multi-letter rebus in the grid to backwards words. The fastest solvers aren’t thrown very much by these tricks (and Friday puzzles are, in fact, slightly harder for those people), but there can be a big jump in completion time for less experienced solvers. We can see this in the huge peak for Thursday puzzles in the chart, which represents more puzzles which are “Hard” or “Very Hard” for a Thursday puzzle (more than any other day of the week). These puzzles push the average up. Notice that the increased range for Thursday puzzles is only on the up side, which is related to, but not exactly the same thing as, more solvers having difficulty with the tricks.
Sunday is between a Wednesday and Thursday in difficulty. Will Shortz aims for a mid-week difficulty level for a Sunday puzzle, and he’s getting it, even though the graph makes it looks like that’s not happening. The Sunday puzzle is twice as big as a daily puzzle in terms of both letters to enter and number of clues. When this is taken into account, the difficulty level of a Sunday puzzle fits right between Wednesday and Thursday.
There aren’t many “Very Easy” puzzles. Except for Sunday, there’s a very narrow slice of gray below the “Easy” range. I think this is explained by consistent editing combined with a small group of test solvers for the puzzles. The fewer the test solvers, the more likely it is that some unforeseen knowledge gap or other issue among the general public will cause a puzzle to be harder than expected. In contrast, it would rare for extra knowledge to make a puzzle easier than expected. I think the greater number of “Very Easy” Sunday puzzles may be partially explained by the solvers, which is discussed more below.
There are more “Very Hard” puzzles earlier in the week. This seems counterintuitive. After all, the Monday puzzle is supposed to be solvable by “anyone in America”. But the Tuesday isn’t, and there’s a pretty steep curve going up to Wednesday. This is also related to who’s solving the puzzles and is discussed below.Sometimes, there’s a puzzle that throws the curve. In the process of analyzing the data, we discovered one puzzle that, all by itself, threw the curve. It’s Patrick’s Berry’s Sunday masterpiece of September 6th, 2015. Without giving anything away, it has a complex trick that took many solvers a long time to figure out. On average, it took solvers more than 14 times as long to complete that puzzle as it did for them to complete a typical Monday puzzle, a data point that would be twice as high as the top of this chart. In comparison, the runner-up hardest Sunday puzzles (and there was a cluster of them) took solvers just under five times as long as a Monday puzzle, so Patrick’s puzzle was three times harder than the next closest Sunday puzzle. Consistent with other puzzles, there were plenty of people who completed Patrick’s puzzle with a time in their expected range. But the slower solvers always push the average up, and, for this puzzle, they pushed it up to an extreme range. This single puzzle skewed the stats enough that we removed it from our analysis, and it is the only puzzle we currently rate as “Extremely Hard.”
Many people solve the Times puzzle every day, but some people solve a subset of days. We can learn some interesting things if we delve into this. As this chart shows, the number of New York Times crossword solvers drops off after the Monday puzzle.
The solvers. The top (blue) line shows the solve rate for each day of the week. These are the people who try to solve the puzzle, whether or not they complete it. We start with a baseline of 100% for Monday, because that’s the most solved puzzle. Fewer people solve on Tuesday than Monday, fewer still on Wednesday. There’s a slight pickup on Thursday and another dropoff until Sunday, which has almost as many solvers as Monday (97%).
It’s worth noting that the chart simplifies things slightly. Each day (except Sunday), there is a dropoff of solvers from the day before, but there are also new solvers added day which makes up for some of the people lost. The greatest churn is on Thursday, where the new solvers more than make up for the people lost from Wednesday. That’s a surprising find, but it’s not completely a shock. Thursday is known for being the first “hard” day of the week (and we’ve confirmed that above), so some more serious solvers start solving on Thursday.
When we refer to a “Monday puzzle”, we are referring to a puzzle that the Times labels as a Monday puzzle, which is actually available in Puzzazz on Sunday afternoon and could potentially be solved on any day of the week. We only care about how the Times labels the puzzle, not when it’s solved. One of the benefits of solving in Puzzazz is that it’s easy to solve puzzles when it’s convenient to the solver; we see many solvers solving puzzles out of order, such as solving a bunch of early week puzzles in succession.
Most people who solve puzzles complete them successfully. The bottom (green) line shows the completed puzzles with the same baseline of 100% for Monday solves. Looking at the two lines together with what we’ve learned above yields some more interesting finds.
Fewer people complete harder puzles. This makes perfect sense. Fewer people complete the puzzle on each succeeding day of the week (except Sunday, again), even though there’s an increase of people who solve the Thursday puzzle. The most completed puzzles of the week are Tuesday and Wednesday, where almost everybody who tries to solve the puzzle completes it. While initially surprising, this also makes sense when we think about it. Monday is the puzzle for “anyone in America” and many of the people who struggle with it or can’t complete it drop out. The people who are left are solid solvers and almost all of them complete the puzzles. Relatively speaking, we do see a big drop off in completed puzzles on Thursday, but the dropoff is smaller than the increase in difficulty. The result is a remarkably straight line down from Monday to Friday. I don’t have an explanation for that.Relative completion rates pick up on Friday. Friday puzzles get pretty hard, and, unlike Thursday, they are consistently hard, so more people drop out. Friday and Saturday are also typically themeless puzzles, and it may be that some solvers don’t like them as much (we don’t have any evidence for or against this hypothesis). However, we can tell that the solvers who are left, like the Tuesday and Wednesday solvers, are better solvers, because of the relative increase of completed puzzles, despite the puzzles being harder. People really like the Sunday puzzle. Total Sunday solvers almost reaches the Monday peak, but the completed puzzles are much lower, below 80%. That’s some evidence for the thesis that people enjoy solving puzzles even when they don’t complete them successfully. Unlike the paper world, the digital world doesn’t have very many Monday-only or Sunday-only solvers. It’s one thing to solve only one puzzle a week when in comes in the newspaper you’re already getting, but it’s quite another thing to pay for a subscription and only use one seventh of it. The larger solving pool may also partially explain why more Sunday puzzles are considered “Very Easy.”
Validity of results
It’s important to note that our results are relatively valid, not absolutely valid. We are not saying how difficult any particular puzzle, or puzzles on any particular day of the week are. Rather, we are comparing relative difficulties between puzzles and groups of puzzles.
To verify that our results are valid, we analyzed different date ranges and sliced the raw data in different ways. The results were very consistent, as seen in this chart in which each line is a different month. This graph uses a single baseline of the lowest average puzzle in the last 12 months, but the graph looks pretty much the same if we use a separate baseline for each month. The variability between the lines is attributable to the fact that there are only 4-5 puzzles per day of the week in each month and some months naturally have easier or harder puzzles than other months. This analysis yields a 4% margin of error, which would not affect any conclusions.
The change in solvers from day to day, discussed above, also affects our results very little, at least in part because the dropoffs aren’t very large. The relationship between the days, including the relationship between Thursdays and Fridays, holds, as well as the measurement of Sunday as being between Wednesday and Thursday in difficulty.
Tracking individual solvers separately is more complicated, harder to validate, and harder to explain. The main thing that changes with such an analysis is that we would have a slightly steeper Monday-to-Saturday difficulty curve. In addition, a few more Monday and Tuesday puzzles would be considered “Very Easy”, plus a few more Friday to Sunday puzzles would be considered “Very Hard,” increasing slightly the gray areas on the first chart. The conclusions are the same.
If we could factor in incomplete puzzles, we’d probably also see a steeper Monday-to-Saturday curve, but there’s no reasonable way to incorporate that data. When somebody doesn’t complete a puzzle, we don’t know why, and we can’t extrapolate to get a time estimate that would match up with actual time values. It’s better to look at that data separately, as we’ve done here.
Some people use hinting, and we don’t take that into account. Since those people who use hinting generally use it in a consistent manner (typically more on puzzles later in the week), this actually has no effect on the perceived difficulty levels of the puzzles for those solvers.
Finally, it’s worth asking if these results are equally valid for those people solving on paper. Some people solve faster on digital devices while others solve faster on paper, and sometimes it’s affected by the difficulty of the puzzle — for harder puzzles, more time is spent thinking about the puzzle than in writing or entering text. We have done some comparisons between paper solving and Puzzazz solving, and we believe that the relative relationships would hold producing very similar charts. But, it would be a significant project to prove this empirically.
We analyzed a year’s worth of solving data for New York Times crossword puzzles, and drew some interesting conclusions, some surprising.
To view the Puzzazz leaderboard for the New York Times crossword, tap the trophy icon from the green bar after you've completed the puzzle in the Puzzazz app, or visit our web site to view the leaderboard and learn how to solve the New York Times crossword in Puzzazz.
Our Puzzle Books Buy Gift Certificates
Our World-Class Authors
How to Solve Puzzles
Our exclusive, award-winning TouchWrite™ handwriting recognition
Special features: Puzzles Live 2013 100th Anniversary of the Crossword
Solve the NYT Daily Crossword in the free Puzzazz app for iOS
View the leaderboard for the current NYT Crossword by Zhouqin Burnikel
Get the puzzle of the day: RSS Feed Daily Emails
About Us Contact Us Support FAQ News Our Blog Read the Buzz about Puzzazz
Your account Redeem a coupon or special offer