The Forecast Scorecard: Stop Being Wrong the Same Way Twice
Every owner I’ve ever worked with believes their forecasts are roughly right. None of them have ever sat down and checked. They remember the months they nailed and forget the months they missed. They tell themselves “last August was an anomaly,” and then it’s anomalous again the next August, and they tell themselves the same thing.
The fix is the most boring spreadsheet in the world. You write down every forecast as you make it, and when reality arrives, you write that down too. After twelve months of that bookkeeping, you can answer questions you’ve never been able to answer: Am I systematically optimistic about new menu items? Do I always under-forecast December? Is my “80% confidence” really 80%, or is it really 50%? Once you can answer those questions, you stop being wrong the same way twice. That’s the whole game.
This post is the recipe. No software. No data team. A Google Sheet, fifteen minutes a week, and the discipline to actually log forecasts before you find out the answer.
Why most owners never check
Three reasons it doesn’t happen:
- It’s embarrassing. Logging your forecasts and then checking them is publicly admitting you’re going to be wrong. Most of us would rather feel right than be right.
- The feedback loop is slow. A monthly forecast takes a month to verify. By the time the answer comes in, you’ve forgotten what your reasoning was when you made it.
- There’s no template. Calibration is a research-paper word; scorecard is a spreadsheet word. Without the spreadsheet, owners don’t know where to start.
The first two are character problems. The third is what this post solves.
What goes in the scorecard
Six columns, no exceptions:
| Column | What goes here |
|---|---|
| Made on | Date you made the forecast |
| Forecast for | Date / week / month the forecast is about |
| Question | "April revenue", "Saturday catering orders", etc. |
| Low (p10) | Number you'd be surprised to come in below |
| Best (p50) | Single best guess |
| High (p90) | Number you'd be surprised to come in above |
| Actual | Filled in when reality arrives |
| In range? | Y / N (formula) |
| Notes | What you knew when forecasting; what surprised you |
That’s the whole thing. The hardest column is the last one. Most people skip it. Skipping it means you can’t reconstruct your reasoning when you review six months later, and the whole exercise loses most of its value.
The first column — the date you made the forecast — matters more than people think. Without it, you can’t tell whether your January forecast for August was just as wrong as your June forecast for August (you keep getting better as more information arrives) or worse (your model has a stable seasonal blind spot). Time-stamping every prediction lets you separate “forecast quality” from “information availability” later.
Three rules for filling it out
Rule 1: Log the forecast before you find out the answer
This sounds obvious. It is not obvious. Most owners forecast in their head while looking at the dashboard, then react to what they see. The forecast and the realization happen at the same moment, so there’s nothing to compare. Without a hard line between “before-the-fact estimate” and “after-the-fact actual,” you can’t score anything.
Discipline trick: pick a fixed time each week to update the scorecard with new forecasts (we did Monday morning, before the week’s data showed up). At that point, you don’t know yet how the week will go. Whatever you write, you wrote it honestly. After that, the scorecard is locked — you can add the actual when it lands, but you cannot edit your earlier forecast.
Rule 2: Always include the range, never just the point
If you only log the p50 (best guess), you can’t score calibration — you can only score whether you happened to be close. The p10/p90 range is what makes the scorecard meaningful. The simple test — did the actual fall inside the p10-to-p90 range? — gives you a calibration check that improves with every forecast you log.
Over many forecasts, your “in range?” column should be Y about 80% of the time (since p10-to-p90 is an 80% confidence interval). If your hit rate is below 80%, your ranges are too narrow — you’re overconfident. Widen them. If your hit rate is above 90%, your ranges are too wide — the forecast is meaningless because it covers everything. Tighten them.
This is the whole calibration loop. There’s nothing more to it. The bookkeeping is what makes the loop close.
Rule 3: Write the why, not just the number
Three months in, the scorecard’s value comes from its notes column. When you go back and ask “why was I 12% high in March?”, the notes are how you reconstruct your assumptions. Without them you have a number and no story, and you’ll repeat the same mistake in a different month under a different label.
Useful notes are short and specific:
- “Forecasted assuming the weekend market would happen. It rained, market was canceled.”
- “Used last March as the base. Forgot we ran a $5-off promo last March that we’re not running this March.”
- “Catering desk was confident on three booked events. One of them was a tentative inquiry, not booked. Adjusted my mental model of what counts as ‘booked.’”
Useless notes:
- “Felt right.”
- “Optimistic month.”
- (blank)
The monthly review (15 minutes, last Friday of the month)
The scorecard is worthless if you never look at it. Block 15 minutes on the last Friday of the month. Open the sheet. Filter to forecasts that have completed (the “forecast for” date is in the past and the “actual” column is filled in). Compute three numbers:
- Hit rate. What fraction of forecasts had the actual inside the p10-to-p90 range? Should be ~80%.
- Bias. Average of (actual − p50) across all completed forecasts. Should be near zero. If it’s consistently positive, you’re under-forecasting (sandbagging). If it’s consistently negative, you’re over-forecasting (wishful thinking).
- Worst miss. Find the forecast where the actual was furthest from your p50. Read the note. Decide whether it’s a one-time event you couldn’t have predicted, or a structural blind spot you should fix.
That’s the whole review. Fifteen minutes. The bias number alone has changed how I plan more than any forecasting tool I’ve ever used. You think you’re unbiased. You aren’t. After three months, the sign of your bias will be obvious and stable, and you’ll start adjusting your gut on the way in instead of getting surprised on the way out.
What the scorecard reveals
After a year of doing this on a small kitchen, three patterns emerged that no amount of staring at the P&L would have surfaced:
- August was always under-forecasted. We thought of August as a slow month because of the State Fair pulling foot traffic, but the data showed catering revenue spiked because corporate clients ran end-of-summer events. Two effects pulling in opposite directions; net was a 12% positive bias every August. We hadn’t noticed because the two effects looked roughly similar in size in our heads.
- New menu items were over-forecasted by ~30%. Every launch we projected 50 orders/week and got 35. Every single time. Once we had the bias number, we just adjusted the launch projection down 25-30% on day one and started planning realistically.
- Weather sensitivity was bigger than we thought. Cold snaps below 10°F dropped catering 35% on average; we’d been mentally adjusting by maybe 15%. The scorecard’s notes column was full of “cold day” entries on under-forecasted weeks. The data was right there, but it took the explicit tally to see it.
None of these are dramatic insights individually. The compounding effect over a year is the dramatic part. Each pattern is a few percent of revenue saved or planned-for. Stack three of them and you’ve materially changed your forecast quality without touching a model.
The trap of the “sophisticated” alternative
Software vendors will sell you a forecasting tool. Some are good; many bury you in a dashboard with no feedback loop. The dashboard tells you the forecast; it doesn’t tell you whether the forecast was right last time. The scorecard tells you whether the forecast was right last time. The scorecard is the part that makes you better. The forecast itself is the part that’s easy to buy.
If you’re going to invest in tools, invest in the bookkeeping side first. A tool that gives you a number every month with no scoring loop is just a more confident version of guessing.
The connection to bigger systems
The scorecard idea isn’t restaurant-specific. It’s how every serious forecasting operation works, from weather services to financial markets to automated prediction systems. They all do the same thing: log the forecast, log the outcome, compute a calibration metric, look at the bias, fix the model. The math gets fancier at scale; the loop is the same.
The point of doing it on a spreadsheet is that the loop is more important than the fanciness. A small business with a calibrated owner who knows her bias outperforms a big-company forecasting team with three engineers and no scoring discipline. We’ve seen it both ways.
The starter template
If you want to skip the design step, start here. Copy this into a sheet and fill it out for a week:
Made on | Forecast for | Question | p10 | p50 | p90 | Actual | In range? | Notes
2026-04-28| 2026-05-04 wk| Catering revenue | 4200 | 5400 | 6800 | | | Mother's Day weekend
2026-04-28| 2026-05-04 wk| Walk-in covers | 240 | 310 | 380 | | | New patio opens Sat
2026-04-28| 2026-05-31 | May revenue |25000 |31000 |37000 | | | Includes Mother's Day, no holidays
The “In range?” column formula:
=IF(AND(G2>=D2, G2<=F2), "Y", "N")
That’s the entire system. Twelve columns, three formulas, fifteen minutes a week. Run it for a year and you’ll know more about your business than most owners ever do.
Related posts
- Point Estimates Lie to Small Business Owners — the why behind ranges; this post is the how-to-keep-score follow-up.
- Demand Forecasting for a 200-Order Kitchen — the model that produces the forecasts the scorecard verifies.
- Inventory Is a Prediction Problem — the operational counterpart: par levels as scored forecasts.
- ZenHodl — the same calibrated-forecast discipline applied to live sports markets at scale.