The single most-cited reason interview scorecards fail is not that the template is wrong. It is that the team uses the template as a souvenir of the interview rather than as the instrument that drives the decision. The scorecard gets filled in five minutes after the conversation ends, from memory, with the candidate’s name already mentally scored as “yes” or “no”. The scores are reverse-engineered to match the gut-feel verdict. The document gets filed. Nobody is the wiser.
This guide gives you a free interview scorecard template you can copy into Google Sheets, Notion, or a sheet of paper, and explains the discipline that makes it actually work. Both halves matter. The template without the discipline is theater.
Why a scorecard at all
Decades of organizational psychology research show structured interviews predict on-the-job performance substantially better than unstructured ones. The McDaniel et al. (1994) meta-analysis 1 reported criterion-related validity nearly three times higher for structured interviews (.63 vs .20). Schmidt and Hunter’s landmark 1998 synthesis of 85 years of selection research 2 reported a smaller but still substantial advantage (.51 vs .38), and Wingate et al. (2025) re-validated the same direction with modern data 3. The cost of getting this wrong is not academic. SHRM puts the total cost of replacing an employee at 0.5x to 2x the position’s annual salary 4, with the widely-cited “30% of annual salary” figure (commonly attributed to the U.S. Department of Labor) sitting at the floor of that range 5.
Translated: for an $80,000-a-year role, a single bad hire costs the company somewhere between $40,000 and $160,000 (the SHRM 0.5x-2x range), or even more for senior and revenue-touching positions. The scorecard is the single cheapest defense against that cost.
The template
Copy this directly into a spreadsheet, a Notion page, or a doc. One scorecard per candidate per role.
Header block
| Field | Value |
|---|---|
| Candidate | (name) |
| Role | (role title) |
| Interviewer | (name) |
| Date | (date) |
| Interview round | (1, 2, panel, final) |
Criteria block
This is the heart of the scorecard. Define the criteria before you see the first candidate, not while you are filling in the form. Add or remove rows to fit the role; 4 to 6 criteria is the sweet spot.
| # | Criterion | Weight (1–5) | Evidence (quote what the candidate said) | Evidence quality (Surface / Specific / Tested) | Score (1–5) |
|---|---|---|---|---|---|
| 1 | (e.g. “Has run a B2B negotiation through to close on a deal of $50K+ in the last 24 months”) | ||||
| 2 | (e.g. “Can explain a technical decision to a non-technical stakeholder without losing precision”) | ||||
| 3 | |||||
| 4 | |||||
| 5 | |||||
| 6 |
Decision block
| Field | Value |
|---|---|
| Weighted score | (sum of Score × Weight) |
| Strengths | (one or two sentences, citing evidence) |
| Concerns | (one or two sentences, citing evidence) |
| Open questions for the next round | (gaps where evidence was thin) |
| Recommendation | Strong yes / Yes / Mixed / No / Strong no |
| Rationale (one sentence) | (frame as evidence, not verdict) |
That is the entire template. Sixty seconds to set up. Useless without what comes next.
How to actually use it: the discipline
1. Define criteria first, never during the interview
Open the scorecard before the first candidate is on the call. Decide, with the hiring manager, what the 4 to 6 criteria are, what weight each carries, and what evidence-strong looks like for each one. Write a one-line rubric for each: evidence-strong = the candidate cites a specific deal with named context (industry, size, objection), the action they took, and a measurable outcome. Evidence-weak = generic claim, no example, no numbers.
If you skip this step, the scorecard becomes a post-hoc rationalization. The whole point is to lock the standard before any candidate-specific bias enters the picture.
2. Write evidence during the interview, not from memory after
The most common failure mode is filling in the scorecard from memory five minutes after the call. By that point, what survives is impression, not evidence. The candidate who spoke fluently feels stronger; the candidate who was nervous feels weaker. This is the halo effect documented in decades of cognitive psychology 6: a single positive attribute (verbal fluency, confident posture) contaminates evaluation of every other dimension. The validity numbers earlier in this article are not theoretical: Schmidt & Hunter found unstructured interviews explain less than 15% of the variance in on-the-job performance, meaning a memory-based score is closer to a coin-flip than to a measurement.
Instead: quote the candidate’s actual words in the Evidence column, while the interview is happening. Even short fragments are fine. “Said: led migration of 12-person team from monolith to microservices over 9 months, KPI was deploy frequency, went from weekly to daily.” That is evidence. “Strong technical leadership” is impression.
3. Use the evidence-quality classification
Every quote you capture goes in one of three buckets:
- Surface — the candidate made a generic claim with no specifics. (“I’m a strong leader.”)
- Specific — the candidate gave a concrete example with names, numbers, or detail. (“I led the migration from X to Y over Z months and we hit metric M.”)
- Tested — you challenged the claim and the candidate held up under pressure. (“When I asked what would have been different if the budget had been half, they walked through the trade-off they would have made and named the metric they’d have sacrificed.”)
A candidate with three Tested-quality quotes and two Surface-quality is in a different league from one with five Surface-quality quotes, even if their nominal scores look similar.
4. Score the criterion, not the candidate
When you assign a 1 to 5, you are scoring that specific criterion based on the evidence you captured, not your overall feeling about the person. This is harder than it sounds. The simplest forcing function: write the score in the Score column before you read your own Evidence column, then read the evidence and ask yourself “would a stranger reading just this evidence give the same score?”. If not, adjust the score, not the evidence.
5. Compare scorecards side by side, never one at a time
The decision is not “is this candidate good?”. The decision is “which of these candidates is best for this role?”. Open all the scorecards in one view. Compare row by row. The candidate who scored a 4 on the most-weighted criterion beats the candidate who scored a 5 on the least-weighted one, even if the second one was more charming in the room.
6. Use the Open Questions column to drive the next round
The scorecard is not just a verdict-making instrument. It is also the briefing document for the next interviewer. The Open Questions column tells the next round exactly which gaps to probe. If criterion #3 was thin in the first interview, the second interviewer goes deep there. This is how a multi-round process compounds evidence instead of duplicating it.
7. Frame the recommendation as evidence, not as verdict
The Rationale field is one sentence. It should sound like “Strong on technical depth (criteria #1, #4) with specific evidence; concerns on stakeholder communication (#2) where evidence was Surface across two probes.” It should not sound like “Smart and personable, would be a good fit.”
The first form lets a different reader (your co-founder, a future manager, a compliance audit) see exactly why the decision was made and decide whether they agree. The second form is gut feel dressed up as a decision.
The seven mistakes that make scorecards useless
After watching teams use scorecards for years, the same failure modes repeat. If your scorecard is not improving your hires, it is almost certainly one of these seven:
- Filling it in from memory. See above. The single biggest source of failure.
- Inventing criteria during the interview. Criteria need to be locked before the candidate is on the call. Otherwise you are scoring against a moving target.
- Using the same generic criteria for every role. A scorecard that does not change between a sales rep and a CTO is not actually evaluating anything specific.
- Letting one charismatic interviewer dominate the panel decision. The whole point of multiple scorecards is that they get compared. If one person’s vote always wins, you do not have a scorecard, you have a rubber stamp.
- Treating the score as the answer. The score summarizes the evidence. The evidence is what you decide on. If two candidates have the same weighted score but very different evidence quality, the evidence wins.
- Burying gaps. If criterion #4 was thin in the interview and you score it a generous 3 because you “got a good feeling”, you are making the decision impossible to defend later.
- Not keeping them. Scorecards filed away after the decision are wasted. Reviewed six months in (when the new hire is either crushing it or struggling), they are the highest-leverage learning artifact a hiring team has. Patterns emerge: which criteria predicted success, which criteria you misweighted, which kinds of evidence were misleading.
When the scorecard needs to live in the interview, not after it
The template above works. The hard part is not building it. The hard part is keeping it open and actively scored during the interview itself, when the conversation is moving fast and the candidate is in front of you.
The most common failure modes are predictable:
- The scorecard is open but the interviewer scores from memory at the end of the day, so vibes win over evidence
- Different interviewers use different mental versions of “strong” / “mixed” / “weak” and the comparison across candidates becomes noise
- Evidence quotes get paraphrased instead of captured verbatim, which is the same as not capturing them; the defensibility disappears
- The scorecard exists for hire #1 and gets reinvented from scratch for hire #2
Each of those failures cancels the gain, and the cost of each failure is exactly what this discipline exists to prevent (SHRM puts the total cost of a single bad hire at 0.5x to 2x annual salary).
Recrutador is a Hiring Intelligence Platform that runs the scorecard discipline end to end as software. A chat-first Strategist defines the Role Blueprint (criteria + weights + rubric + probe library) and persists it across interviews. Resumes are ranked by the Blueprint. During the live interview, a desktop HUD listens, transcribes in real time, and surfaces the next probe one action at a time. At the end, the Post-Interview Memo is generated automatically with quoted evidence. Same engine for any role, any seniority.
If you want to see the methodology end to end, read What is Recrutador. If you want to try it, get started or talk to the team.
References
Footnotes
-
McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S. D. (1994). The Validity of Employment Interviews: A Comprehensive Review and Meta-Analysis. Journal of Applied Psychology, 79(4), 599-616. PDF ↩
-
Schmidt, F. L., & Hunter, J. E. (1998). The Validity and Utility of Selection Methods in Personnel Psychology: Practical and Theoretical Implications of 85 Years of Research Findings. Psychological Bulletin, 124(2), 262-274. DOI ↩
-
Wingate, T. G., et al. (2025). Evaluating interview criterion-related validity. International Journal of Selection and Assessment. Wiley Online Library ↩
-
Society for Human Resource Management and aggregated industry estimates put the total cost of a bad hire between $17,000 and $150,000+, with management roles often costing 1 to 2 times annual salary. Updated synthesis: The Real Cost of a Bad Hire (2026). ↩
-
The 30%-of-annual-salary figure is widely attributed to the U.S. Department of Labor and replicated across consultancy and market reviews. Accessible overview: The Hidden Costs of Bad Hiring and The Cost of a Bad Hire. ↩
-
Nisbett, R. E., & Wilson, T. D. (1977). The Halo Effect: Evidence for Unconscious Alteration of Judgments. Journal of Personality and Social Psychology, 35(4), 250-256. Applied synthesis: Halo Effect in Job Interviews and 7 Cognitive Biases That Distort Hiring. ↩