← Back to all articles

How to Run a Structured Interview: A Practical Step-by-Step Guide

The difference between a structured interview and a regular one is not the tone of the conversation. It is who decides what gets asked, when, and how the answer gets scored. In a regular interview, those three decisions happen inside the interviewer’s head, live, under the pull of the moment’s impression. In a structured interview, all three decisions were already made before the candidate sat down.

The consequence is measurable. Schmidt and Hunter’s 1998 synthesis of 85 years of selection research reported substantially higher predictive validity for structured interviews (.51 vs .38 for unstructured) 1. McDaniel et al. (1994) reported larger gaps in specific subgroups: panel interviews with consensus rating reached coefficients near .63, against .20 in the least structured conditions, with the study’s mean values at roughly .44 vs .33 2. Wingate et al. (2025) re-validated the same direction with modern data 3. The direction has been stable for more than six decades: structure predicts real-world performance significantly better than free conversation.

The U.S. Office of Personnel Management, the federal HR authority, explicitly identifies structured interviews as a high-validity and legally defensible selection method, alongside cognitive ability tests and work-sample evaluations 4. And yet the majority of interviews actually conducted in U.S. small and mid-size businesses remain unstructured, scored from impression after the fact. This guide is the practical fix for that gap.

What follows is the full process from scratch: what to decide before the first interview, what to write down, how to run the conversation, how to score answers, and how to make a final call by reading evidence rather than recalling vibes. Works for any role. No software required.

What counts as a structured interview

Before the step-by-step, it helps to be clear on what is included. The literature defines a structured interview as one that has, at minimum, two elements:

  • The same opening questions for every candidate for the same role. Each candidate answers the same initial set of questions, tied to pre-defined criteria.
  • Fixed evaluation criteria and an anticipated rubric. What counts as a strong, mixed, or weak answer was decided before the conversation, not after.

When both elements are present, the interview is structured. When only one is, it is semi-structured (the most common format in practice, and the one we recommend for most teams). When neither is, it is unstructured, even if it feels organized in the room.

The practical difference is that in an unstructured interview, two candidates answer different questions, get evaluated against different implicit criteria, and the final comparison is made by reconstructing from memory which one left the stronger impression. In a structured interview, two candidates are compared line by line against the same criteria, with evidence captured live.

Step 1: Define the role’s evaluation criteria

A structured interview does not start when the candidate enters the room. It starts when you sit down and list what this specific role actually requires. Without that, any structure layered on top is arbitrary.

The practical rule is to define between 4 and 6 objective criteria. Fewer than 4 and you are measuring too little of the role. More than 6 and the interview turns into a shallow questionnaire that cannot go deep on anything.

For each criterion, write four things:

  1. What the criterion is, in concrete language. Not “communication.” Something like “ability to explain a technical decision to a non-technical stakeholder without losing precision.” Or “track record of taking a B2B negotiation through to close on a deal of $50K+ without giving away margin.”
  2. What weight it carries. Disqualifying, important, or nice-to-have. Honest weights stop the first impression from dominating.
  3. What strong evidence looks like. For example: “strong evidence = candidate cites a specific deal with industry/segment, deal size, what the objection was, how they responded, and the measurable outcome.”
  4. What weak evidence looks like. Generic answer, no concrete example, no numbers, no context.

This set is the Role Blueprint. It does not need to be designed. It needs to be written down before the first candidate walks through the door. Without it, every step that follows is compromised.

The reference review by Campion, Palmer and Campion (1997) identifies pre-defined criteria as one of the most consistent contributors to predictive validity among the 15 components of structure they analyzed 5. It is not structure on its own. It is structure backed by clear criteria.

For more on building the Blueprint into a usable scoring tool, see the interview scorecard template.

Step 2: Write the opening questions

For each criterion, write at least one opening question. This is the question every candidate will receive, in the same form, at the start of that section of the interview.

The literature distinguishes two main formats of structured questions, both with consistently high inter-rater reliability per the Huffcutt, Culbertson and Weyhrauch (2013) meta-analysis 6:

Behavioral questions (“behavior description”): ask about a real past situation. Example: “Tell me about the last time you had to renegotiate a deadline with an unhappy client. Who were they, what was the situation, what did you do, and what was the outcome?” The premise, validated empirically, is that past behavior is the best predictor of future behavior in analogous situations.

Situational questions: present a hypothetical scenario related to the role and ask what the candidate would do. Example: “Imagine you inherit a team where two senior members have been in open conflict for weeks and output is down 20%. What is your first move, and why?” The candidate has to demonstrate applicable reasoning, not memory of past events.

For most roles, we recommend a hybrid format: the opening question is behavioral (it pulls real evidence) and, when needed, a situational follow-up is used to probe areas where the candidate has no direct experience but would still need to operate.

The critical point: the opening question is the same for everyone. Depth adapts. If the answer was specific and substantive, you go deeper (ask for more detail, contrast with another situation, test the claim). If it was vague, you ask for a concrete example before moving on. But the starting point does not change.

Step 3: Define the scoring rubric

The rubric is the part that gets skipped most often, and the part that has the biggest impact on the quality of the final decision. Without a rubric, “good answer” and “bad answer” become subjective end-of-week assessments.

A simple, functional rubric uses three levels per criterion:

LevelWhat characterizes it
StrongConcrete example with context, measurable numbers, clear candidate role, verifiable outcome
MixedExample present but with gaps (no numbers, no clear outcome, or candidate role diluted in a group effort)
WeakGeneric answer, no example, or example where the candidate’s role is unclear

For each criterion, describe in one sentence what specifically counts as strong for that criterion. For example, for “B2B negotiation on high-ticket deals”:

  • Strong: cites at least one closed negotiation in the last 24 months on a deal above $50K, with a specific objection, response given, and outcome in margin
  • Mixed: cites B2B negotiations in general, but without high-ticket detail or without clarity on the objection and margin
  • Weak: speaks of “sales experience” without bringing a concrete case

This description has to be written before the first interview. Then it gets used to score every candidate against the same standard. Without an anticipated rubric, the interviewer scores in the moment based on feeling, and that feeling is heavily driven by verbal fluency and likability, not substance.

Step 4: Run the interview

With criteria, opening questions, and rubric in hand, the interview itself has a predictable shape.

Standardized opening. Briefly introduce the company and role. Explain how the interview will work (same opening, focused questions, candidate questions at the end). This takes 3 to 5 minutes. Do it the same way for every candidate.

Block of opening questions. For each criterion, ask the opening question in the same form. Listen to the answer without interrupting. Take notes on the literal answer, not your impression of it.

Adaptive depth. After the initial answer, decide whether to go deeper or move on. Go deeper when: (a) the answer was specific and substantive and you want to test it; (b) the answer was vague and you need a concrete example before scoring; (c) the candidate showed an unexpected red flag or strength worth probing.

Typical follow-ups: “Can you say more about how you measured the outcome?”, “What would you do differently today?”, “Was anyone on the team disagreeing with that approach? How did you handle that?”. The goal is to gather evidence that lets you score against the rubric, not to lead the candidate to an expected answer.

Candidate questions block. The last 10 to 15 minutes belong to the candidate. Best practice: note the type of question they ask. A culture question signals different interest from a comp question, which signals different interest from a scope question. None are right or wrong; they are additional information about how the candidate is thinking about the role.

Step 5: Capture evidence live, not impression after

This is the step that protects most against the classic cognitive biases of the interview.

Cognitive psychology research has shown that decisions made from reconstructed memory are heavily affected by the halo effect, documented by Nisbett and Wilson (1977): a single positive attribute (verbal fluency, confident posture) unconsciously contaminates the evaluation of every other dimension of the person 7. In an interview context, the “thin slices” literature (Ambady and Rosenthal, 1992) further shows that evaluators form persistent impressions from very brief exposures, before any substantive evidence appears 8.

The practical defense is simple: during the interview, write down what the candidate actually said, not your assessment of it. Not “answered well,” but: “said: led the migration of a 12-person team from monolith to microservices over 9 months, KPI was deploy frequency, went from weekly to daily.” Real quotes, tied to specific criteria.

If the interview is on video, consider a transparent transcription tool (with disclosure to the candidate; never record without notice). If it is in person, keep a notebook open.

The rule of thumb: nothing your memory will reconstruct three days later is as precise as what you wrote down in the moment. And no decision made from reconstructed memory is as defensible as one made by reading captured evidence.

Step 6: Decide by reading evidence, not by remembering

Before any “I liked them,” produce a short document, even if only for yourself. One paragraph per criterion, with cited evidence. One line of scoring: strong / mixed / weak for each. At the end, one sentence: “recommend / do not recommend, and why.”

That document does three things:

  1. Forces you to look at evidence before deciding, instead of impression.
  2. Lets you compare candidates honestly, side by side.
  3. Defends the decision later, in compliance reviews or simply if the hire does not work out and your co-founder or board wants to understand what happened.

When more than one interviewer is involved, the procedure gains validity. Each writes their memo before discussing with the others (this order matters: reading someone else’s memo before writing yours contaminates yours). Discussion then happens by comparing cited evidence, not impressions.

Common failure modes that invalidate the structure

Structured interviews fail in predictable ways. The most common:

Criteria defined after seeing resumes. If criteria are designed around the candidates already in the funnel, they become justification for a decision that was already being made. Criteria have to be written before any screening.

Opening questions that turn into a checklist. If every question lasts 90 seconds and the interviewer moves on, the interview becomes a quiz. The opening question is the starting point; the depth is where the real evidence appears.

Generic rubric. “Strong = good answer” is not a rubric. It is opinion in disguise. The rubric has to describe, for that specific criterion, what counts as strong evidence on that dimension.

End-of-day scoring. Scoring 5 candidates at the end of the afternoon, from memory, kills the gain of structure. Scoring happens at the end of each interview, with the notebook open.

Discussion between interviewers before memos. When interviewers discuss before each writes their memo, the second memo is contaminated by the first, and the gain of having more than one head disappears. Memos first, discussion second.

When the spreadsheet stops holding the discipline

The method above works at any company size. But it requires six steps executed for every role, by every interviewer, on every candidate. In a spreadsheet, each step is cheap. In real execution, each step is exactly where the discipline slips.

The most common failure modes are predictable:

  • The written criteria become free text and start drifting between interviewers within a week
  • The rubric is written once and then never consulted during the interview, when it should be the live comparison point
  • Live evidence capture is abandoned after the first 10 minutes, when the conversation gets intense and the interviewer goes back to “noting impressions”
  • The per-criterion memo turns into “I’ll write it later” and in most cases never gets written

Each of those failures cancels the gain from structure, and the cost of each failure is exactly what this discipline exists to prevent (SHRM puts the total cost of a single bad hire at 0.5x to 2x annual salary).

Recrutador is a Hiring Intelligence Platform that automates each step of the method. A chat-first Strategist defines the Role Blueprint with you before any candidate enters the funnel. The Job Description is generated from the Blueprint. Resumes are ranked by it. During the live interview, the desktop HUD listens, transcribes in real time, and surfaces the next right probe. At the end, the Post-Interview Memo is generated automatically with quoted evidence and a structured decision. Every step you would have to discipline by hand becomes automatic.

Works for any role, any company size.

To understand more, read What is Recrutador. To get started, try it now or talk to the team.

References

Footnotes

  1. Schmidt, F. L., & Hunter, J. E. (1998). The Validity and Utility of Selection Methods in Personnel Psychology: Practical and Theoretical Implications of 85 Years of Research Findings. Psychological Bulletin, 124(2), 262-274. DOI

  2. McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S. D. (1994). The Validity of Employment Interviews: A Comprehensive Review and Meta-Analysis. Journal of Applied Psychology, 79(4), 599-616. PDF

  3. Wingate, T. G., et al. (2025). Evaluating interview criterion-related validity. International Journal of Selection and Assessment. Wiley Online Library

  4. U.S. Office of Personnel Management. Structured Interviews. OPM is the federal HR authority and explicitly identifies structured interviews as a high-validity, legally defensible selection method. OPM resource.

  5. Campion, M. A., Palmer, D. K., & Campion, J. E. (1997). A review of structure in the selection interview. Personnel Psychology, 50(3), 655-702. The reference review on the 15 components of structure in interviews. DOI

  6. Huffcutt, A. I., Culbertson, S. S., & Weyhrauch, W. S. (2013). Employment Interview Reliability: New Meta-Analytic Estimates by Structure and Format. International Journal of Selection and Assessment, 21(3), 264-276. DOI

  7. Nisbett, R. E., & Wilson, T. D. (1977). The Halo Effect: Evidence for Unconscious Alteration of Judgments. Journal of Personality and Social Psychology, 35(4), 250-256. PDF

  8. Ambady, N., & Rosenthal, R. (1992). Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. Psychological Bulletin, 111(2), 256-274. Classic meta-analysis on how evaluators form persistent impressions from very brief exposures. DOI