Data Scientist Interview Questions & Answers (2026)

Data science interviews test breadth: statistics, machine learning, coding, SQL, and the judgment to apply them to a messy business problem. Here are the questions that separate strong candidates.

Statistics & probability

Explain p-value to a non-technical stakeholder. (If you can't, you'll struggle in the role.)
What is the difference between Type I and Type II error? Which is worse — and why does it depend?
How would you detect and handle multicollinearity?
What's the Central Limit Theorem, and why does it matter for A/B testing?

Machine learning

How do you handle an imbalanced dataset? Walk through three approaches.
Bias-variance trade-off — explain it with a real model you've trained.
When would you choose a random forest over gradient boosting?
Your model has 99% accuracy but is useless. What happened? (Hint: base rates.)
How do you know your model is overfitting, and what do you do about it?

SQL & coding

Write a query to find the second-highest salary per department.
Find users who logged in three days in a row.
Given event data, compute a 7-day rolling retention.

Case study & behavioral

A product metric dropped 20% overnight. How do you investigate?
How would you design an experiment to test a new recommendation algorithm?
Tell me about a model you shipped that didn't work in production.

How to approach: for case studies, state the hypothesis, the data you'd pull, the metric you'd move, and how you'd validate. Structure beats cleverness.

What each interview round is really testing

A typical data science loop has four flavors: a stats/probability screen (do you understand uncertainty?), a coding/SQL round (can you get the data?), an ML depth round (can you reason about models in production, not just notebooks?), and a case/product round (can you tie analysis to a business decision?). Candidates who fail usually nail the math but can't connect it to a metric a stakeholder cares about.

How to structure a case answer

Restate the goal and the metric you'd move.
State your hypothesis and the data you'd pull to test it.
Pick the method — and justify why it fits the data and the decision timeline.
Name the risks: confounders, sample size, seasonality, leakage.
Close with the action you'd recommend and how you'd validate it.

Mistakes that sink data science candidates

Reaching for a complex model when a simple baseline answers the question.
Forgetting to mention data leakage when discussing model evaluation.
Quoting accuracy on an imbalanced dataset without flagging base rates.
Giving a textbook definition of p-value instead of explaining it plainly.

Bring one project you can discuss end to end — the messy data, the trade-offs, what shipped, and what you'd do differently. Depth on a real project beats breadth across buzzwords.

Worked answers to the questions that decide the loop

"Explain p-value to a non-technical stakeholder." A strong answer avoids jargon: "It's the chance we'd see a result this extreme if the change actually did nothing. A small p-value means the result is unlikely to be a fluke, so we have more confidence the change had a real effect. It does not tell us how big the effect is, or that the result matters for the business." That last sentence — naming the limitation — is what separates a data scientist from someone who memorized a definition.

"How do you handle an imbalanced dataset?" Walk through three levers and their trade-offs. First, change the evaluation metric — accuracy is useless at 99% negatives, so use precision, recall, F1, or AUC depending on the cost of errors. Second, resample — oversample the minority (SMOTE) or undersample the majority, noting that each can distort the distribution. Third, adjust class weights in the model. Close by tying it back to the business: "If a false negative is a missed fraud case, I'd optimize recall and accept more false positives."

"A product metric dropped 20% overnight. How do you investigate?" Resist guessing. Say: "First I'd confirm it's real and not a logging or pipeline bug — those are the most common cause of overnight cliffs. Then I'd segment by platform, region, app version, and new versus returning users to localize it. A drop concentrated in one app version points to a release; a broad drop points to an external factor. I'd form a hypothesis and pull the specific cut of data that confirms or kills it."

Round-by-round: the data science loop

A typical loop has a recruiter screen, a technical screen (SQL plus stats or a coding exercise), and an onsite. The onsite usually covers a SQL/coding round, a machine learning depth round, a stats/experimentation round, a product or case round, and a behavioral round. Some companies add a take-home, which rewards clean, well-documented analysis over fancy models.

The split tells you what to emphasize. If the role is analytics-leaning, weight SQL and product cases. If it's ML-engineering-leaning, weight modeling, productionization, and coding. Read the job description for the signal — "experimentation" means A/B testing depth; "production models" means MLOps and monitoring.

What a strong answer looks like

Strong candidates connect every method to a decision. They state assumptions, name the metric they'd move, and acknowledge what could go wrong — leakage, confounders, seasonality, sample size. They prefer a simple baseline they can explain over a complex model they can't. And they always close a case with "so here's what I'd recommend, and here's how I'd validate it."

Weak answers stay in the textbook. They define terms correctly but can't apply them to a messy, ambiguous business problem — which is the actual job.

How expectations differ by company

Big tech weights experimentation and causal inference heavily because they run thousands of A/B tests. Startups care more about breadth and pragmatism — can you own data end to end, from pipeline to dashboard to model? Research-heavy orgs probe ML depth and may ask you to derive things from first principles. Tailor your prep to the company's data culture, which you can usually infer from the job description and the team's blog.

Frequently asked questions

How much SQL do I really need? A lot. SQL is the most common reason strong modelers get rejected. Practice window functions, multi-table joins, and rolling aggregations until they're automatic.

Should I do the take-home even if it's unpaid? If you want the role and the scope is reasonable (a few hours), yes — it's often the strongest signal you can send. Document your reasoning, not just your code.

Do I need deep learning? Only if the role calls for it. For most analytics and product DS roles, strong fundamentals in regression, tree models, and experimentation matter far more than neural networks.

How do I talk about a model that failed in production? Own it plainly: what you expected, what actually happened (drift, leakage, a bad assumption), how you detected it, and what you changed. Honesty about failure signals seniority.

An expanded question bank by theme

Broaden your reps with these commonly asked prompts. Practice explaining each plainly, as if to a smart non-specialist.

Statistics and probability: Explain the bias-variance trade-off. What is a confidence interval, really? When would you use a t-test versus a chi-square test? What is the difference between Bayesian and frequentist thinking? Explain the law of large numbers. How do you correct for multiple comparisons?

Experimentation: How do you size an A/B test? What is statistical power, and what reduces it? What are guardrail metrics, and why do you need them? How do you handle novelty and primacy effects? What is a network effect, and how does it break standard A/B assumptions?

Machine learning: Explain regularization and the difference between L1 and L2. How does a random forest reduce variance? What is gradient boosting doing differently? How do you choose a loss function? What is cross-validation, and when does it leak? How would you explain a model's predictions to a stakeholder?

SQL and data manipulation: Compute month-over-month growth. Find the top N per group. Calculate a funnel conversion rate across steps. De-duplicate while keeping the latest record. Pivot long data to wide.

Follow-up questions interviewers love

After your first answer, expect the squeeze: "Why that model and not a simpler one?" "What would you do if the data were ten times larger?" "How would you know this is working in production?" "What's your plan when the model degrades?" "How do you explain this to a skeptical executive?" The follow-ups test whether you understand the consequences of your choices, not just the choices.

A realistic two-week study plan

Days 1–3: Statistics and probability fundamentals, framed for plain-language explanation. Practice the p-value, confidence interval, and hypothesis-testing answers out loud.
Days 4–6: SQL drills — window functions, multi-table joins, funnels, and rolling metrics until they're fast and correct.
Days 7–9: Machine learning depth — model selection, evaluation, regularization, and handling imbalance, each tied to a business example.
Days 10–11: Experimentation and product cases. Practice the "metric dropped" and "design an experiment" prompts with a clear structure.
Days 12–13: Mock interviews and, if there's a take-home, a timed practice run focusing on clean documentation.
Day 14: Review your project stories and your weakest topic. Prepare to discuss one project end to end.

The day before and the day of

The night before, review your plain-language explanations and your one end-to-end project story — the messy data, the decision, what shipped, what you'd change. On the day, for every question, name the metric or decision at stake, state your assumptions, and acknowledge the limitations of your approach. Data scientists are hired for judgment under ambiguity, so let your structure and honesty about trade-offs show in every answer.

How to turn this question list into real readiness

A list of questions is raw material, not preparation. The candidates who convert practice deliberately, and the method is the same regardless of role: focus on plain-language explanation, SQL speed, and business judgment.

Start by answering out loud, never silently. Comprehension and recall under pressure are different skills, and only spoken practice builds the second. Record yourself so you can hear the filler words, the hedging, and the moments where your structure falls apart — things you never notice while speaking.

Then score yourself against a simple rubric: was the answer structured, specific, and relevant to what was asked? Did it land on a concrete result or trade-off? Rebuild the weakest answers and run them again. A useful daily rep is to explain a concept to an imaginary non-technical stakeholder, then check yourself for jargon.

Use spaced repetition rather than a single cram. Three short sessions across a week beat one long session the night before, because the goal is durable recall under stress, not short-term familiarity. Finally, simulate pressure with at least two timed mock interviews before the real thing — pressure changes how you think, and you want to have felt it before it counts.

A final pre-interview checklist

Run through this the day before:

Can you explain p-value, confidence intervals, and the bias-variance trade-off in plain words?
Are your window-function and multi-join SQL queries fast and correct under light pressure?
Can you walk one project end to end, including what failed?
Do you always tie a method back to the metric or decision it serves?
Have you researched the company, the team, and the specific role enough to tailor your answers and ask sharp questions of your own?
Have you prepared two or three genuine questions to ask the interviewer that show you understand the role?

If you can answer yes to each, you're ready. Get a good night's sleep — being rested will do more for your performance than one more hour of practice.

Practice these questions with AI

Reading questions is step one. The candidates who convert are the ones who rehearse out loud and iterate on feedback. Paste your target job description into ClavePrep to generate role-specific questions, run a free AI mock interview (text or voice), and get structured feedback on each answer. Build your behavioral stories first with the free STAR Answer Builder.