I’m not trying to grill you. The bar’s not high on the telephone interview. The next step, the face-to-face, means meeting a few members of the team. They’ll grill you. My job is to screen out the uncomfortably bad experiences.

I’m just trying to protect them and not waste your time. But be prepared — this question is a technical one. However, it’s not meant to test the breadth or depth of your technical knowledge, but rather to lay bare your soul to me and reveal who you are as a data scientist.

I’ve interviewed hundreds of experienced and aspiring data scientists — and hired several dozen. I’m not looking for one type of person. I build teams, and there are many ways that very different people can fit together to form a great team. I also don’t expect to find the exact piece I need. I’m looking for someone to invest in. But I have high standards.

The Standard

It’s a big responsibility. I treat my teams as a family. I’m introducing a potential, new family member. I want two things for each family members:

For them to feel safe to be and express themselves honestly without fear of retribution.
I want them to feel that the work we do is important and requires a certain strength of character. For a data scientist, that strength is intellectual honesty.

If I let someone in who isn’t prepared to be brutally honest with themselves and others about the quality of their work, even as a candidate, it’s an insult. They see the caliber of the candidate as a reflection of how I see them.

The Set-up

I don’t just blurt this question out. I want it to be unassuming. I want to gauge how important you think the question is. I don’t want to bias you, so I’ll sneak it in as a follow-up.

So first, I’ll ask you to tell me about your relevant experience. Perhaps I say, “Which of your projects are you most proud of? Tell me about it.”

Here, I’m looking for two things:

Can you explain to me, a technical expert, what you’ve done? I’m not looking for eloquence, just clarity.
What do you value? When you talk about what you’re proud of, the language you use (and what you focus on) betrays your values.

Some people fail the interview on this question, but few. Some people cannot explain their work — to the point you’d question if they just made it up right then. Others just present themselves poorly — speaking extremely rudely of previous co-workers or saying things that are flat out wrong. One or two have droned on for the remainder of the interview refusing to be interrupted.

But almost all pass and are now set up for the big one.

The Unassuming Follow-up

Now, finally, I can gauge the degree of your intellectual honesty. You’ve just told me so much about the data science project you’re most proud of. At some point, you’ll have at least mentioned the statistical or machine learning model central to your project. So, I casually ask you how well your model worked.

Since I asked casually, I’m not surprised or dissuaded if I get a superficial response, but ideally, you jump at the chance to go into great detail about how you evaluated and tested your model. Take this opportunity.

If you don’t, I’ll underline my interest with skepticism: “How do you know it worked?”. The first time I asked, I implied that it worked and worked well. It was a question of degree, now I am asking for a defense.

The Wrong Answer

There is no single right answer. But there are several wrong responses. Brushing the question off a second time is perhaps the worst. I’ll suppress my ire, make a third and final attempt, but that’s effectively the end of the interview. Many flounder and reveal they have no idea how to evaluate a model in even the most cursory ways.

Accuracy

One common poor answer is something like, “I had a test set and a training set; the accuracy of the model on the test set was 90%”. Why is this poor? Well, for starters, I’d need more information to know if 90% accuracy is a good result or a terrible one. If 95% of the test set is the same class, 90% accuracy is terrible!

Also, that answer doesn’t address several other important questions including:

How did you avoid bias when selecting the test set?
What makes you think the test set is sufficient?
How well does the test set reflect the data you’ll see in production?
Why did you choose accuracy as your scoring metric? What other metrics did you consider?
How many different models did you try with this split, and how do you know you haven’t overfitted?

I’ll ask as many of these as time permits. I need to know you at least care that your model is correct and not just turning the crank until you get a favorable result.

Cross-validation

The most common poor answer tries, somewhat, to address the issue of over-fitting: “I used cross-validation”. There seems to be a pervasive misconception that cross-validation solves several fundamental and complex problems in data science. It doesn’t.

Cross-validation is a great tool, but it guarantees nothing. You can still easily cherry-pick results or features, bleed information from one fold to another, or use information in a way not possible in production. Nor can it address more epistemological issues in data science such as black swans, dragon kings, reproducibility, and the effect of predictions on the outcomes they are meant to predict.

Don’t worry. I’m not looking for a philosophical treatise. I just need to know you don’t think that machine learning is magic.

Think Like a Scientist

If you want to succeed where the vast majority of the other candidates fail, think like a scientist. Your job is to find the truth, not prove or disprove any particular theory. Don’t just find patterns, find meaningful patterns.

Take some time. Ponder the implications of what you are doing. Explore data science with more skepticism. Maybe even read more philosophy of science. And dig into the various metrics and model-evaluation packages of the tools you already use.

And most importantly, go back over your old work and see where you can poke holes in what you’ve done. Because if you don’t, I will.

Why You’re Failing the Telephone Interview