11 min read

So, you want to be a data scientist. It’s a smart move: it’s a job that’s in high demand, can command a healthy salary, and can also be richly rewarding and engaging. But to get the job, you’re going to have to pass a data science interview – something that’s notoriously tough.

One of the reasons for this is that data science is a field that is incredibly diverse.

I mean that in two different ways: on the one hand it’s a role that demands a variety of different skills (being a good data scientist is about much more than just being good at math). But it’s also diverse in the sense that data science will be done differently at every company. That means that every data science interview is going to be different. If you specialize too much in one area, you might well be severely limiting your opportunities.

There are plenty of articles out there that pretend to have all the answers to your next data science interview. And while these can be useful, they also treat job interviews like they’re just exams you need to pass. They’re not – you need to have a wide range of knowledge, but you also need to present yourself as a curious and critical thinker, and someone who is very good at communicating.

You won’t get a data science by knowing all the answers. But you might get it by asking the right questions and talking in the right way.

So, with all that in mind, here are what you need to do to ace your data science interview.

Know the basics of data science

This is obvious but it’s impossible to overstate. If you don’t know the basics, there’s no way you’ll get the job – indeed, it’s probably better for your sake that you don’t get it!

But what are these basics?

Basic data science interview questions

  • “What is data science?” This seems straightforward, but proving you’ve done some thinking about what the role actually involves demonstrates that you’re thoughtful and self-aware – a sign of any good employee.
  • “What’s the difference between supervised and unsupervised learning?” Again, this is straightforward, but it will give the interviewer confidence that you understand the basics of machine learning algorithms.
  • “What is the bias and variance tradeoff? What is overfitting and underfitting?” Being able to explain these concepts in a clear and concise manner demonstrates your clarity of thought. It also shows that you have a strong awareness of the challenges of using machine learning and statistical systems.

If you’re applying for a job as a data scientist you’ll probably already know the answers to all of these. Just make sure you have a clear answer and that you can explain each in a concise manner.

Know your algorithms

Knowing your algorithms is a really important part of any data science interview. However, it’s important to not get hung up on the details. Trying to learn everything you know about every algorithm you know isn’t only impossible, it’s also not going to get you the job.

What’s important instead is demonstrating that you understand the differences between algorithms, and when to use one over another.

Data science interview questions about algorithms you might be asked

  • “When would you use a supervised machine learning algorithm?”
  • “Can you name some supervised machine learning algorithms and the differences between them?” (supervised machine learning algorithms include Support Vector Machines, Naive Bayes, K-nearest Neighbor Algorithm, Regression, Decision Trees)
  • “When would you use an unsupervised machine learning algorithm?” (unsupervised machine learning algorithms include K-Means, autoencoders, Generative Adversarial Networks, and Deep Belief Nets.)
    Name some unsupervised machine learning algorithms and how they’re different from one another.
  • “What are classification algorithms?”

There are others, but try to focus on these as core areas. Remember, it’s also important to always talk about your experience – that’s just as useful, if not even more useful than listing off the differences between different machine learning algorithms.

Some of the questions you face in a data science interview might even be about how you use algorithms:

  • “Tell me about the time you used an algorithm. Why did you decide to use it? Were there any other options?”
  • “Tell me about a time you used an algorithm and it didn’t work how you expected it to. What did you do?”

When talking about algorithms in a data science interview it’s useful to present them as tools for solving business problems. It can be tempting to talk about them as mathematical concepts, and although it’s good to show off your understanding, showing how algorithms help solve real-world business problems will be a big plus for your interviewer.

Be confident talking about data sources and infrastructure challenges

One of the biggest challenges for data scientists is dealing with incomplete or poor quality data. If that’s something you’ve faced – or even if it’s something you think you might face in the future – then make sure you talk about that.

Data scientists aren’t always responsible for managing a data infrastructure (that will vary from company to company), but even if that isn’t in the job description, it’s likely that you’ll have to work with a data architect to make sure data is available and accurate to be able to carry our data science projects.

This means that understanding topics like data streaming, data lakes and data warehouses is very important in a data science interview. Again, remember that it’s important that you don’t get stuck on the details. You don’t need to recite everything you know, but instead talk about your experience or how you might approach problems in different ways.

Data science interview questions you might get asked about using different data sources

  • “How do you work with data from different sources?”
  • “How have you tackled dirty or unreliable data in the past?”

Data science interview questions you might get asked about infrastructure

  • “Talk me through a data infrastructure challenge you’ve faced in the past”
  • “What’s the difference between a data lake and data warehouse? How would you approach each one differently?”

Show that you have a robust understanding of data science tools

You can’t get through a data science interview without demonstrating that you have knowledge and experience of data science tools. It’s likely that the job you’re applying for will mention a number of different skill requirements in the job description, so make sure you have a good knowledge of them all.

Obviously, the best case scenario is that you know all the tools mentioned in the job description inside out – but this is unlikely. If you don’t know one – or more – make sure you understand what they’re for and how they work.

The hiring manager probably won’t expect candidates to know everything, but they will expect them to be ready and willing to learn. If you can talk about a time you learned a new tool that will give the interviewer a lot of confidence that you’re someone that can pick up knowledge and skills quickly.

Show you can evaluate different tools and programming languages

Another element here is to be able to talk about the advantages and disadvantages of different tools. Why might you use R over Python? Which Python libraries should you use to solve a specific problem? And when should you just use Excel?

Sometimes the interviewer might ask for your own personal preferences. Don’t be scared about giving your opinion – as long as you’ve got a considered explanation for why you hold the opinion that you do, you’re fine!

Read next: Why is Python so good for AI and Machine Learning? 5 Python Experts Explain

Data science interview questions about tools that you might be asked

  • What tools have you – or could you – use for data processing and cleaning? What are their benefits and disadvantages?” (These include tools such as Hadoop, Pentaho, Flink, Storm, Kafka.)
  • “What tools do you think are best for data visualization and why?” (This includes tools like Tableau, PowerBI, D3.js, Infogram, Chartblocks – there are so many different products in this space that it’s important that you are able to talk about what you value most about data visualization tools.)
  • “Do you prefer using Python or R? Are there times when you’d use one over another?”
  • “Talk me through machine learning libraries. How do they compare to one another?” (This includes tools like TensorFlow, Keras, and PyTorch. If you don’t have any experience with them, make sure you’re aware of the differences, and talk about which you are most curious about learning.)

Always focus on business goals and results

This sounds obvious, but it’s so easy to forget. This is especially true if you’re a data geek that loves to talk about statistical models and machine learning.

To combat this, make sure you’re very clear on how your experience was tied to business goals. Take some time to think about why you were doing what you were doing. What were you trying to find out? What metrics were you trying to drive?
Interpersonal and communication skills

Another element to this is talking about your interpersonal skills and your ability to work with a range of different stakeholders. Think carefully about how you worked alongside other teams, how you went about capturing requirements and building solutions for them.

Think also about how you managed – or would manage – expectations. It’s well known that business leaders can expect data to be a silver bullet when it comes to results, so how do you make sure that people are realistic.

Show off your data science portfolio

A good way of showing your business acumen as a data scientist is to build a portfolio of work. Portfolios are typically viewed as something for creative professionals, but they’re becoming increasingly popular in the tech industry as competition for roles gets tougher.

This post explains everything you need to build a great data science portfolio. Broadly, the most important thing is that it demonstrates how you have added value to an organization.

This could be:

  • Insights you’ve shared in reports with management
  • Building customer-facing applications that rely on data
  • Building internal dashboards and applications

Bringing a portfolio to an interview can give you a solid foundation on which you can answer questions. But remember – you might be asked questions about your work, so make sure you have an answer prepared!

Data science interview questions about business performance

  • “Talk about a time you have worked across different teams.”
  • “How do you manage stakeholder expectations?”
  • “What do you think are the most important elements in communicating data insights to management?”

If you can talk fluently about how your work impacts business performance and how you worked alongside others in non-technical positions, you will give yourself a good chance of landing the job!

Show that you understand ethical and privacy issues in data science

This might seem like a superfluous point but given the events of recent years – like the Cambridge Analytica scandal – ethics has become a big topic of conversation. Employers will expect prospective data scientists to have an awareness of some of these problems and how you can go about mitigating them.

To some extent, this is an extension of the previous point. Showing you are aware of ethical issues, such as privacy and discrimination, proves that you are fully engaged with the needs and risks a business might face. It also underlines that you are aware of the consequences and potential impact of data science activities on customers – what your work does in the real-world.

Read next: Introducing Deon, a tool for data scientists to add an ethics checklist

Data science interview questions about ethics and privacy

Ethics is a topic that’s easy to overlook but it’s essential for every data scientist. To get a good grasp of the issues it’s worth investigating more technical content on things like machine learning interpretability, as well as following news and commentary around emergent issues in artificial intelligence.

Conclusion: Don’t treat a data science interview like an exam

Data science is a complex and multi-faceted field. That can make data science interviews feel like a serious test of your knowledge – and it can be tempting to revise like you would for an exam.

But, as we’ve seen, that’s foolish. To ace a data science interview you can’t just recite information and facts. You need to talk clearly and confidently about your experience and demonstrate your drive and curiosity.

That doesn’t mean you shouldn’t make sure you know the basics. But rather than getting too hung up on definitions and statistical details, it’s a better use of your time to consider how you have performed your roles in the past, and what you might do in the future.

A thoughtful, curious data scientist is immensely valuable. Show your interviewer that you are one.

Co-editor of the Packt Hub. Interested in politics, tech culture, and how software and business are changing each other.