Data Scientist qualifications to look for
Top Data Scientists, Managers and Analysts candidates excel at gaining actionable insights from organization-generated data. They have a sixth sense about the right data to collect, and a solid process for carrying out effective data analyses and building predictive models.
Candidates need a strong foundation in statistics, operations research and machine learning as well as database skills such as Python and SQL. This helps them retrieve, clean and process data from a variety of sources.
Your best candidates will have a background and degree in mathematics or statistics, engineering, or computer science.
A typical Data Scientist will program in a scripting language such as R, Python or MATLAB, and are able to present the findings of analysis.
Keep an eye out for candidates who are:
- Have knowledge of Tableau or D3.js (or related programs)
- Strong communicators
Remember to modify some questions to more quantitative, statistical analysis interview questions.
Top tip: Hire candidates willing to grow by making sure their personal career goals align with your company's mission.
Problem-solving interview questions
- Walk me through your step-by-step process to design a data-driven model that solves a business problem. For example, an automated process to segment customer support questions, predict hiring patterns or reduce churn rates.
- What are the pre-processing steps carried out on data prior to training a model and state under what conditions they might be applied?
- Describe the difference between a simple and complex model. Give me a few examples.
- How would you combine models to form model ensembles? When would this be advantageous?
- Explain dimensionality reduction and ways to perform it.
- In what situation would you choose a more complex model over a simpler one? When would this not be to your advantage?
Role-specific interview questions
- In which environment(s) do you usually run your analyses?
- Are you familiar with SQL? When have you used it?
- What visualization tools have you used? What are your favorite features?
- We’d love to see any presentation you’ve prepared.
- Describe your experience presenting reports and findings directly to senior management.
- How do you feel about public speaking? Have you presented a technical topic to an audience before? If so, how do you explain things to a non-technical audience?
- What is your measure to know if you’ve collected enough data to train a model?
- What is the reason for training, test and validation data sets? How they are used effectively?
- Explain a confidence interval and in what circumstance you would use it.
- Explain the difference between statistical independence and correlation.
- Define conditional probability and Bayes’ Theorem. When would you put this practice to use?
- We are training a model using stochastic gradient descent. How do we know if we are converging to a solution? If a training procedure converges will it always result in the best possible solution?
- Explain clustering and describe an algorithm that performs it. What measure do you use obtained decent clusters? How do you estimate a good number of clusters to use with our data?
- Explain why correlation does not imply causation.
- Describe the key differences between unsupervised and supervised learning.
- Describe the key differences between regression and classification.
- What is a bias-variance tradeoff in statistical models?
- Explain over-fitting and how it relates to the bias-variance trade-off.
- Define regularization and give examples of regularization in models.
- We are training a binary classifier and one class is very rare. What example describes this problem? How should we train this model? What performance metrics should we use?
- How many unique subsets of n different objects can we make?
- Explain how to build a data-driven recommender system. Are there limitations to this approach?