Okay, you’re probably here because you’ve got just a few months to graduate and the projects section of your resume is blank. Or you’re just an inquisitive little nerd scraping the WWW for ways to crack that dream job. Either way, you’re not alone and there are ten thousand others trying to build a great Data Science portfolio to land them a good job. Look no further, we’ll try our best to help you on how to make a portfolio that catches the recruiter’s eye!
Companies are on the lookout for employees who can add value to the business. To showcase this on your resume effectively, the first step is to understand the different ways in which you can add value.
4 things you need to show in a data science portfolio
Data science can be broken down into 4 broad areas:
- Obtaining insights from data and presenting them to the business leaders
- Designing an application that directly benefits the customer
- Designing an application or system that directly benefits other teams in the organisation
- Sharing expertise on data science with other teams
You’ll need to ensure that your portfolio portrays all or at least most of the above, in order to easily make it through a job selection. So let’s see what we can do to make a great portfolio.
Demonstrate that you know what you’re doing
So the idea is to show the recruiter that you’re capable of performing the critical aspects of Data Science, i.e. import a data set, clean the data, extract useful information from the data using various techniques, and finally visualise the findings and communicate them. Apart from the technical skills, there are a few soft skills that are expected as well. For instance, the ability to communicate and collaborate with others, the ability to reason and take the initiative when required. If your project is actually able to communicate these things, you’re in!
Stay focused and be specific
You might know a lot, but rather than throwing all your skills, projects and knowledge in the employer’s face, it’s always better to be focused on doing something and doing it right. Just as you’d do in your resume, keeping things short and sweet, you can implement this while building your portfolio too. Always remember, the interviewer is looking for specific skills.
Research the data science job market
Find 5-6 jobs, probably from Linkedin or Indeed, that interest you and go through their descriptions thoroughly. Understand what kind of skills the employer is looking for. For example, it could be classification, machine learning, statistical modeling or regression. Pick up the tools that are required for the job – for example, Python, R, TensorFlow, Hadoop, or whatever might get the job done. If you don’t know how to use that tool, you’ll want to skill-up as you work your way through the projects. Also, identify the kind of data that they would like you to be working on, like text or numerical, etc. Now, once you have this information at hand, start building your project around these skills and tools.
Be a problem solver
Working on projects that are not actual ‘problems’ that you’re solving, won’t stand out in your portfolio. The closer your projects are to the real-world, the easier it will be for the recruiter to make their decision to choose you. This will also showcase your analytical skills and how you’ve applied data science to solve a prevailing problem.
Put at least 3 diverse projects in your data science portfolio
A nice way to create a portfolio is to list 3 good projects that are diverse in nature. Here are some interesting projects to get you started on your portfolio:
Data Cleaning and wrangling
Data Cleaning is one of the most critical tasks that a data scientist performs. By taking a group of diverse data sets, consolidating and making sense of them, you’re giving the recruiter confidence that you know how to prep them for analysis. For example, you can take Twitter or Whatsapp data and clean it for analysis. The process is pretty simple; you first find a “dirty” data set, then spot an interesting angle to approach the data from, clean it up and perform analysis on it, and finally present your findings.
Storytelling showcases not only your ability to draw insight from raw data, but it also reveals how well you’re able to convey the insights to others and persuade them. For example, you can use data from the bus system in your country and gather insights to identify which stops incur the most delays. This could be fixed by changing their route. Make sure your analysis is descriptive and your code and logic can be followed.
Here’s what you do; first you find a good dataset, then you explore the data and spot correlations in the data. Then you visualize it before you start writing up your narrative. Tackle the data from various angles and pick up the most interesting one. If it’s interesting to you, it will most probably be interesting to anyone else who’s reviewing it. Break down and explain each step in detail, each code snippet, as if you were describing it to a friend. The idea is to teach the reviewer something new as you run through the analysis.
End to end data science
If you’re more into Machine Learning, or algorithm writing, you should do an end-to-end data science project. The project should be capable of taking in data, processing it and finally learning from it, every step of the way. For example, you can pick up fuel pricing data for your city or maybe stock market data. The data needs to be dynamic and updated regularly. The trick for this one is to keep the code simple so that it’s easy to set up and run.
You first need to identify a good topic. Understand here that we will not be working with a single dataset, rather you will need to import and parse all the data and bring it under a single dataset yourself. Next, get the training and test data ready to make predictions. Document your code and other findings and you’re good to go.
Prove you have the data science skill set
If you want to get that job, you’ve got to have the appropriate tools to get the job done. Here’s a list of some of the most popular tools with a link to the right material for you to skill-up:
Data science languages
There’s a number of key languages in data science that are essential. It might seem obvious, but making sure they’re on your resume and demonstrated in your portfolio is incredibly important. Include things like:
Big Data tools
If you’re applying for big data roles, demonstrating your experience with the key technologies is a must. It not only proves you have the skills, but also shows that you have an awareness of what tools can be used to build a big data solution or project. You’ll need:
Machine learning frameworks
With machine learning so in demand, if you can prove you’ve used a number of machine learning frameworks, you’ve already done a lot to impress. Remember, many organizations won’t actually know as much about machine learning as you think. In fact, they might even be hiring you with a view to building out this capability. Remember to include:
Data visualisation tools
Data visualization is a crucial component of any data science project. If you can visualize and communicate data effectively, you’re immediately demonstrating you’re able to collaborate with others and make your insights accessible and useful to the wider business. Include tools like these in your resume and portfolio:
So there you have it. You know what to do to build a decent data science portfolio. It’s really worth attending competitions and challenges. It will not only help you keep up to data and well oiled with your skills, but also give you a broader picture of what people are actually working on and with what tools they’re able to solve problems.