As a category manager, I manage the data science portfolio of product ideas for Packt Publishing, a leading tech publisher. In simple terms, I place informed bets on where to invest, what topics to publish on etc. While I have a decent idea of where the industry is heading and what data professionals are looking forward to learn and why etc, it is high time I walked in their shoes for a couple of reasons.
Basically, I want to understand the reason behind Data Science being the ‘Sexiest job of the 21st century’, and if the role is really worth all the fame and fortune. In the process, I also wanted to explore the underlying difficulties, challenges and obstacles that every data scientist has had to endure at some point in his/her journey, or still does, maybe. The cherry on top, is that I get to use the skills I develop, to supercharge my success in my current role that is primarily insight-driven.
This is the first of a series of posts on how I got started with Data Science. Today, I’m sharing my experience with devising a learning path and then gathering appropriate learning resources.
Devising a learning path
To understand the concepts of data science, I had to research a lot. There are tons and tons of resources out there, many of which are very good. Once you seperate the good from the rest, it can be quite intimidating to pick the options that suit you the best.
Some of the primary questions that clouded my mind were:
- What should be my programming language of choice? R or Python? Or something else?
- What tools and frameworks do I need to learn?
- What about the statistics and mathematical aspects of machine learning? How essential are they?
Two videos really helped me find the answers to the questions above:
- If you don’t want to spend a lot of your time mastering the art of data science, there’s a beautiful video on how to become a data scientist in six months
- What are the questions asked in a data science interview? What are the in-demand skills that you need to master in order to get a data science job? This video on 5 Tips For Getting a Data Science Job really is helpful.
After a lot of research that included reading countless articles and blogs and discussions with experts, here is my learning plan:
Per the recently conducted Stack Overflow Developer Survey 2018, Python stood out as the most-wanted programming language, meaning the developers who do not use it yet want to learn it the most. As one of the most widely used general-purpose programming languages, Python finds large applications when it comes to data science. Naturally, you get attracted to the best option available, and Python was the one for me.
The major reasons why I chose to learn Python over the other programming languages:
- Very easy to learn: Python is one of the easiest programming languages to learn. Not only is the syntax clean and easy to understand, even the most complex of data science tasks can be done in a few lines of Python code.
- Efficient libraries for Data Science: Python has a vast array of libraries suited for various data science tasks, from scraping data to visualizing and manipulating it. NumPy, SciPy, pandas, matplotlib, Seaborn are some of the libraries worth mentioning here.
- Python has terrific libraries for machine learning: Learning a framework or a library which makes machine learning easier to perform is very important. Python has libraries such as scikit-learn and Tensorflow that makes machine learning easier and a fun-to-do activity. To make the most of these libraries, it is important to understand the fundamentals of Python.
My colleague and good friend Aaron has put out a list of top 7 Python programming books which helped as a brilliant starting point to understand the different resources out there to learn Python. The one book that stood out for me was Learn Python Programming – Second Edition – This is a very good book to start Python programming from scratch.
- There is also a neat skill-map present on Mapt, where you can progressively build up your knowledge of Python – right from the absolute basics to the most complex concepts.
Another handy resource to learn the A-Z of Python is Complete Python Masterclass. This is a slightly long course, but it will take you from the absolute fundamentals to the most advanced aspects of Python programming.
Task Status: Ongoing
Learn the fundamentals of data manipulation
After learning the fundamentals of Python programming, the plan is to head straight to the Python-based libraries for data manipulation, analysis and visualization. Some of the major ones are what we already discussed above, and the plan to learn them is in the following order:
- NumPy – Used primarily for numerical computing
- pandas – One of the most popular Python packages for data manipulation and analysis
- matplotlib – The go-to Python library for data visualization, rivaling the likes of R’s ggplot2
- Seaborn – A data visualization library that runs on top of matplotlib used for creating visually appealing charts, plots and histograms
Some very good resources to learn about all these libraries:
- Python Data Analysis
- Python for Data Science and Machine Learning – This is a very good course with a detailed coverage on the machine learning concepts. Something to learn later.
The aim is to learn these libraries upto a fairly intermediate level, and be able to manipulate, analyze and visualize any kind of data, including missing, unstructured data and time-series data.
Understand the fundamentals of statistics, linear algebra and probability
In order to take a step further and enter into the foray of machine learning, the general consensus is to first understand the maths and statistics behind the concepts of machine learning. Implementing them in Python is relatively easier once you get the math right, and that is what I plan to do.
I shortlisted some very good resources for this as well:
Task Status: Ongoing
Learn Machine Learning (Sounds odd I know)
After understanding the math behind machine learning, the next step is to learn how to perform predictive modeling using popular machine learning algorithms such as linear regression, logistic regression, clustering, and more. Using real-world datasets, the plan is to learn the art of building state-of-the-art machine learning models using Python’s very own scikit-learn library, as well as the popular Tensorflow package.
To learn how to do this, the courses I mentioned above should come in handy:
- Stanford University – Machine Learning Course at Coursera
- Python for Data Science and Machine Learning
- Python Machine Learning, Second Edition
Task Status: To be started
As I start this journey, I plan to share my experiences and knowledge with you all. Do you think the learning path looks good? Is there anything else that I should include in my learning path? I would really love to hear your comments, suggestions and experiences.
Stay tuned for the next post where I seek answers to questions such as ‘How much of Python should I learn in order to be comfortable with Data Science?’, ‘How much time should I devote per day or week to learn the concepts in Data Science?’ and much more..