- Published on
Are you a self-made data scientist? How did you do it?
Original Answer
My original answer to Quora question "Are you a self-made data scientist? How did you do it?" has been deleted due to Quora moderation policies. But I come across this question quite often from people trying to get into data science. So I am copy-pasting the original answer here - written in 2016.
Please bear in mind that this is an old answer and some of the aspects may not be relevant now.
I am a Mechanical Engineering graduate who had no prior knowledge about data science or for that matter even coding when I left my college six years back. Now I am working as a Lead Data Scientist in a reputed firm and also one of the top 25 Kaggle Data Scientists in the world.
Though I do not have formal background in CS or Statistics or Maths, I have a passion for crunching numbers and finding patterns right from my school days. I think anyone with a good passion for patterns and numbers coupled with right amount of hard work can become a self-made data scientist. Here is my path :
MOOC Courses
This played a major role and is the first place in my learning path. Courses which helped me understand the basics concepts are
- Introduction to Statistics by Edx - This is very good introductory course in Statistics which taught me the basic concepts
- Machine learning course in Coursera - A very famous course by Andrew N G which most people are aware of
- Analytics Edge course in Edx - This is again a very good course with a lot of practical examples
- Statistical Learning by Standford Online - This is again a very good course by which teaches the concepts of predictive modeling in detail with R codes. The curriculum of the course closely follows this book
Some other nice online courses which I came across are
- Data Science by Harvard Extension - This is a very good course for people wanting to learn the concepts using python.
- Data Science and Engineering using Apache Spark by Edx - This is a very useful course for people starting with big data analytics
- Learning from Data by CalTech - This covers the basic concepts of machine learning
- Neural Networks for Machine Learning by Coursera - Interested in knowing about the new boy (Deep Learning) in town. This course is the perfect place for that taught by none other than Geoff Hinton himself.
Once I get a fair understanding of the DS concepts from these courses, I was itching to use them somewhere. I was looking for options to test these theoretical skills. That is when I came across DS / ML competitions.
DS / ML Competitions
I came to know about Kaggle when I was searching for datasets to apply my learnings. I thought that I can ace the competitions easily since I have a good understanding of basic concepts. Poor me was not aware that hands-on is a different ball game from theory.
I started doing competitions on Kaggle but ended up at the bottom half of the table inspite of all the hard works. So once the competitions were over, I started looking at how others solved the problems from Kaggle Forums and blog. This is one important place where most of my learning took / taking place.
It also helped me hone my structured thinking on approaching the DS problems. It also helped me work on different real world datasets from different domains, each one challenging in its own way. When working deeper on these problems, I got new learnings every time and helped me improve myself further.
Doing Kaggle competitions at the first go might be daunting these days since the competition levels are quite high. So one can try to work on data science problems in other platforms like Analytics Vidhya Hackathons, Crowdanalytix, Driven Data etc before trying out on Kaggle to gain some confidence.
Other Sources
Apart from MOOCs and DS competitions, two important sources that helped me with my learning and understanding of this space are
I follow these two blogs to update my knowledge and to keep up myself to the advancements in the field. Other resources which I found to be helpful are
- Data Science Central
- WildML blog
- Analytics India Magazine (To understand the happenings in India)
- MLWave blog
- FastML blog
Hope this helps other budding self-made data scientists.!
Update in 2021:
There are several new courses, hackathon platforms, blogs that have come up after the original answer. I am listing down some of them which were / are helpful for my learning.
MOOC Courses
Some more good MOOC courses for the beginners are:
- ML Course AI - an open machine learning course by ods.ai which has a good balance between theory and practice.
- Practical Deep Learning for Coders - FastAI - good course on deep learning based on the top-down learning approach.
- Natural Language Processing with Deep Learning - good course on natural language processing that covers the latest advancements in the field
- MLOps course - Made with ML - course that covers the end to end machine learning pipeline. If one wants to learn the whole machine learning project workflow, this is one of the best courses ot there. This is continuously being updated as well.
DS / ML Competition platforms
Some of the well known platforms for data science hackathons are
- Kaggle
- Driven Data
- Analytics Vidhya DataHack Platform
- Zindi
- CodaLab
- BienData
- Machine Hack
- TechGig Hackathons
Blogs
Some more additional good blogs are
- Towards DataScience - a medium publication sharing concepts, ideas and codes about data science
- Blog by Jay Alammar - a good blog that helps us understand the machine learning concepts through visualizations
- Machine Learning Mastery - a very good blog by Jason Brownlee which helps to understand the concepts with python codes
- Applying ML - blog focussing on applying machine learning in the organizations. A good place to learn more about end to end machine learning
- Blog by Eugene Yan - a good blog on how to design, develop, build and operate machine learning systems at scale