So, what’s the buzz all about with Data Science and Machine Learning. Seems like every other developer is fascinated by these two terms. If you ask an undergrad, who just came out of a 4-year college degree, the first thing that comes out of their mouth is “Data”.
So, we thought to break it down for you. And also explain how to start your Data Science and Machine Learning career.
Let’s first understand the terminologies:
What is Data Science?
“Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from noisy, structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains.”
What is Machine Learning?
“Machine Learning is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as training data, to make predictions or decisions without being explicitly programmed to do so.”
Example of Data Science and Machine Learning
Let’s break it down with an example from the credit card industry.
Major Bank ABC.com has collected a lot of customer transactions over the years. The data is messy and not easy to understand. In comes a data analyst, Mark, who looks at this huge semi-structured data. Mark cleans the data, applies some models, and figures out that 5% of the transactions are fraudulent. These fraudulent transactions are causing not only reputational damage for the ABC.com bank but also a monetary loss of over USD 50million annually. ABC.com risk management team hires Samantha as a Data Scientist to develop a machine learning model based on the findings from Mark. Samantha develops a model to take in the data from Mark and makes predictions to prevent the fraud before it even happens. What’s more, Samantha builds logic that the code/model trains itself by feeding off of the transactions as it happens in real-time.
In the above example, Mark’s actions are closely related to that of a Data Analyst who preps the data and develops actionable insights from it – this is the data science part of it. Samantha on the other hand works as a Data Scientist who develops machine learning models to prevent frauds from happening.
Now, this brings us to three roles that are often thrown around in the data realm and job postings.
Difference between Data Analyst, Data Scientist, and Data Engineer
Does this sound familiar?
|Data Analyst||Data Scientist||Data Engineer|
|Average Entry Level Salary (USD) |
*as per payscale
|Role Example||Look at the company or industry data and use it to answer business questions. Also, communicate the answers to other teams in the company to be acted upon.||Job Role of Data analysts, and also build machine learning models to make accurate predictions near real-time & future based on past data.||Manages a company’s data infrastructure.|
|Technical Skillset Needed||Python or R Programming Language including the use of popular packages SQL queries Data cleaning Data visualization Probability and statistics||Supervised and unsupervised machine learning methods Working knowledge of Statistical models Advanced Python or R, and potentially familiarity with other tools like Apache Spark||Building data pipelines Advanced SQL skills Working knowledge of relational and non-relational databases.|
Now that you understand the roles and the buzz words, let’s clear out some misconceptions.
Here is what we hear more often than not –
Knowledge of programming, machine learning, statistics, etc. is enough to get a data science job – FALSE
Oftentimes individuals think that knowledge of R, Python, statistics, and theoretical knowledge of some commonly used deep learning, machine learning, and neural networks algorithms, is enough to start your data science career in a top technology company. While this is not impossible, but these individuals are far from few.
Being a data scientist is not an easy job. Employers are looking for individuals who are self-starters and who have at least taken a step in the right direction. In this competitive landscape, you need to develop a data science portfolio to showcase your analysis skills and technical skills. So, the next obvious question is HOW?
How to start your data science career?
Here are the top 10 ways on how to start your data science and machine learning career:
- Learning a programming language – Python, R are the most common languages used by all three roles described above. The level of expertise is based on what job role you want to apply for. For example, being an expert in R is more suited for a data scientist than a data engineer.
- Database skills – Relational and NoSQL databases knowledge along with extreme comfort with writing SQL queries is very important. After all, you are dealing with data daily. Your technical competence with database skills is very important.
- Learn the components of the data science life cycle – Without going into many details, individuals should learn the different components of the data science life cycle:
- Business requirements
- Data discovery
- Data processing
- Data exploration
- Predictive modeling
- Testing model
- Communicating results
- Practical Applications – Now that you have the technical skills you need to put these into action. Below are some of the ways you can build your data science portfolio:
- Kaggle competitions are an excellent way to practice your skills without coming up with the problem yourself. Kaggle has a huge inventory of open-sourced data sets that you can use to your benefit as you deem fit. Their data science competitions are involved and would benefit you sharpen your skills.
- If you aren’t on Github create a profile today. The projects that you create, make them open-sourced, and request folks to contribute and/or to try out your applications. You can connect with a lot of like-minded people on Github. The best part is you can also contribute to many open-source projects if you so desire.
- Use open-sourced data sets to your advantage – Add projects that will demonstrate your skills in data cleaning. Data collecting, cleaning, prep, and transformation is important part of a data science job. There are multiple websites you can go to get free datasets to enhance your Github portfolio for analysis. Some dataset examples to build your data science portfolio :
- Use open-sourced data platforms to your advantage – Below is a list of other important open data portals and platforms that permit users to access open data quite easily, study the impact and glean valuable insights.
- Soft skills – We wrote an entire blog on why soft skills are necessary for developers to succeed in an IT field. Check it out here (Why Soft Skills are essential to succeed).
- Become a good storyteller – Once you have derived some useful information from your data science projects, it is now crucial to tell the right story that convinces the end-user to take an action on it. Interesting analysis helps with a compelling narrative. Furthermore, beautiful visualizations help with delivering the message of the analysis performed.
- Start with an online course – If all of this sounds too complicated, learning paths suggested in a good online course is the way to go about it. Check out these courses below to get started:
- Video Course – Learn Python Programming from A-Z
- Video Course – Learn Data Science and Machine Learning with R from A-Z
- Learn Cloud – The advent of cloud (AWS, Azure, Google Cloud) has made it easy for anyone to pick up machine learning. Out-of-the-box solutions from AWS, GCP, and Azure Machine Learning cut down on the learning curve drastically. Few courses that can help you get started:
While it can take years to create a strong portfolio and be a data science expert, it is important to be agile and make changes to your learning journey as your progress. Be persistence in your learning process and the best way to do that is by creating a habit. Good Luck!
Closing this blog by a quote from Steve Jobs – Stay Hungry, Stay Foolish!
Author: Haman Sharma is a technology enthusiast. You can connect with him on LinkedIn.