According to Wikipedia, "data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics." In other words, data science combines different methods of analysis of large data volumes and provides tools and artificial intelligence (AI) applications to facilitate and visualize data analyses. Since early 2010s, data science is considered to be one of the most prospective, promising and highest paid jobs in IT.
Data science is closely related to machine learning and is defined as a field of computer science that gives computers the ability to learn without being explicitly programmed (according to Arthur Samuel).
As such, the ultimate goal of machine learning is to teach computers and devices to solve different complex tasks using algorithms that can learn from data and make predictions on data and that operate by designing a model from sample inputs to enable data-driven business intelligence and other outputs rather than following static program instructions. Human face and objects recognition, self-driving car technologies, IBM Watson, text content understanding, voice recognition, sales prediction, books and movies recommendations based on user behavior and preferences - these are just some of machine learning applications in real life.
Although machine learning algorithms enable intellectual, or smart data analysis and AI applications for data science, its use cases and formats of processed data are a way broader.
So, what do data scientists and machine learning engineers do?
Data scientist
If we take a look at most of job openings today, we'll see a huge variety of tasks and responsibilities assigned to data scientists / Big data specialists by different companies. Yet, we can distinguish some of the common requirements for data science specialists.
Standard Responsibilities:
- Highlight, aggregate and synthesize data from various structured and unstructured sources
- Explore, develop and apply intellectual learning to real-world data, draw important conclusions and create relevant use cases based on them
- Analyze and present data collected by your organization through different sources
- Design, build and deploy new processes for data modeling and analysis
- Create prototypes, algorithms, and predictive modeling
- Fulfill data analysis requests and report outcomes to respective organizational departments
Domain / Industry Specific Responsibilities:
- Discrete mathematics, statistics and statistical analysis
- Machine learning algorithms
- Data warehousing skills (relational and non-relational databases), SQL and other query language
- Data analysis and modeling tools:
- R
- Python (NumPy/SciPy)
- MATLAB
- SPSS/SAS
- Hadoop and relevant technologies including Pig, Hive, etc
- Java
- Data discovery and visualization
- Domain knowledge and subject matter expertise (of critical importance!)
- Strong social and communicative skills
Note that data scientist isn't supposed to program; knowledge of Matlab, SPSS and SAS would suffice in most of cases. Therefore, this job is often sought by business analysts and data analysts rather than software developers. However, additional skills such as software programming, Python, Java, Hadoop and data warehousing are appreciated and add up 5% to 14% to the average salary, according to Payscale.com.
As such, data scientist's job can be interesting for both programmers and applied math / statistics specialists.
Salaries
Here're average data scientist salaries per country (3+ years of experience, gross and per annum):
Ukraine: $18,000 - $30,000
United States: $60,408 - $141,500
United States (Chicago): $55,000 - $125,000
United Kingdom: $40,000 - $60,000
Germany: $70,000 - $91,000
Norway: $64,000 - $80,000
Machine Learning (ML) Engineer
Compared to data science, machine learning is a more technical job that's rather close to a classical software engineering. Machine learning has more in common with software development than data science.
Required skills and responsibilities:
- Strong skills in one or several programming languages (e.g. Java, R and / or Python) and databases (SQL, Hadoop)
- Smaller focus on data analytics and larger focus on machine learning algorithms
- Data modeling (Matlab, SPSS и SAS)
- Ability to use available libraries for different stacks such as Mahout, Lucene for Java, NumPy/SciPy for Python
- Ability to build distributed applications using Hadoop and other solutions
Additionally, you may need:
- Skills in Natural Language Processing (NLP), Computer Linguistics, Sentiment Analysis for text processing, understanding and assessment
- Computer Vision for image and video recognition
- Digital Signal Processing for work with sounds, sensor data and other signals
- Skills in building Recommender Systems
Such requirements are a rare case when it comes to data scientists.
As you see, ML engineer's job requires skills and knowledge of software engineering and, thus, is perfectly suited for experienced developers. Average programmers are often challenged with machine learning tasks in the course of project development and that's how they migrate to the machine learning domain.
Pros and Cons
First and foremost, it's very exciting to build apps that go beyond applied programming. This job makes your brain work faster and more efficiently by having you conduct numerous experiments, read scientific journals, and seek non-trivial solutions when trying to reach your goal. And let's point out that the outcome isn't always positive.
"Normally, we, developers, have to write "if - then - else" business logic cases which help software products work much faster than a human brain. That's because a PC's computational power outperforms that of human beings. However, using this method, we can't build a program that would be smarter than its developer. In data science, we build systems that are by far smarter than us. We teach them to become teachable and make autonomous decisions based on data analysis results. So, building systems that are smarter than humans is a magic that attracts me most when it comes to data science," Nick Z., data scientist at Intersog.
Second, machine learning allows companies and startups build smart products and services that provide users with an unprecedented opportunity to have more mature tools for task resolution; no standard software development methods can do so. It explains why non-IT businesses are leaning towards data science and machine learning. Companies and their ambience generate huge amounts of data which can give a significant competitive advantage. As such, machine learning specialists are in high demand today, especially in the developed economies, and their current supply doesn't meet the global demand.
That's how average salaries of software engineer, data scientist and machine learning specialist compare in the United States:
Software engineer: $102K per annum
Machine learning specialist: $112K per annum
Data scientist: $117K per annum
How to start a career in machine learning / data science?
If you're interested in pursuing a career in ML or data science and don't know where to start, here're some tips from Intersog:
- Consider taking a virtual course in ML and statistical learning on Coursera or Udacity (e.g., by Stanford, UoW, Harvard, Johns Hopkins), etc
- Learn R using these sources: swirlstats.com, tryr.codeschool.com, Computing for Data Analysis by Johns Hopkins, R Tutorial
- Learn statistics using the following resources: Statistical Learning by Stanford, Statistics by UoT, Statistics One by Princeton
In addition, there're a lot of free and paid books and resources on the above.
After you've gained skills in R and have understood the building blocks of data analytics and ML, try to improve your skills by participating in Kaggle's data science competitions.
Also, consider attending local events and conferences pertaining to data science and machine learning and learn from subject matter experts.