So what's involved in becoming a good data scientist? One way to think about data scientists is that they are the curators and analysts of large data sets. Essentially, there are four fundamental concepts to become familiar with if you want to start building a career as a data scientist: statistics, programming, critical thinking skills and people skills.
First, you will need to have a solid grounding in mathematics and statistics. This knowledge will help you understand the general nature of different types of machine learning (ML) algorithms – for example, decision trees, random forests and regular regression and clustering algorithms. You’ll also need to be familiar with the vocabulary used in training machine learning algorithms. For example, you should be able to explain the difference between supervised and unsupervised learning as well as the difference between overfitting and underfitting a machine learning model. If you have experience with statistical analysis, you already have a head start.
Next, you’ll need to know something about the coding side of data science and get some experience working in programming languages like Python and R. You’ll also need to know about the back-end technology and applications that support the operational requirements of AI and ML applications. Whereas Apache Hadoop used to be the gold standard for working with big data sets, all sorts of alternatives have since emerged, from Kubernetes containers to platforms like Pachyderm.
In addition to the technical requirements above, you’ll need strong critical thinking skills. This is important because it will help you choose the right algorithm to carry out a specific machine learning task. And of course, you should understand the logic behind how ML and artificial intelligence programming works. Without a formal understanding of logic, it will be exponentially more difficult for you to understand the role of dimensionality when building models or the importance of testing and continuously verifying the accuracy of an AI model in order to avoid bias.
The fourth concept you’ll need to master involves the people-facing side of the job. Good data scientists have what people refer to in medical doctors as a ‘good bedside manner.’ This simply means that if you want to be a good data scientist, you’ll need to know how to break down complex concepts and turn complicated explanations into simpler translations for other people. When a data scientist has good people skills, they can communicate more effectively with the full stack developers, DevOps engineers and line-of-business (LOB) folks they work with. In any aspect of data engineering, this skill is important – it may sound simple, but it is not easy to change the mathematical analysis of large data sets into a narrative that can be understood by a wide audience.
All four of these skills help to create the consummate professional data scientist in a highly complexified IT world where the next big thing is always around the corner. Good luck!