The Technical Skills Needed by a Data Scientist and how to Acquire Them
What are the technical skills of a data scientist? The answers to this question do vary. In the interest of taking cues from real life rather than just from curricula, we’ll look at the 15 skills that a Kaggle survey identified as the most used in the field.
As you can see from the graph of the top 15 choices below, Python is far ahead the top skill, claiming more than 76%. Second place goes to R at just under 60%. SQL is somewhat behind that, coming in under 54%. There’s also a significant showing for fourth place, which goes to Jupyter Notebooks with just over 40%.
The shares then drop below 29% for TensorFlow, followed by Amazon Web Services, which is just a bit ahead of Unix shell, as both top 23%. The categories that follow are all hovering under the 20% mark, and that encompasses: Tableau, C/C++ and NoSQL. The next two that are very close are MATLAB/Octave and Java, both with over 18%. Likewise, Hadoop/Hive/Pig is just barely ahead of Spark, with both just over 17%. One more skill just makes the cut, and that’s Microsoft’s Excel Data Mining, a tool used by a bit under 14%.
Python Is #1
Python’s pride of place in data science is established not just by Kaggle but by other surveys, as attested to by the attention it has been getting in media. Last year, for example, The Economist ran the headline “Python is becoming the world’s most popular coding language.” Even though C++ has had a type of resurgence, as reported by Tech Republic, Python still retains pride of place for data science. In fact, a recent Dice article reported, “Python is on the precipice of becoming the programming language to know if you want a well-paid engineering job on Wall Street.”
So clearly Python is very important, and, as we’ll see in the next section on what a Python course includes, it encompasses some of the other skills that made the list, as well.