12 Key Tips for Learning Data Science
Data scientists obviously need strong math and coding skills, but communication and other soft skills are also essential for success.
Data scientist ranks as the best job for 2019 in America on Glassdoor. With a median base salary of $108,000 and a job satisfaction rank of 4.3 out of 5, plus a fair number of openings predicted, that is not surprising. The question is: What does one have to do to get on track to qualify for this job?
To find out, we looked for the advice given to those who seek to get on this career track. Much comes down to the hard skills in coding and math. But that strong computation alone doesn’t cut it. Successful data scientists also need to be able to speak to business people on their own terms, which calls for the capabilities associated with soft skills and leadership. (To learn more about the duties of a data scientist, see Job Role: Data Scientist.)
Building the Educational Foundation: Three Primary Tips
Drace Zhan, a data scientist at NYC Data Science Academy, stresses the need for an educational foundation that includes the essentials of coding and math ability:
- R/Python + SQL. If you don’t have the coding skills, you need a lot of networking power and other areas to beef up this deficit. I’ve seen data scientists with weak math and little domain experience but they’ve always been carried by a strong ability to code. Python is ideal but R is a great fall back tool. It’s best to have both in your arsenal. SQL is also extremely important for a Data Analyst.
- Strong math skills. Having very good understanding of a few of the commonly used methods: generalized linear models, decision tree, K-means, and statistical tests is better than having a broad picture of various models or specialization such as RNN.
Those are central skills to build on, though some experts add to them. For example, a KDnuggets list includes the coding components Zhan mentioned and adds on some other useful things to know on the technical side, including the Hadoop platform Apache Spark, data visualization, unstructured data, machine learning and AI.
But if we take our cues from a survey on the most commonly used tools identified for use in real life by a Kaggle survey, we get somewhat different results. As you can see from the graph of the top 15 choices below, Python, R and SQL easily make the top three, but the fourth is Jupyter notebooks, followed by TensorFlow, Amazon Web Services, Unix shell, Tableau, C/C++, NoSQL, MATLAB/Octave and Java, all ahead of Hadoop and Spark. One more addition that may surprise people, is Microsoft’s Excel Data Mining.
The KDnuggets list also includes a tip regarding formal education. Most data scientists possess advanced degrees: 46 percent have PhDs, and 88 percent hold at least a master’s-level degree. The undergraduate degrees they possess are generally split among related areas. About a third are in math and statistics, which is the most popular for this career track. The next most popular is a computer science degree, held by 19 percent, and engineering, the choice of 16 percent. Of course, the technical tools particular to data science are often not studied in the degree programs but at specialized boot camps or through online courses.
More than Courses: Two More Tips
Hank Yun, a research assistant in the Pulmonary Department at Weill Cornell Medicine and student at NYC Data Science Academy, advises aspiring data scientists to plan around what they will work on and to find a mentor. He said:
Don’t make the mistake I made by telling yourself that you know data science because you took a course and received a certificate. That’s a great start, but when you start studying, go with a project in mind. Then find a mentor in the field and start a passion project right away! When you’re fresh, you don’t know what you don’t know so it helps when someone is there to guide you to what’s important to you and what’s not. You don’t want to be spending a lot of time studying with nothing to show for it!
Knowing Which Tool to Take Out of Your Toolbox: Tip to Stay Ahead of the Curve
Given the disparity in the ranking of data science tools, some may feel bewildered about what to focus on. Celeste Fralick, chief data scientist at security software company McAfee, addresses the issue in a CIO article that looks at the essential skills for a data scientist, declaring, “A data scientist needs to stay in front of the curve in research, as well as understand what technology to apply when.” That means not being lured by the “‘sexy’ and new, when the actual problem” requires something much more run-of-the-mill. “Being aware of the computational cost to the ecosystem, interpretability, latency, bandwidth, and other system boundary conditions – as well as the maturity of the customer – itself helps the data scientist understand what technology to apply.”
Essential Soft Skills: Another Six Tips
The point Fralick brings up relates to the nontechnical skills that the data scientist job requires. That’s why the KDnuggets list includes these four: intellectual curiosity, teamwork, communication skills and business acumen. Zhan also included key soft skills in his tips for data scientists, identifying “communication skills” like KDnuggets, but using “domain expertise” in place of “business acumen.” Whatever it is called, it refers to practical application of data science to the business. (To learn more about communication skills, see The Importance of Communication Skills for Technical Professionals.)
Olivia Parr-Rud offered her own spin on this, adding in two more soft skills, with an emphasis on the role of creativity, asserting, “I think of data science as an art as much as a science,” something that requires drawing on the strengths of both sides of the brain. “Many people talk about data science as a career that primarily uses the left-brain. I have found that to be successful, data scientists must use their whole brain.”
She explained that advancing in the field requires not just technical competency but creativity and the vision needed for leadership:
Most left-brain/linear tasks can be automated or out-sourced. To offer a competitive edge as data scientists, we must be able to recognize patterns and synthesize large quantities of information using both sides of our brain. And we must be innovative thinkers. Many of the best outcomes result from the integration of the left and right brain.
She also stressed why communicating a vision clearly is essential:
As data scientists, our goal is to use data to help our clients grow their profits. Most executives don’t understand what we do or how we do it. So we need to think like leaders and communicate our findings and recommendations in language that our stakeholders understand and trust.
The Data Dozen
The key tips incorporate a larger number of technical tools, skills, and capabilities, as well as less quantifiable qualities like aptitude for creativity and leadership. Ultimately, it’s not just a numbers game. As data science is not just about creating models in a vacuum but coming up with practical applications to solve real life problems for businesses, those who will succeed in the field need to not just master technology but to know their business domain and understand the needs of the various members of the team at work.
- What are some key ways to automate and optimize data science processes?
- What are some of the potential drawbacks to the gender imbalance in data science?
- What’s the difference between a data scientist and a decision scientis
- What’s the difference between a function and a functor?
- What is the biggest gap in widespread deployment of data science across businesses?