Key Data Science Concepts All IT Pros Should Know

Learning the Languages and Skills

Learning Python

People who want to master Python have a number of options. Self-motivated individuals can learn on their own by referring to books, YouTube tutorials, and self-directed practices. Those who want more instruction and direction can sign up for courses either in colleges or at specialized coding schools. Both of these will often include an online option.

There are some beginner level courses that are free. However, typically, there is some charge for the more advanced courses, as well as the ones that offer a certificate that can be added to a resume.

A thorough course will not just provide instruction in the language itself, but in supplementary packages. That means that individuals who complete a Python data science course shouldn’t just learn the basics of Python coding, but also the following:

  • An in-depth understanding of data science processes, data wrangling, data exploration, data visualization, and hypothesis building and testing, including knowledge on how to install the Python environment and its auxiliary tools and libraries
  • Understanding and application of the concepts of Python and associated packages, including NumPy, SciPy, Pandas, Scikit-Learn and the matplotlib library
  • Expertise in machine learning and natural language processing with open-source Jupyter Notebooks
  • Knowledge of how to use web scraping to extract useful data from websites
  • Insight on how to integrate Python with Hadoop, Spark and MapReduce

While not all of the above are listed explicitly among the top skills in the Kaggle survey, they are generally considered part of the the data science toolkit. There also is some overlap in the purpose of the skills identified on Kaggle and the one obtained in Python courses. Tableau, for example, is a data visualization tool that some data scientists use, though the thoroughly trained ones will also master other tools to be used according to the requirements of the particular project they are working on.

Learning R

A number of data science programs will teach only Python because of its immense popularity. There are also some that offer a course of study based only on R. But for the person who aspires to really be as well-grounded as possible, a mastery of both is the way to go. As Drace Zhan, a data scientist at NYC Data Science Academy observed in 12 Key Tips for Learning Data Science, “Python is ideal but R is a great fall back tool. It’s best to have both in your arsenal.”

For those who are not enrolled in a university or data science course of study, there are additional options recommended in The 5 Most Effective Ways to Learn R. They include taking an online course, reading books, watching instructional videos and reading blogs. It particularly recommends the following:

Learning SQL

Zhan considers SQL to be “extremely important for a Data Analyst.”

There are a number of free or very low cost online courses available on the subject. Javarevisited recommends five options.

One is courses from Udemy, particularly, Complete SQL Bootcamp. The second is SQLZOO, which is described as “the most popular website for learning SQL online.” The third is a free SQL course provided by Stanford University. The fourth is Khan Academy’s “Intro to SQL: Querying and managing databases.” The fifth is SQL Bolt, which is presented as a very good bet even for those with no coding background. It offers “20 lessons starting from a basic SQL query to more advanced and confusing Join queries, aggregation, filtering and dealing with nulls.”

Rounding Out the Technical Skills

Zhan added that math skills enter into thorough comprehension of popular data science techniques, including “generalized linear models, decision tree, K-means, and statistical tests.”

Most of the rest of the top-ranked Kaggle skills are included in the applications data scientists learn in the course of mastering Python or R or languages unto themselves that can be studied formally in school, online or through the self-directed study means discussed for R. The same goes for Excel, though it isn’t a language but a component of the Microsoft Office Suite. While it isn’t regarded as a true data science tool, it likely is used by businesses because it is familiar to people working there and has some built-in visualizations tools. Many people learn Excel in college or just by working with it on the job and checking for tutorials on techniques.

But That’s Not All

There are still other skills entailed in being a data scientist, though. We’ll explore those in the last section.


Share this:
Written by Ariella Brown
Profile Picture of Ariella Brown

As a technology writer, Ariella Brown has covered 3-D printing, analytics, big data, bitcoin, cloud computing, green technology, marketing and social media. She holds a Ph.D. in English and taught college level writing before becoming a full-time writer, editor, and social media consultant. Her best social media outlet of choice is Google+. Links to her portfolio, blogs, favorite quotes, and photos can be found at writewaypro.weebly.com.

 Full Bio