The data scientist role is fast becoming the most sought-after career in the technology world. Companies like Google, Facebook, Amazon and LinkedIn are using data scientists to help them maintain that innovative edge in the digital data era. And now data and technology enthusiasts are aspiring to become data scientists the same way some musicians aspire to become rock stars. Perhaps that's why some people are referring to data scientists as the new rock stars of the technology era.
Unfortunately, this role is still so new that there's still a level of obscurity about it, which means many wannabe data scientists are driving their tour buses down the wrong road. Do data scientists deserve their rock star reputations? We dive into the world of data science with an interview with Jake Porway, the data scientist from the R&D lab at The New York Times.
Data Scientists: Tech's Rock Stars?
So why are data scientists being referred to as the new rock stars of the technology world? This analogy actually goes deeper than data nerds' desire to sound ultracool. Just like a rock star, a data scientist's career includes diversity, artistic freedom and adaptability. And like the rock stars of the entertainment world, the best data scientists tend to gain quite a following of people from all walks of the data and technology industry.
What a data scientist does is very diverse; just as musicians use different instruments, tools and techniques to play musical styles that are as disparate as jazz and death metal, a data scientist also masters a particular tool and field. There's style involved, too. And there is no right or wrong way of doing the job either – it’s about the impact the work has on other people.
When the Beatles wrote their songs, there wasn’t just one person dictating how every note on every instrument was to be played. They came together and jammed; through creative discovery they found songs that worked. It's the same for data scientists. They have to feel the rhythm, get into the groove and harmonize a solution. This is only possible with the right amount of artistic freedom to try whatever approaches, tools and techniques might come to mind in the moment – and the agility to make changes when something seems out of key.
Once a data scientist masters the core fundamentals, he or she becomes adaptable, and gains the confidence to provide solutions in other fields. We talk more about these core fundamentals later. The point to make here is that once you master data science you can take the role to whatever field you want, because data is everywhere.
A data scientist’s ultimate goal is to create massive amounts of value for the largest number of people possible. While a data scientist works behind the scenes, it's not unlike playing to a large audience: The better you do the job, the more people you reach – and the more rewards you see.
Data Scientists Do What?
So what do data scientists do exactly? Let's go through this with an example that we all might be able to relate to.
Let's say you realize one day that you don’t have the same amount of energy in the day that you used to. So you set yourself a goal: to have more energy during the day. Now, that’s a pretty broad and ambiguous goal. So the first step as a data scientist is to remove some of that ambiguity and quantify this goal's measurability. There are methods for this. We won't go into the details here, but let's just say that you theorize that you are not getting enough sleep and therefore give yourself the sub-goal of getting eight hours of sleep each night.
Even though this goal is a bit more measurable and less ambiguous, it has its own challenges. You can't really start a timer once you fall asleep, and even if you start a timer after you hop into bed, you may not fall asleep straight away. In addition, it's hard to account for the times you wake up in the middle of the night. Finally, there are different types of sleep, such as deep sleep and light sleep. The bottom line is that it's difficult to measure sleep accurately and therefore even more difficult to measure its impact on your energy levels.
So what can you do? Well, as a data scientist you'd seek out the latest in technology and discover that there are sleep monitoring devices. And if you used such a device to measure and digitally record your sleep, you'd be able to get more accurate data about your sleep, and collect that data over time to plot out a graph.
This alone can give you greater insight into what's going on. The visual representation will give you awareness, clarity and direction. You will be able to see if you are reaching your goal of eight hours of sleep a night and, more importantly, be able to take action if you are not.
This is the basic job of the data scientist: to bring new ways of measuring and displaying data so that more awareness, clarity and direction is provided to those looking at it.
But a good data scientist doesn’t stop there. Once the data is collected, it can be integrated with any other measured activity that you do throughout the day. Integrate it with your productivity based on data from your task management system. Integrate it with your moods based on tweets and status updates. Integrate it with your health based on visits to the gym or weight loss. With the amount of data already available and the ease at which it can be captured, the possibilities are endless.
How to Be a Data Scientist
Interested in a career in data science? Because data science is so new, we asked a top data scientist for the insight into the field. Jake Porway is a data scientist at The New York Times and the founder of DataKind (originally known as Data Without Borders), which matches nonprofits in need of data science with freelance and pro-bono data scientists. Porway has a computer science background and a Ph.D. in statistics from UCLA. Here's what he had to say about how to get into data science, how to perform well, and how to avoid key mistakes in the field.
1. Get the Right Skills
According to Porway, getting into the field boils down to three key things:
- Practical computing skills
- Statistical skills
- A desire to learn
"You need to be able to write scripts to scrape data as well as code up the algorithms you come up with in your head," Porway says. "You should know your basic stats (and more, ideally) if you're going to really be able to assess whether the models you're building or algorithms you're writing are doing what you want."
2. Make Connections
Before joining The New York Times R&D lab, Porway worked in machine learning and computer vision, and spent a lot of time getting robots to identify landmines and fly planes (how cool is that?). It wasn't until he landed his job at The New York Times that he got to expand into broader data science tasks, namely Project Cascade, which tracks links from the publication across social media.
The most important thing to get in the field, Porway says, is to get learning.
"Get on a data science project!" Porway says. "Download some data, pick up some R [a language and environment for statistical computing and graphics], and start playing … I'd say to focus on using something like R alongside a basic stats book to guide you through exploring some data. The machine learning and computing skills will come with that (of course this depends on your past experience – if you're already a statistician, pick up some Python!)"
Then it's time to make some connections. Porway recommends a local meetup group – because being part of the data science community is "the fastest way to know what you don't know." And in a field that's constantly evolving, that matters.
3. Get In the Game
Porway has a Ph.D. in statistics from UCLA, but he stresses that you don't need one to do good work.
"It might help, but don't think you have to go off and do another five years of school to be able to call yourself a 'data scientist,'" Porway said.
Data science is a relatively new field. This means that those who want to get into the field need to approach it with an open mind.
"A data scientist at Foursquare is going to look a lot different from a data scientist at Goldman Sachs," Porway says.
4. Rock Your New Role
Data science is all about clarifying goals, examining assumptions, evaluating evidence and assessing conclusions. But there's one little piece of the puzzle many people overlook. Can you guess what it is? According to Porway, the secret ingredient is critical thinking.
"It really sets apart the hackers from the true scientists, for me," Porway says. "You'd be amazed at how many times I've seen someone build a model and report the results without realizing that they hadn't thought critically about where the data was coming from or if their experiment was designed correctly. You must must MUST be able to question every step of your process and every number that you come up with."
The Road to Big Data
Porway says when he realized the ability to use vast amounts of data to have machines teach themselves, it blew his mind. It's that passion – and his education and skills – that helped to land him a top job in data science. If you want to rock big data, hunker down with some books, download some data and start playing around. You never know what a pile of raw data will turn up.
For a full transcript of the interview, go to DataScientists.Net.