Big data is the order of the day — but ever since advanced storage media made it possible for us to compile much larger amounts of information, we’ve been trying to figure out how to effectively use all of that data to come up with insights and how to find the signal in the noise.
These six courses can help aspiring data scientists to get familiar with the cutting-edge methods and techniques in big data management.
- Introduction to Big Data — UC San Diego
- Big Data Specialization — UC San Diego
- Business Analytics Specialization — University of Pennsylvania
- Big Data Modeling and Management Systems — UC San Diego
- Exploring and Producing Data for Business Decision Making — University of Illinois
- Data Visualization with Tableau Specialization — UC Davis
This course takes students through the big data landscape and presents key terminology. It helps to show progress in the big data world, for example, through the use of Apache Hadoop and clustering, where big data became more manageable and data governance became more sophisticated.
- Introduction to three key sources of big data — people, organizations and sensors
- Focus on the “V’s”of big data — volume, velocity, variety, veracity, valence and value, and the importance of each one in big data models
- Big data process models for analysis
- Identification of key big data problems and solutions
- Explanation of big data models and how they scale
- Hands-on work with Apache Hadoop for data science as well as components like Yarn, HDFS and MapReduce
Duration: 16 hours (suggested: three weeks of study)
In this course, students will learn about decision-making with big data and how big data is manipulated to come up with insights.
The course leads into enterprise analysis for big data and the hands-on use of big data tools like Hadoop and others.
Students will learn to collaborate with others on data science projects, take initiative as self-starters on big data decision-making processes, and deal with large and complex data sets. (For more on big data, see Big Data Silos: What They Are and How to Deal With Them.)
- Introduction to Hadoop, MapReduce, Spark, Pig and Hive
- Provided code modules for predictive modeling
- Final project for skills application
Duration: five months (at seven hours per week)
This course promises the building of basic big data literacy and analytics insight. Students look at how data scientists work in the real world, with a focus on predictive analytics and industry-specific application of big data. Course instructors and planners help students to cultivate a “big data mindset” and work toward greater proficiency and competence with big data tools.
Get a look at how big data projects work in human resources, finance, and operations, as well as other key areas, with an emphasis on predictive analytics work.
- Transparency into real-world data science projects
- Focus on enterprise decision support
- Industry-specific tools and resources
- Final project for testing skills
Duration: three months (seven hours per week)
This course goes over some of the peripheral issues with big data science, as well as core tasks related to big data, for example, collection and storage of data and data organization.
The course also covers various types of data sets and management tools, and resources for each one, while showing how big data management benefits from analytical tools and practices.
Tutorials help to show how the data science work gets done and specific big data manipulation with various tools. The course also goes over some of the “industry track,” describing the proliferation of big data systems, so that students can bring thought leadership to the history of evolving big data work, understanding the key tools, vendors and players, and how various offerings compete.
- Applied techniques for skill building
- Discussion of tools such as AsterixDB, HP Vertica, Impala, Neo4j, Redis, and SparkSQL
- Differentiation between different data management systems
- Hands-on design work in the gaming industry
Duration: 16 hours (six weeks of study at 2-3 hours per week)
Some of the focus of this course will be on statistical models and big data summaries, as well as sampling and other big data analysis. Methodologies for the use of big data in various formats and platforms help with looking at statistical work in various settings.
In developing a unique instruction track, the course will not only guide students toward an understanding of the “what” (tools, resources, patterns, methods) but the “how” (how results are generated, and why that matters.) It’s a thoughtful look at big data in general with a distinctly quantitative approach, which can prepare a student for working in a more technical area of big data handling and analysis.
- Formatted use of big data sets
- Insight into sampling and how sampling supports decision-making
- The evaluation of various settings for big data
- Key focus on statistical analysis
Duration: 22 hours (four weeks of study 4 to 6 hours per week)
By focusing on one of the most popular key tools for business analysis, the Tableau platform, this course goes over data visualization and other elements of big data use for beginners. Looking at resources and libraries, students will evaluate real-world use cases to understand how to generate reports and use visual dashboards to work with big data.
The visualization aspect of this course is somewhat unique, and very much in demand as industry experts reveal the extent to which visual model in is useful in big data analysis. (For more on data visualization, see The Joy of Data Viz: The Data You Weren’t Looking For.)
- Focus on Tableau platform
- Data visualization component
- Journalistic examples for evaluation
- Final project to test skills
Duration: 4 months (6 hours per week)