This five-day instructor-led course goes beyond basic Big Data concepts to give participants a head start with Hadoop. It will also cover data analysis using the Hadoop Ecosystem for data analysts, business intelligence specialists, developers, and system architects.
ASSOCIATED CERTIFICATION(S)
Upon completion of the course, participants can take the exam on Cloudera Certified Associate (CCA) Data Analyst, MapR: Certified Data Analyst (MCDA) or Hortonworks HDP Certified Developer (HDPCD): Pig and Hive. These certifications are great differentiators to establish yourself as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise.
Course Outline
Lesson 1: Basics of Big Data and Understanding Hadoop
- Why We Need Hadoop
- Why Hadoop Is in Demand in Market Nowadays
- Where Expensive SQL-Based Tools Are Failing
- Key Points: Why Hadoop Is a Leading Tool in the Current IT Industry
- Definition of Big Data
- Hadoop Nodes
- Introduction to Hadoop Release 1
- Hadoop Daemons in Hadoop Release 1
- Introduction to Hadoop Release 2
- Hadoop Daemons in Hadoop Release 2
- Hadoop Cluster and Racks
- Hadoop Cluster Demo
- New Projects on Hadoop
- How Open Source Tools Are Capable of Running Jobs in Less Time
- Hadoop Storage – HDFS (Hadoop Distributed File System)
- Hadoop Processing Framework (MapReduce/YARN)
- Alternatives to MapReduce
- Why NoSQL Is in High Demand Instead of SQL
- Distributed Warehouse for HDFS
- Hadoop Ecosystem and Its Uses
- Data Import/Export Tools
Lesson 2: Hadoop Distributed File System (HDFS) and Ingestion Tools
- Hadoop Installation
- Introduction to Hadoop FS and Processing Environment UIs
- How to Read and Write Files
- Basic Unix Commands for Hadoop
- Hadoop FS Shell
- Practical: Hadoop Releases
- Practical: Hadoop Daemons
Lesson 3: Pig Programming
- Pig UDFs
- Pig Use Cases
- Pig Assignment
- Complex Use Cases with Pig
- Real-Time Scenarios with Pig
- When to Use Pig
- When Not to Use Pig
Lesson 4: Hive Programming
- Introduction to Hive
- Meta Storage and Meta Store
- Introduction to Derby Database
- Hive Data Types
- HQL (Hive Query Language)
- DDL, DML, and Sub-languages of Hive
- Internal, External, and Temporary Tables in Hive
- Differentiation Between SQL-based Data Warehouses and Hive
Lesson 5: Advanced Hive Programming
- Hive Releases
- Why Hive Is Not the Best Solution for OLTP
- OLAP in Hive
- Partitioning
- Bucketing
- Hive Architecture
- Thrift Server
- Hue Interface for Hive
- How to Analyze Data Using Hive Scripts
- Differentiation Between Hive and Impala
- UDFs in Hive
- Complex Use Cases in Hive
- Hive Advanced Assignment
Lesson 6: Hadoop 2 and YARN
- How to load data streaming data without fixed schema
- How to load unstructured and semi structured data in Hadoop Introduction to Flume
- Hands-on on Flume
- How to load Twitter data in HDFS using Hadoop
- Introduction to Oozie
- How to schedule jobs using Oozie
- What kind of jobs can be scheduled using Oozie
- How to schedule jobs which are time based
- Hadoop releases From where to get
- Hadoop and other components to install
- Introduction to YARN
- Significance of YARN
Lesson 7: HCatalogue
- Introduction to NOSQL
- Why NOSQL if SQL is in market since several years
- Databases in market based on NOSQL CAP Theorem
- ACID Vs. CAP
- OLTP Solutions with different capabilities
- Which Nosql based solution is capable to handle specific requirements Examples of companies that uses NOSQL based databases
- HBase Architecture of column families
Lesson 8: Introduction to Spark Core
- Introduction to Spark
- Basics Features of SPARK and Scala available in Hue Why SPARK demand is increasing in market
- How can we use Spark with Hadoop Eco System Datasets for practice purpose
Lesson 9: Emerging Technologies in Big Data and Ecosystem
- YARN
- Emerging Technologies of Big Data
- Emerging use cases e.g. IoT, Industrial Internet, New Applications
- Certifications and Job Opportunities