20775 Performing Data Engineering on Microsoft HD Insight

This 5-day course focuses on giving the participants the ability to plan and implement big data workflows on HDInsight.

Overview

This 5-day course focuses on giving the participants the ability to plan and implement big data workflows on HDInsight.

Prerequisites

In addition to their professional experience, participants who attend this course should have:

  • Programming experience using R, and familiarity with common R packages
  • Knowledge of common statistical methods and data analysis best practices
  • Basic knowledge of the Microsoft Windows operating system and its core functionality
  • Working knowledge of relational databases

Who Should Attend?

This course is recommended for data engineers, data architects, data scientists, and data developers who plan to implement big data engineering workflows on HDInsight.

Course Outline

  • Big Data
  • Hadoop
  • MapReduce
  • HDInsight

Lab: Querying Big Data

  • HDInsight cluster types
  • Managing HDInsight Clusters
  • Managing HDInsight Clusters with PowerShell

Lab: Managing HDInsight clusters with the Azure Portal

  • Non-domain Joined clusters
  • Configuring domain-joined HDInsight clusters
  • Manage domain-joined HDInsight clusters

Lab: Authorizing Users to Access Resources

  • HDInsight Storage
  • Data loading tools
  • Performance and reliability

Lab: Loading Data into HDInsight

  • Analyze HDInsight logs
  • YARN logs
  • Heap dumps
  • Operations management suite

Lab: Troubleshooting HDInsight

  • Apache Hive storage
  • Querying with Hive and Pig
  • Operationalize HDInsight

Lab: Backing Up SQL Server Databases

  • What is Spark?
  • ETL with Spark
  • Spark performance

Lab: Design Batch ETL solutions for big data with Spark

  • Implement interactive queries
  • Perform exploratory data analysis

Lab: Analyze data with Spark SQL

  • Implement interactive queries for big data with interactive hive.
  • Perform exploratory data analysis by using Hive
  • Perform interactive processing by using Apache Phoenix

Lab: Analyze data with Hive and Phoenix

  • Stream analytics
  • Process streaming data from stream analytics
  • Managing stream analytics jobs

Lab: Implement Stream Analytics

  • Dstream
  • Create Spark structured streaming applications
  • Persistence and visualization

Lab: Spark streaming applications using DStream API

  • Persist long term data
  • Stream data with Storm
  • Create Storm topologies
  • Configure Apache Storm

Lab: Developing big data real-time processing solutions with Apache Storm

  • Implement interactive queries
  • Perform exploratory data

Lab: Analyze data with Spark SQL

Get Pricing and Brochure

More Like This

Get the course Brochure & Pricing

Our course consultant will contact you within 1 working day

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Get in touch with our consultant