DP-203 Data Engineering on Microsoft Azure

In this course, the student will learn about the data engineering as it pertains to working with batch and real-time analytical solutions using Azure data platform technologies.

Overview

In this course, the student will learn how to implement and manage data engineering workloads on Microsoft Azure, using Azure services such as Azure Synapse Analytics, Azure Data Lake Storage Gen2, Azure Stream Analytics, Azure Databricks, and others. The course focuses on common data engineering tasks such as orchestrating data transfer and transformation pipelines, working with data files in a data lake, creating and loading relational data warehouses, capturing and aggregating streams of real-time data, and tracking data assets and lineage.

Prerequisites

Successful students start this course with knowledge of cloud computing and core data concepts and professional experience with data solutions.

Specifically completing:

Course Days

4 Days

Course Outline

    • What is data engineering
    • Important data engineering concepts
    • Data engineering in Microsoft Azure
  • Understand Azure Data Lake Storage Gen2
  • Enable Azure Data Lake Storage Gen2 in Azure Storage
  • Compare Azure Data Lake Store to Azure Blob storage
  • Understand the stages for processing big data
  • Use Azure Data Lake Storage Gen2 in data analytics workloads
  • What is Azure Synapse Analytics
  • How Azure Synapse Analytics works
  • When to use Azure Synapse Analytics
  • Exercise – Explore Azure Synapse Analytics
  • Understand Azure Synapse serverless SQL pool capabilities and use cases
  • Query files using a serverless SQL pool
  • Create external database objects
  • Transform data files with the CREATE EXTERNAL TABLE AS SELECT statement
  • Encapsulate data transformations in a stored procedure
  • Include a data transformation stored procedure in a pipeline
  • Exercise – Transform files using a serverless SQL pool
  • Understand lake database concepts
  • Explore database templates
  • Create a lake database
  • Use a lake database
  • Exercise – Analyze data in a lake database
  • Get to know Apache Spark
  • Use Spark in Azure Synapse Analytics
  • Analyze data with Spark
  • Visualize data with Spark
  • Exercise – Analyze data with Spark
  • Modify and save dataframes
  • Partition data files
  • Transform data with SQL
  • Exercise: Transform data with Spark in Azure Synapse Analytics
  • Understand Delta Lake
  • Create Delta Lake tables
  • Create catalog tables
  • Use Delta Lake with streaming data
  • Use Delta Lake in a SQL pool
  • Exercise – Use Delta Lake in Azure Synapse Analytics
  • Design a data warehouse schema
  • Create data warehouse tables
  • Load data warehouse tables
  • Query a data warehouse
  • Exercise – Explore a data warehouse
  • Load staging tables
  • Load dimension tables
  • Load time dimension tables
  • Load slowly changing dimensions
  • Load fact tables
  • Perform post load optimization
  • Understand pipelines in Azure Synapse Analytics
  • Create a pipeline in Azure Synapse Studio
  • Define data flows
  • Run a pipeline
  • Exercise – Build a data pipeline in Azure Synapse Analytics
  • Understand Synapse Notebooks and Pipelines
  • Use a Synapse notebook activity in a pipeline
  • Use parameters in a notebook
  • Exercise – Use an Apache Spark notebook in a pipeline
  • Understand hybrid transactional and analytical processing patterns
  • Describe Azure Synapse Link
  • Enable Cosmos DB account to use Azure Synapse Link
  • Create an analytical store enabled container
  • Create a linked service for Cosmos DB
  • Query Cosmos DB data with Spark
  • Query Cosmos DB with Synapse SQL
  • Exercise – Implement Azure Synapse Link for Cosmos DB
  • What is Azure Synapse Link for SQL?
  • Configure Azure Synapse Link for Azure SQL Database
  • Configure Azure Synapse Link for SQL Server 2022
  • Exercise – Implement Azure Synapse Link for SQL
  • Understand data streams
  • Understand event processing
  • Understand window functions
  • Exercise – Get started with Azure Stream Analytics
  • Stream ingestion scenarios
  • Configure inputs and outputs
  • Define a query to select, filter, and aggregate data
  • Run a job to ingest data
  • Exercise – Ingest streaming data into Azure Synapse Analytics
  • Use a Power BI output in Azure Stream Analytics
  • Create a query for real-time visualization
  • Create real-time data visualizations in Power BI
  • Exercise – Create a real-time data visualization
  • What is Microsoft Purview?
  • How Microsoft Purview works
  • When to use Microsoft Purview
  • Catalog Azure Synapse Analytics data assets in Microsoft Purview
  • Connect Azure purview to an Azure Synapse Analytics workspace
  • Search a Purview catalog in Synapse Studio
  • Track data lineage in pipelines
  • Exercise – Integrate Azure Synapse Analytics and Microsoft Purview
  • Get started with Azure Databricks
  • Identify Azure Databricks workloads
  • Understand key concepts
  • Exercise – Explore Azure Databricks
  • Get to know Spark
  • Create a Spark cluster
  • Use Spark in notebooks
  • Use Spark to work with data files
  • Visualize data
  • Understand Azure Databricks notebooks and pipelines
  • Create a linked service for Azure Databricks
  • Use a Notebook activity in a pipeline
  • Use parameters in a notebook
  • Exercise – Run an Azure Databricks Notebook with Azure Data Factory

Get Pricing and Brochure

More Like This

Get the course Brochure & Pricing

Our course consultant will contact you within 1 working day

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Get in touch with our consultant