Explore how to utilize Apache Spark and high-performance clusters on the Azure Databricks platform to handle large-scale data engineering workloads in the cloud.
Course Outline
Lesson 1: Explore Azure Databricks
- Get started with Azure Databricks
- Identify Azure Databricks workloads
- Understand key concepts
Lesson 2: Use Apache Spark in Azure Databricks
- Get to know Spark
- Create a Spark cluster
- Use Spark in notebooks
- Use Spark to work with data files
- Visualize data
Lesson 3: Use Delta Lake in Azure Databricks
- Get Started with Delta Lake
- Create Delta Lake tables
- Create and query catalog tables
- Use Delta Lake for streaming data
Lesson 4: Use SQL Warehouses in Azure Databricks
- Get started with SQL Warehouses
- Create databases and tables
- Create queries and dashboards
Lesson 5: Run Azure Databricks Notebooks with Azure Data Factory
- Understand Azure Databricks notebooks and pipelines
- Create a linked service for Azure Databricks
- Use a Notebook activity in a pipeline
- Use parameters in a notebook