Mastering Azure Databricks Course

1) Introduction to BigData

  • What is Data?
  • What is Database?
  • What is BigData?
  • What are the challenges of BigData?
  • Why Traditional Databases Doesn’t handle Bigdata

2) Introduction to Hadoop

  • What is Hadoop?
  • How Hadoop overcome bigdata challenges
  • Hadoop Architecture
  • Hadoop Daemons
  • HDFS
  • YARN
  • MapReduce

3) Introduction to Spark

  • Spark Architecture
  • Spark internals
  • Spark RDD
  • Spark DataFrame
  • Spark Streaming

4) Introduction To Databricks 

  • What is Databricks?
  • Databricks Architecture
  • Working in Databricks workspace
  • Workign with Databricks notebook

5) Working with Databricsks FileSystem – DBFS

  • What is DBFS?
  • DBFS commands – mkdirs , cp , mv , head, put, rm , rmdir
  • How to handle multiple files in DBFS
  • How to process the files in DBFS
  • How to archive the files in DBFS

6) Databricks -Sparck Core

  • RDD Programming
  • Operations on RDD
  • Transformations- Narrow
  • Transformations -Wide
  • Actions
  • Loading Data and Saving Data
  • Key Value Pair RDD
  • Broadcast variables

7) Databricks – Spark-SQL- DataFrames

  • Creating Data Frames
  • DataFrames internal execution
  • Transformations using DataFrame API
  • Actions using DataFrame API
  • User-defined functions in Spark SQL

8) Databricks- Handle multiple file formats

  • CSV Data
  • JSON Data
  • parquet files
  • Excel files
  • ORC file format

9) Databricks utilities

  • credentials utility
  • FilSystem utility
  • Notebook utility
  • secrets utility
  • widgets utility

10) Databricks Cluster Management

  • Creating and configuring clusters
  • Managing Clusters
  • Displaying clusters
  • Starting a cluster
  • Terminating a cluster
  • Delete a cluster
  • Cluster Information
  • Cluster logs
  • Types of Clusters
  • All pupose clusters
  • Job cluster
  • Clusters Mode
  • Standard
  • High Concurrency
  • Autoscalling
  • Databricks runtime versions

11) Databricks – Batch Processing

  • Historical Data load
  • Incremental Data load
  • Date Transformations
  • Aggregations
  • Join Operations
  • window functions
  • union operations

12) Introduction to Azure

  • Azure Portal Walkthrough
  • What is Subscription?
  • What is a Resource Group?
  • What is a Resource?
  • Overview of Azure Resources / Services            
  • Azure Data bricks
  • BLOB Storage, Data Lake Storage  Gen2
  • Azure SQL Server, SQL Database
  • Key Vault

13) Databricks Integration with

  • Blob strorage storage
  • Azure Datalake storage gen2
  • Azure SQL Database
  • Synapse
  • Azure Keyvault

14) Databricks – Streaming API

  • What is streaming?
  • Process streaming using Pyspark API
  • Handling bad records
  • Stream data into Gen2lake
  • Load the data into Tables

15) Databricks – Lakehouse (Delta Lake)

  • Difference between Data lake and Delta Lake
  • Introduction to Deltalake
  • Features of DeltaLake
  • How to create delta table
  • How to DML operations in Delta Table
  • Merge statements
  • Handling SCD Type1 and Type2
  • Handling Data Deduplication in delta tables
  • Handling streaming Data in Delta lake

16) Databricks – Unity Catalog

  • what is Unity catalog
  • Creating access connector for databricks
  • creating metastore in unity catalog
  • Unity catalog object model
  • Roles in unity catalog
  • users and group management
  • unity catalog previleges
  • manages external tables in unity catalog

17) Workflows in Databricks

  • Introduction to workflows
  • Create, run and manage Databricks jobs
  • Schedule Databricks jobs
  • Monitor Databricks Jobs

18) Azure DevOps – Repos

  • What are DevOps Repos
  • Integrate databricks notebooks with Repos
  • Commit, Sync notebooks to and from Repos

19) SDLC and Agile methodology

20) End to End Data Migration Project from On Premises to Cloud.

21) Interview Questions 22) Mock Interviews