Azure Data Engineer Course Content

(Azure Data Factory + Azure Databricks + Azure Synapse Analytics + Microsoft Fabric)

New Batch Details

Course	Date	Timings	Duration	Trainer	Training Options
Azure Data Engineering Course	27-June-2026	7:00 AM - 8:00 AM IST	3 Months	Mr. Srinivas (13+ Yrs Exp)	Online/Offline

Overview of Cloud

1) Basics of Cloud computing

What is Cloud?
Types of Cloud deployment models
1. Private Cloud
1. Public Cloud
1. Hybrid Cloud
Types of Cloud Services
1. IaaS – Infrastructure as a Service
1. PaaS – Platform as a Service
1. SaaS – Software as a Service

2) Cloud computing Platforms / Vendors

Azure
AWS – Amazon Web Services
GCP – Google Cloud Platform etc

3) Introduction to Azure

4) Azure Portal Walkthrough

What is Subscription?
What is a Resource Group?
What is a Resource?

5) Overview of Azure Resources / Services

Data Factory
Azure Data bricks
BLOB Storage, Data Lake Storage Gen1 and Gen2
Azure SQL Server, SQL Database
Key Vault
Function App
Logic Apps

6) Introduction to BigData

What is Data?
What is BigData?
Data Sources of Big Data?
Characteristics of BigData
Variety, Velocity, Volume, Veracity, Value
Types of Data
1. Structured Data
1. Semi-structured Data
1. Unstructured Data

7) Python Basics

Variables
DataTypes
Operators
Collections
Functions
Packagea and Modules

8) Basics of SQL

DQL Commands (select)
DDL commands (create, alter, drop , truncate)
DML Commands (insert , update, delete, merge)
Joins
Window functions
Aggregate functions

9) Over View of Azure Storage Accounts

Types of strorage accounts
Blob storage
Access Tiers
Data Replication Policies
Azure Data Lake Storage Gen2

10) Azure Key Vault

Introduction to Key Vault
Keys, Secrets, Certificates
Creating and configuring Key Vault

Azure Data Factory

1) Azure Data Factory

What is Azure Data Factory?
Azure Data Factory Architecture
Azure Data Factory Portal UI
Top-level concepts
1. Pipelines
1. Activities
1. Linked services
1. Datasets
1. Triggers
1. Data Flows
1. Integration Runtimes

2) Pipeline

What is a Pipeline?
Create a new pipeline
Organize pipelines into folders
Debug pipeline
Publish pipeline
Parameters / Pipeline Parameters

3) Linked Service

What is a Linked Service?
Create a Linked Service for –
1. BLOB
1. SQL Database
1. SQL Server
1. Data Lake Storage Gen1
1. Azure Data Lake Storage Gen2 etc
Parameters / Linked Service Parameterization

4) DataSets

What is a Data Set?
Create a Data Set for –
1. Avro, Binary, CSV, Excel, JSON, ORC, Parquet, XML in BLOB/ADLS Gen1/ADLS Gen2.
1. Table in SQL Database, SQL Server, Oracle Database etc
Parameters / Data Set Parameterization

5) Activities

Wait
Variables
1. Create a variable
1. Set variable
1. Append variable
Copy Data
1. General
1. Source
1. Sink
1. Mapping
1. Settings
1. User Properties
Copy file(s) from one BLOB Container to another Container
1. One file from a folder
1. All files from a folder
1. All files and folders recursively from a folder
Copy data / file from BLOB to SQL Database / ADLS Gen2
1. As CSV, TSV, Parquet, Avro, ORC etc.
Databricks Notebook
Azure Function
Lookup, Stored Procedure
Get Metadata, Delete
Execute Pipeline
Validation, Fail
Iteration & Conditionals
1. Filter
1. ForEach
1. If Condition
1. Switch
1. Until

6) What is a Trigger?

Types
1. Schedule
1. Tumbling window
1. Storage Events
Triggers with Parameters

7) Integration Runtime (IR)

Azure AutoResolveIntegrationRuntime
Azure Managed Virtual Network
Self-Hosted
Linked Self-Hosted

8) Source control

Git configuration
ARM Template
1. Export / Import
Azure Devops Repos

9) Global parameters

10) Credentials

11) Monitoring ADF Jobs

12) Alerts

13) Send Failure Notifications using Logic Apps

14) Data Flows

What is Data Flow?
Mapping Data Flow
Data Flow Debug
Transformations
1. Filter, Aggregate, Join
1. Conditional Split, Derived Column
1. Exists, Union, Lookup, Sort,
1. GroupBy, Pivot, Unpivot, Flatten etc.
1. Flatten, parase, stringify
1. Filter sort, alterrow,asset
1. flowlet
Validate Schema, Schema Drift
Remove Duplicate Rows using Mapping Data Flows in Azure Data Factory

15) Azure Devops

Repos

16) SDLC

17) Agile Methodology

18) ADF Interview Questions

19) ADF Resume Preparation

20) End TO End ADF Project

21) ADF Exercises

Create variables using set variable activity
How to use if condition using if condition activity
Iterating files using for loop activity
Creating linked services, Data sets
Copy activity – blob to blob
Copy activity – blob to azure SQL
Copy activity – pattern matching files copy
Copy activity – copy the filtered file formats
Copy activity – copy multiple files from blob to another blob
Copy activity – Delete source files after copy activity
Copy activity – using parameterized data sets
Copy activity – convert one file format to another file format
Copy activity – add additional columns to the source columns
Copy activity – filter files and copy from one blob to another
Delete the files from blob with more than 100KB
How to use getmetdata activity
Bulk copy tables and files
How to integrate keyvault in ADF
How to set up integration run time
Copy data from on premises to azure cloud
How to use databricks activity activity and pass paraemeters to it
How to use scheduling trigger
How to use tumbling window trigger
How to use event based trigger
How to use with Activity
How to use Until Activity
Dataflows – select the rows
Dataflows – Filter the rows
Dataflows – join Transformations
Dataflows – union Transformations
Dataflows – look up Transformations
Dataflows – window functions transformations
Dataflows – pivot, unpivot transformations
Dataflows – Alter rows transformations
Dataflows – Removing Duplicates transformations
How to pass parameters to the pipeline
How to create alerts and rules
How to set global parameters
How to import and export ARM templates
How to integrate ADF with Devops
How to use Azure devops Repos
How to send mail notifications using logic apps
How to monitor the pipelines
How to debug the pipelines
How to schedule pipeline using triggers
How to create trigger dependency
How to one pipeline in another pipeline

Azure Databricks

1) Introduction to BigData

What is Data?
What is Database?
What is BigData?
What are the challenges of BigData?
Why Traditional Databases Doesn’t handle Bigdata

2) Introduction to Hadoop

What is Hadoop?
How Hadoop overcome bigdata challenges
Hadoop Architecture
Hadoop Daemons
HDFS
YARN
MapReduce

3) Introduction to Spark

Spark Architecture
Spark internals
Spark RDD
Spark DataFrame
Spark Streaming

4) Introduction To Databricks

What is Databricks?
Databricks Architecture
Working in Databricks workspace
Workign with Databricks notebook

5) Working with Databricsks FileSystem – DBFS

What is DBFS?
DBFS commands – mkdirs , cp , mv , head, put, rm , rmdir
How to handle multiple files in DBFS
How to process the files in DBFS
How to archive the files in DBFS

6) Databricks -Sparck Core

RDD Programming
Operations on RDD
Transformations- Narrow
Transformations -Wide
Actions
Loading Data and Saving Data
Key Value Pair RDD
Broadcast variables

7) Databricks – Spark-SQL- DataFrames

Creating Data Frames
DataFrames internal execution
Transformations using DataFrame API
Actions using DataFrame API
User-defined functions in Spark SQL

8) Databricks- Handle multiple file formats

CSV Data
JSON Data
parquet files
Excel files
ORC file format

9) Databricks utilities

credentials utility
FilSystem utility
Notebook utility
secrets utility
widgets utility

10) Databricks Cluster Management

Creating and configuring clusters
Managing Clusters
Displaying clusters
Starting a cluster
Terminating a cluster
Delete a cluster
Cluster Information
Cluster logs
Types of Clusters
All pupose clusters
Job cluster
Clusters Mode
Standard
High Concurrency
Autoscalling
Databricks runtime versions

11) Databricks – Batch Processing

Historical Data load
Incremental Data load
Date Transformations
Aggregations
Join Operations
window functions
union operations

12) Introduction to Azure

Azure Portal Walkthrough
What is Subscription?
What is a Resource Group?
What is a Resource?
Overview of Azure Resources / Services
Azure Data bricks
BLOB Storage, Data Lake Storage Gen2
Azure SQL Server, SQL Database
Key Vault

13) Databricks Integration with

Blob strorage storage
Azure Datalake storage gen2
Azure SQL Database
Synapse
Azure Keyvault

14) Databricks – Streaming API

What is streaming?
Process streaming using Pyspark API
Handling bad records
Stream data into Gen2lake
Load the data into Tables

15) Databricks – Lakehouse (Delta Lake)

Difference between Data lake and Delta Lake
Introduction to Deltalake
Features of DeltaLake
How to create delta table
How to DML operations in Delta Table
Merge statements
Handling SCD Type1 and Type2
Handling Data Deduplication in delta tables
Handling streaming Data in Delta lake

16) Databricks – Unity Catalog

what is Unity catalog
Creating access connector for databricks
creating metastore in unity catalog
Unity catalog object model
Roles in unity catalog
users and group management
unity catalog previleges
manages external tables in unity catalog

17) Workflows in Databricks

Introduction to workflows
Create, run and manage Databricks jobs
Schedule Databricks jobs
Monitor Databricks Jobs

18) Azure DevOps – Repos

What are DevOps Repos
Integrate databricks notebooks with Repos
Commit, Sync notebooks to and from Repos

19) SDLC and Agile methodology

20) End to End Data Migration Project from On Premises to Cloud.

21) Interview Questions 22) Mock Interviews

Azure Synapse

1) Introduction & Overview

Azure Synapse Analytics Overview
Anzure Synapse Analytics Architecture
Create Azure Free Account for Synapse

2) Overview of pools in Synapse Analytics

Dedicated SQL pools
Serverless SQL pool
Apache Spark pools
Data Explorer pools

3) Using Azure Synapse Analytics to Query Data Lake

Creating Azure Synapse Analytics Workspace
Uploading Sample Data into Data Lake Storage
Exploring Azure Synapse Workspace and Studio
Querying a Data Lake Store using serverless SQL pools in Azure Synapse Analytics
Creating a View for CSV Data with a Serverless SQL Pool

4) Azure Storage Account Integration with Azure Synapse

Copy multiple files from blob to blob using wildcard file options
Copy multuple folders from blob to blob using dataset parameters
Get File Names from Folder Dynamically and copy latest file from folder

5) Azure Synapse Triggers

Schedule Trigger in Azure Synapse
Event Based Trigger in Azure Synapse

6) Azure SQL Database integration with Azure Synapse

Azure SQL Databases – Introduction __ Relational databases in Azure
Copy data from SQL Database to ADLS Gen2 using table, query and stored procedure
Overwrite and Append Modes in Copy Activity in Azure Synapse
Use Foreach loop activity to copy multiple Tables- Step by Step Explanation

7) Incremental Load to Azure Synapse in Azure Synapse

Incremental Load or Delta load from SQL to blob Storage in Azure Synapse
Multi Table Incremental Load or Delta load from SQL to to Azure Synapse
Incrementally copy new and changed files based on Last Modified Date

8) Logging and Notification and keyvault integration __ Azure Logic Apps

Log Pipeline Executions to SQL Table using Azure Synapse
Custom Email Notifications and keyvault integration with Linked Service
Send Error notification with logic app
Use Foreach loop activity to copy multiple Tables with pipeline logs logic and notifications

9) Deep dive into Copy Activity in Azure Synapse

Load data from on premise sql server to Azure Synapse
Copy Data from sql server to to Azure Synapse with polybase & Bulk Insert
Copy Data from on-premise File System to Azure Synapse
Loop through REST API copy data 2 ADLS Gen2 Linked Service Parameters

10) Data Flows Introduction

Azure Data Flows Introduction
Setup Integration Runtime for Data Flows
Basics of SQL Joins for Azure Data FlowsServerless SQL Pool Demo
Joins in Azure DataFlowsDedicated SQL Pool Demo
Difference Between Join vs.Lookup Transformation& Merge Functionality Spark Pool Demo
Dataflows – select the rows
Dataflows – Filter the rows
Dataflows – join Transformations
Dataflows – union Transformations
Dataflows – look up Transformations
Dataflows – window functions transformations
Dataflows – pivot, unpivot transformations
Dataflows – Alter rows transformations
Dataflows – Removing Duplicates transformations

11) Spark Pool Introduction in Azure Synapse

Spark Introduction and components
Spark Architecture
Create notbook and notebook option and create notebook in different langauges
MSSparkUtils for file system
MSSparkUtils for for creating notebook parameters
Magic commands and calling one synapse notebook from another and returning output of synapse notebook
Configure keyvault in azure synapse notebook
Different ways to connect to ADLSGen2 from synapse notebook
Different ways to connect to Blob from synapse notebook
Different ways to connect to Azure SQL Database from synapse notebook
Different ways to connect to on premise SQL Server from synapse notebook
Optimization while Reading and writing CSV files from Azure Synapse
Reading and writing parquet files from Azure Synapse
Reading and writing JSON files from Azure Synapse
Reading and writing avro and orc files from Azure Synapse
Reading and writing EXCEL files from Azure Synapse
Different ways to create RDD in synapse notebook
Different ways to create dataframes in synapse notebook
When to use repartition and coalesce
Joins in Synapse Notebook
Broadcast Joins in Synapse Notebook and configuration of spark for optimization
what is catalyst optimiser and skewness issue in spark
Optimization techniques in pyspark
Implementing SCD1 in Synapse Notebook
Implementing SCD2 in Synapse Notebook
Executing synapse notebooks from synapse pipleines with input and output parameters

12) Project: End to End DataMigration using Synapse Analytics