1) Introduction to BigData
- What is Data?
- What is Database?
- What is BigData?
- What are the challenges of BigData?
- Why Traditional Databases Doesn’t handle Bigdata
2) Introduction to Hadoop
- What is Hadoop?
- How Hadoop overcome bigdata challenges
- Hadoop Architecture
- Hadoop Daemons
- HDFS
- YARN
- MapReduce
3) Introduction to Spark
- Spark Architecture
- Spark internals
- Spark RDD
- Spark DataFrame
- Spark Streaming
4) Introduction To Databricks
- What is Databricks?
- Databricks Architecture
- Working in Databricks workspace
- Workign with Databricks notebook
5) Working with Databricsks FileSystem – DBFS
- What is DBFS?
- DBFS commands – mkdirs , cp , mv , head, put, rm , rmdir
- How to handle multiple files in DBFS
- How to process the files in DBFS
- How to archive the files in DBFS
6) Databricks -Sparck Core
- RDD Programming
- Operations on RDD
- Transformations- Narrow
- Transformations -Wide
- Actions
- Loading Data and Saving Data
- Key Value Pair RDD
- Broadcast variables
7) Databricks – Spark-SQL- DataFrames
- Creating Data Frames
- DataFrames internal execution
- Transformations using DataFrame API
- Actions using DataFrame API
- User-defined functions in Spark SQL
8) Databricks- Handle multiple file formats
- CSV Data
- JSON Data
- parquet files
- Excel files
- ORC file format
9) Databricks utilities
- credentials utility
- FilSystem utility
- Notebook utility
- secrets utility
- widgets utility
10) Databricks Cluster Management
- Creating and configuring clusters
- Managing Clusters
- Displaying clusters
- Starting a cluster
- Terminating a cluster
- Delete a cluster
- Cluster Information
- Cluster logs
- Types of Clusters
- All pupose clusters
- Job cluster
- Clusters Mode
- Standard
- High Concurrency
- Autoscalling
- Databricks runtime versions
11) Databricks – Batch Processing
- Historical Data load
- Incremental Data load
- Date Transformations
- Aggregations
- Join Operations
- window functions
- union operations
12) Introduction to Azure
- Azure Portal Walkthrough
- What is Subscription?
- What is a Resource Group?
- What is a Resource?
- Overview of Azure Resources / Services
- Azure Data bricks
- BLOB Storage, Data Lake Storage Gen2
- Azure SQL Server, SQL Database
- Key Vault
13) Databricks Integration with
- Blob strorage storage
- Azure Datalake storage gen2
- Azure SQL Database
- Synapse
- Azure Keyvault
14) Databricks – Streaming API
- What is streaming?
- Process streaming using Pyspark API
- Handling bad records
- Stream data into Gen2lake
- Load the data into Tables
15) Databricks – Lakehouse (Delta Lake)
- Difference between Data lake and Delta Lake
- Introduction to Deltalake
- Features of DeltaLake
- How to create delta table
- How to DML operations in Delta Table
- Merge statements
- Handling SCD Type1 and Type2
- Handling Data Deduplication in delta tables
- Handling streaming Data in Delta lake
16) Databricks – Unity Catalog
- what is Unity catalog
- Creating access connector for databricks
- creating metastore in unity catalog
- Unity catalog object model
- Roles in unity catalog
- users and group management
- unity catalog previleges
- manages external tables in unity catalog
17) Workflows in Databricks
- Introduction to workflows
- Create, run and manage Databricks jobs
- Schedule Databricks jobs
- Monitor Databricks Jobs
18) Azure DevOps – Repos
- What are DevOps Repos
- Integrate databricks notebooks with Repos
- Commit, Sync notebooks to and from Repos
19) SDLC and Agile methodology
20) End to End Data Migration Project from On Premises to Cloud.
21) Interview Questions 22) Mock Interviews