Understanding Big Data Engineering Basics

Get a practical, hands-on introduction to Big Data and modern data platforms, covering distributed storage, batch and real-time processing, ETL pipelines, NoSQL, and data warehousing. Learn how technologies like Hadoop, Spark, Kafka, Hive, and MongoDB work together to build scalable data architectures and support faster, data-driven decisions.

Enterprise-BD-B

Beginner

English

Discover the lab

DOWNLOAD LAB PDF

About The Lab

Prerequisites

Linux basics

Python basics

SQL basics

CLI basics

Audiences

Lab Architecture

This hands-on lab uses a distributed Big Data environment that reflects a real-world data platform. It includes Hadoop for storage and resource management, Spark and MapReduce for data processing, PySpark for ETL, Hive and MongoDB for data management, and Kafka with Spark Structured Streaming for real-time analytics. The lab combines batch and streaming workflows in a scalable architecture.

‍

Why this Lab ?

The Big Data lab provides a comprehensive hands-on environment designed for managers, architects, and both technical and non-technical profiles to explore the fundamentals and advanced concepts of modern data engineering.

Participants will gain practical exposure to the entire Big Data ecosystem, from data ingestion and storage to distributed processing, analytics, and real-time streaming. The lab covers key technologies such as Hadoop, HDFS, MapReduce, Apache Spark, ETL pipelines, NoSQL databases, and streaming platforms like Kafka.

Through progressive challenges, learners will understand how large-scale data systems are designed, deployed, and optimized End to End.

‍

Lab Objectives

Understand core Big Data concepts, including the 5Vs, data types, and distributed computing principles.
Learn to use Hadoop, HDFS, and YARN for scalable storage and cluster-based data management.
Process and transform large datasets with MapReduce, Apache Spark, PySpark, and Spark SQL.
Build batch and real-time data pipelines using ETL practices, Kafka, and Spark Structured Streaming.
Apply Big Data tools such as Hive and MongoDB to improve data quality, analytics, scalability, and performance.

Related Labs

Explore More Hands-On Trainings

Explore More

Get Your Hands-On Training Lab

REQUEST LAB

Download Lab Guide

Prerequisites

Audiences

Related Labs

Get Your Hands-On Training Lab