Big Data Hadoop + Spark (Basic+Advanced)

Big Data Hadoop+Spark (Basic+ Advanced) course content

Online Course Duration: 35 Hours

HADOOP BASIC COURSE STRUCTURE

SUMMARY

  1. Hadoop
  2. Hive
  3. Pig
  4. Scoop
  5. Flume
  6. Hbase

HADOOP ADVANCED COURSE STRUCTURE

  1. BASIC COURSE STRUCTURE
  2. SPARK
  3. SPARK SQL
  4. SPARK STREAMING
  5. MLIB
  6. GRAPH X

HADOOP

  • A brief discussion
  • Why Hadoop over other Systems.
  • Simple Mapreduce introduction and case example.
  • Description on Hadoop components and its working.
  • First example of wordcount with Java using CLI and eclipse.
  • Anatomy of the wordcount program used.
  • YARN architecture.
  • YARN workflow.
  • Input splits.
  • Mapreduce: Combiner and Partitioner.
  • Working with other formats.
  • Use cases.

HIVE

  • A brief introduction.
  • Why hive over RDBMS.
  • Hive architecture.
  • Various Data types in Hives – Simple and Complex.
  • Working with DML(Data Manipulation Language)
  • Working with DDL(Data Definition Language)
  • Data Retrieval techniques – intro to Sql querying.
  • Adavanced admin functionalities in Hive.
  • Using Windows and Analytics Functions in Hive.
  • Use case 1.
  • Working with other file formats.
  • Use case 2.
  • Interview questions and answers.

PIG

  • Pig and its architecture.
  • Mapreduce vs Pig.
  • Shell and Utility commands.
  • Programming structure in Pig.
  • Using built-in function in Pig.
  • Standard working queries with pig
  • Using nested querying.
  • Using UDFs in Pig.
  • PiggyBank- a repository in Pig.
  • Use cases.

SQOOP

  • SQOOP architecture
  • Working functionality of Sqoop
  • Databases functionalities and file formats.
  • Query, query optimization and options file usage
  • Use Cases.

FLUME

  • Flume architecture and usage.
  • Configuring flume for data ingestion.
  • Various file types and configuration variables.
  • Use Cases.

HBASE

  1. HBASE architecture.
  2. Working with Hbase Shell.
  3. Getting and Inserting Data
  4. Filters in Hbase.
  5. Data Loading techniques.
  6. Use cases.

HADOOP ADVANCED COURSE STRUCTURE.

SPARK with Python

  • Spark full architecture.
  • Spark vs MR1. The advantages.
  • Transformations and actions.
  • Graph lineage and console UIto spark.
  • Text file processing using spark.
  • Persist and caching.
  • Understanding shuffle, partitioning and advanced transformations and actions.
  • Querying with spark.
  • Dataframe concept in Spark.

SPARK SQL

  • Analysing Hive and Spark SQL architecture.
  • SQLContext in Spark SQL.
  • Working with Dataframes.- an elaborative study on functions,queries and UDFs.
  • Various file types usage. Writing and saving.
  • Using Dataset API using Scala.
  • Use cases.

SPARK STREAMING

  • Spark streaming architecture.
  • Transformations in Spark Streaming
  • Fault tolerance in Spark Streaming
  • Checkpointing in Spark Streaming
  • Parallelism level
  • Windowed operations in Spark Streaming
  • Use cases.

MLIB

  • Machine learning with Spark
  • Data types
  • Algorithms – statistics
  • Classification and regression
  • Clustering
  • Collaborative filtering
  • Use Cases.

GRAPHX

  • Implementing data visualization with Spark.
  • The property Graph algorithm.
  • Using Vertex and Edge RDDs.
  • Advanced Graph operators.
  • Aggregations using GraphX.
  • Using Graph Builders.
  • Collaborative filtering
  • The Pregel API.
  • Sample Graph Algorithms.
  • Use cases.