Course Outline

Introduction

  • Introduction to Cloud Computing and Big Data solutions
  • Overview of Apache Hadoop Features and Architecture

Setting up Hadoop

  • Planning a Hadoop cluster (on-premise, cloud, etc.)
  • Selecting the OS and Hadoop distribution
  • Provisioning resources (hardware, network, etc.)
  • Downloading and installing the software
  • Sizing the cluster for flexibility

Working with HDFS

  • Understanding the Hadoop Distributed File System (HDFS)
  • Overview of HDFS Command Reference
  • Accessing HDFS
  • Performing Basic File Operations on HDFS
  • Using S3 as a complement to HDFS

Overview of the MapReduce

  • Understanding Data Flow in the MapReduce Framework
  • Map, Shuffle, Sort and Reduce
  • Demo: Computing Top Salaries

Working with YARN

  • Understanding resource management in Hadoop
  • Working with ResourceManager, NodeManager, Application Master
  • Scheduling jobs under YARN
  • Scheduling for large numbers of nodes and clusters
  • Demo: Job scheduling

Integrating Hadoop with Spark

  • Setting up storage for Spark (HDFS, Amazon, S3, NoSQL, etc.)
  • Understanding Resilient Distributed Datasets (RDDs)
  • Creating an RDD
  • Implementing RDD Transformations
  • Demo: Implementing a Text Search Program for Movie Titles

Managing a Hadoop Cluster

  • Monitoring Hadoop
  • Securing a Hadoop cluster
  • Adding and removing nodes
  • Running a performance benchmark
  • Tuning a Hadoop cluster to optimizing performance
  • Backup, recovery and business continuity planning
  • Ensuring high availability (HA)

Upgrading and Migrating a Hadoop Cluster

  • Assessing workload requirements
  • Upgrading Hadoop
  • Moving from on-premise to cloud and vice-versa
  • Recovering from failures

Troubleshooting

Summary and Conclusion

Requirements

  • System administration experience
  • Experience with Linux command line
  • An understanding of big data concepts

Audience

  • System administrators
  • DBAs
  35 Hours
 

Number of participants


Starts

Ends


Dates are subject to availability and take place between 09:30 and 16:30.
Open Training Courses require 5+ participants.

Testimonials (6)

Related Courses

Related Categories