2 Day Hadoop Fundamentals

Course Summary


ExpoNential Inc. (Host of CloudCon Expo & Conference) is offering this extensive weekend class on Hadoop platforms. We have a team of experienced instructors who have worked extensively in Hadoop & Cassandra platforms, and have deployed various clustering software packages internationally to fortune 500 clients. 

 

This is a fast paced, vendor agnostic, technical overview of the Hadoop landscape. No prior knowledge of databases or programming is assumed. This survey course is targeted towards both technical and non-technical people who want to understand the emerging world of Big Data, with a specific focus on Hadoop. In each sub-topic, the instructor will provide links and resource recommendations for students who want to explore that area further (for example, YouTube videos, books, blog posts). Students will be given slide deck which can be used as reference material after the course.  

 

Students will experience real Hadoop clusters and the latest Hadoop distributions. We will discuss vendor offerings for Hadoop including Cloudera, Hortonworks, and MapR. The lab work will be conducted on Cloudera based deployments to facilitate hands-on experience.

 

Duration


June 8-9, 2013 (Sat-Sun)

8am - 6pm ( Breakfast & Lunch will be provided)

 

Location


The Domain Hotel

1085 El Camino Real, Sunnyvale, CA 94087

 

Instructor


Salman Ahmed, Cloudera Certified Hadoop Instructor

 

Cost


 

One Day $699
Both Days $999

 

Save 15%  (Use discount code save15now on our secure checkout page.)

 

Audience


Engineers, Programmers, Networking specialists, Managers, Executives

 

Softwares Covered


 HDFS, MapReduce, Pig, Hive, HBase (Please bring your laptop)

 

Objectives


 

- Introduce students to the core concepts of Hadoop

- Deep dive into the critical architecture paths of HDFS, MapReduce and HBase

- Teach the basics of how to effectively write Hive scripts

- Explain how to choose the correct use cases for Hadoop

- Give each student access to an individual 1-node Hadoop cluster in Rackspace to run through hands-on 

- Provide links to the best books, blog posts and videos for students to learn more about Hadoop on their own

 

 

 

 

IMP: Please bring your laptop.

 

 

Course Outline


 

 

 

 

Day 1:    Big Data, HDFS and MapReduce Primer

 

                Hadoop

- Parallel Computer vs. Distributed Computing

- Brief history of Hadoop

- Scaling with Hadoop

- RDBMS/SQL vs. Hadoop

- Hadoop Daemons introduction: NameNode, DataNode, JobTracker, TaskTracker

- Intro to the Hadoop ecosystem: HDFS, MapReduce, Pig, Hive, HBase, ZooKeeper

- Vendor Comparison - Hardware + Software recommendations for Hadoop

                LAB #1: Hadoop Installation, Hadoop cluster specific operations and sample job execution

                

 

                HDFS 

 

- Linux File system options

- Sample HDFS commands

- HDFS sample architecture at Yahoo!

- Data Locality

- Rack Awareness

- Write Pipeline

- Read Pipeline

- NameNode architecture (EditLog, FsImage, location of replicas, safe mode)

- Secondary NameNode architecture

- DataNode architecture

- Heartbeats

- Block Scanner

- Fsck Health Check + file breakdown

- Balancer

                LAB #2: Various HDFS specific operations

 

                MapReduce 

 

- MapReduce Architecture

- JobTracker/TaskTracker

- Combiner

- Partitioner (shuffle)

- Counters

- Speculative Execution

- Distributed Cache

- Job Scheduling (FIFO, Fair Scheduler, Capacity Scheduler)

                LAB #3: Understanding MapReduce jobs through execution

 

 

Day 2:  Hadoop Ecosystem

 

                Real-time I/O with HBase 

 

- HBase versions and origins

- HBase architecture

- HBase core concepts

- HBase vs. RDBMS

- HBase Master and Region Servers

- Data Modeling

- Column Families and Regions

- HBase Internals: Bloom Filters and Block Indexes

- Write Pipeline / Read Pipeline

- Compactions

                LAB #4: Exploring HBase command 

 

                Hive

 

- Hive philosophy and architecture

- Hive vs. RDBMS

- HiveQL and Hive Shell

- Managing tables

- Data types and schemas

- Querying data

- HiveODBC

                LAB #5: Analyzing real world data using Hive and performing analysis

 

               Sqoop 

 

              - Data Processing through Sqoop
              - Understand Sqoop connectivity model with RDBMS 
              - Using Sqoop example with real time data applications 

                LAB #6: Using Sqoop with Excel and PowerPivot to Perform Data Analysis

 

                Next-gen Hadoop  

 

- HDFS improvements: HDFS Federation, NameNode HA, Snapshots

- MapReduce improvements: YARN, Performance

- HBase GeoRedundancy, DR, and Snapshots

- Brief introduction on Mahout, Oozie, Pig and Avro etc.

 

 

Reserve Your Space Today!