Course details

Hadoop in the Cloud

Hadoop in the Cloud

Target Audience
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Amazon Web Services, also known as AWS, is a secure cloud-computing platform offered by This course introduces AWS and it's most prominent tools such as IAM, S3, and EC2. Additionally we will cover how to install configure and use a Hadoop cluster on AWS. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Target Audience
Developers interested in expanding their knowledge of Hadoop from the operations perspective


Expected Duration (hours)

Lesson Objectives

Hadoop in the Cloud

  • start the course
  • describe how cloud computing can be used as a solution for Hadoop
  • recall some of the most come services of the EC2 service bundle
  • recall some of the most common services that Amazon offers
  • describe how the AWS credentials are used for authentication
  • create an AWS account
  • describe the use of AWS access keys
  • describe AWS identification and access management
  • set up AWS IAM
  • describe the use of SSH key pairs for remote access
  • set up S3 and import data
  • provision a micro instance of EC2
  • prepare to install and configure a Hadoop cluster on AWS
  • create an EC2 baseline server
  • create an Amazon machine image
  • create an Amazon cluster
  • describe what the command line interface is used for
  • use the command line interface
  • describe the various ways to move data into AWS
  • recall the advantages and limitations of using Hadoop in the cloud
  • recall the advantages and limitations of using AWS EMR
  • describe EMR End-user connections and EMR security levels
  • set up an EMR cluster
  • run an EMR job from the web console
  • run an EMR job with Hue
  • run an EMR job with the command line interface
  • write an Elastic MapReduce script for AWS
  • Course Number:

    Expertise Level