Course details

Data Lake: Architectures & Data Management Principles

Data Lake: Architectures & Data Management Principles


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

A key component to wrangling data is the data lake framework. In this 9-video Skillsoft Aspire course, learners discover how to implement data lakes for real-time management. Explore data ingestion, data processing, and data lifecycle management with Amazon Web Services (AWS) and other open-source ecosystem products. Begin by examining real-time big data architectures, and how to implement Lambda and Kappa architectures to manage real-time big data. View benefits of adopting Zaloni data lake reference architecture. Examine the essential approach of data ingestion and comparative benefits provided by file formats Avro and Parquet. Explore data ingestion with Sqoop, and various data processing strategies provided by MapReduce V2, Hive, Pig, and Yam for processing data with data lakes. Learn how to derive value from data lakes and describe benefits of critical roles. Learners will explore steps involved in the data lifecycle and the significance of archival policies. Finally, learn how to implement an archival policy to transition between S3 and Glacier, depending on adopted policies. Close the course with an exercise on ingesting data and archival policy.



Expected Duration (hours)
0.6

Lesson Objectives

Data Lake: Architectures & Data Management Principles

  • Course Overview
  • implement Lambda and Kappa architectures to manage real-time big data
  • identify the benefits of adopting Zaloni data lake reference architecture
  • describe data ingestion approaches and compare Avro and Parquet file format benefits
  • demonstrate how to ingest data using Sqoop
  • describe the data processing strategies provided by MapReduce V2, Hive, Pig, and Yam for processing data with data lakes
  • recognize how to derive value from data lakes and describe the benefits of critical roles
  • describe the steps involved in the data life cycle and the significance of archival policies
  • implement an archival policy to transition between S3 and Glacier, depending on adopted policies
  • ingest data using Sqoop and implement an archival policy to transition from S3 to adopted policies
  • Course Number:
    it_dsdlipdj_02_enus

    Expertise Level
    Intermediate