Course details

Streaming Data Architectures: Processing Streaming Data

Streaming Data Architectures: Processing Streaming Data


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description
Process streaming data with Spark, the analytic engine built on Hadoop. In this course, you will discover how to develop applications in Spark to work with streaming data and generate output. Topics include the following: Configure a streaming data source; Use Netcat and write applications to process the data stream; Learn the effects of using the Update mode on your stream processing application's output; Write a monitoring application that listens for new files added to a directory; Compare the append output with the update mode; Develop applications to limit files processed in each trigger; Use Spark's Complete mode for output; Perform aggregation operations on streaming data with the DataFrame API; Process streaming data with Spark SQL queries.

Expected Duration (hours)
0.9

Lesson Objectives

Streaming Data Architectures: Processing Streaming Data

  • Course Overview
  • install the latest available version of PySpark
  • configure a streaming data source using Netcat and write an application to process the stream
  • describe the effects of using the Update mode for the output of your stream processing application
  • write an application to listen for new files being added to a directory and process them as soon as they come in
  • compare the Append output to the Update mode and distinguish between the two
  • develop applications that limit the files processed in each trigger and use Spark's Complete mode for the output
  • perform aggregation operations on streaming data using the DataFrame API
  • work with Spark SQL in order to process streaming data using SQL queries
  • define and apply standard, re-usable transformations for streaming data
  • recall they key ways to use Spark for streaming data and explore the ways to process streams and generate output
  • Course Number:
    it_dssdardj_02_enus

    Expertise Level
    Intermediate