Apache Flink® is an open source platform for distributed stream and batch data processing. It offers expressive APIs to define data flow programs as well as a robust and scalable engine to execute these programs.

Goals and scope of this training

This training teaches how to implement scalable data analysis programs using Apache Flink. The focus is on Flink’s DataSet API for batch processing.

Specifically, this training teaches:

  1. How to setup an environment to develop Flink programs
    • Setup all required software and tools and configure an IDE
    • Create a Maven project for Flink programs and import it into an IDE
    • Execute and debug a Flink program locally in an IDE
  2. How to implement Flink programs using
    • Exercises for Flink batch programs using the DataSet API
  3. How to package, execute, and monitor Flink programs on running Flink systems

After the training you should have a general understanding of Flink’s usage and be able to know where to ask questions and get answers.