This training provides programming exercises that teach how to implement scalable data analysis programs with Apache Flink’s APIs

Note: The programming exercises assume a working development environment and some basic knowledge of Flink’s programming primitives.

DataSet API Exercises

The DataSet API is a programming model for scalable batch processing. It features a Java and a Scala API which are feature equivalent and very similar.

The exercises are ordered by increasing difficulty.

Mail Count

Count the number of mails in the archive of Flink’s developer mailing list per email address and month.

Instructions DataSet API: Mail Count
Data Set Mail Data Set
API Features Map, GroupBy, GroupReduce
Reference Solution    Java:

Reply Graph

Extract a graph of reply connections from the mails of Apache Flink’s developer mailing list archives. A reply connection is defined by two emails where one email that was sent as a reply to the other email. By extracting the email addresses of both mails of a reply connection, we can construct a graph that allows to analyze the Flink community.

Instructions DataSet API: Reply Graph
Data Set Mail Data Set
API Features Map, Join, GroupBy, GroupReduce
Reference Solution    Java:

You can find more exercises with solutions at the Flink Training page.