Programming Exercises
This training provides programming exercises that teach how to implement scalable data analysis programs with Apache Flink’s APIs
Note: The programming exercises assume a working development environment and some basic knowledge of Flink’s programming primitives.
DataSet API Exercises
The DataSet API is a programming model for scalable batch processing. It features a Java and a Scala API which are feature equivalent and very similar.
The exercises are ordered by increasing difficulty.
Mail Count
Count the number of mails in the archive of Flink’s developer mailing list per email address and month.
Instructions | DataSet API: Mail Count |
Data Set | Mail Data Set |
API Features | Map, GroupBy, GroupReduce |
Reference Solution | Java: MailGraphExercise.java |
Reply Graph
Extract a graph of reply connections from the mails of Apache Flink’s developer mailing list archives. A reply connection is defined by two emails where one email that was sent as a reply to the other email. By extracting the email addresses of both mails of a reply connection, we can construct a graph that allows to analyze the Flink community.
Instructions | DataSet API: Reply Graph |
Data Set | Mail Data Set |
API Features | Map, Join, GroupBy, GroupReduce |
Reference Solution | Java: ReplyGraphExercise.java |
You can find more exercises with solutions at the Flink Training page.