Courses » Distributed Systems » 2019/2020 » Projects » En-It » The power of Apache Flink

The power of Apache Flink




According to a recent report by IBM Marketing cloud, “90 percent of the data in the world today has been created in the last two years alone, creating 2.5 quintillion bytes of data every day and with new devices, sensors and technologies emerging, the data growth rate will likely accelerate even more”. The amount of data is growing significantly over the past few years, therefore, the need for distributed data processing frameworks is growing. It all started back in 2011 when the first version of Apache Hadoop was released. Here is where Distributed Data processing frameworks come into play. Apache Flink (released in March 2016) is a new face in the field of distributed data processing. Flink has been considered as the next distributed data processing revolution. Apache Flink is a top-level Apache project that allows unifying distributed stream and batch processing. Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and any scale. In the core of Apache Flink is a streaming data-flow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.