Devoxx Belgium 2018
from Monday 12 November to Friday 16 November 2018.
Gerard Maas is a senior software engineer at Lightbend, where he contributes to the Fast Data Platform and focuses on the integration of stream processing technologies. Previously, he held leading roles at several startups and large enterprises, building data science governance, cloud-native IoT platforms, and scalable APIs. He is the coauthor of Stream Processing with Apache Spark from O’Reilly. Gerard is a frequent speaker and contributes to small and large open source projects. In his free time, he tinkers with drones and builds personal IoT projects.
Fast Data architectures provide an answer to the increasing need for the enterprise to process and analyze continuous streams of data, which helps accelerate decision making and enables faster responses to changing characteristics of their market. Apache Spark is a popular framework for data analytics. Its capabilities in the streaming domain are represented by two APIs: The low-level Spark Streaming and the more declarative Structured Streaming, which builds upon the recent advances in Spark SQL query optimization and code generation.
After a quick introduction to both APIs, we will discuss their virtues, capabilities and key differences:
- How to get started: ease of development.
- How to deal with time: both at the processing and event level
- How to deal with state: locally, distributed and its relation with time
- How to migrate: functional coding strategies
- How to do ML: machine learning capabilities
Using practical examples from actual applications, we will provide guidance on how to choose one or even combine both APIs to implement functional and resilient streaming pipelines.