Spark Structured Streaming
Spark Structured Streaming is a scalable and fault-tolerant stream processing engine built on top of the Apache Spark framework. It allows you to process and analyze real-time data streams using the same APIs and programming constructs as batch processing in Spark. Structured Streaming provides a high-level, declarative API for processing data streams, making it easier for developers to work with streaming data.
With the vast growth of Spark Structured Streaming, Databricks, the tech unicorn behind Apache Spark and Spark Structured Streaming, announced Project LightSpeed in 2022. Project Lightspeed is an umbrella project aimed at improving a couple of key aspects of Spark Streaming:
Performance improvements. Including offset management, log purging, microbatch pipelining, state rebalancing, adaptive query execution, and many more. Enhanced functionalities. Some new functionalities include: multiple stateful operators, stateful processing in Python, dropping duplicates within watermark, and native support over Protobuf serialization. Improved observability. It is important to have metrics and tools for monitoring, debugging and alerting over streaming jobs. Project Lightspeed introduces Python query listener in PySpark to send streaming metrics to external systems. Expanding ecosystem. Project Lightspeed adds new connectors such as Amazon Kenesis and Google Pub/Sub to expand the ecosystem of Spark structured streaming. Project Lightspeed is a significant undertaking, but it has the potential to make Spark Structured Streaming a more powerful and versatile stream processing engine. I am excited to see how it develops in the future.
Reference List
- https://www.risingwave.com/blog/top-7-apache-flink-alternatives-a-deep-dive/
- https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#overview
- https://www.amazon.com/Scaling-Machine-Learning-Spark-Distributed/dp/1098106822/ref=asc_df_1098106822/?tag=hyprod-20&linkCode=df0&hvadid=598233665925&hvpos=&hvnetw=g&hvrand=12428709593327702844&hvpone=&hvptwo=&hvqmt=&hvdev=m&hvdvcmdl=&hvlocint=&hvlocphy=9009731&hvtargid=pla-1723190686221&psc=1&mcid=c4560dcb5aac36d3ada275b9fa49b8a9&gclid=CjwKCAiAmsurBhBvEiwA6e-WPP7TGSWVshnb0RkzFwSqGAgWubhJcHowzsYkeScjRNVtsg79uCccdxoCUjsQAvD_BwE