Micah Wylde
banner
micahw.com
Micah Wylde
@micahw.com
Co-founder arroyo.dev, building next-gen streaming systems. Prev Splunk, Lyft, Sift, Quantcast.
Arroyo is sitting at 3,999 stars... who's going to put us over the top github.com/ArroyoSystem...
March 1, 2025 at 12:05 AM
14/ This can have a huge effect on the number of tasks in a large complex pipeline. In a simple example pipeline, this reduces the number of operators from 10 to 3. Multiply that by the parallelism to give the total amount of task reduction.
December 5, 2024 at 9:34 PM
8/ For example, a group-by query will need to get all of the records for a particular key on the same subtask of an operator so that they can be processed together. We call this a shuffle edge.
December 5, 2024 at 9:33 PM
2/ What is operator chaining? First we need to understand a bit about the physical structure of a stream processing pipeline. A user-supplied SQL query defining a pipeline goes through several stages of transformation, from SQL text, to a logical plan, and finally to a physical dataflow graph:
December 5, 2024 at 9:30 PM