Skip to main content
Version: 0.2.x

Intro to Stream Processing

"Turning the database inside out" is an influential article in the data engineering space, leading to the founding of Kafka. Since then, implementations like Redpanda and Redis Streams emerged, spurring a real-time data processing ecosystem.

Vs event-based programming

Similar to event-based programming, stream processing is a programming paradigm that aims to handle events in near real-time or as soon as events happen. One way to classify between the two might be frequency. Streams are continuous sequence of events with a high throughput: instead of many short-lived connections, you simply keep a connection open and wait for events to come.

Vs batch processing

Stream processing can be thought as batch processing with extremely small batch sizes. The batch size does not necessarily have to be one. Messages can be micro-batched by the millisecond, or second, according to your latency requirement and processing power. But gone are the days of "it may take up to 24 hours for this change to reflect"!

Why Rust?

We want to construct the best stream processing platform where Rust's unique characteristics truly shine:

Multi-threaded async

Unlike other languages, Rust's async execution is multi-threaded. It allows you to scale up a process with as many threads as needed to fully utilize the CPU for maximum concurrency. You do not need to setup an external queuing system!

Predictable latency

As a language with no garbage collection, there is no random point in time where the garbage collector kicks in and causes jitter. When you have a long pipeline, these jitters tend to propagate and amplify downstream. Rust is not automatically low-latency though - you still need to spend considerable effort in optimization. But you will have a good starting point.

Self-contained

Unlike other languages, the recommended way of packaging Rust programs is to static-link everything into one executable - often only sized a few megabytes. And there is no installation or warm-up needed - it spins up immediately, which is a bonus for stream processing.

Low resource usage

Like other compiled languages, Rust uses considerably less memory than a VM based language. And without the need of JIT, Rust also has less CPU overhead.

Long-running safe

Again, without GC, Rust programs are less susceptible to "slow memory bloat over a period of days" (technically, it's not a leak). There is less risk of out-of-memory crashes, so you don't have to "restart the process every week". Albeit, you still have to be careful about heap allocations.

Ecosystem

Finally, Rust has a great ecosystem of async programming libraries: networking libraries built on async IO, lock-free channels and other data structures to make async programming ergonomic and fun.

Without further ado, let's get started!