What is the difference between batch and stream processing?

Why Trust Techopedia

The typical answer when someone describes the difference between batch processing and stream processing is that batch data is collected, stored for a period of time, and processed and put to use at regular intervals (e.g. payroll, bank statements) while streaming data is processed and put to use as close to the instant it is generated (think of alerts from sensor data).

While accurate, this answer fails to capture why the difference is important and why companies are moving decisively toward stream processing architectures.

We experience the world as a constant stream of events. We make decisions by comparing this stream of information to our experiences and memories. We perceive and react to threats or recognize and seize opportunities. And often reacting in a timely fashion is rewarding – we avoid the snake bite or grab the best seat at the movie theater. Stream processing more closely reflects this very human mode of experience.

Enterprises ingest as many streams of information as they can handle, look for patterns in the data that represent threats or opportunities as it flows past, and when said patterns emerge, they act. The cost of not acting could be a data breach or a lost revenue opportunity.

Batch processing still works well when you need to process huge amounts of data and the results can be delivered at regular intervals. But if recent trends hold, more of these jobs will move to streaming because companies can't accept the hidden cost of batch any longer and remain competitive.

A great example is insider trading. The cost of detecting someone who is about to execute an insider trade is now much less than the cost of trying to unwind that trade later when batch processing picks it up. Even if the batch process runs every five minutes, that just means you'll find them sooner, not stop them. Ultimately stream vs. batch will show up in the balance sheet and the stock price.

The one potential argument against streaming is that it might not handle the amount of data as cost effectively as batch handles. However, with the advent of systems like Kafka, Flink, and their cloud analogues, such cases are getting rare.

Related Terms

Rob Malnati

Rob Malnati is the COO of thatDot and is a repeat entrepreneur focused on enterprise infrastructure as software. He has defined, evangelized and managed product and go to market strategies for a range of service, software and cloud-based SaaS providers, from startup to IPO, or Fortune 100/500 acquisition.