Buffer in data a pipeline.

Buffers are typically used when there is a difference between the rate at wich data is received and the rate at which it can be processed. A buffer often adjusts timing by implementing a queue alorithm in memory, simultaneously writing data into the queue at one rate and reading it at another rate. A buffer is a reserved memory area where data is stored temporarily until it is processed. A pipeline connects a ‘producer’ of data to a consumer of data.

The buffer is a temporary storage. People can use solutions like messaging queue systems like Apache Kafka, Rabbit MQ, AWS Kinesis, or Redis, Google pub/sub (publsih-subscribe patterns).

These systems are message Queues. You put something in one side and take it out on the other side. The idea behind buffers is to have an intermediate system for the incomming data.

How does it work ? :

You have for example :

The buffer is getting data from an API.

The API publish into the message queue.

The data is buffered here until it is picked up by the processing.

If you don’t have a buffer you can run into problems when writing directly into a store, or you are processing data directly. You can always have peaks of incoming data that blockthe systems. The process could take too much time.

With buffers processes for storage and analytics can take out only as much data as they can process. Buffers are good to build pipelines. For example, you can take data out of kafka, you can pre-process it and put it back into kafka. Then with another analytics process you can take the processed data out and put it into a store (like hdfs, hbase, amazon s3, dynamoDB perfect to store big data)

Then by creating APIs from data you can enable others developers to build their own UI.

references :

  • https://github.com/andkret/Cookbook/blob/master/sections/01-Introduction.md#buffer
  • https://www.quora.com/What-does-a-buffer-do-in-pipelining
  • https://en.wikipedia.org/wiki/Data_buffer

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.