Wednesday 1 February 2023

AWS Kinesis Overview

Amazon Kinesis is a family of services provided by Amazon Web Services (AWS) for processing and analyzing real-time streaming data at a large scale. Amazon Kinesis is composed of four main services: Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics, and Kinesis Video Streams.
- Kinesis is a managed “data streaming” service
- Great for application logs, metrics, IoT, clickstreams
- Great for “real-time” big data
- Great for streaming processing frameworks (Spark, NiFi, etc...)
- Data is automatically replicated synchronously to 3 AZ.
- Kinesis Data Streams: low latency streaming ingest at scale.
- Kinesis Data Analytics: perform real-time analytics on streams using SQL.
- Kinesis Data Firehose: load streams into S3, Redshift, ElasticSearch & Splunk.
This is how it Amazon Kinesis looks like, Source can be Click Streams, IoT devices and Metrics & logs; Then do some Amazon Kinesis for processing and analyzing real-time streaming of data at a big scale and then data can be pushed to Amazon S3 and Redshift.

Kinesis Streams Overview

- Streams are divided in ordered Shards / Partitions

Producers --> Shard 1/Shard 2/Shard 3 --> Consumers

- Data retention is 24 hours by default, can go up to 365 days
- Ability to reprocess / replay data
- Multiple applications can consume the same stream
- Real-time processing with scale of throughput
- Once data is inserted in Kinesis, it can’t be deleted (immutability)

Kinesis Producers & Consumers


KINESIS PRODUCERS
- AWS SDK: simple producer.
- Kinesis Producer Library (KPL): batch, compression, retries, C++, Java.
- Kinesis Agent:
- Monitor log files and sends them to Kinesis directly.
- can write to Kinesis Data Streams
AND Kinesis Data Firehose.
KINESIS CONSUMERS
- AWSSDK:simpleconsumer
- Lambda:(throughEventsourcemapping)
- KCL:checkpointing,coordinatedreads