Amazon Kinesis is a family of services provided by Amazon Web Services (AWS) for processing and analyzing real-time streaming data at a large scale. Amazon Kinesis is composed of four main services: Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics, and Kinesis Video Streams.
KINESIS PRODUCERS
- Kinesis is a managed “data streaming” serviceThis is how it Amazon Kinesis looks like, Source can be Click Streams, IoT devices and Metrics & logs; Then do some Amazon Kinesis for processing and analyzing real-time streaming of data at a big scale and then data can be pushed to Amazon S3 and Redshift.
- Great for application logs, metrics, IoT, clickstreams
- Great for “real-time” big data
- Great for streaming processing frameworks (Spark, NiFi, etc...)
- Data is automatically replicated synchronously to 3 AZ.
- Kinesis Data Streams: low latency streaming ingest at scale.
- Kinesis Data Analytics: perform real-time analytics on streams using SQL.
- Kinesis Data Firehose: load streams into S3, Redshift, ElasticSearch & Splunk.
Kinesis Streams Overview
- Streams are divided in ordered Shards / Partitions
Producers --> Shard 1/Shard 2/Shard 3 --> Consumers
- Data retention is 24 hours by default, can go up to 365 days
- Ability to reprocess / replay data
- Multiple applications can consume the same stream
- Real-time processing with scale of throughput
- Once data is inserted in Kinesis, it can’t be deleted (immutability)
Kinesis Producers & Consumers
KINESIS PRODUCERS
- AWS SDK: simple producer.KINESIS CONSUMERS
- Kinesis Producer Library (KPL): batch, compression, retries, C++, Java.
- Kinesis Agent:
- Monitor log files and sends them to Kinesis directly.
- can write to Kinesis Data Streams
AND Kinesis Data Firehose.
- AWSSDK:simpleconsumer
- Lambda:(throughEventsourcemapping)
- KCL:checkpointing,coordinatedreads