Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL.
Use cases : Buisness intelligence / analytics / reporting, analyze, & query VPC Flow logs, ELB logs, CloudTrails, etc.
Use cases : Buisness intelligence / analytics / reporting, analyze, & query VPC Flow logs, ELB logs, CloudTrails, etc.
- Serverless query service to analyse data stored in Amazon S3.
- Use standard SQL language to query files(built on pesto).
- Support CSV, JSON, ORC, Avro and Parquet.
- Pricing $5.00 per TB of data scanned.
- Commonly used with Amazon Quicksight for reporting/dashboards.
Amazon Athena : Performance Improvement
- Use Columnar Data for cost savings (by doing less scan).- Apache Parquet or ORC is recommended.- Compress data for smaller retrievals (bzip2, gzip, Iz4, snappy, zlip...)
- Huge performance improvement.
- Use Glue to convert your data to Parquet or ORC format.
- Partition datasets in S3 for easy querying on virtual columns- s://yourbucketname/pathtotable- Use larger files (>128MB) to minimize overhead./[PARTITION_COLUMN_NAME]=[VALUE]
/[PARTITION_COLUMN_NAME]=[VALUE]
etc..
- Example s3://athenabucket/flight/parquet/year=1999/month=1/day=1/
Amazon Athena : Federated Query
- Allows you to run SQL queries across data stored in relation, non-relational, objects or custom data sources (AWS or on-premises).
- Uses Data Sources Connectors that run on AWS Lambda to run federated queries (like on Cloudwatch logs, DynamoDB, RDS DB, Elasticache etc..)
- Store results back in Amazon S3.
Excellent article. Thanks for sharing.
ReplyDeleteAzure Data Engineer Online Training
Azure Data Engineer Training Hyderabad
Azure Data Engineer Training Ameerpet
Azure Data Engineer Course
Azure Data Engineer Training
Azure DevOps Training in Hyderabad
Data Engineer Course in Hyderabad