Data Lakes and Analytics on AWS

Fastest way to get answers from all your data to all your users
Easiest to build data lakes and analytics
Setting up and managing data lakes involves a lot of manual and time-consuming tasks such as loading, transforming, securing, and auditing access to data. AWS Lake Formation automates many of those manual steps and reduces the time required to build a successful data lake from months to days.
Scalable and cost effective
Data volumes are growing exponentially, but your cost to store and analyze that data can’t also grow at those same rates. AWS provides comprehensive tooling to help control the cost of storing and analyzing all of your data at scale, including features like Intelligent Tiering for data storage in S3 and features that help reduce the cost of your compute usage, like auto-scaling and integration with EC2’s Spot instances.
Comprehensive and open
We provide the broadest and deepest portfolio of purpose-built analytics tools so you can quickly get insights from your data using the most appropriate tool for the job. All of our analytics services support open file formats like Apache Parquet so you don’t need to move and transform your data in order to analyze it, but can instead store it once in a standard format and analyze it using whatever tool or technique is most appropriate.
Secure infrastructure for analytics
Securing vast volumes of data is one of the biggest challenges facing most organizations. Beyond all of the certifications and best practices you would expect from AWS, we also have security features designed to help you stay compliant with your best practices and industry regulations. For example, Amazon Macie helps find sensitive data that was accidentally stored in the wrong place and Amazon Inspector helps spot configuration errors that might lead to data breaches.

AWS Analytics services

Use cases
AWS service
Amazon Athena

Query data in S3 using SQL.

Big data processing

Amazon EMR

Hosted Hadoop framework.

Data warehousing

Amazon Redshift

Fast, simple, cost-effective data warehousing.

Real-time analytics

Amazon Kinesis

Analyze real-time video and data streams.

Operational analytics

Amazon Elasticsearch Service

Run and scale Elasticsearch clusters.

Dashboards and visualizations

Amazon QuickSight

Fast business analytics service.

Data movement
Real-time data movement

Amazon Kinesis Video Streams

Capture, process, and store video streams for analytics and machine learning.

Amazon Kinesis Data Firehose

Amazon Kinesis Data Streams

Collect streaming data, at scale, for real-time analytics.

Amazon Kinesis Data Analytics

Get actionable insights from streaming data in real-time.

Data lake
Object storage

Amazon S3

Object storage built to store and retrieve any amount of data from anywhere.

AWS Lake Formation

Build a secure data lake in days.

Backup and archive

Amazon S3 Glacier

AWS Backup

Centralized backup across AWS services.

Data catalog

AWS Glue

Prepare and load data.

AWS Lake Formation

Build a secure data lake in days.

Third-party data

AWS Data Exchange

Find and subscribe to third-party data in the cloud.

Predictive analytics and machine learning
Frameworks and interfaces

AWS Deep Learning AMIs

Deep learning on Amazon EC2.

Platform services

Amazon SageMaker

Build, train, and deploy machine learning models at scale.

Use cases

Data warehousing

Run SQL and complex, analytic queries against structured and unstructured data, without the need for unnecessary data movement.

Big data processing

Quickly and easily process vast amounts data for data engineering, data science development, and collaboration.

Real time analytics

Collect, process and analyze streaming data as it arrives in your data lake, and respond in real-time.

Operational analytics

Search, explore, filter, aggregate, and visualize your data in near real-time for application monitoring, log analytics, and clickstream analytics.

"We built a 120TB data lake in Amazon S3, with 1500 different schemes and use AWS analytics services like Glue, Redshift, and Athena extensively. We couldn’t get these insights from a bunch of siloed databases and warehouses - we needed an S3 scale data lake."

- Bernardo Rodriguez
Chief Digital Officer, J.D. Power

Additional resources

AWS Data Lab

AWS Data Lab is a four-day intensive engagement between a team of customer builders and AWS technical resources to create tangible deliverables that accelerate data and analytics modernization initiatives.

