Big Data

Scientists, developers, and many other technologists from many different industries are taking advantage of Amazon Web Services to perform big data analytics and meeting the challenges of the increasing volume, variety, and velocity of digital information. Amazon Web Services offers a comprehensive, end-to-end portfolio of cloud computing services to help you manage big data by reducing costs, scaling to meet demand, and increasing the speed of innovation.

Create a Free Account

Contact Sales

See the AWS Big Data solutions for every stage of the big data lifecycle:

Collect > Stream > Store > RDBMS | Data Warehouse | NoSQL > Analytics > Archive

Getting up to speed with big data in the cloud has never been easier. Explore the Getting Started section, with tutorials and resources, to help start your first project.

See how you can use AWS for Big Data

This short video describes how Amazon Web Services can be used to help you deliver greater business insight using big data technology. You'll learn how you can use the big data technologies you are most familiar, including Pig, Hive, Spark, Hadoop, and many other; so you can finish your big data project faster.

Big Data Video Still — Big Data on Amazon Web Services

"Automated infrastructure deployment is critical for us." »

Illumina's uses AWS for DNA sequencer data uploads and data storage. »

Saved $55,000 in upfront costs and got up and running in days. »

"Because of AWS, we spend just two days a month on infrastructure." »

Nokia does social media sentiment analysis on 10s of TBs of data using AWS. »

Loaded 37 million records in 90 minutes »

CAL AMP Lab builds scalable machine learning and data analysis technologies on AWS »

In less than 30s, Sumo Logic can process data and uses AWS to store TBs of data daily. »

What's New

Whitepaper on Big Data Analytics Options on AWS

pp_img_3_col_img_whitepaper_1_378x171 — Download a copy of the new Big Data Analytics Options on AWS whitepaper

This new whitepaper provides an overview of the different big data options available in the AWS Cloud for architects, data scientists, and developers. For each of the big data analytics options, this paper describes the following: Ideal usage patterns, Performance, Durability and availability, Cost model, Scalability, Elasticity, Interfaces, and Anti-patterns. This paper describes two scenarios showcasing the analytics options in use and provides additional resources to get started with big data analytics on AWS.

Download the whitepaper »

Check out the AWS Big Data Blog

The AWS Big Data Blog is intended for solutions architects, data scientists and developers to learn big data best practices, discover which managed AWS dig data services are the best fit for their use case, and help get started and to go deep on AWS big data services. The goal of the blog is to make this the hub for anyone to discover new ways to collect, store, process, analyze, and visualize data at any scale. Readers will find short tutorials with code samples, case studies that demonstrate the unique benefits of working with big data on AWS, new feature announcements, partner and customer generated demos and tutorials, with tips and best practices for using AWS big data services.

View the AWS Big Data Blog »

Capabilities

Collect big data into the cloud with ease

It feels like everything generates data today, from your customers on social networks to the instances running your web applications. AWS makes it easy to provision the storage, computation, and database services you need to turn that data into information for your business. AWS also has data transfer services which can move big data into and out of the cloud quickly such as AWS Direct Connect and our Import/Export service. Furthermore, all inbound data traffic into AWS is free.

Learn how to send hard drives to the cloud with AWS Import/Export »

Learn how to get your private fiber line to the cloud with AWS Direct Connect »

integ-cloud-backup — RDS is available for MySQL, PostgreSQL, Oracle, and SQL Server.

Big data streaming and analysis in real-time

kinesis-thumb2 — Watch the Introduction to Amazon Kinesis video

Amazon Kinesis is a managed service for real-time processing of streaming big data. Amazon Kinesis supports data throughput from megabytes to gigabytes of data per second and can scale seamlessly to handle streams from hundreds of thousands different sources. Designed to provide for high availability and durability in cost-effective manner, you can now focus on making sense of your data which will enable you to make better decisions faster and at lower costs.

Learn more about Amazon Kinesis »

Big data cloud storage solutions

Whether you’re storing pharmaceutical data for analysis, financial data for computation and pricing, or multimedia files such as photos and videos, Amazon Simple Storage Service (S3) is the ideal big data cloud storage solution to store original content durably. Designed for eleven 9's of durability, with no single point of failure, Amazon S3 is your fundamental big data object store.

Learn more about Amazon S3 »

Amazon Elastic Block Store (EBS) provides hard drives for as persistent storage for virtual machines. Amazon EBS volumes offer the consistent and low-latency performance needed to run big data workloads such as your own relational or NoSQL databases, enterprise applications, and high performance distributed network file systems.

Learn more about Amazon EBS »

Solid state hard drives, at your service

NoSQL data stores benefit greatly from the speed of solid state drives (SSDs). Amazon DynamoDB uses them by default, but if you are using alternatives from the AWS Marketplace, such as Cassandra or MongoDB, you can accelerate your access with on-demand access to terabytes of solid state storage, with the High I/O instance class.

Learn more about the options with EC2 instance types »

Scale your NoSQL database without limits...or headaches

When you need a NoSQL database without the operational burden to run it, look no further than Amazon DynamoDB. It is a fast, fully-managed NoSQL database service that makes it simple and cost-effective to store and retrieve any amount of data, and serve any level of request traffic.

Amazon DynamoDB has provisioned guaranteed throughput and single-digit millisecond latency make it a great fit for gaming, ad tech, mobile and many other big data applications.

Learn more about Amazon DynamoDB »

Big data does not equal NoSQL; relational DBs are big too

test-database — Find out more details on Amazon RDS

Big data innovation goes beyond NoSQL, it is more about bringing the appropriate technology to use on your data depending on your business needs. Relational databases deliver fast, predictable, and consistent performance; and it is optimized for transactional workloads such as point of sales or financial history. Relational databases play a complementary role to NoSQL databases in many comprehensive big data architectures.

Amazon RDS allows you to easily to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business.

Learn more about Amazon RDS »

Petabyte-scale data warehousing in minutes

Amazon Redshift provides a fast, fully-managed, petabyte-scale data warehouse for less than $1,000 per terabyte per year. Amazon Redshift delivers fast query and I/O performance for virtually any size dataset by using columnar storage technology and parallelizing and distributing queries across multiple nodes. In just a few minutes, you can easily provision a fully managed data warehouse with automated backups and built-in encryption. Plug in easily with your existing business intelligence tools.

Learn more about Amazon Redshift »

Big data analytics with Hadoop on AWS

Amazon Elastic MapReduce (EMR) provides the powerful Apache Hadoop framework on Amazon EC2 as a easy-to-use managed service. With Amazon EMR, you can focus on your map/reduce queries and take advantage of the broad ecosystem of Hadoop tools, while deploying to a high-scale, secure infrastructure platform. Run big data analytics jobs in the cloud with ease; let Amazon EMR do the work of managing your Hadoop clusters.

Learn more about Amazon EMR »

Save costs by naming your price on spare servers

How fast could your project go with another 1,000 virtual machines? How about 10,000? The Amazon Spot Market, integrated into Amazon Elastic MapReduce, lets you choose your own price for the computing resources you need to do analytics with cloud computing. That means you can choose your own balance of cost and performance, overclocking your analytics when you need to, or reducing costs significantly.

Get started with Spot instances »

Archive cold data at an extremely low cost

Amazon Glacier allows you to offload the administrative burdens of operating and scaling archival storage to AWS, and makes retaining data for long periods, whether measured in years or decades, especially simple. Amazon Glacier is an extremely low-cost cold storage service starting at $0.01 per GB per month. There are no upfront capital commitments, and all ongoing operational expenses are included in the price.

Learn more about Amazon Glacier »