AWS Blog

AWS Webinars – September 2016

by Jeff Barr | on 14 SEP 2016 | in Webinars | Permalink | Comments

At the beginning of the month I blogged about the value of continuing education and shared an infographic that ilustrated the link between continued education and increased pay, higher effectiveness, and decreased proclivity to seek other employment. The pace of AWS innovation means that there’s always something new to learn. One way to do this is to attend some of our webinars. We design these webinars with a focus on training and education, and strongly believe that you can walk away from them ready, willing, and able to use a new AWS service or to try a new aspect of an existing one.

To that end, we have another great selection of webinars on the schedule for September. As always they are free, but they do fill up and I strongly suggest that you register ahead of time. All times are PT, and each webinar runs for one hour:

September 20

9:00 AM –Deep Dive: Amazon Redshift for Big Data Analytics.
11:30 AM – Monitoring Containers at Scale.

September 21

9:00 AM – Getting Started with Cognito User Pools: Easily Add User Sign-Up and Sign-In to Your Apps.
11:30 AM – Best Practices to Migrate Your Data Warehouse to Amazon Redshift.
1:00 PM – Log Analytics with Amazon Elasticsearch Service.

September 22

9:00 AM – AWS Device Farm and Best Practices for Test Automation.
11:30 AM – Real-Time Data Processing Using AWS Lambda.

September 26

10:30 AM – AWS Services Overview.

September 27

9:00 AM – Automating Compliance Defense in the Cloud.
10:30 AM – AWS Infrastructure as Code: Best Practices with AWS CloudFormation.
Noon – Overview and Best Practices for Amazon Elastic Block Store.

September 28

9:00 AM – Addressing Amazon Inspector Assessment Findings.
10:30 AM – Coding Apps in the Cloud to reduce costs up to 90%.
Noon – Amazon Aurora – New Features.

September 29

9:00 AM – Getting Started with AWS IoT.
10:30 AM – Getting Started with Managed Database Services on AWS.
Noon – Building Real-Time Processing Data Analytics Applications on AWS.

— Jeff;

PS – Check out the AWS Webinar Archive for more great content!

Amazon RDS for PostgreSQL – New Minor Versions, Logical Replication, DMS, and More

by Jeff Barr | on 14 SEP 2016 | in Amazon Relational Database Service | Permalink | Comments

Amazon Relational Database Service (RDS) simplifies the process of setting up, operating, and scaling a relational database in the cloud. With support for six database engines (Amazon Aurora, Oracle, Microsoft SQL Server, PostgreSQL, MySQL, and MariaDB) RDS has become a foundation component for many cloud-based applications.

We launched support for PostgreSQL in late 2013 and followed up that launch with support for more features and additional versions of PostgreSQL:

November 2014 – Read Replicas, version 9.3.5, Trigger-based Replication, and Three New Extensions.
March 2015 – Version 9.4.1, Enhanced Monitoring, Enforced SSL Connections.
November 2015 – Point and Click Upgrade from Version 9.3 to 9.4.
April 2016 – Versions 9.3.12, 9.4.7, and 9.5.2.
June 2016 – Cross-Region Read Replicas.

Today we are launching several enhancements to Amazon RDS for PostgreSQL. Here’s a summary:

New Minor Versions – Existing RDS for PostgreSQL database instances can be upgraded to new minor versions.
Logical Replication – RDS for PostgreSQL now support logical replication and the associated logical decoding.
DMS Support – The new logical replication feature allows an RDS for PostgresSQL database instance to be used as the source for AWS Database Migration Service.
Event Triggers – Newer versions of PostgreSQL support event triggers at the database instance level.
RAM Disk Size – RDS for PostgreSQL now supports control of the size of the RAM disk.

Let’s take a closer look!

New Minor Versions
We are adding support for versions 9.3.14, 9.4.9, and 9.5.4 of PostgreSQL. Each of these versions includes fixes and enhancements as documented in the linked release notes. You can also upgrade your database instances using the RDS Console or the AWS Command Line Interface (CLI). Here’s how to upgrade from 9.5.2 to 9.5.4 using the console:

Be sure to check Apply immediately if you don’t want to wait until the next maintenance window.

Here’s how you can initiate the upgrade operation from the command line (I decided to give the command line some extra attention in this post in order to make sure that my skills were still current):

$ aws rds modify-db-instance --region us-west-2  \
  --db-instance-identifier "pg95" \
  --engine-version "9.5.4" \
  --apply-immediately

You can check on the progress of the upgrade like this:

$ aws rds describe-events --region us-west-2 \
  --source-type db-instance --source-identifier "pg95" \
  --duration 10 --output table

The following part of the output will let you know that the instance has been upgraded:

||+-------------------------------------------------+||
||                      Events                       ||
|+--------------------+------------------------------+|
||  Date              |  2016-09-13T00:07:54.547Z    ||
||  Message           |  Database instance patched   ||
||  SourceIdentifier  |  pg95                        ||
||  SourceType        |  db-instance                 ||
|+--------------------+------------------------------+|
|||                 EventCategories                 |||
||+-------------------------------------------------+||
|||  maintenance                                    |||
||+-------------------------------------------------+||

If you take a look at the entire series of events for the database instance, you’ll also see that RDS performs backups before and after the patch. You can find these backups in the console or via the command line:

$ aws rds describe-db-snapshots --region us-west-2 \
  --db-instance-identifier "pg95" \
  --snapshot-type automated --output table

The output will look like this:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                                                                                                                                                               DescribeDBSnapshots
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
||                                                                                                                                                                  DBSnapshots
|+------------------+-------------------+-----------------------+----------------------------+------------+-----------+----------------+---------------------------+-------+---------------------+-----------------+-----------------------+-
|| AllocatedStorage | AvailabilityZone  | DBInstanceIdentifier  |   DBSnapshotIdentifier     | Encrypted  |  Engine   | EngineVersion  |    InstanceCreateTime     | Iops  |    LicenseModel     | MasterUsername  |    OptionGroupName    |
|+------------------+-------------------+-----------------------+----------------------------+------------+-----------+----------------+---------------------------+-------+---------------------+-----------------+-----------------------+-
||  100             |  us-west-2b       |  pg95                 |  rds:pg95-2016-09-12-23-22 |  False     |  postgres |  9.5.2         |  2016-09-12T23:15:07.999Z |  1000 |  postgresql-license |  root           |  default:postgres-9-5 |
||  100             |  us-west-2b       |  pg95                 |  rds:pg95-2016-09-13-00-01 |  False     |  postgres |  9.5.2         |  2016-09-12T23:15:07.999Z |  1000 |  postgresql-license |  root           |  default:postgres-9-5 |
||  100             |  us-west-2b       |  pg95                 |  rds:pg95-2016-09-13-00-07 |  False     |  postgres |  9.5.4         |  2016-09-12T23:15:07.999Z |  1000 |  postgresql-license |  root           |  default:postgres-9-5 |
|+------------------+-------------------+-----------------------+----------------------------+------------+-----------+----------------+---------------------------+-------+---------------------+-----------------+-----------------------+-

Logical Replication
Amazon RDS for PostgreSQL now supports logical replication. You can now efficiently create database replicas by streaming high-level databases changes from an Amazon RDS for PostgreSQL database instance to a non-RDS database that supports the complementary logical decoding feature (PostgreSQL also supports Physical Streaming Replication, an earlier and less efficient byte/block-based mechanism for creating and maintaining replicas). Replication takes place via logical slots; each slot contains a stream of changes that can be replayed exactly once (you can read about Logical Decoding Slots in the PostgreSQL documentation to learn more).

In order to implement logical replication to a non-RDS database, you will need to ensure that the user account for the PostgreSQL database has the rds_superuser and rds_replication roles. You also need to set the rds.logical_replication parameter to 1 in the database options group for your database instance and then reboot the instance. When this parameter is applied, several PostgreSQL parameters will be configured so as to allow for replication.

With the roles in place and the database instance configured, you can create a logical slot and then instruct the non-RDS database (or other client) to read and process records from the slot. For example, the pg_recvlogical command connects to the database instance and streams data from a replication slot into a local file.

To learn more, read Logical Replication for PostgreSQL in the Amazon RDS for PostgreSQL User Guide.

DMS Support
AWS Database Migration Service helps you to migrate databases to AWS. In conjunction with the new support for Logical Replication, you can now migrate your data from a PostgreSQL database (running on RDS or on a self-managed host) to another open source or commercial database. To do this, you will need to create a logical replication slot as described above.

Event Triggers
Newer versions of PostgreSQL (9.4.9+ and 9.5.4+) support event triggers at the database level. Because these triggers exist outside of any particular database table, they are able to capture a wider range of events, including DDL-level events that create, modify, and delete tables (here’s a full list of events that fire triggers). To learn more and to see a sample implementation of a trigger, take a look at Event Triggers for PostgreSQL in the User Guide.

RAM Disk Size
You can now use the rds.pg_stat_ramdisk_size parameter to control the amount of memory used for PostgreSQL’s stats_temp_directory. This directory is used to temporarily store statistics on run-time performance and behavior; making more memory available can reduce I/O requirements and improve performance.

Available Now
The new versions and features described above are available now and you can start using them today.

— Jeff;

AWS Week in Review – September 5, 2016

by Jeff Barr | on 12 SEP 2016 | in Week in Review | Permalink | Comments

This is the third community-driven edition of the AWS Week in Review. Special thanks are due to the 15 internal and external contributors who helped to make this happen. If you would like to contribute, please take a look at the AWS Week in Review on GitHub.

Monday September 5	John McKim wrote about GraphQL with the Serverless Framework on AWS.
Tuesday September 6	We announced that you can now Use AWS Config to View API Events Associated with Configuration Changes. We announced that the AWS SDK for C++ is Now Ready for Production Use. We announced that IAM Service Last Accessed Data is Now Available for the Asia Pacific (Mumbai) Region. The AWS Security Blog wrote about Automated Reasoning and Amazon s2n 2nd Watch wrote a post about Benchmarking Amazon Aurora. We launched a new Quick Start Reference Deployment: NGINX Plus on the AWS Cloud. Aerobatic wrote a post about their new static site CI/CD pipeline built on Docker and ECS. cloudonaut.io wrote a post about a common pitfall when creating an ACM Certificate with CloudFormation. The Cloud Academy Blog wrote a post about What You Need to Know about the AWS C++ SDK. The AWS Podcast released a new episode about the Amazon Elasticache Service. Łukasz Dorosz wrote about AWS Lambda as the website monitoring tool. The Nines Observer invited AWS Insiders to Share a Tip & Win Great Prizes.
Wednesday September 7	We launched HTTP/2 Support for CloudFront. A guest post talked about Streaming Real-time Data into an S3 Data Lake at MeetMe. The Amazon Mobile App Distribution Blog shared a new video: Using Speech Synthesis Markup Language (SSML) in Alexa Skills. The AWS Partner Network Blog shared a new AWS Security Competency Partner Solution: Sophos Outbound Gateway. Kabir Chandhoke wrote about Serverless Backends. Alexei Ledenev wrote about Chaos injection with network emulation – Resiliency testing of microservices. Neil Gehani wrote about In-cluster Continuous Testing Framework. The Cloud Academy Blog talked about AWS Certifications and the two corresponding Learning Paths: AWS Solutions Architect – Professional and AWS SysOps Administrator — Associate (with closed captioning in Spanish, Portuguese and Japanese). CloudHealth Technologies published a blog about how to manage EBS snapshots.
Thursday September 8	We launched a New Reader Endpoint for Aurora. We announced that you can now View Information About Committed Code Changes in AWS CodePipeline. We announced Simplified Backend Feature Integration for AWS Mobile Hub. We announced that Amazon RDS for Oracle Now Supports the Oracle Label Security (OLS) Option. The AWS Compute Blog described A Data Sharing Playform Based on AWS Lambda The AWS Windows and .NET Developer Blog showed you how to Configure AWS SDK with .NET Core. The AWS Mobile Development Blog talked about Using Webpack with the Amazon Cognito Identity SDK for JavaScript. The folks at the Cloudablity Blog highlighted some key AWS re:Invent 2016 events and sessions not to miss in a recent article. Softchoice wrote a blog for CloudHealth Tech about what partners should deliver.
Friday September 9	We announced that you can now Monitor and React to Deployment Changes in AWS CodeDeploy with Amazon CloudWatch Events. The Amazon Mobile App Distribution Blog talked about a 14 Year Girl who Created a CloudStatus Alexa Skill that Benefits AWS Developers. The AWS Big Data Blog showed you how to Integrate IoT Events into Your Analytic Platform. Flux7 listed Three AWS Service Catalog Use Cases: Migration, Replication & Experimentation. SungardAS CTOLabs wrote a blog about Launching products in AWS Marketplace Seven Essential Things You Need to Know to Publish on the AWS Marketplace.
Saturday September 10	Botmetric shared 6 AWS Cloud Security Best Practices for Dummies.
Sunday September 11	autozane.com shared Terraform version control best practices with an example done using the new Terraform s3_bucket_policy resource. J. Scott Johnson wrote a tutorial on the topic of Understanding Elastic IPs.

New & Notable Open Source

s3logs-cloudwatch is a Lambda function parsing S3 server access log files and putting extra bucket metrics in CloudWatch.
README.md is a curated list of AWS resources used to prepare for AWS certifications.
RedEye is a utility to monitor Redshift performance.
Dockerfile will build a Docker image, push it to the EC2 Container Registry, and deploy it to Elastic Beanstalk.
lambda-contact-form supports contact form posts from static websites hosted on S3/CloudFront.
dust is an SSH cluster shell for EC2.
aws-ssh-scp-connector is a utility to help connect to EC2 instances.
lambda-comments is a blog commenting system built with Lambda.

New SlideShare Presentations

New YouTube Videos

AWS re:Invent 2016 Sneak Peek

New Customer Stories

MYOB uses AWS to scale its infrastructure to support demand for new services and saves up to 30 percent by shutting down unused capacity and using Reserved Amazon EC2 Instances. MYOB provides business management software to about 1.2 million organizations in Australia and New Zealand. MYOB uses a wide range of AWS services, including Amazon Machine Learning to build smart applications incorporating predictive analytics and AWS CloudFormation scripts to create new AWS environments in the event of a disaster.
PATI Games needed IT solutions that would guarantee the stability and scalability of their game services for global market penetration, and AWS provided them with the most safe and cost-efficient solution. PATI Games is a Korean company primarily engaged in the development of games based on SNS platforms. AWS services including Amazon EC2, Amazon RDS (Aurora), and Amazon CloudFront enable PATI Games to maintain high reliability, decrease latency, and eventually boost customer satisfaction.
Rabbi Interactive scales to support a live-broadcast, second-screen app and voting system for hundreds of thousands of users, gives home television viewers real-time interactive capabilities, and reduces monthly operating costs by 60 percent by using AWS. Based in Israel, the company provides digital experiences such as second-screen apps used to interact with popular television shows such as “Rising Star” and “Big Brother.” Rabbi Interactive worked with AWS partner CloudZone to develop an interactive second-screen platform.

Upcoming Events

September 14 (Bath, UK) – Meetup #2 of the AWS User Group Bath in Bath in South West England.
September 15 (Washington, DC) – What Can You Do With the Virtual Private Cloud.
September 28 (London, UK) – Meetup #22 of the AWS User Group UK in London.
September 29 (Ipswich, UK) – Meetup #1 of the new Ipswich AWS User Group.
September 29 (Cork, Ireland) – Learn how to get started with the AWS Cloud at the AWS AWSome Day in Cork.
September 29 (Cork, Ireland) – Meetup #2 of the AWS User Group Network Meetup in Cork.

Help Wanted

AWS Careers.

Stay tuned for next week! In the meantime, follow me on Twitter and subscribe to the RSS feed.

Streaming Real-time Data into an S3 Data Lake at MeetMe

by Jeff Barr | on 09 SEP 2016 | in Amazon Kinesis, Amazon S3, Guest Post | Permalink | Comments

In today’s guest post, Anton Slutsky of MeetMe describes the implementation process for their Data Lake.

— Jeff;

Anton Slutsky is an experienced information technologist with nearly two decades of experience in the field. He has an MS in Computer Science from Villanova University and a PhD in Information Science from Drexel University.

Modern Big Data systems often include structures called Data Lakes. In the industry vernacular, a Data Lake is a massive storage and processing subsystem capable of absorbing large volumes of structured and unstructured data and processing a multitude of concurrent analysis jobs. Amazon Simple Storage Service (S3) is a popular choice nowadays for Data Lake infrastructure as it provides a highly scalable, reliable, and low-latency storage solution with little operational overhead. However, while S3 solves a number of problems associated with setting up, configuring and maintaining petabyte-scale storage, data ingestion into S3 is often a challenge as types, volumes, and velocities of source data differ greatly from one organization to another.

In this blog, I will discuss our solution, which uses Amazon Kinesis Firehose to optimize and streamline large-scale data ingestion at MeetMe, which is a popular social discovery platform that caters to more than a million active daily users. The Data Science team at MeetMe needed to collect and store approximately 0.5 TB per day of various types of data in a way that would expose it to data mining tasks, business-facing reporting and advanced analytics. The team selected Amazon S3 as the target storage facility and faced a challenge of collecting the large volumes of live data in a robust, reliable, scalable and operationally affordable way.

The overall aim of the effort was to set up a process to push large volumes of streaming data into AWS data infrastructure with as little operational overhead as possible. While many data ingestion tools, such as Flume, Sqoop and others are currently available, we chose Amazon Kinesis Firehose because of its automatic scalability and elasticity, ease of configuration and maintenance, and out-of-the-box integration with other Amazon services, including S3, Amazon Redshift, and Amazon Elasticsearch Service.

Business Value / Justification
As it is common for many successful startups, MeetMe focuses on delivering the most business value at the lowest possible cost. With that, the Data Lake effort had the following goals:

Empowering business users with high-level business intelligence for effective decision making.
Enabling the Data Science team with data needed for revenue generating insight discovery.

When considering commonly used data ingestion tools, such as Scoop and Flume, we estimated that, the Data Science team would need to add an additional full-time BigData engineer in order to set up, configure, tune and maintain the data ingestion process with additional time required from engineering to enable support redundancy. Such operational overhead would increase the cost of the Data Science efforts at MeetMe and would introduce unnecessary scope to the team affecting the overall velocity.

Amazon Kinesis Firehose service alleviated many of the operational concerns and, therefore, reduced costs. While we still needed to develop some amount of in-house integration, scaling, maintaining, upgrading and troubleshooting of the data consumers would be done by Amazon, thus significantly reducing the Data Science team size and scope.

Configuring an Amazon Kinesis Firehose Stream
Kinesis Firehose offers the ability to create multiple Firehose streams each of which could be aimed separately at different S3 locations, Redshift tables or Amazon Elasticsearch Service indices. In our case, our primary goal was to store data in S3 with an eye towards other services mentioned above in the future.

Firehose delivery stream setup is a 3-step process. In Step 1, it is necessary to choose the destination type, which lets you define whether you want your data to end up in an S3 bucket, a Redshift table or an Elasticsearch index. Since we wanted the data in S3, we chose “Amazon S3” as the destination option. If S3 is chosen as the destination, Firehose prompts for other S3 options, such as the S3 bucket name. As described in the Firehose documentation, Firehose will automatically organize the data by date/time and the “S3 prefix” setting serves as the global prefix that will be prepended to all S3 keys for a given Firehose stream object. It is possible to change the prefix at a later date even on a live stream that is in the process of consuming data, so there is little need to overthink the naming convention early on.

The next step of the process is the stream configuration. Here, it is possible to override various defaults and specify other meaningful values. For example, selecting GZIP compression instead of the uncompressed default will greatly reduce the S3 storage footprint and, consequently, the S3 costs. Enabling Data encryption will encrypt the data at rest, which is important for sensitive data.

One important note is that the choice of the compression algorithm affects the resulting filenames (S3 keys) for the stream objects. Therefore, while it is possible to change these settings later on a live stream, it may be prudent to settle on the compression/encryption approach early to avoid possible issues with processing scripts.

As mentioned in Amazon Kinesis Firehose Limits, Kinesis Firehose has a set of default throughput quotas. Once those quotas are exceeded, Firehose will respond with an error message “ServiceUnavailableException: Slow down.” and will drop data. Therefore, in order to avoid data loss, it is important to estimate individual throughput requirements. If those requirements are likely to exceed the default quotas, it is possible to request additional throughput by submitting a limit increase request as described in the limits.

The final step (not shown) is to review the desired configuration and create the stream.

Setting up the Upload Process
At MeetMe, RabbitMQ serves as a service bus for most of the data that flows through the system. Therefore, the task of data collection for the most part amounts to consuming large volumes of RabbitMQ messages and uploading them to S3 with the help of Firehose streams. To accomplish this, we developed lightweight RabbitMQ consumers. While RabbitMQ consumers are implemented elsewhere (such as Flume), we opted to develop our own versions to enable integration with Firehose API.

Firehose provides two ways of upload data – single record and bulk. With the single record approach, each individual record is packaged into Amazon Firehose API framework objects and each object is serialized to a Firehose end-point via HTTP/Rest. While this approach may be appropriate for some applications, we achieved better performance by using the bulk API methods. The bulk methods allow up to 500 records to be uploaded to Firehose with a single request.

To upload a batch of messages, the lightweight RabbitMQ consumer maintains a small internal buffer, which gets serialized to Firehose as often as possible by a predefined set of processor threads. Here’s the code:

new Thread(new Runnable()
{
  public void run()
  {
    logger.info("Kinesis writer thread started.  Waiting for records to process...");
    while(true)
    {
      try
      {
        if(!recordsQueue.isEmpty())
        {
           if(logger.isDebugEnabled())
             logger.debug("Uploading current batch to AWS: "+recordsQueue.size());
        
           List<MMMessage> records = recordsQueue;
           recordsQueue = new CopyOnWriteArrayList<MMMessage>();
        
           final int uploadThreshold = 499;
        
           List<Record> buffer = new ArrayList<Record>(uploadThreshold);
        
           for(int i = 0; i < records.size(); i++)
           {
             // Get a proprietary message object from an internal queue
             MMessage mmmessage = records.get(i);
                 
             // Get the bytes
             String message = new String(mmmessage.body, "UTF-8");
                 
             // Check for new line and tab characters in data to avoid
             // issues with Hadoop/Spark processing line-based processing
             // later on
             message = CharMatcher.anyOf("\n").replaceFrom(message, "\\n");
             message = CharMatcher.anyOf("\t").replaceFrom(message, "\\t");
 
             // Wrap the message bytes with Amazon Firehose API wrapper    
             Record record = new Record().withData(ByteBuffer.wrap(message.getBytes()));
 
             buffer.add(record);
                 
             // If the current buffer is large enough,
             if(buffer.size() == uploadThreshold)
             {
               // send it to Firehose
               uploadBuffer(buffer);
               // and instantiate a new buffer
               buffer = new ArrayList<Record>(uploadThreshold);
             }
           }
           // don't forget to upload last buffer!
           uploadBuffer(buffer);                                
         }
       }
       catch(Exception e)
      {
        logger.error("Error in sending to Kinesis:"+e.getMessage(), e);
      }
    }
  }
}).start();

The uploadBuffer method is a simple wrapper over the bulk upload Firehose API:

private void uploadBuffer(final List<Record> buffer)
{
  // Make a new request object
  PutRecordBatchRequest request = new PutRecordBatchRequest();
  // Specify the stream name
  request.setDeliveryStreamName("MEETME_STREAM");
        
  // Set the data buffer
  request.setRecords(buffer);
 
  // Attempt to send to Firehose
  PutRecordBatchResult result = getAmazonClient().putRecordBatch(request);
        
  // Always check for failures!
  Integer failed = result.getFailedPutCount();
  if (failed != null && failed.intValue() > 0)
  {
    // If there are failures, find out what caused them
    logger.warn("AWS upload of [" + buffer.size() + "] resulted in " + failed + " failures");
                 
    // Dig into the responses to see if there are various kinds of failures
    List<PutRecordBatchResponseEntry> res = result.getRequestResponses();
    if (res != null)
    {
      for (PutRecordBatchResponseEntry r : res)
      {
        if (r != null)
        {
          logger.warn("Report " + r.getRecordId() + ", " + r.getErrorCode() + ", " + r.getErrorMessage()
                      + ", " + r.toString());
        }
        else
        {
          logger.warn("Report NULL");
        }
      }
    }
    else
    {
      logger.warn("BatchReport NULL");
    }
  }
}

Monitoring Firehose Streams
Once the Firehose streams are set up and internal consumer processes begin to send data, a common task is to attempt to monitor the data flow. Some of the reasons for paying attention to data flow are data volume considerations, potential error conditions, capturing failures among many others. With Amazon Firehose, monitoring is accomplished with the help of Amazon CloudWatch. Common delivery stream metrics are available under the Monitoring tab in each Firehose stream configuration with additional metrics available through the CloudWatch Console.

While AWS provides an extensive set of monitoring facilities, in our experience it turned out to be important to carefully monitor internal data producer logs for errors. Such close monitoring using the syslog facility, Splunk, and other log monitoring tools allowed us to capture and fix specific errors and reduce the number of individual record failures to tolerable levels. Further, internal log monitoring allowed us to recognize early that our volumes were quickly exceeding default Firehose throughput quotas (see above).

Anton Slutsky, Director of Data Science, MeetMe

New Reader Endpoint for Amazon Aurora – Load Balancing & Higher Availability

by Jeff Barr | on 08 SEP 2016 | in Amazon Aurora, Launch | Permalink | Comments

Feature-by-feature, Amazon Aurora has become more powerful and easier to use. Over the past months we have given you the ability to create a cluster from a MySQL backup, create cross-region read replicas, share snapshots across accounts, exercise additional control over failover, and migrate from other in-cloud or on-premises databases to Aurora.

Today, as an extension of Aurora’s existing read replica model, we are introducing a new cluster-level read endpoint. Your application can continue to direct read queries to the individual replicas as before. However, you can also update it to make use of the new endpoint. Doing so will give you two important advantages: load balancing and higher availability:

Load Balancing – Connecting to the cluster endpoint allows Aurora to load-balance connections across the replicas in the DB cluster. This helps to spread the read workload around and can lead to better performance and more equitable use of the resources available to each replica. In the event of a failover, if the replica that you are connected to is promoted to the primary instance, the connection will be dropped. You can then reconnect to the reader endpoint in order to send your read queries to the other replicas in the cluster.

Higher Availability – You can place multiple Aurora replicas in distinct Availability Zones and connect to them via the new endpoint. In the unlikely event that an Availability Zone fails, applications that make use of the new endpoint will continue to send read traffic to the other replicas with minimal disruption.

Find the Endpoint
You can find the new Reader Endpoint in the Aurora Console:

This handy new feature is available now and you can start using it today!

— Jeff;

New – HTTP/2 Support for Amazon CloudFront

by Jeff Barr | on 07 SEP 2016 | in Amazon CloudFront, Launch | Permalink | Comments

When I interview a candidate for a technical position, I often ask them to explain what happens when they see a interesting link and decide to click on it. I encourage them to go in to as much detail as they would like. The answers let me know how well they understand and can explain a complex concept. Some candidates will sum up the entire process in a sentence or two. Others have filled an entire whiteboard with detailed diagrams. At a very simple level, here are the steps that I like to see (If you know much about HTTP, you know that I have failed to mention the SSL handshake, cookies, response codes, caches, content distribution networks, and all sorts of other details):

The domain name is converted to an IP address by way of a DNS lookup.
A TCP connection is made to the remote server.
A GET request is issued.
The remote server locates (or generates) the desired content and returns it to fulfill the request.
The TCP connection is closed.
The client processes and displays the result..

A complex web page might contain references to scripts, style sheets, images, and so forth. If this is the case, the entire sequence must be repeated for each reference. On mobile devices, each connection request wakes up the radio, adding hundreds or thousands of milliseconds of overhead (read about Application Network Latency Overhead to learn more).

Amazon CloudFront is a global content distribution network, or CDN. Several of CloudFront’s features help to make the process above more efficient. For example, it caches frequently used content in dozens of edge locations scattered across the planet. This allows CloudFront to respond to requests more quickly and over a shorter distance, thereby improving application performance. With no minimum usage commitments, CloudFront can help you to deliver web sites, videos, and other types of content to your end users in an economical and expeditious way.

New HTTP/2 Support
The retrieval process that I described above contains a lot of room for improvement. Repeated DNS lookups are avoidable, as are TCP connection requests. HTTP/2, a new version of the HTTP protocol, streamlines the process by reusing the TCP connection if possible. This core feature, combined with many other changes to the existing HTTP model, has the potential to reduce latency and to improve the performance of all types of web applications.

Today we are launching HTTP/2 support for CloudFront. You can enable it on a per-distribution basis today and your HTTP/2-aware clients and applications will start to make use of it right away. While HTTP/2 does not mandate the use of encryption, it turns out that all of the common web browsers require the use of HTTPS connections in conjunction with HTTP/2. Therefore, you may need to make some changes to your site or application in order to take full advantage of HTTP/2. Due to the (fairly clever) way in which HTTP/2 works, older clients remain compatible with web endpoints that do support it.

The connection from CloudFront back to your origin server is still made using HTTP/1. You don’t need to make any server-side changes in order to make your static or dynamic content accessible via HTTP/2.

Several AWS customers have already been testing CloudFront’s HTTP/2 support and have seen clear performance improvements. Marfeel is an ad tech platform that helps publishers to create, optimize, and monetize mobile web sites. They told us that CloudFront’s HTTP/2 support has reduced their first-render time by 17%. This allows the sites that they create to consistently load within 0.8 seconds. making them more accessible to mobile readers.

Enabling HTTP/2
To enable HTTP/2 for an existing CloudFront distribution, simply open up the CloudFront Console, locate the distribution, and click on Edit. Then change the Supported HTTP Versions to include HTTP/2:

The change will be effective within minutes and your users should start to see the benefits shortly thereafter. As I noted earlier, HTTP/2 must be used in conjunction with HTTPS. You can use your browser’s developer tools to verify that HTTP/2 is being used. Here’s what I see when I use the Network tool in Firefox:

You can also add HTTP/2 support to Curl and test from the command line:

$ curl --http2 -I https://d25c7x5dprwhn6.cloudfront.net/images/amazon_fulfilment_center_phoenix.jpg
HTTP/2.0 200
content-type:image/jpeg
content-length:650136
date:Sun, 04 Sep 2016 23:32:39 GMT
last-modified:Sat, 03 Sep 2016 15:21:01 GMT
etag:"b95a82b8df7373895a44a01a3c1e6f8d"
x-amz-version-id:fgWz_QaWo_.4GF7_VOl0gkBwnOmOALz6
accept-ranges:bytes
server:AmazonS3
age:644
x-cache:Hit from cloudfront
via:1.1 91e54ea7c5cc54f4a3500c72b19a2a23.cloudfront.net (CloudFront)
x-amz-cf-id:Dr_A3emW7OdxWfs3O0lDZfiBFL7loKMFCP9XC0_FYmCkeRuyXcu5BQ==

Learning More
Here are some resources that you can use to learn more about HTTP/2 and how it can benefit your application:

Journey to HTTP/2 – I heartily agree with Kamran Ahmed‘s declaration that “HTTP is the protocol that every web developer should know as it powers the whole web and knowing it is definitely going to help you develop better applications.”
Curl With HTTP Support – Step by step directions that show you how to upgrade cURL so that it can make HTTP/2 requests.
HTTP/2 – This chapter is part of the High Performance Browser Networking book.
HTTP/2 Home Page – The primary information source for all things HTTP/2; pay special attention to the HTTP/2 FAQ.
7 Tips for Faster HTTP/2 Performance – Some good ideas, but ignore the references to the now-deprecated SPDY.
HTTP/2 Explained – A detailed reference to HTTP/2 as a technology and as a protocol.

Available Now
This feature is available now and you can start using it today at no extra charge.

— Jeff;

AWS SDK for C++ – Now Ready for Production Use

by Jeff Barr | on 06 SEP 2016 | in Developer Tools | Permalink | Comments

After almost a year of developer feedback and contributions, version 1.0 of the AWS SDK for C++ is now available and recommended for production use. The SDK follows semantic versioning, so starting at version 1.0, you can depend on any of the C++ SDKs at version 1.x, and upgrades will not break your build.

Based on the feedback that we received for the developer preview of the SDK, we have made several important changes and improvements:

Semantic Versioning – The SDK now follows semantic versioning. Starting with version 1.0, you can be confident that upgrades within the 1.x series will not break your build.
Transfer Manager – The original TransferClient has evolved into the new and improved TransferManager interface.
Build Process – The CMake build chain has been improved in order to make it easier to override platform defaults.
Simplified Configuration -It is now easier to set SDK-wide configuration options at runtime.
Encryption – The SDK now includes symmetric cryptography support on all supported platforms.
NuGet – The SDK is now available via NuGet (read AWS SDK for C++ Now Available via. NuGet to learn more).
Fixes – The 1.0 codebase includes numerous bug fixes and build improvements.

In addition, we have more high-level APIs that we will be releasing soon to make C++ development on AWS even easier and more secure.

Here’s a code sample using the new and improved TransferManager API:

#include <aws/core/Aws.h>
#include <aws/s3/S3Client.h>
#include <aws/transfer/TransferManager.h>

static const char* ALLOC_TAG = "main";

int main()
{
    Aws::SDKOptions options;
    Aws::InitAPI(options);

    auto s3Client = Aws::MakeShared<Aws::S3::S3Client>(ALLOC_TAG);
    Aws::Transfer::TransferManagerConfiguration transferConfig;
    transferConfig.s3Client = s3Client;

    transferConfig.transferStatusUpdatedCallback =
       [](const TransferManager*, const TransferHandle& handle)
       { std::cout << "Transfer Status = " << static_cast(handle.GetStatus()) << "\n"; }

    transferConfig.uploadProgressCallback =
        [](const TransferManager*, const TransferHandle& handle)
        { std::cout << "Upload Progress: " << handle.GetBytesTransferred() << " of " << handle.GetBytesTotalSize() << " bytes\n";};

    transferConfig.downloadProgressCallback =
        [](const TransferManager*, const TransferHandle& handle)
        { std::cout << "Download Progress: " << handle.GetBytesTransferred() << " of " << handle.GetBytesTotalSize() << " bytes\n"; };
    
    Aws::Transfer::TransferManager transferManager(transferConfig);
    auto transferHandle = transferManager.UploadFile("/user/aws/giantFile", "aws_cpp_ga", "giantFile", 
                                                     "text/plain", Aws::Map<Aws::String, Aws::String>());
    transferHandle.WaitUntilFinished();
     
    Aws::ShutdownAPI(options);
    return 0;
}

Visit the AWS SDK for C++ home page and read the AWS Developer Blog (C++) to learn more.

Keep the Feedback Coming
Now that the AWS SDK for C++ is production-ready, we’d like to know what you think, how you are using it, and how we can make it even better. Please feel free to file issues or to submit pull requests as you find opportunities for improvement.

— Jeff;

AWS Week in Review – August 29, 2016

by Jeff Barr | on 06 SEP 2016 | in Week in Review | Permalink | Comments

This is the second community-driven edition of the AWS Week in Review. Special thanks are due to the 13 external contributors who helped to make this happen. If you would like to contribute, please take a look at the AWS Week in Review on GitHub. Adding relevant content is fast and easy and can be done from the comfort of your web browser! Just to be clear, it is perfectly fine for you to add content written by someone else. The goal is to catch it all, as they say!

Monday August 29	We announced Improvements to CloudWatch Logs & Dashboards. The AWS Partner Network Blog showed you how to Build Sparse EBS Volumes for Fun and Easy Snapshotting. The Backspace Blog wrote about front-end security of applications Shared Responsibility 2 – Using Dynamic CSS Selectors to stop the bots. John McKim wrote about Serverless Architectures. Antoni Massó shared a tutorial on how to implement your own ElastiCache (Memcached) node Auto Discovery in AWS.
Tuesday August 30	We announced the AWS Hot Startups for August 2016. We announced the AWS SDK for React Native. We announced that Amazon API Gateway and AWS Lambda are available in the Asia Pacific (Seoul) Region. We announced NAPTR Record Support and a DNS Query Test Tool for Route 53. We announced a Redesign of the AWS Storage Gateway Console. We announced Query String Whitelisting for CloudFront. The AWS Security Blog showed you How to Use Amazon CloudWatch Events to Monitor Application Health. Danilo Poccia wrote about Calling AWS Lambda Functions from a Web Browser. Alex Casalboni blogged about Serverless Computing & Machine Learning. The Cloud Academy Blog wrote about the Top 13 Amazon Virtual Private Cloud (VPC) Best Practices. FittedCloud blogged about How to create thin provisioned EBS volumes. HSReplay.net explained how they use AWS Lambda and API Gateway to process Blizzard’s Hearthstone replays
Wednesday August 31	We announced that you can Run SAP HANA on Clusters of X1 Instances. We announced an Oldest-Message CloudWatch Metric for SQS. We announced that AWS Config Now Supports Application Load Balancers. The AWS Compute Blog showed you how to Run Containerized Microservices with Amazon EC2 Container Service and Application Load Balancer. The AWS Government, Education, & Nonprofits Blog explained how Moving Buildings Leads to the All-In Move to the Cloud for Pacific Northwest College of Art and revealed that Ransomware Fightback Takes to the Cloud. 1043 Labs published the fourth part of their series on managing DynamoDB provisions. The Skeddly Blog wrote An Introduction to Amazon CloudTrail.
Thursday September 1	We launched Auto Scaling for EC2 Spot Fleets. We announced that Amazon RDS for Oracle now supports the Oracle Enterprise Manager (OEM) Cloud Control. We announced a User Import Tool and CloudTrail Support for Amazon Cognito’s Your User Pools. I published an Infographic – Top 5 Findings from Global Knowledge’s IT Skills and Salary Report. We published a New Alexa Skills Template for Developers.
Friday September 2	We launched an Amazon Aurora Update with Support for Parallel Read Ahead, Faster Indexing, and NUMA Awareness. The AWS Compute Blog showed you how to Implement a Serverless AWS IoT Backend with AWS Lambda and Amazon DynamoDB. The AWS DevOps Blog showed you how to Secure AWS CodeCommit with Multi-Factor Authentication. The AWS Big Data Blog showed you how to Process VPC Flow Logs with Amazon EMR. cloudonaut.io wrote about how to Reference API Gateway Models in CloudFormation. CloudHealth Technologies talked about The Value of Reserved Instance Modifications. CloudHealth Technologies explained How NewsUK Manages their AWS Environment. The blog autozane.com shared a Terraform recipe outlining how to leverage Terraform for continuous intergration on AWS.

New & Notable Open Source

apilogs is a command-line utility to help aggregate, stream, and filter CloudWatch Log events produced by API Gateway and Lambda serverless APIs.
MoonMail is a fully Lambda / SES powered email marketing tool.

New SlideShare Presentations

New Customer Success Stories

Bustle uses AWS Lambda to process high volumes of data generated by the website in real-time, allowing the team to make faster, data-driven decisions. Bustle.com is a news, entertainment, lifestyle, and fashion website catering to women.
Graze continually improves its customers’ experience by staying agile—including in its infrastructure. The company sells healthy snacks through its website and via U.K. retailers. It runs all its infrastructure on AWS, including its customer-facing websites and all its internal systems from the factory floor to business intelligence.
Made.com migrated to AWS to support a record-breaking sales period with no downtime. The company provides a website that links home-furnishings designers directly to consumers. It now runs its e-commerce platform, website, and customer-facing applications on AWS, using services such as Amazon EC2, Amazon RDS, and Auto Scaling groups.
Sony DADC New Media Solutions (NMS) distributes hundreds of thousands of hours of video content monthly, spins up data analytics, renders solutions in days instead of months, and saves millions of dollars in hardware refresh costs by going all in on AWS. The organization distributes and delivers content to film studios, television broadcasters, and other providers across the globe. NMS runs its content distribution platform, broadcast playout services, and video archive on the AWS Cloud.
Upserve quickly develops and trains more than 100 learning models, streams restaurant sales and menu item data in real time, and gives restaurateurs the ability to predict their nightly business using Amazon Machine Learning. The company provides online payment and analytical software to thousands of restaurant owners throughout the U.S. Upserve uses Amazon Machine Learning to provide predictive analysis through its Shift Prep application.

Upcoming Events

September 6 (Dublin, Ireland) – The AWS User Group Network (Dublin) are running an AWS IoT Workshop.
September 8 (Manchester, UK) – The AWS User Group North are running a full day AWS User Group Meetup at the offices of Zen Internet in Rochdale and featuring speakers from AWS and the AWS Community.
September 13 (Cambridge, UK) – Meetup #8 of the AWS User Group Cambridge, which will focus on Big Data on AWS.
September 14 (Bath, UK) – Meetup #2 of the AWS User Group Bath in Bath in South West England.

Help Wanted

AWS Careers.

Stay tuned for next week! In the meantime, follow me on Twitter, subscribe to the RSS feed, and contribute some content!

— Jeff;

Yemeksepeti: Our Shift to Serverless Architecture

by Jeff Barr | on 06 SEP 2016 | in AWS Lambda, Customer Success, Guest Post | Permalink | Comments

AWS Community Hero Onur Salk wrote the guest post below in order to tell you how he helped his employer to move to a serverless architecture.

— Jeff;

I’m Onur Salk, AWS Community Hero, AWS Certified Solutions Architect – Professional, and organizer of the AWS user group in Turkey. As a Hero, I like to share my AWS experience and knowledge with the community on my personal blog and through meetups with the community. Today I want to share the story behind Yemeksepeti and our shift to serverless architecture.

The Story Behind Yemeksepeti
Yemeksepeti is the biggest online food ordering company in Turkey. It lets users place food orders from affiliated network restaurants without charging any extra fees. At Yemeksepeti, we needed to set up a globally distributed service that is scalable, high-performing, and cost-effective. Our belief is that by designing a serverless architecture, we won’t have to worry about managing our servers and can remove a lot of operational burdens from our team. This means we can focus on running our code at scale.

At Yemeksepeti.com, we developed a real-time discount system called Joker about four years ago. The purpose of this system is to suggest discounts to customers that they normally cannot find for restaurants. The original Joker platform was developed in .NET and then integrated with the website and mobile devices using its REST API. We were asked to open the platform’s API to our sister companies operating in 34 countries, so that they can also provide real-time Joker discounts to their customers.

Initially, we thought we would share our code and let them integrate their applications. However, most of other countries were using different technology stack (programming languages, database, and so on). Although using our code might accelerate their development at first, they would have to maintain an unfamiliar system. We needed to find an integration method that was easier to implement and cheaper to maintain.

Our Requirements
This was a global project, and these were our five focus areas:

Ease of management
High availability
Scalability
Use in several regions
Cost advantage

We evaluated these focus areas against several different processing models and came up with the following matrix:

	Ease of Management	High Availability	Scalability	Use in Several Regions	Cost Advantage
IaaS We could spin up some EC2 instances running IIS on top of Microsoft Windows Server and connected to an RDS DB instance.	No. We need to take care of our servers.	Yes. We can distribute our servers to different AZs.	Yes. We can use Auto Scaling	Yes. We can use AMIs and copy between regions	Partially. There will be license fees and costs for running EC2 instances .
PaaS We could use AWS Elastic Beanstalk.	Partially. We need to take care of our servers.	Yes. We can distribute our servers to different AZs.	Yes. We can use Auto Scaling.	Yes. We can use environment configurations, AMIs, etc.	Partially. There will be license fees and costs for running EC2 instances.
FaaS We could use AWS Lambda.	Yes. AWS takes care of the services.	Yes. It is already highly available	Yes. It performs at any scale	Yes. We can export/import/upload our configurations easily.	Yes. There are no licenses and we pay only for what we use.

We decided to use Faas (Functions as a Service). We started our project in the Europe (Ireland) regions using the following services:

Architecture
Our architecture looks like this:

Amazon VPC: We use Amazon VPC to launch our resources in our private network.

Amazon API Gateway: During the development phase, we started to develop the service in the Europe (Ireland) region. At that time, AWS Lambda was not available in Europe (Frankfurt). We created two APIs: one for web integration and the other for the admin interface. We used custom authorizers with JSON Web Tokens (JWT) to enable token-based authorization for our APIs. We used mapping templates to pass our variables to our Lambda functions.

In the development phase, there was only a test stage for each API.

During the production phase, AWS Lambda became available in Frankfurt. We decided to move the service there to benefit from low latency access from Turkey. We used the API Gateway Export API feature to export our configuration in Swagger format, and then imported it into Frankfurt. (Before the import, we changed the region definitions in the exported file to eu-central-1.) After that, we created a production stage and used stage variables to parameterize our database definitions of the Amazon RDS instances (like host, username, and so on). We also wanted to use our custom domain name. After we bought an SSL certificate for our domain, we created a custom domain name in the Amazon API Gateway console and created an alias for our CloudFront distribution name (Amazon API Gateway uses Amazon CloudFront in the background). Finally, we created an IAM role to enable Amazon CloudWatch logging for API calls, latency, and more.

Metrics for Get_Joker_offer resource:

AWS Lambda: During the development phase, we used Python to develop our service and created 65 functions for integrating our API methods and scheduled tasks using CloudWatch Events Lambda triggers. Lambda VPC integration became available during the production phase, so we uploaded our functions to the Frankfurt region and integrated them with VPC.

Invocation count of Get_joker_offer Lambda function (The peaks correspond to lunch and dinner times (when people are hungry)):

Amazon RDS: During the development phase, we chose to use Amazon RDS for PostgreSQL. We created a single-AZ RDS instance to test our service. During the production phase, we needed to move our database because we migrated our APIs and functions to Frankfurt. We created a snapshot of our instance and using the Copy snapshot feature of RDS, we successfully moved our database. We launched two instances in our VPC: a multi-AZ instance for production and a single-AZ instance for test purposes. In our API stage variables, we defined the endpoint names of our RDS instances to map the staging to the appropriate instance. We also enabled automated backups for both instances.

Amazon S3: The Joker platform has an admin panel that’s used for managing and reporting Joker offers. To host this administration interface, which is basically a Single Page Application (SPA) with AngularJS, we used the static website hosting feature of Amazon S3. All of the logic and functionality is provided by methods running on Lambda, so we didn’t need a server for the admin interface:

Amazon CloudWatch: We use the service to monitor the usage of our important assets and to get alerts if something goes wrong. We created a custom dashboard to monitor the CPU of our production database, connection count, critical API latencies and function counts and durations.

In our Python code, we log the durations of each inner method in CloudWatch to track performance and find any bottlenecks:

Here’s our CloudWatch dashboard:

Amazon ElasticSearch: During the development phase, Cloudwatch Logs streaming to Amazon ES became available in the Ireland region. Using this feature, we created a Kibana dashboard to monitor some other metrics from the logs we generate from our code. As soon as Amazon ES is available in the Frankfurt region, we will use it again.

Initial Results
The Joker system is in production now, as a pilot for a small region of a country. As you can see from the following chart, the growth of the number of orders is promising. By leveraging serverless architecture, we didn’t have to install and manage an operating system and dependencies. Using Amazon API Gateway, AWS Lambda, Amazon S3, and Amazon RDS, our architecture runs in a highly available environment. We don’t need to learn and manage any master-slave replication features or third-party tools. As our service gets more requests, AWS Lambda adds more Lambda instance, so it runs at any scale. We are able to copy our service to another region using the features of AWS services as we did before going into production. Finally, we don’t run any servers, so we benefit from the cost advantage of serverless architecture.

Here is a representation of the number of orders placed through Joker:

What’s Next
We hope this service will spread to all 34 of the sister companies within Delivery Hero Holding. As the service is rolled out globally, we will deploy to other AWS regions. We plan to choose the region nearest to the company. To optimize our costs, we will purchase reserved instances for our RDS instances. Also, as we monitor our inner methods’ duration, we can re-factor and optimize our code and so that we can decrease our Lambda functions’ execution times.

We believe the future of the cloud is FaaS. We would like to experiment more as other features, services, and functions become available.

As an AWS Community Hero, I look forward to sharing the Yemeksepeti story with the AWS user group in Turkey. I’d like to help people explore and leverage serverless architecture.

— Onur Salk

Amazon Aurora Update – Parallel Read Ahead, Faster Indexing, NUMA Awareness

by Jeff Barr | on 02 SEP 2016 | in Amazon Aurora, Launch | Permalink | Comments

Amazon Aurora is currently the fastest-growing AWS service!

As a relational database designed for the cloud (read Amazon Aurora – New Cost-Effective MySQL-Compatible Database Engine for Amazon RDS to learn more), Aurora offers great performance, effortless storage scaling all the way up to 64 TB, durability, and high availability. Because Aurora was designed to be compatible with MySQL, our customers have been able to move existing applications and to build new ones with ease.

With MySQL compatibility “on top” and the unique, cloud-native Aurora architecture underneath, we have a lot of room to innovate. We can continue to make Aurora more efficient while still remaining compatible with all of those applications.

Today we are making three such performance improvements to Aurora, each one aimed at making Aurora more performant on a wide range of workloads commonly run by AWS customers. Here’s an overview:

Parallel Read Ahead – Range selects, full table scans, table alterations, and index generation are now up to 5x faster.

Faster Index Build – Generation of indexes is now about 75% faster.

NUMA-Aware Scheduling – When run on instances with more than one CPU chip, reads from the query cache and the buffer cache are faster, improving overall throughput by up to 10%.

Let’s dive in…

Parallel Read Ahead
The InnoDB storage engine used by MySQL organizes table rows and the underlying storage (disk pages) using the index keys. This makes sequential scans over full tables fast and efficient for freshly created tables. However, as rows are updated, inserted, and deleted over time, the storage becomes fragmented, the pages are no longer physically sequential, and scans can slow down dramatically. InnoDB’s Linear Read Ahead feature attempts to deal with this fragmentation by bringing up to 64 pages in to memory before they are actually needed. While well-intentioned, this feature does not provide a meaningful performance improvement on enterprise-scale workloads.

With today’s update, Aurora is now a lot smarter about handling this very common situation. When Aurora scans a table, it logically (as opposed to physically) identifies and then performs a parallel prefetch of the additional pages. The parallel prefetch takes advantage of Aurora’s replicated storage architecture (two copies in each of three Availability Zones) and helps to ensure that the pages in the database cache are relevant to the scan operation.

As a result of this change, range selects, full table scans, the ALTER TABLE operation, and index generation are up to 5x faster than before.

You will see the improved performance as soon as you upgrade to Aurora 1.7 (see below for more information).

Faster Index Build
When you create a primary or secondary index on a table, the storage engine creates a tree structure that contains the new keys. This process entails a lot of top-down tree searching and plenty of page-splitting as the tree is restructured to accommodate more and more keys.

Aurora now builds the trees in a bottom-up fashion, building the leaves first and then adding parent pages as needed. This reduces the amount of back-and-forth to storage, and also obviates the need to split pages since each page is filled once.

With this change, adding indexes and rebuilding tables is now up to 4x faster than before, depending on the table schema. For example, the Aurora team created a table with the following schema, and added 100 million rows, resulting in a 5 GB table:

create table test01 (id int not null auto_increment primary key, i int, j int, k int);

Then they added four additional indexes:

alter table test01 add index (i), add index (j), add index (k), add index comp_idx(i, j, k);

On a db.r3.large instance, the time to run this query dropped from 67 minutes to 25 minutes. On a db.r3.8xlarge instance, the time dropped from 29 minutes to 11.5 minutes.

This is a brand new feature and we would like you to try it out on your non-production workloads. You’ll need to upgrade to Aurora 1.7 and then set aurora_lab_mode to 1 in the DB Instance Parameter group (see DB Cluster and DB Instance Parameters to learn more):

The team is very interested in your feedback on this performance enhancement. Please feel free to post your observations in the Amazon RDS Forum.

NUMA-Aware Scheduling
The largest DB Instance (db.r3.8xlarge) has two CPU chips and a feature commonly known as NUMA, short for Non-Uniform Memory Access. On systems of this type, each an equal fraction of main memory is directly and efficiently accessible to each CPU. The remaining memory is accessible via a somewhat less efficient cross-CPU access path.

Aurora now does a better job of scheduling threads across the CPUs in order to take advantage of this disparity in access times. The threads no longer need to fight against each other for access to the less-efficient memory attached to the other CPUs. As a result, CPU-bound operations that make heavy use of the query cache and the buffer cache now run up to 10% faster. The performance improvement will be most apparent when you are making hundreds or thousands of connections to the same database instance. As an example, performance on the Sysbench oltp.lua benchmark grew from 570,000 reads/second to 625,000 reads/second. The test was run on a db.r3.8xlarge DB Instance with the following parameters:

oltp_table_count=25
oltp_table_size=10000
num-threads=1500

You will see the improved performance as soon as you upgrade to Aurora 1.7.

Upgrading to Aurora 1.7
Newly created DB Instances will run Aurora 1.7 automatically. For exiting DB Instances, you can choose to install the update immediately or during your next maintenance window.

You can confirm that you are running Aurora 1.7 by running the following query:

mysql> show global variables like "aurora_version";
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| aurora_version | 1.7   |
+----------------+-------+

— Jeff;