AWS Partner Network (APN) Blog

Introducing the Amazon RDS Migration Tool

by Kate Miller | on 02 SEP 2015 | in AWS Product Launch, Database | Permalink

Migrating databases can be a challenging task, often requiring application downtime while data is moved from the source database to the target database. To help you accomplish migrations effectively and with minimal downtime, we’re excited to tell you about the Amazon RDS Migration Tool. This powerful utility can be used to help you and/or your customers move data from on-premise and Amazon EC2-based databases to Amazon RDS, Amazon Redshift, and Amazon Aurora databases.

The RDS Migration Tool supports not only like-to-like migrations, e.g. Oracle-to-Oracle, but also migrations between different database platforms, e.g. SQL Server-to-Amazon Aurora. It runs as an EC2 instance and leverages the scaling power of AWS to match the needs of your migration task.

What Value Does the RDS Migration Tool Provide My Firm and End Customers?

If you’re an APN Partner helping customers migrate their workloads to AWS, the RDS Migration Tool can help you minimize the application downtime by capturing database changes on the source database while the source still receives transactions from the application. As the RDS Migration Tool can capture and replicate data heterogeneously, it can minimize application downtime even in complicated use cases, such as when migrating an Oracle database to Amazon Aurora

Specifically, the RDS Migration Tool provides the following features and benefits:

Support for transactional change data capture (CDC) and application, with low performance impact on the source
Support for heterogeneous migration (e.g. SQL Server-to-Aurora)
Support for homogeneous migration (e.g. Oracle-to-Oracle)
Data transfer optimizations for migrating entire database tables
Light-weight column mapping & transformations
Ability to select individual tables and columns and filter data rows for migration
Reliable delivery and recovery
Intuitive user experience that simplifies the steps required to migrate to AWS
Monitoring and control functions with dashboard, metrics and alerts
No need to deploy agents on the source to capture changes
Scheduling of migration tasks

The use of the RDS Migration Tool software is free, however, the tool requires the use of Amazon EC2, Amazon EBS and other AWS services, and customers will be charged normal AWS fees for the migration instances they create.

How Do I Access the Tool?

Reach out to your PDM if you are interested in signing up to use the tool.

Performance Testing in Continuous Delivery Using AWS CodePipeline and BlazeMeter

by Kate Miller | on 02 SEP 2015 | in APN Technology Partners, Continuous Delivery | Permalink

This is a guest post from our friends at BlazeMeter, an APN Technology Partner.

By now, most software delivery teams have heard about and are either practicing or planning to practice some flavor of continuous delivery. Its popularity has exploded in recent years largely because it has proven to have immense benefits for the rapid release of high-quality software. After each commit, the software is built and tested, and a deployable artifact is the result. How or when that artifact is deployed to either a staging area or to production depends on the team, their process, and their infrastructure.

While unit and functional tests have become standard practices of good software delivery, load and performance tests have been a bit neglected in many workflows, reserved for specially scheduled events and generally conducted manually by a group. In part, this is because load and performance tests have tended to involve complex and brittle scripts that require dedicated, vendor-specific environments and are difficult to automate or run quickly enough for fast feedback.

Since AWS CodePipeline is such a powerful automation framework for managing the continuous delivery process from start to finish, let’s take a look at how we can more easily inject automated load tests at the right places in the delivery workflow with BlazeMeter’s native AWS CodePipeline integration.

Who is BlazeMeter?

BlazeMeter, based in Mountain View, CA, provides an easy-to-use, cloud-based performance testing platform that can be accessed directly from any stage of AWS CodePipeline (as a Test action) at any point where load, stress, or performance tests need to run. BlazeMeter extends Apache JMeter technology by providing some important pieces, like automatic scaling and professional results reporting. If your team hasn’t already adopted JMeter, it’s a very powerful and flexible open source tool capable of orchestrating any type of performance test, from the simplest to the most sophisticated. If you are already using JMeter, you can begin working with BlazeMeter right away.

What Kinds of Performance Tests Should We Run?

When it comes to performance tests in the delivery pipeline, different architectures and objectives call for different strategies.

For example, if you’re deploying an API server that handles a lot of incoming requests from mobile devices or other applications, tests might focus mostly on throughput: the hits or requests per second that various endpoints can handle within given response-time expectations. Those tests can use straight URL requests without regard for the complexities of think time or extraneous logic that synthetically shape traffic.

To perform this type of test in AWS CodePipeline, Edit the Pipeline, add an Action to the target stage where the API test should run, choose Test as the Action category, and choose BlazeMeter as the Test provider.

After choosing Connect, you’ll be taken to BlazeMeter’s sign-in page. If you’re not already a BlazeMeter user, you can create a free account right there and have instant access.

Next you can choose New API Test from the different types of tests BlazeMeter offers.

BlazeMeter provides an easy-to-use utility where you can simply enter your endpoint URLs and required payload data. You can add the URL, specify the HTTP verb (GET, POST, PUT, DELETE), and even add custom headers. In this example, I’m providing the necessary Content-Type header as well as a JSON payload for my POST request that will test selecting cities in a flight reservation app.

Two things to take special note of in the test configuration:

– Amazon CloudWatch integration. Here you can have BlazeMeter include Amazon CloudWatch metrics for your Amazon EC2 instances involved in the test.

– Thresholds. Use this feature to define what will constitute test failure, such as average response time or percentage of errors being above selected values.

API-oriented test scenarios like these could run immediately after an AWS CodePipeline action that uses AWS CodeDeploy or AWS Elastic Beanstalk to configure a staging environment, and they can run quite speedily.

Simulating Realistic Traffic in Automated Load Tests

A more thorough and real-world performance test will take a little more time to set up and will require Jmeter scripts. Rather than just hitting the app with a barrage of HTTP requests, we want to be more strategic in how we shape the overall load profile. (Getting started with creating Jmeter scripts is a bit beyond the scope of this blog post, but we’ll provide some useful tips below. BlazeMeter provides lots of great Jmeter tutorials at https://docs.blazemeter.com/.)

For these more realistic tests, we once again add an Action to the desired stage in AWS CodePipeline, and choose BlazeMeter as our Test provider, but this time we’ll select New Jmeter Test.

Business considerations enter the picture at this point. How many users do we expect? What will they be doing with the app, and how frequently? It’s often useful to include business analysts and product marketing teams in these discussions as they can bring useful metrics about user activity.

For these scenarios, we should create Jmeter scripts that represent different types of expected interactions. For example, if we’re testing a flight reservation website, we should have some users browsing and looking at flight prices, while others are making reservations, and still others are canceling flights or choosing hotels. And since humans stop to read pages or fill in web forms, we should make use of scripted timers, such as Jmeter’s Uniform Random Timer, to introduce those natural delays into the test.

Ultimately we want to understand what we sometimes call “business throughput”: How many successful actions customers can perform, how many search results are returned, or how many total flights are reserved. Choke points and constraints around these items have a direct impact on the business so they tend to be the important elements to focus on during the test. Also, since we know the underlying components of the stack involved in these transactions, this data gives us ideas about where to start our investigations.

Using Jmeter’s Transaction Controllers and naming them clearly will help you identify these business transactions after the test run.

In the example below, I’ve labeled different actions in a flight reservation system and the BlazeMeter report tells me about response times and number of transaction calls.

Let AWS CodePipeline Do The Work

Now that we can automate any kind of performance and load test using AWS CodePipeline and BlazeMeter, we hope to help teams focus on the more critical tasks of fixing defects and optimizing and tuning the bottlenecks that these automated tests discover. Since tests run so frequently, baselines start to develop and we can observe trends that provide a sense of familiarity with how our apps behave. Tuning gets easier, and aberrations become more evident.

Before you know it, you’ll be confidently releasing to production without thinking twice, knowing that your users are seeing high-class performance.

Getting the Most out of the Amazon S3 CLI

by Scott Ward, Partner Solutions Architect | on 01 SEP 2015 | in AWS Partner Solutions Architect (SA) Guest Post | Permalink

Editor’s note: this is a co-authored guest post from Scott Ward and Michael Ruiz, Solutions Architects with the APN.

Amazon Simple Storage Service (Amazon S3) makes it possible to store unlimited numbers of objects, each up to 5 TB in size. Managing resources at this scale requires quality tooling. When it comes time to upload many objects, a few large objects or a mix of both, you’ll want to find the right tool for the job. Today we will take a look at one option that is sometimes overlooked: the AWS Command Line Interface (AWS CLI) for Amazon S3.

Note: Some of the examples in this post take advantage of more advanced features of the Linux/UNIX command line environment and the bash shell. We included all of these steps for completeness, but wont spend much time detailing the mechanics of the examples in order to keep the post at reasonable length.

What is Amazon S3?

Amazon S3 is global online object store and has been a core AWS service offering since 2006. Amazon S3 was designed for scale: it currently stores trillions of objects with peak load measured in millions of requests per second. The service is designed to be cost-effective—you pay only for what you use—durable, and highly available. See the Amazon S3 product page for more information about these and other features.

Data uploaded to Amazon S3 is stored as objects in containers called buckets and identified by keys. Buckets are associated with an AWS region and each bucket is identified with a globally unique name. See the S3 Getting Started guide for a typical Amazon S3 workflow.

Amazon S3 supports workloads as diverse as static website hosting, online backup, online content repositories, and big data processing, but integrating Amazon S3 into an existing on-premises or cloud environment can be challenging. While there is a rich landscape of tooling available from AWS partners and open-source communities, a great place to start your search is the AWS CLI for Amazon S3.

The AWS Command Line Interface (AWS CLI)

The AWS CLI is an open source, fully supported, unified tool that provides a consistent interface for interacting with all parts of AWS, including Amazon S3, Amazon Elastic Compute Cloud (Amazon EC2), Amazon Virtual Private Cloud (Amazon VPC), and other services. General information about the AWS CLI can be found in the AWS CLI User Guide.

In this post we focus on the aws s3 command set in the AWS CLI. This command set is similar to standard network copy tools you might already be familiar with, like scp or rsync, and is used to copy, list, and delete Amazon S3 buckets and objects. This tool supports the key features required for scaled operations with Amazon S3, including multipart parallelized uploads, automatic pagination for queries that return large lists of objects, and tight integration with AWS Identity and Access Management (IAM) and Amazon S3 metadata.

The AWS CLI also provides the aws s3api command set, which exposes more of the unique features of Amazon S3 and provides access to bucket metadata, like lifecycle policies designed to migrate or delete data automatically.

There are two pieces of functionality built into the AWS CLI for Amazon S3 tool that help make large transfers (many files and large files) into Amazon S3 go as quickly as possible:

First, if the files are over a certain size, the AWS CLI automatically breaks the files into smaller parts and uploads them in parallel. This is done to improve performance and to minimize impact due to network errors. Once all the parts are uploaded, Amazon S3 assembles them into a single object. See the Multipart Upload Overview for much more data on this process, including information on managing incomplete or unfinished multipart uploads.

Second, the AWS CLI automatically uses up to 10 threads to upload files or parts to Amazon S3, which can dramatically speed up the upload.

These two pieces of functionality can support the majority of your data transfer requirements, eliminating the need to explore other tools or solutions.

For more information on installation, configuration and, usage of the AWS CLI and the s3 commands, see the following AWS documentation:

AWS S3 Data Transfer Scenarios

Let’s take a look at using the AWS CLI for Amazon S3 in the following scenarios and dive into some details of the Amazon S3 mechanisms in play, including parallel copies and multipart uploads.

Example 1: Uploading a large number of very small files to Amazon S3
Example 2: Uploading a small number of very large files to Amazon S3
Example 3: Periodically synchronizing a directory that contains a large number of small and large files that change over time
Example 4: Improving data transfer performance with the AWS CLI

Environment Setup

The source server for these examples is an Amazon EC2 m3.xlarge instance located in the US West (Oregon) region. This server is well equipped with 4 vCPUs and 15 GB RAM, and we can expect a sustained throughput of about 1 GB/sec over the network interface to Amazon S3. This instance will be running the latest Amazon Linux AMI (Amazon Linux AMI 2015.03 (HVM).

The example data will reside in an Amazon EBS 100 GB General Purpose (SSD) volume, which is an SSD-based, network-attached block storage device attached to the instance as the root volume.

The target bucket is located in the US East (N. Virginia) region. This is the region you will specify for buckets created using default settings or when specifying us-standard as the bucket location. Buckets have no maximum size and no object-count limit.

All commands that are represented in this document are run from the bash command line. All command-line instructions will be represented by a $ as the starting point for the command.

We will be using the aws s3 command set throughout the examples. Here is an explanation for several common commands and options used in these examples:

The cp command initiates a copy operation to or from Amazon S3.
The --recursive option instructs the AWS CLI for Amazon S3 to descend into subdirectories on the source.
The --quiet option instructs the AWS CLI for Amazon S3 to print only errors rather than a line for each file copied.
The --sync option instructs the AWS CLI for Amazon S3 to initiate a copy to or from Amazon S3.
The Linux time command is used with each AWS CLI call in order to get statistics on how long the command took.
The Linux xargs command is used to invoke other commands based on standard output or output piped to it from other commands.

Example 1 – Uploading a large number of small files

In this example we are going to simulate a fairly difficult use case: moving thousands of little files distributed across many directories to Amazon S3 for backup or redistibution. The AWS CLI can perform this task with a single command, s3 cp --recursive, but we will show the entire example protocol for clarity. This example will utilize the multithread upload functionality of the aws s3 commands.

Create the 26 directories named for each letter of the alphabet, then create 2048 files containing 32K of pseudo-random content in each

$ for i in {a..z}; do
    mkdir $i
    seq -w 1 2048 | xargs -n1 -P 256 -I % dd if=/dev/urandom of=$i/% 
bs=32k count=1
done

Confirm the number of files we created for later verification:

$ find . -type f | wc -l
53248

Copy the files to Amazon S3 by using aws s3 cp, and time the result with the time command:

$ time aws s3 cp --recursive --quiet . s3://test_bucket/test_smallfiles/

real    19m59.551s
user    7m6.772s
sys     1m31.336s

The time command returns the ‘real’ or ‘wall clock’ time the aws s3 cp took to complete. Based on the real output value from the time command, the example took 20 minutes to complete the copy of all directories and the files in those directories.

Notes:

Our source is the current working directory (.) and the destination is s3://test_bucket/test_smallfiles.
The destination bucket is s3://test_bucket.
The destination prefix is test_smallfiles/. Note that this is not a directory in the usual sense, but rather a key prefix that will be prepended to the file name of each object to build the final key name.

TIP:

In many real-world scenarios, the naming convention you use for your Amazon S3 objects will have performance implications. See this blog post and this document for details about object key naming strategies that will ensure high performance as you scale to hundreds or thousands of requests per second.

We used the Linux lsof command to capture the number of open connections on port 443 while the above copy (cp) command was running:

$ lsof -i tcp:443
COMMAND   PID     USER   FD   TYPE DEVICE SIZE/OFF NODE NAME

aws     22223 ec2-user    5u  IPv4 119954      0t0  TCP ip-10-0-0-37.us-west-2.com
pute.internal:48036->s3-1-w.amazonaws.com:https (ESTABLISHED)

aws     22223 ec2-user    7u  IPv4 119955      0t0  TCP ip-10-0-0-37.us-west-2.com
pute.internal:48038->s3-1-w.amazonaws.com:https (ESTABLISHED)

<SNIP>

aws     22223 ec2-user   23u  IPv4 118926      0t0  TCP ip-10-0-0-37.us-west-2.com
pute.internal:46508->s3-1-w.amazonaws.com:https (ESTABLISHED)

...10 open connections

You may be surprised to see there are 10 open connections to Amazon S3 even though we are only running a single instance of the copy command (we truncated the output for clarity, but there were ten connections established to the Amazon S3 endpoint ‘s3-1-w.amazonaws.com’). This demonstrates the native parallelism built into the AWS CLI.

Here is an example of a similar command that gives us the count of open threads directly:

$ lsof -i tcp:443 | tail -n +2 | wc -l

10

Let’s also peek at the CPU load during the copy operation:

$ mpstat -P ALL 10
Linux 3.14.35-28.38.amzn1.x86_64 (ip-10-0-0-37)     05/04/2015  
_x86_64_    (4 CPU)

<SNIP>
09:43:18 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
09:43:19 PM  all    6.33    0.00    1.27    0.00    0.00    0.00    0.51    0.00   91.90
09:43:19 PM    0   14.14    0.00    3.03    0.00    0.00    0.00    0.00    0.00   82.83
09:43:19 PM    1    6.06    0.00    2.02    0.00    0.00    0.00    0.00    0.00   91.92
09:43:19 PM    2    2.04    0.00    0.00    0.00    0.00    0.00    1.02    0.00   96.94
09:43:19 PM    3    2.02    0.00    0.00    0.00    0.00    0.00    1.01    0.00   96.97

The system is not seriously stressed given the small file sizes involved. Overall, the CPU is 91.90% idle. We don’t see any %iowait, %sys, or %user activity, so we can assume that almost all of the CPU time is spent running the AWS CLI commands and handling file metadata.

6. Finally, let’s use the aws s3 ls command to list the files we moved to Amazon S3 and get a count to confirm that the copy was successful:

$ aws s3 ls --recursive s3://test_bucket/test_smallfiles/ | wc -l
53248

This is the expected result: 53,248 files were uploaded, which matches the local count in step 2.

Summary:

Example 1 took 20 minutes to move 53,248 files at a rate of 44 files/sec (53,248 files / 1,200 seconds to upload) using 10 parallel streams.

Example 2 – Uploading a small number of large files

In this example we will create five 2-GB files and upload them to Amazon S3. While the previous example stressed operations per second (both on the local system and in operating the aws s3 upload API), this example will stress throughput. Note that while Amazon S3 could store each of these files in a single part, the AWS CLI for Amazon S3 will automatically take advantage of the S3 multipart upload feature. This feature breaks each file into a set of multiple parts and parallelizes the upload of the parts to improve performance.

Create five files filled with 2 GB of pseudo-random content:

$ seq -w 1 5 | xargs -n1 -P 5 -I % dd if=/dev/urandom of=bigfile.% b
s=1024k count=2048

Since we are writing 10 GB to disk, this command will take some time to run.

List the files to verify size and number:

$ du -sk .
10485804

$ find . -type f | wc -l
5

This is showing that we have 10 GB (10,485,804 KB) of data in 5 files, which matches our goal of creating five files of 2 GB each.

Copy the files to Amazon S3:

$ time aws s3 cp --recursive --quiet . s3://test_bucket/test_bigfiles/

real    1m48.286s
user    1m7.692s
sys     0m26.860s

Notes:

Our source prefix is the current working directory (.) and the destination is s3://test_bucket/test_bigfiles.
The destination bucket is s3://test_bucket.
The destination prefix is test_bigfiles/. Note that this is not a directory in the usual sense, but rather a key prefix that will be prepended to the file name of each object to build the final key name.

We again capture the number of open connections on port 443 while the copy command is running to demonstrate the parallelism built into the AWS CLI for Amazon S3:

$ lsof -i tcp:443 | tail -n +2 | wc -l
10

Looks like we still have 10 connections open. Even though we only have 5 files, we are breaking each file into multiple parts and uploading them in 10 individual streams.

Capture the CPU load:

$ mpstat -P ALL 10
Linux 3.14.35-28.38.amzn1.x86_64 (ip-10-0-0-37)     05/04/2015  _
x86_64_    (4 CPU)

<SNIP>
10:35:47 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
10:35:57 PM  all    6.30    0.00    3.57   76.51    0.00    0.17    0.75    0.00   12.69
10:35:57 PM    0    8.15    0.00    4.37   75.21    0.00    0.71    1.65    0.00    9.92
10:35:57 PM    1    5.14    0.00    3.20   75.89    0.00    0.00    0.46    0.00   15.31
10:35:57 PM    2    4.56    0.00    2.85   75.17    0.00    0.00    0.46    0.00   16.97
10:35:57 PM    3    7.53    0.00    3.99   79.36    0.00    0.00    0.57    0.00    8.55

This is a much more serious piece of work for our instance: We see around 70-80% iowait (where the CPU is sitting idle, waiting for disk I/O) on every core. This hints that we are reaching the limits of our I/O subsystem, but also demonstrates a point to consider: The AWS CLI for Amazon S3, by default and working with large files, is a powerful tool that can really stress a moderately powered system.

6. Check our count of the number of files moved to Amazon S3 to confirm that the copy was successful:

$ aws s3 ls --recursive s3://test_bucket/test_bigfiles/ | wc -l
5

7. Finally, let’s use the aws s3api command to examine the object head metadata on one of the files we uploaded.

$ aws s3api head-object --bucket test_bucket --key test_bigfiles/bigfile
.1
bytes   2147483648      binary/octet-stream     "9d071264694b3a028a22f20
ecb1ec851-256"    Thu, 07 May 2015 01:54:19 GMT

The 4th field in the command output is the ETag (opaque identifier), which contains an optional ‘-’ if the object was uploaded with multiple parts. In this case we see that the ETag ends with ‘-256’ indicating that the s3 cp command split the upload into 256 parts. Since all the parts but the last are of the same size, a little math tells us that each part is 8 MB in size.

The AWS CLI for Amazon S3 is built to optimize upload and download operations while respecting Amazon S3 part sizing rules. The Amazon S3 minimum part size (5 MB, except for the last part which can be smaller), the maximum part size (5 GB), and the maximum number of parts (10,000) are described in theS3 Quick Facts documentation.

Summary:

In example 2, we moved five 2-GB files to Amazon S3 in 10 parallel streams. The operation took 1 minute and 48 seconds. This represents an aggregate data rate of ~758 MB/s (85,899,706,368 bytes in 108 seconds) – about 80% of the maximum bandwidth available on our host.

Example 3 – Periodically synchronizing a directory that contains a large number of small and large files that change over time

In this example, we will keep the contents of a local directory synchronized with an Amazon S3 bucket using the aws s3 sync command. The rules aws s3 sync will follow when deciding when to copy a file are as follows: “A local file will require uploading if the size of the local file is different than the size of the s3 object, the last modified time of the local file is newer than the last modified time of the s3 object, or the local file does not exist under the specified bucket and prefix.” See the command reference for more information about these rules and additional arguments available to modify these behaviors.

This example will use multipart upload and parallel upload threads.

Let’s make our example files a bit more complicated and use a mix of file sizes (warning: inelegant hackery eminent):

>
$ i=1;
while [[ $i -le 132000 ]]; do
    num=$((8192*4/$i))
    [[ $num -ge 1 ]] || num=1
    mkdir randfiles/$i
    seq -w 1 $num | xargs -n1 -P 256 -I % dd if=/dev/urandom of=r
andfiles/$i/file_$i.% bs=16k count=$i;
    i=$(($i*2))
done

Check our work by getting file sizes and file counts:

$ du -sh randfiles/
12G     randfiles/
$ find ./randfiles/ -type f | wc -l
65537

So we have 65537 files totaling 12 GB in size, to sync.

Upload to Amazon S3 using the aws s3 sync command:

$ time aws s3 sync --quiet . s3://test_bucket/test_randfiles/
real    26m41.194s
user    10m7.688s
sys     2m17.592s

Notes:

Our source prefix is the current working directory (.) and the destination is s3://test_bucket/test_randfiles/.
The destination bucket is s3://test_bucket.
The destination prefix is test_randfiles/. Note that this is not a directory in the usual sense, but rather a key prefix that will be prepended to the file name of each object to build the final key name.

We again capture the number of open connections while the sync command is running to demonstrate the parallelism built into the AWS CLI for Amazon S3:

$ lsof -i tcp:443 | tail -n +2 | wc -l
10

Let’s check the CPU load. We are only showing one sample interval, but the load will vary much more than the other runs as the AWS CLI for Amazon S3 deals with various files of varying file sizes:

$ mpstat -P ALL 10
Linux 3.14.35-28.38.amzn1.x86_64 (ip-10-0-0-37)     05/07/2015  _
x86_64_    (4 CPU)

03:08:50 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
03:09:00 AM  all    6.23    0.00    1.70    1.93    0.00    0.08    0.31    0.00   89.75
03:09:00 AM    0   14.62    0.00    3.12    2.62    0.00    0.30    0.30    0.00   79.03
03:09:00 AM    1    3.15    0.00    1.22    0.41    0.00    0.00    0.31    0.00   94.91
03:09:00 AM    2    3.06    0.00    1.02    0.31    0.00    0.00    0.20    0.00   95.41
03:09:00 AM    3    4.00    0.00    1.54    4.41    0.00    0.00    0.31    0.00   89.74

Let’s run a quick count to verify that the synchronization is complete:

$ aws s3 ls --recursive s3://test_bucket/test_randfiles/  | wc -l
65537

Looks like all the files have been copied!

Now we’ll make some changes to our source directory:

With this command we are touching eight existing files to update the modification time (mtime) and creating a directory containing five new files.

$ touch 4096/*
$ mkdir 5_more
$ seq -w 1 5 | xargs -n1 -P 5 -I % dd if=/dev/urandom of=5_more/5
_more% bs=1024k count=5

$ find . –type f -mmin -10
.
./4096/file_4096.8
./4096/file_4096.5
./4096/file_4096.3
./4096/file_4096.6
./4096/file_4096.4
./4096/file_4096.1
./4096/file_4096.7
./4096/file_4096.2
./5_more/5_more1
./5_more/5_more4
./5_more/5_more2
./5_more/5_more3
./5_more/5_more5

Rerun the sync command. This will compare the source and destination files and upload any changed files to Amazon S3:

$ time aws s3 sync . s3://test_bucket/test_randfiles/
upload: 4096/file_4096.1 to s3://test_bucket/test_randfiles/4096/file_4096.1
upload: 4096/file_4096.2 to s3://test_bucket/test_randfiles/4096/file_4096.2
upload: 4096/file_4096.3 to s3://test_bucket/test_randfiles/4096/file_4096.3
upload: 4096/file_4096.4 to s3://test_bucket/test_randfiles/4096/file_4096.4
upload: 4096/file_4096.5 to s3://test_bucket/test_randfiles/4096/file_4096.5
upload: 4096/file_4096.6 to s3://test_bucket/test_randfiles/4096/file_4096.6
upload: 4096/file_4096.7 to s3://test_bucket/test_randfiles/4096/file_4096.7
upload: 5_more/5_more3 to s3://test_bucket/test_randfiles/5_more/5_more3
upload: 5_more/5_more5 to s3://test_bucket/test_randfiles/5_more/5_more5
upload: 5_more/5_more4 to s3://test_bucket/test_randfiles/5_more/5_more4
upload: 5_more/5_more2 to s3://test_bucket/test_randfiles/5_more/5_more2
upload: 5_more/5_more1 to s3://test_bucket/test_randfiles/5_more/5_more1
upload: 4096/file_4096.8 to s3://test_bucket/test_randfiles/4096/file_409
6.8

real    1m3.449s
user    0m31.156s
sys     0m3.620s

Notice that only the touched and new files were transferred to Amazon S3.

Summary:

This example shows the result of running the sync command to keep local and remote Amazon S3 locations synchronized over time. Synchronizing can be much faster than creating a new copy of the data in many cases.

Example 4 – Maximizing throughput

When you’re transferring data to Amazon S3, you might want to do more or go faster than we’ve shown in the three previous examples. However, there’s no need to look for another tool—there is a lot more you can do with the AWS CLI to achieve maximum data transfer rates. In our final example, we will demonstrate running multiple commands in parallel to maximize throughput.

In the first example we uploaded a large number of small files and achieved a rate of 44 files/sec. Let’s see if we can do better. What we are going to do is string together a few additional Linux commands to help influence how the aws s3 cp command runs.

Launch 26 copies of the aws s3 cp command, one per directory:

$ time ( find smallfiles -mindepth 1 -maxdepth 1 -type d -print0 | xargs -n1 -0 -P30 -I {} aws s3 cp --recursive --quiet {}/ s3://test_bucket/{}/ )
real    2m27.878s
user    8m58.352s
sys     0m44.572s

Note how much faster this completed compared with our original example which took 20 minutes to run.

Notes:

The find part of the above command passes a null-terminated list of subdirectories to the ‘smallfiles’ directory to xargs.
xargs launches up to 30 parallel (‘-P30’) invocations of aws s3 cp. Only 26 are actually launched based on the output of the find.
xargs replaces the ‘{}’ argument in the aws s3 cp command with the file name passed from the output of the find command.
The destination here is s3://test_bucket/smallfiles/, which is slightly different from example 1.

Note the number of open connections

$ lsof -i tcp:443 | tail -n +2 | wc -l
260

We see 10 connections for each of the 26 invocations of the s3 cp command.

Let’s check system load:

$ mpstat -P ALL 10
Linux 3.14.35-28.38.amzn1.x86_64 (ip-10-0-0-37)     05/07/2015  _
x86_64_    (4 CPU)

07:02:49 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
07:02:59 PM  all   91.18    0.00    5.67    0.00    0.00    1.85    0.00    0.00    1.30
07:02:59 PM    0   85.30    0.00    6.50    0.00    0.00    7.30    0.00    0.00    0.90
07:02:59 PM    1   92.61    0.00    5.79    0.00    0.00    0.00    0.00    0.00    1.60
07:02:59 PM    2   93.60    0.00    5.10    0.00    0.00    0.00    0.00    0.00    1.30
07:02:59 PM    3   93.49    0.00    5.21    0.00    0.00    0.00    0.00    0.00    1.30

The server is finally doing some useful work! Since almost all the time is spent in %user with very little %idle or %iowait, we know that the CPU is working hard on application logic without much constraint from the storage or network subsystems. It’s likely that moving to a larger host with more CPU power would speed this process up even more.

Verify the file count:

$ aws s3 ls --recursive s3://test_bucket/smallfiles
53248

Summary:

Using 26 invocations of the command improved the execution time by a factor of 8: 2 minutes 27 seconds for 53,248 files vs. the original run time of 20 minutes. The file upload rate improved from 44 files/sec to 362 files/sec.

The application of similar logic to further parallelize our large file scenario in example 2 would easily saturate the network bandwidth on the host. Be careful when executing these examples! A well-connected host can easily overwhelm the Internet links at your source site!

Conclusion

In this post we demonstrated the use of the AWS CLI for common Amazon S3 workflows. We saw that the AWS CLI for Amazon S3 scaled to 10 parallel streams and enabled multipart uploads automatically. We also demonstrated how to accelerate the tasks with further parallelization by using common Linux CLI tools and techniques.

When using the AWS CLI for Amazon S3 to upload files to Amazon S3 from a single instance, your limiting factors are generally going to be end-to-end bandwidth to the AWS S3 endpoint for large file transfers and host CPU when sending many small files. Depending on your particular environment, your results might be different from our example results. As demonstrated in example 4, there may be an opportunity to go faster if you have the resources to support it. AWS also provides a variety of Amazon EC2 instance types, some of which might provide better results than the m3.xlarge instance type we used in our examples. Finally, networking bandwidth to the public Amazon S3 endpoint is a key consideration for overall performance.

We hope that this post helps illustrate how powerful the AWS CLI can be when working with Amazon S3, but this is just a small part of the story: the AWS CLI can launch Amazon EC2 instances, create new Amazon VPC’s and enable many of the other features of the AWS platform with just as much power and flexibility as it can for Amazon S3. Have fun exploring!

Learn More Through APN Webcast!

by Kate Miller | on 20 AUG 2015 | in APN Webcast | Permalink

Over the past six weeks, new webcasts have been added to the APN Portal, covering topics including various AWS 101 material, AWS Quick Start deep dives, and APN Marketing content.

What’s Been Recently Added?

Below is a list of the on-demand videos that are now available for you to enjoy on your own time:

SAP on AWS
AWS for Developers
Big Data on AWS
Trend Micro Deep Security on the AWS Cloud
AWS 101 Application Services
AWS 101 Compute
AWS 101 Database
AWS 101 Deployment and Management
AWS 101 Enterprise Applications
AWS 101 Mobile Services
AWS 101 Networking
AWS 101 Storage and Content Delivery
AWS 101 Administration and Security

Log in to the APN Portal to view our shorter length, on-demand, and up to date videos that will help you familiarize yourself with APN Benefits and AWS Services.

Have You Signed up for the AWS re:Invent Live Stream?

by Kate Miller | on 20 AUG 2015 | in AWS Events, re:Invent 2015 | Permalink

AWS re:Invent 2015 is fast approaching, and while re:Invent is sold out, you can still get the latest news and announcements from the week by viewing our live stream of the keynotes and select technical breakout sessions. We encourage your team members who are not able to join us in Las Vegas to sign up for live stream now.

What is the AWS re:Invent Live Stream Agenda?

Wednesday, October 7

9:00am – 10:30am PT: Andy Jassy, Sr. Vice President, AWS

11:00am – 5:15pm PT: Four of the most popular breakout sessions (to be announced)

Thursday, October 8

9:00am – 10:30am PT: Dr. Werner Vogels, CTO, Amazon

11:00am – 6:30pm PT: Five of the most popular breakout sessions (to be announced)

Is There Any Opportunity to Attend re:Invent If I Don’t Already Have a Pass?

APN Partners, there are a few remaining sponsorship opportunities available, and packages include full conference passes. We encourage you to read our recent blog post about the value of AWS Sponsorship, to learn more about how you can benefit from sponsoring re:Invent and the success other APN Partners have experienced.

To learn more about re:Invent 2015, click here.

Stay Connected to the event by following event activities on Twitter @awsreinvent (#reinvent), or liking us on Facebook.

Available on AWS: Amazon EC2 for Microsoft Windows Server with SQL Server Enterprise Edition

by Bonnie Donovan | on 19 AUG 2015 | in AWS for Windows | Permalink

Below is a guest post from Bonnie Donovan, Microsoft Channel Manager at AWS, regarding some recent updates regarding SQL Server Enterprise Edition on AWS.

AWS recently announced Amazon EC2 for Microsoft Windows Server with SQL Server Enterprise Edition. SQL Server enables you to build mission-critical applications and Big Data solutions using high-performance, in-memory technology across OLTP, data warehousing, business intelligence and analytics workloads. There are pre-configured Amazon Machine Images (AMI) available for launch on R3.2xlarge, R3.4xlarge, and R3.8xlarge instance types, in the US-East (Virginia), US-West (Oregon) and Europe (Ireland) regions.

Many customers have requirements that require the SQL Server Enterprise version. Microsoft SQL Server Enterprise Edition offers a number of new features, including:

“AlwaysOn” high availability: You can configure up to four active, readable secondaries
Self-service business intelligence: You can use Power View to conduct interactive data exploration and visualization
Data Quality Services: You can use organizational and 3rd party reference data to profile, cleanse and match data

Customers can continue to bring their existing SQL Server Enterprise Licenses to AWS leveraging Microsoft’s License Mobility Program, but some customers prefer to purchase directly from AWS, as this offers you the flexibility to run a database server for as much or as little time as you need. All SQL Server versions that we offer integrate with Amazon EBS, enabling you to take advantage of the persistence, performance, and reliability of Amazon EBS for all your databases.

You can visit our web site to learn more about our AMI’s with SQL Server Enterprise Edition pre-installed, the R3 Instance type, and pricing. To get started, you can select and launch an EC2 for Windows AMI with SQL Server Enterprise Edition from within the AWS Management Console, or you can purchase directly from the AWS Marketplace (Microsoft SQL Server Enterprise Edition 2012 and 2014).

In addition to the having the ability to run Amazon EC2 for Windows with SQL Server Enterprise Edition, you can now run SQL Server Enterprise Edition as a License Included offering on Amazon RDS. In the License Included service model, customers do not need separately purchased Microsoft SQL Server licenses. License Included pricing is inclusive of software license, underlying hardware resources, and Amazon RDS management capabilities. This allows customers to pay only for the hours they use and change their instance type as needed to fit their workloads. The new License Included offering is in addition to the Bring Your Own License offering for SQL Server Enterprise Edition already available on Amazon RDS.

The Amazon RDS for SQL Server Enterprise Edition offers a number of features including High Availability using Multi-AZ, database storage sizes up to 4TB, and storage performance with up to 20,000 input/output operations per second (IOPS). To create a new Amazon RDS for SQL Server Enterprise Edition instance with just a few clicks, use the “Launch DB Instance” wizard in the AWS Management Console and select SQL Server Enterprise Edition and License Included options. Learn more by visiting the Amazon RDS Pricing page.

Newest APN Competency – Mobile Solutions

by Kate Miller | on 19 AUG 2015 | in APN Competencies, APN Launches | Permalink

AWS offers a robust set of services for mobile developers, such as AWS Lambda, Amazon Cognito, Amazon SNS, Amazon Mobile Analytics, Amazon Kinesis, Amazon API Gateway, and the recently announced AWS Device Farm. As the number of services supporting mobile development grows, APN Partners have an increasing opportunity to work with customers and help them as they develop innovative mobile applications on AWS. We strive to help customers identify APN Partners with expertise and solutions in the mobile space on AWS; to support this effort, I’m excited to announce the launch of the APN Mobile Competency.

What is the APN Mobile Competency?

The APN Competency Program is designed to highlight APN Partners who have demonstrated technical proficiency and proven customer success in specialized solution areas, including mobile. APN Mobile Competency Partners have deep experience working with developers, mobile-first businesses to help build, test, analyze and monitor their mobile apps.

Partner solutions pre-qualified by the APN Partner Competency Program are highlighted to AWS customers as solutions that can help them with comprehensive mobile application development, faster deployment, and easy management on the AWS Cloud.

Mobile Use Cases and Launch APN Partners

By becoming an APN Mobile Competency Partner, firms can distinguish themselves by their use case expertise on AWS, making it easier for customers to identify and connect with them to address their mobile development needs.

Below are the use cases that our inaugural APN Mobile Competency Partners have demonstrated expertise in on AWS, along with a list of our launch Competency Partners associated with each:

Developer Tools: Accelerate project creation with tools and components to assist with each lifecycle stage of software development.

Launch Partners: Kony Solutions, Twilio, SecureAuth, Auth0, Xamarin

Testing & Performance Monitoring: Facilitate application testing and monitoring, and get insights into architecture stability and integrity.

Launch Partner: Crittercism

Analytics & User Engagement: Understand user activity, anticipate future behaviors, and increase user engagement.

Launch Partners: Taplytics, Tableau, Looker

App Development & Consulting: Get assistance with application development, validate best practices, and conduct analysis on architecture and implementation decisions.

Launch Partners: Slalom Consulting, Classmethod, Concrete Solutions, Mobiquity, NorthBay Solutions, and Accenture

Why Attain the APN Mobile Competency?

The Mobile Competency provides the means by which customers can discover your solutions via the AWS Partner Mobile Solution pages, and gives customers confidence that your solution has been vetted by AWS.

Participation in the Competency Program includes the opportunity for Competency Partners to participate in use case specific go-to-market activities that further allow us to feature your subject-matter expertise and customer success. Activities and artifacts may include joint events, webinars, co-authored whitepapers, reference architectures, and targeted marketing campaigns designed to drive awareness.

How do I Apply?

If you’re an APN Partner interested in the Competency Program, click here to learn more.

To apply for the Mobile Competency, log in to the APN Portal, update your Partner Scorecard, click on ‘Apply for APN Competencies’ (under My APN Account), and submit your application.

Learn more about Mobile on AWS

We have a number of resources available to you to learn more about developing mobile applications on AWS. Visit the Mobile page to see some of the customers developing mobile apps on AWS, along with information on AWS services available to you to support mobile app development. Take a gander at the AWS Mobile blog for up-to-date news and advice surrounding mobile application development on AWS.

Interested in Publishing a New AWS Test Drive Before re:Invent?

by Kate Miller | on 14 AUG 2015 | in AWS Test Drive, re:Invent 2015 | Permalink

There’s never been a better time to explore how the AWS Test Drive Program can help you reach your target customer base on AWS. As I’ve said before, the AWS Test Drive program is a key program for APN Partners. It provides you with a very effective opportunity to showcase your enterprise solutions on AWS to prospective customers, and to educate them on how a particular solution may address their business needs. If you’re an APN Competency Partner, you can develop a Test Drive that demonstrates your areas of expertise on AWS. Do you offer products on AWS Marketplace? You may consider having a Test Drive that then links to your offerings. If you develop SaaS solutions on AWS, I encourage you to think about how you may be able to use Test Drive to illustrate the benefits of your offerings to end customers. The Test Drive framework allows you to publish multiple Test Drives with the same Orbitera account, with the goal of making it easy to reach different target audiences by providing you with the ability to publish multiple audience-specific Test Drives.

For a limited time, we’ve increased the AWS Test Drive start-up bonus to $3,000 in AWS usage credits. This is a great opportunity for your firm if you’ve been considering developing and publishing a new or additional AWS Test Drive prior to AWS re:Invent. To receive the increased start-up bonus, you must commit to publishing a new Test Drive before September 15th, 2015.

To learn more about the Test Drive program, click here or take a look at the video below:

If you have any questions about the program, don’t hesitate to reach out to the Test Drive team at awstestdrive@amazon.com.

AWS Professional Services Delivery Best Practices Bootcamp in NYC – Sept. 14th – Sept. 17th

by Kate Miller | on 13 AUG 2015 | in APN Consulting Partners, AWS Professional Services | Permalink

Throughout 2015 we’ve hosted AWS Professional Services Delivery Best Practices bootcamps around the world for our Premier APN Consulting Partners and qualified Advanced APN Consulting Partners. We’re hosting another bootcamp in NYC from Sept. 14th – 17th and would like to tell you more about the bootcamp, along with the qualifications to attend.

What is the AWS Professional Services Delivery Best Practices Bootcamp?

The AWS Professional Services Delivery Best Practices Bootcamp is an intensive four-day bootcamp that mixes strategy and ‘delivery how-to’ derived from our own customer implementation and migration projects. We’ll expose participants to an array of topics, from our delivery framework and methodology, to providing deep levels of architectural patterns and examination, Total Cost of Ownership (TCO) comparison, Time to Return on Investment (ROI), foundational network design and connectivity, security optimization, application discovery and migration assessments, integration, planning and estimation, hybrid concepts, migration tools, logging, monitoring and automation, and much more. This course is designed to teach you how to:

Understand the AWS Cloud Adoption Framework
Understand how to perform an effective TCO analysis
Identify use cases, reference architectures, and best practices for enterprise deployments
Understand migration strategies, methodologies, and tools
Architect a foundational design for an enterprise deployment with security controls and risk mitigations
Establish cloud quality standards using a disciplined checklist and pointers to resources
Use migration functionality for enterprise integration, validation, security, and access controls
Run hybrid architecture using metering, chargeback, and show back
Application discovery, prioritization, migration execution
Design for application optimization using automation tools

It’s also a great opportunity to informally network with others like yourself from a variety of firms, and hear how they view the market and what their customers are doing. We’ve received very good feedback from those who have participated in the bootcamps delivered throughout 2015, and we look forward to delivering another in NYC.

Who Should Attend the NYC AWS Professional Services Bootcamp?

The bootcamp is invite-only, and is intended for technical or professional services practice leaders who focus on helping enterprise customers migrate applications to AWS at our top APN Consulting Partner firms. The following prerequisites for participants are required:

Hold at least one Associate level AWS Certification (AWS Certified Solutions Architect, AWS Certified Developer, AWS Certified SysOps Administrator, or AWS Certified DevOps Engineer)
Have at least six months of AWS implementation and delivery experience
Have completed the AWS TCO and Cloud Economics Accreditation

Where Can I Learn More about Attending the NYC Bootcamp?

We encourage you to check with your AWS Partner Manager to understand the prerequisites and to learn if this bootcamp is right for you or someone on your team.

To keep up-to-date on all of the latest from the APN Blog, follow us on Twitter and subscribe to the RSS feed.

HIPAA Compliancy on AWS: Amazon DynamoDB, Amazon RDS, and Amazon EMR Now Covered Under the AWS Business Associate Agreement

by Kate Miller | on 12 AUG 2015 | in Healthcare | Permalink

Many of our APN Partners work closely with customers in the healthcare industry, and develop services and solutions that address business needs in the healthcare space. One of the biggest considerations for healthcare customers and healthcare-focused APN Partners is building applications that are compliant with the US Health Insurance Portability and Accountability Act (“HIPAA”).

You can use AWS to build applications that are compliant with HIPAA, using services that are covered under the AWS Business Associate Agreement (BAA). This includes popular services like Amazon EC2, Amazon S3, Amazon Glacier, and Amazon Redshift. We’re happy to announce that the AWS BAA now covers three new services: Amazon RDS (MySQL and Oracle engines only), Amazon DynamoDB (NoSQL database), and Amazon EMR (big data processing). A full list of our HIPAA-eligible services can be found here.

APN Partners play an increasingly important role throughout the healthcare ecosystem. For example, Orion Health worked with Logicworks, a Premier APN Consulting Partner, to build the Cal INDEX Health Information Exchange on top of AWS. You can learn more about Orion Health’s story here. With the addition of the new HIPAA-eligible services, AWS partners can build HIPAA-compliant applications that cover the entire healthcare analytics pipeline, from data ingestion; to analysis using popular big data processing tools; through output to object storage, to a relational or non-relational database, to a data warehouse, or to a long-term archive. The most recent information on configuring these services for HIPAA applications can be found in our whitepaper. To learn about a few more of the innovative HIPAA-compliant projects that AWS customers and partners have built on AWS, visit our ‘HIPAA and AWS’ page here.

If you already have an executed BAA with AWS, no action is necessary to begin using these services. If you have any questions about building HIPAA-compliant applications on AWS, please contact us and we will put you in touch with a representative from our team.

AWS & Cloud Computing

Solutions

Products

Developers

Training & Resources

Support

Partners

Websites & Web Apps

Backup, Storage, & Archive

Big Data & HPC

Gaming

Digital Media

Healthcare & Life Sciences

Business Apps

Compute

Storage & Content Delivery

Databases

Networking

Administration & Security

Analytics

Application Services

Deployment & Management

Mobile Services

Mobile Services

Enterprise Applications

AWS Marketplace Software

AWS Partner Network (APN) Blog

Introducing the Amazon RDS Migration Tool

Performance Testing in Continuous Delivery Using AWS CodePipeline and BlazeMeter

Getting the Most out of the Amazon S3 CLI

Learn More Through APN Webcast!

Have You Signed up for the AWS re:Invent Live Stream?

Available on AWS: Amazon EC2 for Microsoft Windows Server with SQL Server Enterprise Edition

Newest APN Competency – Mobile Solutions

Interested in Publishing a New AWS Test Drive Before re:Invent?

AWS Professional Services Delivery Best Practices Bootcamp in NYC – Sept. 14th – Sept. 17th

HIPAA Compliancy on AWS: Amazon DynamoDB, Amazon RDS, and Amazon EMR Now Covered Under the AWS Business Associate Agreement

APN Resources

More AWS Blogs

RSS Feed