AWS allows you to simplify and securely scale genomic analysis. AWS provides you with inherent scalability and an ecosystem of partners for tools and datasets that are prepared for your sensitive data and workloads. AWS customers will be able to accelerate their genomics insight and build a bridge from their existing on-premise infrastructure to the cloud.
With AWS, you can efficiently and dynamically store and compute your data, collaborate with peers, and integrate your findings into clinical practice. You can also address security and compliance concerns in many ways, such as encrypting your data in rest and transit or de-identify patient information.
Genomics organizations deal with some of the biggest and most complex data sets in the world – with the goal of providing personalized medicine. AWS adds value by enabling genomics customers to derive actionable insights from data. AWS gives you fast access to flexible and low cost IT resources. With AWS, you don’t need to make large upfront investments in time and money to build and maintain infrastructure. You can access as many resources as you need, almost instantly, and only pay for what you use.
Many projects in genomics, particularly in early discovery phases, are very spikey. You build a pipeline, run an experiment, and then put it on the shelf for later use. Running this pipeline on AWS means you can efficiently scale to meet your demand, then scale back down again when the demand is gone. AWS also provides some alternative pricing and computing methods to complete genomic testing.
From building your genomics pipeline to the integration of genomic findings into diagnostic treatment patterns, AWS has a broad ecosystem of partners that you can work with. This ecosystem provides you a variety of flexible options and allows you to build optionality for your solutions. AWS partners that join the AWS Life Sciences Competency Program have demonstrated technical proficiency and proven customer success in life sciences and genomics.
"With AWS, DNAnexus enables enterprises worldwide to perform genomic analysis and clinical studies in a secure and compliant environment at a scale not previously not possible."
Richard Daly, CEO, DNANexus
“For us to maintain a real-world data platform would be prohibitively expensive. [AWS allows us] to scale up our experiments and try out our new software on realistic configurations of hundreds or even thousands of computers.”
Michael Franklin, Professor, Computer Science and Director, AMP Lab, UC Berkeley
“The whole ecosystem of the tools that are developed around AWS APIs, like the cookbooks that we use to launch infrastructure....helped us a great deal.”
Ravi Madduri, Research Fellow and Project Manager, University of Chicago
Whitepaper | AWS Genomics Guide: Common Strategies & Best Practices (link) |
||
Blog Post | How DNAnexus and Edico Genome are Powering Precision Medicine on AWS (link) |
||
Blog Post | Building High-Throughput Genomics Batch Workflows on AWS: Introduction (link) | ||
Blog Post | Building High-Throughput Genomics Batch Workflows on AWS: Job Layer (link) | ||
Blog Post | Building High-Throughput Genomic Batch Workflows on AWS: Batch Layer (link) | ||
Blog Post |
Building High-Throughput Genomics Batch Workflows on AWS: Workflow Layer (link) |
In order to help make your genomics pipeline easier to distribute and execute, you can run containers in the AWS Cloud with Amazon EC2 Container Service (ECS) or run Docker on AWS . You can solve your large genomics problem as smaller parts, make it easier to use libraries with complicated setups, make the data output reproducible, and make the data easier to share,
Human Longevity, Inc. explains how they process up to 12TB per day of raw data in Amazon S3 with custom analytics tools running in Docker containers
View the architecture diagram here.
AWS Containers for Genomics Science Solutions
Running containers in the AWS Cloud allows you to build robust, scalable applications and services by leveraging the benefits of the AWS Cloud such as elasticity, availability, security, and economies of scale. You also pay for only as much resources as you use.
Amazon EC2 Container Service (ECS) is a highly scalable, high performance container management service that support Docker containers and allows you to easily run applications on a managed cluster of Amazon EC2 instances. Amazon ECS eliminates the need for you to install, operate, and scale your own cluster management infrastructure.
Running Docker on AWS provides a highly reliable, low-cost way to quickly build, run, test, and deploy distributed applications at any scale. AWS provides support for Docker open-source and commercial solutions within AWS services.
In order to help make your genomics pipeline more efficient to manage, you can design workflow management rules in the AWS Cloud. This will help you specifically compose and execute a series of computational or data manipulation steps.
AWS Workflow Management Solutions
Amazon SWF helps developers build, run, and scale background jobs that have parallel or sequential steps. You can think of Amazon SWF as a fully-managed state tracker and task coordinator in the Cloud.
AWS Batch dynamically provisions the optimal quantity and type of compute resources based on the volume and specific resource requirements of the batch jobs submitted. AWS Batch plans, schedules, and executes your batch computing workloads across the full range of AWS compute services and features, such as Amazon EC2 and Spot Instances.
AWS Step Functions makes it easy to coordinate the components of distributed applications and microservices using visual workflows. Building applications from individual components that each perform a discrete function lets you scale and change applications quickly. Step Functions is a reliable way to coordinate components and step through the functions of your application.
Genomics organizations are facing a data tsunami from what is generated from their genomics pipelines. In order to make this data more actionable, you can deploy AWS components that will support your entire analytical pipeline from data ingestion and analysis, through to visualization, storage, warehousing and archiving.
AWS Big Data Solutions
Amazon EMR provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances. Amazon EMR securely and reliably handles a broad set of big data use cases, including log analysis, web indexing, data transformations (ETL), machine learning, scientific simulation, and bioinformatics.
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools. Start small for $0.25 per hour with no commitments and scale to petabytes for $1,000 per terabyte per year, less than a tenth the cost of traditional solutions.
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
AWS Big Data Partner Solutions
The ConvergeHEALTH suite of software solutions are built with the understanding that, in today’s market, truly actionable insights are derived from a combination of real-world information, evidence, and experience, not just data. Powered by Deloitte’s unparalleled industry experience, our solutions are designed specifically to solve the biggest business and operational challenges that the health care and life science industries face.
The ConvergeHEALTH suite of software solutions are built with the understanding that, in today’s market, truly actionable insights are derived from a combination of real-world information, evidence, and experience, not just data. Powered by Deloitte’s unparalleled industry experience, our solutions are designed specifically to solve the biggest business and operational challenges that the health care and life science industries face.
Edico Genome has created a patented end-to-end platform solution for analysis of next-generation sequencing data, DRAGEN™, which speeds whole genome data analysis from hours to minutes while maintaining high accuracy and reducing costs. Top clinicians and researchers are utilizing the platform to achieve faster diagnoses for critically ill newborns, cancer patients and expecting parents waiting on prenatal tests, and faster results for scientists and drug developers.
With AWS, you can access your own private data sets or controlled repositories such as the NIH Database of Genotypes and Phenotypes (dbGaP). You can also use the toolset of your choice (like GATK or Galaxy) to analyze your data. AWS has all the tools you need to address the security and compliance requirements for working with these sensitive data sets, including built-in features to encrypt your data at rest or in-transit.
AWS has published a whitepaper that describes how to work with controlled data sets using AWS. Download the AWS dbGaP whitepaper »
AWS Solutions for Data Sets
Amazon Simple Storage Service (Amazon S3) is object storage with a simple web service interface to store and retrieve any amount of data from anywhere on the web. It is designed to deliver 99.999999999% durability, and scale past trillions of objects worldwide.
Apache Hadoop is an open source software project that can be used to efficiently process large datasets. Instead of using one large computer to process and store the data, Hadoop allows clustering commodity hardware together to analyze massive data sets in parallel. Amazon EMR makes it easy to create and manage fully configured, elastic clusters of Amazon EC2 instances running Hadoop and other applications in the Hadoop ecosystem.
Running Docker on AWS provides a highly reliable, low-cost way to quickly build, run, test, and deploy distributed applications at any scale. AWS provides support for Docker open-source and commercial solutions within AWS services.
CfnCluster is a tool used to build and manage High Performance Computing (HPC) clusters on AWS. Once created, you can log into your cluster via the master node where you will have access to standard HPC tools such as schedulers, shared storage, and an MPI environment.
AWS Partner Solutions for Data Sets
Seven Bridges Genomics provides a scalable and secure cloud-platform for NGS data analysis. The platform is aimed at researchers, labs, core facilities, and pharmaceutical companies to manage large amounts of NGS data, design and run scalable analysis pipelines, and efficiently collaborate on projects.
DNAnexus provides a global network for sharing and management of genomic data and tools to accelerate genomics. The DNAnexus cloud-based platform is optimized to address the challenges of security, scalability, and collaboration, for organizations that are pursuing genomic-based approaches to health, in the clinic and in the research lab.
BaseSpace Sequence Hub offers a wide variety of NGS (next-generation sequencing) data analysis applications (apps) that are developed or optimized by Illumina, or from a growing ecosystem of third-party app providers. Together, these cover all the common analysis methods used with Illumina NGS data, including RNA-Seq, exome/enrichment, amplicon, whole-genome sequencing (WGS), de novo assembly, and 16S metagenomics data analysis.
Share your data with your collaborators whether they are down the hall or on the other side of the globe. AWS can provide a central, shared workspace where you and your colleagues can data sets or write algorithms and create tools, without having to physically move the data back and forth or worry about intellectual property infringement.
AWS Collaboration Solutions
Amazon Simple Storage Service (Amazon S3) is object storage with a simple web service interface to store and retrieve any amount of data from anywhere on the web. It is designed to deliver 99.999999999% durability, and scale past trillions of objects worldwide.
Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket.
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools. Start small for $0.25 per hour with no commitments and scale to petabytes for $1,000 per terabyte per year, less than a tenth the cost of traditional solutions.
AWS Partner Solutions for Collaboration
Thermo Fisher is an American biotechnology company that creates genetic testing and laboratory equipment. The company used AWS to build the Thermo Fisher Cloud, a platform that helps medical researchers and scientists securely store, analyze, and share data globally. By using AWS, Thermo Fisher can provide its customers with a scalable and secure platform on which to conduct research, collaborate, and improve medical treatments for patients.
REAN Cloud, pronounced “rain”, is your full-service AWS Premier Consulting Partner and Managed Service Partner. Put our cloud-native and DevOps expertise to work for you; delivering end-to-end enterprise IT solutions. REAN Cloud implements secure, compliant architectures for the most highly regulated industries, such as Financial Services, Healthcare/Life Sciences, Education, and Public Sector.
When you are ready to bring genomics into your clinical practice, AWS has the tools and an expansive ecosystem of partners that you can leverage in order to build HIPAA-compliant applications for genomics.
AWS Clinical Integration of Genomic Sequencing Solutions
AWS OpsWorks is a configuration management service that uses Chef, an automation platform that treats server configurations as code. OpsWorks uses Chef to automate how servers are configured, deployed, and managed across your Amazon Elastic Compute Cloud (Amazon EC2) instances or on-premises compute environments. OpsWorks has two offerings, AWS Opsworks for Chef Automate, and AWS OpsWorks Stacks.
Amazon SWF helps developers build, run, and scale background jobs that have parallel or sequential steps. You can think of Amazon SWF as a fully-managed state tracker and task coordinator in the Cloud.
Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. Amazon API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, authorization and access control, monitoring, and API version management. Amazon API Gateway has no minimum fees or startup costs.
AWS Partner Solutions for Clinical Integration of Genomic Sequencing
Syapse enables health systems to improve clinical outcomes, streamline operations, and shift to new payment models. As a comprehensive software suite used by leading health systems to implement precision oncology program, this category-defining platform enables clinical and genomic data integration, decision support, care coordination, and quality improvement at point of care.
AWS Marketplace is an online software store that enables genomics companies to find, buy, and immediately start using popular business software and publishing applications running on the AWS cloud.
Learn more about AWS Marketplace»
The Amazon Partner Network (APN) includes partners who have achieved AWS Life Sciences Competency by demonstrating AWS technical competence with supporting customer references. Working with these Competency Partners gives you access to innovative, cloud-based media solutions that have a proven track record of success in the media and publishing industries.
Learn more about Life Sciences Competency partners solutions »
We can help you get started with a consultation from our sales and architecture organization, or you can begin your own pilot today.