Quickstart

This page shows you how to set up and start using Google Genomics.

Before you begin

  1. If you don't already have one, sign up for a Google Account.
  2. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  3. In the Cloud Platform Console, go to the Projects page and create a new project.

    Go to the Projects page

  4. Enable billing for your project.

    Enable billing

  5. Enable the Genomics, BigQuery, and Cloud Storage APIs.

    Enable the APIs

Launch Cloud Shell to use the command line

You can use Cloud Shell to access the Google Cloud SDK, which includes tools and libraries that you need to create and manage resources on Google Cloud Platform, including Google Genomics, Google Compute Engine, Google Cloud Storage, and BigQuery.

To launch Cloud Shell:

  1. Navigate to the project you want to use in the Cloud Platform Console.

  2. Click the Activate Google Cloud Shell button at the top of the console window.

    Activate Google Cloud Shell

    A Cloud Shell session opens inside a new frame at the bottom of the console and displays a command-line prompt.

    Cloud Shell session

Run a query

Query a dataset from the 1000 Genomes Project using the Genomics tools.

  1. Search a variant set for variants at a specific location:

    gcloud alpha genomics variants list --variant-set-id "10473108253681171589" --reference-name "22" --start 51003835 --end 51003836
    

    This query returns the following variant:

    VARIANT_SET_ID        REFERENCE_NAME  START     END       REFERENCE_BASES  ALTERNATE_BASES
    10473108253681171589  22              51003835  51003836  A                [u'G']
    
  2. Search callsets for individuals with calls (including reference calls) at the same location:

    gcloud alpha genomics callsets list "10473108253681171589" --limit 10
    

    This query returns the following individuals:

    ID                      NAME     VARIANT_SET_IDS
    10473108253681171589-0  HG00261  [u'10473108253681171589']
    10473108253681171589-1  HG00593  [u'10473108253681171589']
    10473108253681171589-2  NA12749  [u'10473108253681171589']
    10473108253681171589-3  HG00150  [u'10473108253681171589']
    10473108253681171589-4  NA19675  [u'10473108253681171589']
    10473108253681171589-5  NA19651  [u'10473108253681171589']
    10473108253681171589-6  NA19393  [u'10473108253681171589']
    10473108253681171589-7  NA19207  [u'10473108253681171589']
    10473108253681171589-8  HG00342  [u'10473108253681171589']
    10473108253681171589-9  NA12546  [u'10473108253681171589']
    
  3. Gives the details of the calls at this location.

    gcloud alpha genomics variants list --variant-set-id "10473108253681171589" --reference-name "22" --start 51003835 --end 51003836 --format json
    

    This query returns the following details:

    [
      {
        "alternateBases": [
          "G"
        ],
        "calls": [
          {
            "callSetId": "10473108253681171589-0",
            "callSetName": "HG00261",
            "genotype": [
              1,
              1
            ],
            "genotypeLikelihood": [
              -5.0,
              -1.2,
              -0.03
             ],
            ...
    
  4. You can also search read group sets:

    gcloud alpha genomics readgroupsets list 10473108253681171589 --limit 5
    

    This query returns the following 10 read group set IDs:

    ID                      NAME     REFERENCE_SET_ID
    CMvnhpKTFhDq9e2Yy9G-Bg  HG02573  EOSt9JOVhp3jkwE
    CMvnhpKTFhCEmf_d_o_JCQ  HG03894  EOSt9JOVhp3jkwE
    CMvnhpKTFhCjz9_25e_lCw  HG01440  EOSt9JOVhp3jkwE
    CMvnhpKTFhCKwY-lmoOYDQ  NA20790  EOSt9JOVhp3jkwE
    CMvnhpKTFhCYnZrehsCAFg  HG01455  EOSt9JOVhp3jkwE
    CMvnhpKTFhDv55POyoCXIw  NA19074  EOSt9JOVhp3jkwE
    CMvnhpKTFhDVt4WpxI2KJw  HG01794  EOSt9JOVhp3jkwE
    CMvnhpKTFhDt6uGJ6YSOLQ  HG02166  EOSt9JOVhp3jkwE
    CMvnhpKTFhDC-oi_l9fLMw  HG03268  EOSt9JOVhp3jkwE
    CMvnhpKTFhCPqeCAnfPLNA  HG04014  EOSt9JOVhp3jkwE
    
  5. Get the description of a read group set:

    gcloud alpha genomics readgroupsets describe CMvnhpKTFhDq9e2Yy9G-Bg
    

    This query returns the following description:

    datasetId: '10473108253681171589'
    filename: HG02573.mapped.ILLUMINA.bwa.GWD.low_coverage.20130415.bam
    id: CMvnhpKTFhDq9e2Yy9G-Bg
    info:
     SAM:@SQ:
     - "SN:1\tLN:249250621\tAS:NCBI37\tM5:1b22b98cdeb4a9304cb5d48026a85128\tSP:Human\t\        UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz"
      ...
    
  6. Search a set of read group set IDs, and give a genomic range to query:

    gcloud alpha genomics reads list "CJ_ppJ-WCxDxrtDr5fGIhBA" --reference-name "chr20" --start 68198 --end 69000 --limit 5
    

    This query returns the following description:

    REFERENCE_NAME  POSITION  REVERSE_STRAND  FRAGMENT_NAME                 SEQUENCE
    chr20           68099     True            H7F3RADXX_1:1:1207:2208623:0   CTTACAGTTCTACGGGATAATAGCTTATCTCATAAGGCCTCAGCTTTCTTTAATAATTTCTAGAAGCAGACGTTATTGTGTCATGCACACTAAGTGTTGC │
    chr20           68106     True            H7F3RADXX_1:1:2202:1444084:0   AGTTCTATTGGATAATAGCTTATCTCATAAGGCCTCAGCTTTCTTTAATAATTTCTAGAAGCAGACGTTATTGTGTCATGCACACTCAGTGTTGCAAATT │
    chr20           68109     True            H7F3RADXX_1:1:2210:379591:0   TACGGGATAATAGCTTATCTCATAAGGCCTCAGCTTTCTTTAATAATTTCTAGAAGCAGACGTTATTGTGTCATGCACACTCAGTGTTGCAAATTAATGG
    chr20           68110     False           H7F3RADXX_1:1:2206:641983:0   ACGGGATAATAGCTTATCTCATAAGGCCTCAGCTTTCTTTAATAATTTCTAGAAGCAGACGTTATTGTGTCATGCACACTCAGTGTTGCAAATTAATGGT
    chr20           68112     True            H7F4RADXX_0:2:2111:1926421:0  GGGATAATAGCTTATCTCATAAGGCCTCAGCTTTCTTTAATAATTTCTAGAAGCAGACGTTATTGTGTCATGCACACTCAGTGTTGCAAATTAATGGTCT
    

What's next