NLM Curation at Scale Workshop: Recorded webinars now available!

NLM Curation at Scale Workshop: Recorded webinars now available!

NLM’s virtual Curation at Scale Workshop held March 28 – March 30, 2022, brought together biocurators, developers of automated curation systems, and others to learn about the current status of biomedical data curation, to share research and challenges, and to discuss the implementation of advanced computational techniques in scientific data curation. In addition to curation, participants discussed key related aspects such as data standards and data sharing.

Speakers from academia, government, publishing, and industry presented in sessions focused on Curation in the Ecosystem of Science and Scholarship, Putting Automation into Curation Workflow, Gene/Variant Curation in Precision Medicine, and Adapting Data Curation @NLM. In addition, there was a panel discussion on Curation at Scale in the times of Pandemics. And more! Continue reading “NLM Curation at Scale Workshop: Recorded webinars now available!”

RefSeq Release 212 is available!

RefSeq Release 212 is available!

RefSeq Release 212 is now available online, from the FTP site and through NCBI’s Entrez
programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of May 2, 2022, and contains 314,915,153 records, including 229,417,182 proteins, 44,805,833 RNAs, and sequences from 119,373 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Human genome Annotation Release 110

Annotation Release 110 is the first new annotation of human in four years, including all latest curated RefSeqs, and recalculation of models using over 80M long reads and 9B Illumina RNA-seq reads. AR 110 includes annotation of two human assemblies: Continue reading “RefSeq Release 212 is available!”

NCBI Posters at the Biology of Genomes Meeting

NCBI Posters at the Biology of Genomes Meeting

May 10-14, 2022

We are looking forward to the Biology of Genomes meeting, which will focus on “DNA sequence variation and its role in molecular evolution, population genetics and complex diseases, comparative genomics, large-scale studies of gene and protein expression, and genomic approaches to ecological systems.”

NCBI will present three posters to highlight our Comparative Genomics Resource (CGR) and the Allele Frequency Aggregator (ALFA):

  1. The NIH Comparative Genomics Resource: Amplifying the biology of genomes presented by Valerie Schneider, PhD

On behalf of NIH, NLM is developing the NIH Comparative Genomics Resource (CGR) at NCBI to facilitate organism-spanning data connections and promote new research discoveries. This initiative aims to connect NCBI genomics-associated data types and tools with resources external to NCBI to provide a foundation for reliable comparative analysis for all eukaryotic research organisms. Continue reading “NCBI Posters at the Biology of Genomes Meeting”

New ClusteredNR database: faster searches and more informative BLAST results

New ClusteredNR database: faster searches and more informative BLAST results

Reduced redundancy. Faster searches. More diverse proteins and organisms in your BLAST results. Check out our new ClusteredNR database – derived from the default BLAST protein nr database by clustering sequences at 90% identity / 90% length (details below).  Get quicker results and access to information about the distribution of your hits across a wider range of organisms and evolutionary distances.

Searching ClusteredNR

You can choose the ClusteredNR database in the Choose Search Set section of the BLAST submission form where you normally pick the BLAST database.  Simply select the Experimental databases radio button.  You can also select the checkbox to search both ClusteredNR and the standard nr at the same time allowing you to compare results (Figure 1).

Figure 1. The ‘Choose Search Set’ section of the BLAST submission form. Selecting the Experimental databases radio button chooses ClusteredNR. You can also perform simultaneous searches against the clustered and the standard nr by checking ‘Select to compare standard and experimental database.’ Continue reading “New ClusteredNR database: faster searches and more informative BLAST results”

Logging into your My NCBI account is now easier, faster and more secure!

Logging into your My NCBI account is now easier, faster and more secure!

As of June 2022, you will be required to login using a 3rd-party option.

As we announced previously, beginning June 2022, you will no longer be able to log into your My NCBI account with your My NCBI username and password.

We launched our Password Retirement Wizard last summer to help you link a 3rd-party login to your My NCBI account and retire your password. Currently, the wizard is opt-in, which means you can retire your password at your convenience. In June, retiring your password will become mandatory to access any features that require a My NCBI login. Continue reading “Logging into your My NCBI account is now easier, faster and more secure!”

Announcing GenBank Release 249.0

Announcing GenBank Release 249.0

GenBank release 249.0 (4/19/2022) is now available on the NCBI FTP site. This release has 17.85 trillion bases and 2.66 billion records.

The current release has 237,520,318 traditional records containing 1,266,154,890,918 base pairs of sequence data. There are also 1,781,374,217 WGS records containing 16,071,520,702,170 base pairs of sequence data, 534,770,586 bulk-oriented TSA records containing 474,421,076,448 base pairs of sequence data, and 109,820,387 bulk-oriented TLS records containing 41,324,192,343 base pairs of sequence data.   Continue reading “Announcing GenBank Release 249.0”

NCBI ALFA Project at Bio-IT World 2022 Hackathon

NCBI ALFA Project at Bio-IT World 2022 Hackathon

Announcing the Allele Frequency Aggregator (ALFA) Project as part of the Bio-IT World 2022 Hackathon: Visualization of NCBI ALFA Variants

Join NCBI at the Bio-IT World 2022 Hackathon on May 4-5, 2022 to learn about and work with data from our ALFA project! The primary goal of this hackathon project is to develop a novel tool, app, or approach to explore and visualize NCBI ALFA variants and allele frequency for 12 different human populations. We aspire to create a new helpful variant interpretation resource for the clinical and research communities.

We hope to see you there! More information and registration hereContinue reading “NCBI ALFA Project at Bio-IT World 2022 Hackathon”

Introducing SARS-CoV-2 Variants Overview, NLM’s latest tool in the fight against COVID-19 

Introducing SARS-CoV-2 Variants Overview, NLM’s latest tool in the fight against COVID-19 

The National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM) has released a new resource, called the SARS-CoV-2 Variants Overview, that aggregates data related to SARS-CoV-2 variants from sequences available in NCBI’s GenBank and Sequence Read Archive (SRA) databases.

SARS-CoV-2 Variants Overview, a freely available online dashboard, was developed with guidance from the TRACE Working Group as part of NLM’s participation in the National Institutes of Health (NIH) Accelerating COVID-19 Therapeutic Interventions and Vaccines (ACTIV) initiative, a public-private partnership for a coordinated research strategy to support and speed up the development of COVID-19 treatments and vaccines.

One impetus for development of the dashboard is that unassembled SRA data cannot be processed through Pango tools, and many SARS-CoV-2 samples are only represented in SRA. The Pango nomenclature is being used by researchers and public health agencies worldwide to track the transmission and spread of SARS-CoV-2, including variants of concern. Thus, we developed a uniform approach to making variant calls from SRA records and assigning Pangolin lineages on the basis of these results. This means that submission groups do not have to go through the effort of creating assemblies. Continue reading “Introducing SARS-CoV-2 Variants Overview, NLM’s latest tool in the fight against COVID-19 “

New RefSeq Annotations!

New RefSeq Annotations!

In February and March, the NCBI Eukaryotic Genome Annotation Pipeline released thirty-seven new annotations in RefSeq for the following organisms:

  • Belonocnema kinseyi (wasp)
  • Daphnia pulex (common water flea)
  • Daphnia pulicaria (crustacean)
  • Dermatophagoides farinae (American house dust mite)
  • Diprion similis (hymenopteran)
  • Drosophila willistoni (fly)
  • Equus quagga burchellii (Burchell’s zebra) (pictured)
  • Gallus gallus (chicken)
  • Haliotis rubra (blacklip abalone)
  • Haliotis rufescens (red abalone)
  • Helicoverpa zea (corn earworm)
  • Homalodisca vitripennis (glassy-winged sharpshooter)
  • Hydra vulgaris (swiftwater hydra)
  • Hypomesus transpacificus (delta smelt)
  • Ictalurus punctatus (channel catfish)
  • Ischnura elegans (damselfly)
  • Lolium rigidum (monocot)
  • Lucilia cuprina (Australian sheep blowfly)
  • Lynx rufus (bobcat)
  • Marmota monax (woodchuck)
  • Meles meles (Eurasian badger)
  • Micropterus dolomieu (smallmouth bass)
  • Neodiprion fabricii (hymenopteran)
  • Neodiprion lecontei (redheaded pine sawfly)
  • Neodiprion pinetum (white pine sawfly)
  • Neodiprion virginiana (hymenopteran)
  • Oncorhynchus gorbuscha (pink salmon)
  • Osmia bicornis bicornis (red mason bee)
  • Scatophagus argus (bony fish)
  • Schistocerca americana (American grasshopper)
  • Schistocerca piceifrons (Central American locust)
  • Silurus meridionalis (bony fish)
  • Ursus americanus (American black bear)
  • Vanessa cardui (painted lady)
  • Vespa crabro (European hornet)
  • Vigna umbellata (eudicot)
  • Xenia sp. Carnegie-2017 (soft coral)

View the full list of annotated eukaryotes available in the Genome Data Viewer (GDV) browser.

New feature in the MSA viewer: Search for a short sequence

New feature in the MSA viewer: Search for a short sequence

We’re reading and incorporating your feedback! As requested, you can now search for sequences in our Multiple Sequence Alignment (MSA) Viewer. You can search the anchor or consensus sequence of a multiple alignment for short sequence strings. This new feature allows you to: