Now Available: GenBank Release 262.0!

Now Available: GenBank Release 262.0!

GenBank release 262.0 (8/22/2024) is now available on the NCBI FTP site. This release has 34.10 trillion bases and 4.76 billion records.

The current release has: 

  • 251,998,350 traditional records containing 3,675,462,701,077 base pairs of sequence data
  • 3,569,715,357 WGS records containing 29,643,594,176,326 base pairs of sequence data
  • 755,907,377 bulk-oriented TSA records containing 706,085,554,263 base pairs of sequence data
  • 187,321,998 bulk-oriented TLS records containing 77,026,446,552 base pairs of sequence data 

Continue reading “Now Available: GenBank Release 262.0!”

NCBI Hidden Markov Models (HMM) Release 16.0 Now Available!

NCBI Hidden Markov Models (HMM) Release 16.0 Now Available!

Download release 16.0 of the NCBI protein profile Hidden Markov models (HMMs) used by the Prokaryotic Genome Annotation Pipeline (PGAP)! Search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.

What’s New?

Release 16.0 contains:

  • 17,078 HMMs maintained by NCBI
  • 406 new HMMs since release 15.0
  • The GO terms between NCBI HMMs and the corresponding Interpro entries were compared and evaluated over a substantial number of HMMs and updated (added: 307; deleted: 39; updated: 1,482). 

Continue reading “NCBI Hidden Markov Models (HMM) Release 16.0 Now Available!”

Quick & Easy Access to Mpox Data Through NCBI Virus

Quick & Easy Access to Mpox Data Through NCBI Virus

The World Health Organization (WHO) declared the recent upsurge of the mpox virus to be a public health emergency of international concern. Having timely viral genome data freely and widely available enables researchers to explore how this virus differs from viruses isolated and sequenced in the past. Therefore, NCBI’s GenBank is expediting the release of mpox data by annotating gene and coding region features as part of the submission process.  Continue reading “Quick & Easy Access to Mpox Data Through NCBI Virus”

NCBI’s First-Ever BioEd Summit Was a Success!

NCBI’s First-Ever BioEd Summit Was a Success!

NCBI hosted its first-ever BioEd Summit: Crafting Student-Centric Curricula with NCBI resources. This week-long, in-person event for science educators across the U.S. was held on the National Institutes of Health (NIH) campus in Bethesda, MD, from August 5-9, 2024. 

Event Details 

During the week, educators participated in morning sessions including interactive workshops on NCBI educational curricular design, the use of various NCBI resources in teaching, and detailed hands-on discussions and practice with NCBI tools. A panel discussion on employing novel, data-driven, active learning exercises in science classes with leaders from several institutions including:   Continue reading “NCBI’s First-Ever BioEd Summit Was a Success!”

NCBI’s PopSet Database to Retire Effective January 2025

Beginning in January 2025, NCBI’s PopSet database will no longer be available.

While PopSet web pages (example) will no longer be accessible, individual sequences of PopSet will still be searchable and accessible in Nucleotide as independent records (example).  A link under ‘Related information’ on a GenBank record page will also let users access other sequences of the same set. Continue reading “NCBI’s PopSet Database to Retire Effective January 2025”

Access and Download Sequence Data and Metadata Using NCBI Datasets

Access and Download Sequence Data and Metadata Using NCBI Datasets

Goodbye Assembly and Genome, hello NCBI Datasets!

Exciting news! NCBI has streamlined and modernized how you access and download genome, taxonomy, and gene information with NCBI Datasets. As previously announced, NCBI Datasets is replacing the legacy Genome and Assembly resources providing you a single entry point to genome datasets. Effective today, the legacy pages are retired and no longer available.

Please note there will be no changes to how you programmatically access the databases using E-Utilities or EDirect. Continue reading “Access and Download Sequence Data and Metadata Using NCBI Datasets”

Coming Soon! Rapid Access to Influenza Data

Coming Soon! Rapid Access to Influenza Data

Improved Influenza GenBank submission process

Do you submit flu sequences to GenBank? Thanks to community feedback, NCBI is excited to announce that we are improving the influenza GenBank submission process. We continue to play a key role in providing the biomedical community free and easy access to genome sequences from viruses. To further advance public health research, in the coming weeks we will begin to expedite the release of influenza data. This means you will see the rapid assignment of accession numbers and data becoming publicly accessible within hours. In addition, we will automatically process all Influenza genomes to produce standardized, consistent annotation which saves you time and benefits the researchers who find your data valuable. Continue reading “Coming Soon! Rapid Access to Influenza Data”

Now Available: Assembled Genomes for Influenza Viruses and Improved Functionality of NCBI Virus

Now Available: Assembled Genomes for Influenza Viruses and Improved Functionality of NCBI Virus

NCBI Virus now offers genomes for viruses such as Influenza A by using an automated process to group segments from the same samples. We group these segments into genomes based on metadata for the sample including species, isolate name, host organism, collection date, and location. Newly released GenBank records are added daily. 

Access these genome assemblies through NCBI Virus using the new NCBI Virus Assembly” tab above the Results Table as shown below. Continue reading “Now Available: Assembled Genomes for Influenza Viruses and Improved Functionality of NCBI Virus”

NCBI Taxonomy Updates to Birds (Aves)

NCBI Taxonomy Updates to Birds (Aves)

Recent molecular comparisons have better defined relationships among high-level taxonomic groups in birds. To reflect new data and classification changes, NCBI is improving our Taxonomy resource. As previously announced, we updated the higher-level classification of birds (Aves) with the introduction of a new major taxonomic group (clade).

What’s new?

The new clade, Neoaves, comprises about 95% of all birds. The Neoaves contain six major superordinal clades: 

Continue reading “NCBI Taxonomy Updates to Birds (Aves)”

Open Access! Million Veteran Program Genome-Wide PheWAS Results Now Available in dbGaP!

Open Access! Million Veteran Program Genome-Wide PheWAS Results Now Available in dbGaP!

The Million Veteran Program (MVP) is a research program from the U.S. Department of Veterans Affairs (VA) that has collected and analyzed health information from over one million veteran volunteers. The data include genes, lifestyles, military experiences, and exposures that may impact health and wellness.  

The results of the MVP phenome-wide association study (PheWAS) analysis are now available in NLM-NCBI’s database of Genotypes and Phenotypes (dbGaP). The PheWAS summary data is based on information from approximately 600,000 veterans from four broad ancestry groups, with hundreds of phenotypic traits recorded in medical records. This is one of the largest publicly available PheWAS datasets to date and does not require an application to access the data.   Continue reading “Open Access! Million Veteran Program Genome-Wide PheWAS Results Now Available in dbGaP!”