Thursday, June 02, 2011

Sequence/short read archive (SRA) back from the dead

Well, in February it seemed as though the Sequence/Short Read Archive was dead (see for example, End of Sequence Read Archive (SRA) - some quick notes and Though I generally love NCBI, the Sequence/Short Read Archive (SRA) seems to have issues; what do others think?)).

But now it seems as though there has been a resurrection of sorts (I guess I should have paid attention in March). See for example: SRA Home where it is stated that
"Recently, NCBI announced that due to budget constraints, it would be discontinuing its Sequence Read Archive (SRA) and Trace Archive repositories for high-throughput sequence data. However, NIH has since committed interim funding for SRA in its current form until October 1, 2011. In addition, NCBI has been working with staff from other NIH Institutes and NIH grantees to develop an approach to continue archiving a widely used subset of next generation sequencing data after October 1, 2011.

We now plan to continue handling sequencing data associated with:

RNA-Seq, ChIP-Seq, and epigenomic data that are submitted to GEO
Genomic and Transcriptomic assemblies that are submitted to GenBank
Genomic assemblies to GenBank/WGS
16S ribosomal RNA data associated with metagenomics that are submitted to GenBank
In addition, NCBI will continue to provide access to existing SRA and Trace Archive data for the foreseeable future. NCBI is also continuing to discuss with NIH Institutes approaches for handling other next-generation sequencing data associated with specific large-scale studies."

Well I'll be. I guess it is back at least for now.


  1. That is a good news, for the time being. But with Illumina annoucing new orders of magnitude every 6 months, with compressed FASTQ files approaching now 100Gb per lane, the problem will certainly occur again in the future. I am not sure it actually be possible to publish raw data into a centralized database for each of our sequencing experiments.
    The other point is that back in February, your post started (and participate) intense complaints about many aspects of the SRA system. It would be good to take these complaints into account and transform the way Users are treated on both end of the database (submitting and retrieving data)...

  2. I tend to agree this is delaying the inevitable. For files that large I think the best case scenario for the likes of NCBI, Uniprot, GenBank, etc. is to be trusted sources of bio-specific torrents and checksums. Hopefully higher education and a helpful public could step in to propagate the data itself.

  3. Any tips on retrieving SRA-type datasets from reluctant authors...?


Most recent post

My Ode to Yolo Bypass

Gave my 1st ever talk about Yolo Bypass and my 1st ever talk about Nature Photography. Here it is ...