But now it seems as though there has been a resurrection of sorts (I guess I should have paid attention in March). See for example: SRA Home where it is stated that
"Recently, NCBI announced that due to budget constraints, it would be discontinuing its Sequence Read Archive (SRA) and Trace Archive repositories for high-throughput sequence data. However, NIH has since committed interim funding for SRA in its current form until October 1, 2011. In addition, NCBI has been working with staff from other NIH Institutes and NIH grantees to develop an approach to continue archiving a widely used subset of next generation sequencing data after October 1, 2011.We now plan to continue handling sequencing data associated with:RNA-Seq, ChIP-Seq, and epigenomic data that are submitted to GEOGenomic and Transcriptomic assemblies that are submitted to GenBankGenomic assemblies to GenBank/WGS16S ribosomal RNA data associated with metagenomics that are submitted to GenBankIn addition, NCBI will continue to provide access to existing SRA and Trace Archive data for the foreseeable future. NCBI is also continuing to discuss with NIH Institutes approaches for handling other next-generation sequencing data associated with specific large-scale studies."
Well I'll be. I guess it is back at least for now.
That is a good news, for the time being. But with Illumina annoucing new orders of magnitude every 6 months, with compressed FASTQ files approaching now 100Gb per lane, the problem will certainly occur again in the future. I am not sure it actually be possible to publish raw data into a centralized database for each of our sequencing experiments.
ReplyDeleteThe other point is that back in February, your post started (and participate) intense complaints about many aspects of the SRA system. It would be good to take these complaints into account and transform the way Users are treated on both end of the database (submitting and retrieving data)...
agreed NIcolas
ReplyDeleteI tend to agree this is delaying the inevitable. For files that large I think the best case scenario for the likes of NCBI, Uniprot, GenBank, etc. is to be trusted sources of bio-specific torrents and checksums. Hopefully higher education and a helpful public could step in to propagate the data itself.
ReplyDeleteAny tips on retrieving SRA-type datasets from reluctant authors...?
ReplyDelete