DNA sequencing continues to go crazy in terms of lower cost, higher speed, and spread of technology. Alas, some aspects of doing a genome project are not necessarily keeping up. So I am posting here to ask a simple question about one of these steps. What do people out there think about the steps of getting genome / metagenome data into Genbank. Without wanting to bias answers too much - we are having some challenges in this area. Storify of Twitter responses below the fold
Tuesday, March 26, 2013
Question - anyone having issues w/ delays/difficulty in the process of getting genomes / metagenomes into Genbank?
Subscribe to: Post Comments (Atom)
Most recent post
Panoramas from the past ...
In 1997, my then girlfriend (now wife) and I went camping in Death Valley. It was our fourth trip there to camp in grad. school. I brought m...
I have a new friend in Google Scholar Updates I have written about the Updates system before and if you want more information please see...
New article out from the Eisen Lab: Isolation and sequence-based characterization of a koala symbiont: Lonepinella koalarumSee Isolation and sequence-based characterization of a koala symbiont: Lonepinella koalarum Paper based on PhD thesis work of Katie Dahlha...
Just got this press release by email. I am sick of receiving dozens of unsolicited press releases, especially those in topics not related ...
As much as it saddens me to say this (since a colleague refers to me as the dumpster diver of genomic data), I think that scientists need to have a frank conversation about the costs and benefits of saving every piece of genomic data and curating it.ReplyDelete
Absolutely - but given that everyone is wasting months to years on just getting data into Genbank - we need to either agree that people won't do that or find another way to share.Delete
You don't have to submit to GenBank ... the European Nucleotide Archive is much easier to work with than GenBank or the SRA. There are a few publications where people use SEED, RAST, microbes online, IMG, etc etc etc to announce bacterial genomes.ReplyDelete
If GenBank is an archive of your annotations why do they make you use PGAAP to annotate. If you don't need to use PGAAP why don't they accept annotations directly from third party tools (RAST, microbes online, etc)?
The days of monolithic databases holding all sequence data known are nearing an end. The question is, how can scientists still get access to all the sequence data they are interested in?
The problem with github/figshare/etc/etc/ is generating a common dataset that holds all (or most) of the sequence data for new comparisons.
There is no technical reason we have to submit to GenBank, we should be able to use whatever database has the best access. Provided they are listed in a common aggregator of web services e.g. http://www.biocatalogue.org/) and they provide an API for programmatic access then everyone can access the data from anywhere. [In principle we could use RDF but that does not work in practice.]
Free the data, make smaller, open databases, but make sure they are linked and accessible to all.
I can see the point(s) about ease of submission, but the major issue with lots of smaller DBs is sustainability, who will look after a database longer term?ReplyDelete
There is also the point that being in a single format allows for direct comparisons to be made.
And finally, from an end users point of view, wouldn't you rather be able to go to one (or just a few) place to find all the genomes of interest?
Now I've not tried to submit data to NCBI so I dont know your pain, but having worked in the ENA at EBI, I know it really isn't so hard to submit data there.
Regarding @caseybergman 's comment about @GigaScience , we can provide an option for those submitting papers to the Giga Science journal, but we're still encouraging submission of raw data to the SRA.