Saturday, February 09, 2008

Marco Island sequencing frenzy - are we getting lost in all the data?

So - the Marco Island meeting could be summed up as a sequencing frenzy. Everyone and their mother presented something about how the next generation sequencers are revolutionizing their work. People are using these systems (well, people are using mostly Illumina and Roche sequencers and some are using ABI Solid) to do all sorts of new things - RNAi, gene expression, methylation, mutations, population genetics, comparative genomics, metagenomics, infectious disease, etc etc etc.

Certainly, much of this new work is truly revolutionary. Sequencing has gotten so cheap, and so easy, relative to even 3-4 years ago, that it can be used in all sorts of new ways. And new developments in sequencing seem poised to happen too, to make sequencing even cheaper and even better. Sequencing will get even better for a few reasons.

First, the current players in the market (Roche, Illumina, and possibly soon ABI Solid) are improving both their systems and their informatics such that they are getting more and more robust and easy to use and producing more data (Roche for example presented details on how they can extend their sequence reads to ~350 base pairs mostly by software and reagent changes). Lots of companies can say "We have some sequencing system about ready to be introduced" And lots of them are doing this at this meeting. But there is nothing like having sequencers in the hands of scientists to really test how well they work and to really help push the development of the technology.

Second, it does seem like some competitors for Illumina and Roche are coming. ABI presented multiple results from ABI Solid technology that makes it seem like these systems are ready for prime time. Whether other systems are ready for prime time is unclear. Helicos presented what could be seen as data. It was disappointingly minimal on detail but though I am rooting for them, it was far from convincing. But the discussions in the hallway seemed to suggest that Helicos is getting close. And there are 5+ other players itching to get into the market (some of which are apparently presenting later today). Some will fail. Some will succeed. And as long as their are a couple of good systems, the competition will push further development and reductions in costs. Thus everyone at the meeting I talked to said basically the same thing --- this is an exciting time in sequencing.

So - sequencing is getting better and cheaper. That certainly will be good in many ways. But there are some negative aspects to this frenzy. I see two in particular. The first, which was discussed extensively at the meeting, is that nobody is really prepared to deal with the sheer volume of data coming out of these new systems. Data storage, transfer and analysis will unquestionably be the rate limiting steps in turining the new sequence data into knowledge.

And this is the other negative aspect of the new frenzy. Right now there seems to be a mad rush to apply the new sequencing methods to everything under the sun. And the data piles up. And piles up. And the biology seems to have taken a back seat in some cases. Perhaps the bext example of this is exemplified by something Neil Hall pointed out yesterday to me. There has been almost no mention at this whole meeting of things related to function of genes. For example, I have not heard "gene ontology" once. I do not think I have even heard "annotation" once. Function and process have been replaced by terms like "systems biology" and "SNPs" and "networks" and "massively parallel." We have in a way regressed in terms of treating organisms (or communities) as a black box. Fine scale detail has been lost in a sea of data. In a way, we have all become born again geneticists. And I do not mean to disparage genetics. But I mean the part of genetics that treats organisms as a bit of a black box and focuses just on transmission of traits. We need to find a way to not get lost in all the data. I am not sure how to do that, but when we do, then the full potential of the new sequencing methods will be realized.


  1. As I was always a geneticist I don't need to be born again. However i think the problem is that we are so impressed with our ability to generate data people are forgetting to look at what they are generating. A lot of work seems to end with "we found 100 break points" or "we found 1000000 SNPs" or "there are 100 genes under selection". But few people look at the functions. When genome sequencing started we were guilty of this sort of thing but i thought it got better once the data generation got boringly simple. But with all these new machines on the market we are focusing on the toys again and maybe the biology is taking a back seat.

  2. At the risk of sounding long in the tooth, all this sure sounds familiar. Every generation has their data burst, and then the commentators say, 'but what about the biology?' and then in five years it's standard and everyone forgets about their arguments about swimming in too much data. I don't think it's something to really worry about too much. There are lots of labs still focusing on function - perhaps they were too busy with their cool biology to attend the Marco Island meeting - and they'll figure out ways (with the help of informatics types) to make the data biologically relevant. But enough of that. What I was confused about is, I thought 454 and Solexa were the main players - but there's no mention of them in your post - have they been subsumed by Roche and Illumina?

  3. Solexa are part of illumina

    454 are part of roche

  4. There has been almost no mention at this whole meeting of things related to function of genes. For example, I have not heard "gene ontology" once. I do not think I have even heard "annotation" once.

    Are our standards so low that taking GO classes into account means that we're getting at the function of genes? I read a lot of papers in which the authors did some high throughput analysis of a large dataset, then pointed out which GO classes were over-represented at one tail of the distribution of some statistic. I never really considered that an adequate analysis of function or biologically meaningful analysis. But if people aren't even doing that anymore . . . well, that's pretty lame.

  5. What I meant was the analyses were not very meaningful in term of biological functions

    My examples were admittedly poor as I agree with your assessment of genome analyses

    It is just that in the last few years genome sequencing talks I have seen are usually accompanied by more functional studies of many kinds and this year the sequencing took center stage again. Mind you I love these new toys ... I was just trying to say we are focused on the toys and the data right now

  6. matt

    that type of thing in that paper is not really what I meant, as we and many others have also worked on phylogenetic/phylogenomic methods for extracting information from many genomes at once

    the problem now is that the data itself is not per se high enough quality to use for such methods and everyone is getting caught up in the daw data (e.g., how to accurately map Solexa reads or assemble them)

  7. There has been almost no mention at this whole meeting of things related to function of genes. For example, I have not heard "gene ontology" once. I do not think I have even heard "annotation" once.

    I think its easier to point out the exciting bits and novel way of doing experiments than to mention how they are not very sure of how to proceed with data analysis.

    I think this is just a natural progression of things when you have a new toy you try everything then later take a breath to think how to deal with it.
    at this point in time everyone is interested to see if they can use NGS on their work
    the data analysis together with the biology takes a back seat... which is so wrong..


Most recent post

My Ode to Yolo Bypass

Gave my 1st ever talk about Yolo Bypass and my 1st ever talk about Nature Photography. Here it is ...