Thursday, August 13, 2009

Overselling genomics award #6: Quake/Helicos & the "democratization" of sequencing

For those interested in so-called "third generation" DNA sequencing systems, this week has had some buzz with the release of a publication in Nature Biotechnology reporting the sequencing and analysis of a human genome using a Helicos Heliscope sequencer. In this paper Stephen Quake and colleagues generated short read sequences from Quake's DNA using this machine and then analyzed them by comparing them to reference human genomes.

Certainly, what they did was cool. And the use of the Helicos equipment is a good thing for that company and it's development of single molecule sequencing. And given the "race" if you want to call it that for the $1000 genome, it is thus not surprising that this paper received a lot of coverage from all sorts of angles because they claim it involved the cheapest sequencing of a human genome yet achieved.

So first I want to commend Quake and Helicos for an important step in third generation sequencing. Quake mind you is one guy who is constantly inventing cool new techniques of great use in genomics and biology and he is always worth checking out.

But in this case, there are some aspects of what they claim they achieved here that are very off putting. In particular, I am concerned with the supposed "democratization of sequencing" that they think this project embodies (e.g., see some of the quotes in this). The basis for their concluding that democratization has happened here is that they believe this sequencing (of Quake's genome) was done at lower cost and with less effort than previous human genome sequencing efforts. To back this up they make a table (Supplemental Table 1) detailing estimates of these values for 8 human genome papers (the original Lander et al and Venter et al ones, as well as Watson's genome, etc) that is meant to represent some of this information).

In essence Quake et al are doing the following math (my formula, not theirs, but their discussions imply basically this)

D = B/(E*C)
Democratization factor (D) = # of bases sequenced (B) / (amount of effort (E) * cost (C))

That is, with more sequence, less effort, or less cost, the more democratized sequencing is. Sounds fine in some ways. Except when you look at the details.

For example consider the cost (C) of the sequencing. They report that the cost for the sequencing was < $50,000. But this number is misleading since, for example, they do not include any aspect of the cost of actually buying and setting up the machine. For more detail on the flaws in the cost calculation and for more detail on the whole story see Times Online and Dan Macarthur at Genetic Future and GenomeWeb).

However, more disconcerting to me is what they do with the rest of the implied calculation.

For example, they treat all the projects in essence as though they are equal in terms of total number of bases sequenced (B) because I guess after all, all were sequencing human genomes. But this is not fair since the depth of sequencing and the quality of sequencing varies between the projects and more recent projects, such as theirs, make use of the data from prior projects, which allows them to gather less data (e.g., in their paper here they assemble the genome by tiling the reads against reference genomes, thus allowing them to do lower coverage than would be required for denovo assemblies of genomes).

But even worse - the way they calculate effort required (E) is flabbergasting.

They seem to infer this in two ways. First, they make use of the number of runs of the machine that are required. They apparently used four runs while they claim that the use of second generation sequencing methods required many more runs. And many have been questioning this claim (e.g., see Chad Nusbaum's quotes in the GenomeWeb article).

It is the second way that they infer effort that is perhaps the most annoying. They infer this from number of authors on the papers describing the sequencing of these human genomes (e.g., In Supplemental Table 1 they say "number of authors" is "an estimate of labor.") And the big thing for Quake et al is that there are only three authors on their paper and dozens to hundreds on other human genome papers. Based on this lower number of authors they conclude that their work required less effort and discuss this as evidence for further democratization of sequencing.

Now suppose we gloss over that there is no way to infer amount of effort by number of authors (e.g., letters to the editor, which usually do not require a lot of effort, can sometimes have hundreds of authors while Origin of Species had but one author and was, shall we say, a lot of work). Even worse to me is that they are trying to compare their paper which is focused almost entirely on the technical aspects of the sequencing with other papers that spend much more effort on studying and discussing what the genomes might mean. For example the Venter/Celera and the public human genome papers are complex detailed volumes with analysis of everything you could think of. To compare the effort required to do this with the effort required to do what they did in the Quake paper which was pretty much assembly and analysis of SNPs is inappropriate and actually offensive.

Given the number of areas that they have oversold how their project has reduced effort and cost for sequencing a human genome and how this implies democratization, I am giving Quake and Helicos my coveted "Overselling genomics award". Again, not that what they did was not cool or interesting, but by overselling it, it detracts from everything they achieved.

9 comments:

  1. Another irritating aspect of this hype is calling this "third generation" sequencing. That's an interesting marketing ploy for a company that is about to make a very late entry into the second generation sequencing market with an instrument that is more expensive and generates lower quality data than those already being produced by their competitors. If they are lucky, they will make a small dent in Illumina and ABI's market share before they are overrun by the true third-generation instruments.

    ReplyDelete
  2. I could be totally wrong, but I'm guessing there are some technicians and/or data analyzers who are hacked at being left off the author list of this "three-author project."

    And, as you might imagine, I completely agree with you that this paper is nothing like the human genome papers, older and newer (like Solexa)! Or the chimpanzee, or the mouse, or.... Guess they didn't have some crazy editor repeating the mantra "novel biological insight" incessantly. But to compare this to those papers is like apples and oranges. Can't we just appreciate this paper for what it is, in the proper context, without hyping it up?

    ReplyDelete
  3. The initial investment on the equipment and intsllation is tiny when compared to the over all cost. It is ridiculous just to compare the cost on the whole new technology.
    I don't know who you are but this blog has only one purpose to help your partners( That is helico's competitors).
    I really don't know who you are but I say at this moment you should appreciate once work rather than your stupid comments.
    If you are capble of doing it do it in different way and show the world. Stupidity here.

    ReplyDelete
  4. Hmm. I am torn. Normally I delete comments that are offensive. But anonymous, you are so onto the truth that I probably should keep it. It is true that I work for many of Helicos' competitors. Strangely, though, none of them have ever paid me for all the work I do for them. So maybe my work arrangement is a bit off. And it is also clearly true that when we are discussing the possible democratization of sequencing that the cost of equipment and set up should not be included because after all, those costs can just be billed to taxpayers in creative ways. So thanks for pointing me to the truth with your pleasant vibe.

    ReplyDelete
  5. The sequencer can be useful for things that do not require too much accuracy, like checking the direction of an insert into a plasmid vector or counting the number of tandem repeats.

    ReplyDelete
  6. Besides bashing accuracy with no evidence in your pocket, you guys are totally missing the boat. "Cost" for a university / non-profit is relative and may focus on capital investments.

    "Cost" for a commercial venture is much more to the bone. You've totally ignored the enormous Helicos advantage compared to legacy system sample preparation complexity, time, and cost. A commercial venture dealing with rabid burn rates needs to get it done faster and cheaper. They can save months in development and have far fewer employees using a Helicos system. The up front cost is not the issue in this situation.

    Simple and fast. Why you ignore that I don't understand. But you completely miss what's going on in the business world.

    ReplyDelete
  7. Umm. Docsparks, perhaps you missed all the press coverage and the quotes from Quake and Helicos who were saying this was evidence for the "democratization" of sequencing. They were saying this meant anyone and everyone could do sequencing for cheaper. Your argument relates to corporations or large entities with significant $$$ to start up. Fine, I am not sure I agree with your #s but I accept it is a possibility. But that is far far far from "democratization."

    ReplyDelete
  8. John, didn't miss it at all. I think you need to open up a bit on this. A little imagination has some entity buying the machines and renting time on them.

    Eliminating the lengthy, complex, and highly skilled preparation process of the older technology is an incredible advantage and opportunity.

    The business case here is just not that challenging - even if one chose to ignore further advancements and drops in cost per sequencing due to the intense nature of the competition.

    The proof [of course] will be found in the marketplace

    ReplyDelete
  9. Docsparks - I think you are missing many of the points here.

    1) They use # of authors on a paper they wrote as proof that the project was easy. Yet they compare their project, which was a simple proof of principle of doing comparative sequencing to big scale projects to sequence the first human genome. It was RIDICULOUS on its face, and actually a bit offensive. As I said, I like Quake and think the technology is cool. But this # of authors on papers trick was idiotic.

    2. The preparation of DNA for 454 or Illumina or ABI solid sequencing, which are the main competitors of Helicos is simply not that complicated. It may be easier to do Helicos sequencing but I think that is unlikely.

    3) The main challenge with the new sequencing technologies is informatics not prepping samples. They present no evidence that the informatics here was easier than any other methods.

    I am not against Helicos, I even visited them many years ago pondering the possibility of switching from sequencing genomes to developing technology. But I stand by my statements about the press release and coverage associated with this paper. It was off the wall. If they want to stand on their strengths, all the power to them. If they want to throw out red herrings about # of authors, and democratization, that is lame.

    ReplyDelete