Tuesday, December 19, 2006

HYG101 - A Short story about Neanderthal Genomics, written in 1998

All these recent stories on the Neanderthal sequencing projects reminded me of a short story I wrote while in graduate school. I wrote this story for a class taught by Carl Djerassi, a professor at Stanford and an author of a variety of what he calls "Science in Fiction" novels. In these, he writes about the workings of science and thus the name of the genre.

For this class all the students were supposed to write short stories that were then shared among the students and with Djerassi and each week we would read and discuss two of the stories. The stories written by the other students were great overall - moving and interesting and fun to read and discuss. The class digressed at the end due to Djerassi's desire for the group to write a fiction "Renga." Renga is a form of poetry where one person writes a line and then another writes the next line and together a group is supposed to come up with a poem of interest.

So we wrote a story - with each student in the class writing a paragraph and then sending it on to the next author. In the end, the story was, in my opinion, much worse than all the individual's stories. But amazingly, Djerassi convinced Nature to publish the story and thus I am a co-author on what was the first (and probably last) short story published in Nature. The worst part of it was - Nature made us reduce the lenght of the story to fit some page restrictions. That would be fine I guess if we knew this in advance. But in the end, what this did was cause us to edit the story and remove any hints of the "Renga" nature of the story. I was embarassed by the story and wanted to use a pen name so people would not know it was me. I chose Joan Eisen but for some very particular reasons I had to change my author line back to Jonathan (long story).

OK -back to the main point. I wrote a story that is about sequencing Neanderthal DNA. And though I am not sure any more what I think of the story ... here it is.

-------------------------------------------------------------------------------
HYG101

David Cohn was unaware that, while he was camping in Death Valley, every five minutes his computer would contact U. C. Berkeley's mainframe system and download any email messages David had received since the last check. David did not find out that he had left his email program open until, fumbling for his keys outside his office, he heard his computer play two randomly selected segments of Mozart's clarinet concerto, announcing that two new messages had just arrived. David had been gone for three weeks and had come in early in the morning so he could check his email in peace before other people would arrive and start asking about his trip or reading his email over his shoulder.



When David finally got his office door open, he put his backpack down, sat down on top of the phone messages he did not see on his chair, and turned on the computer screen. "Well, at least I remembered to turn something off," he thought. As the screen hummed to life, David dug the phone messages out from under his bike shorts and put them aside. When the screen display finished forming, David found out he had seven hundred and fifty-one new messages. While David felt a sense of pride that his computer had made it three weeks and had collected so many messages without crashing, he was no longer looking forward to reading his email.

David was conflicted about email. He loved getting letters, and when he first started using email, he liked to pretend that each message was a real letter. He would print each message on the laser printer down the hall, and read it carefully on the walk back to his office. Unfortunately, this soon became impractical and wasteful. Now, in his fourth year of graduate school, David felt he could no longer take the time to pretend email was real mail.

David took a deep breath, grabbed the mouse with his right hand, and started working away. Each message was condensed into one summary line, but with so many messages, the summary list itself was thirty pages long. After sorting the messages by subject and scanning the subject list to get a feel for the overall pattern of his email, David felt a little relieved. Most of the messages were clearly junk. With each click of the mouse button, messages with subjects like "frisbee" or "opera" were banished to the trash. Messages that he did not want to deal with now, but might need later were stored in an archive file. After some thirty minutes the list had been whittled down extensively.

David knew that it was time to start reading some of the messages. He looked over his shoulder and opened a message with the subject "Free Sex":
Date: Thu, 9 January, 1997 12:00:10 (PST)
From: data@trek.com
To: davidc@berkeley.edu
Subject: Free Sex

Hah. Made you look you Loser.
Star Trek rules. You suck.
David had no idea who this was from or what it was about, but he soon found out. He had dozens of messages from people who thought he had sent an email to an international Star Trek discussion group saying that Star Trek was the worst TV show ever and that Star Trek fans were "Losers with a capital L." David figured one of his office mates had taken advantage of the fact that he had left his email account open and had sent the message to which these strangers were responding. "Very funny," David thought, and all messages about Star Trek joined the junk mail in the trash.

David went back to scanning the summary list. Messages with the words "Nature" or "Nature Paper" in the subject line caught his eye. David figured these might be related to the paper he was first author of that was being reviewed for publication by the prestigious scientific journal Nature, so he started reading some of them. The first six messages he looked at said things like "Congrats" or "Way to go," but were not very informative. "Way to go for what?" David wondered. His question was answered with the next message he opened:
Date: Sun, 12 January, 1997 9:45:22 (PST)
From: jchui@nytimes.com
To: davidc@berkeley.edu
Subject: Nature paper

Dear Dr. Cohn:

I am writing a piece for the New York Times about your article in the latest issue of Nature. I would like to interview you for it but I have been unable to reach you by phone. If you could email me times that would be good to talk on the phone I would appreciate it.

Joel Chui, New York Times
212-556-1245
It always made David laugh to be referred to as Dr. since he had only started graduate school a few years before. David was somewhat thrilled by the invitation from the Times, but he was confused too. He wondered how his paper had been published since it had not even been given the final acceptance by Nature?

"The Nature Paper," as David had come to refer to it in his head, was the result of the combined effort of people at three separate institutions: Professor Dmitri Bohr and his graduate student Chris from the University of Pennsylvania; Kevin Gogan, a research scientist from Monkeygene Corporation; and David, in the Genetics Department at Berkeley. "The Team" had its beginnings two years earlier ...

David had just finished his first scientific paper. David's advisor John Gold was listed as a co-author on the paper, but the work was a completely independent project by David. David had ended up in Professor Gold's lab mostly by chance, and had spent his first year in the lab drifting, unable to find a project that interested him. Professor Gold studied mouse developmental genetics and early on David realized he was not interested in this. Professor Gold was not thrilled with David's lack of focus, but he told David that as long as he worked on something related to mouse development or mouse genetics, he could stay in the lab. After attending a seminar that caught his attention, David decided to work on mouse population genetics. Since David was a little worried about working in such a crowded and competitive field, he narrowed his focus to population genetics of the mouse Y chromosome.

"Why the Y?" people would ask David. Although his choice had been based mostly on a whim, his answer went something like "Well, since the Y is passed down only from father to son, population genetic studies that focus on the Y are essentially studies of the relationships among males." Despite his lack of a profound interest in the subject, David did do some good work. Specifically, David used a technique that allowed him to determine the relationships among male mice by comparing the chemical structure of the DNA (also known as the DNA sequence) of the Y chromosome from many different males. David determined the DNA sequence of a part of the Y chromosome called MYG101 (for mouse Y chromosome gene number 101) from twenty mouse strains. In his paper, he used these sequences to infer the relationships among the males of these twenty strains. To do the required analysis, David had designed a novel neural network based computer program that was optimized to analyzing Y chromosome data. He also showed that, for a variety of reasons, MYG101 was an ideal choice for studies of mouse Y chromosome relationships. Just before the paper was published, David received an email from Professor Bohr:
Date: Wed, 12 April, 1995 6:45:22 (PST)
From: dbohr2@upenn.edu
To: davidc@berkeley.edu
Subject: Y chromosome

Dear David:

I hear you have a paper in press conserning a new method to study the Y chromosome relationships. We have been trying to use studies of the Y to infer the genetic background of some anceint human specimens (e.g., bog men). We have some prelimnary results and would like your advice on whether you think your analysis would work on our data? If you have time to look at our data, let me know and I will send it to you. Alternatievly, you could send us your program and we could run the analysis. Also, can you send a preprint of your paper?

Thanks in advance, Dmitri Bohr

David responded immediately. He told Professor Bohr he would be happy to look at their results, but he was unable to send his computer program since he was thinking of trying to patent it. David offered instead to run the program for Professor Bohr. David also suggested that Professor Bohr might want to determine if there was a gene similar to MYG101 on the human Y chromosome. Such a gene would likely be useful for population genetic studies of the human Y chromosome since it had worked so well in mice. Professor Bohr's response came quickly:
Date: Fri, 14 April, 1995 9:55:12 (PST)
From: dbohr2@upenn.edu
To: davidc@berkeley.edu
Subject: Re: Y chromosome

David. I appreciate your suggestion but at the time I am unable to commit anyone in my lab to working on any new projects with these anceint samples. However, I will have my graduate student Chris send you some samples to play around with. I am also attaching our limited data. let me know if you get any interesting results.

Dmitri
David was not thrilled by the tepid response. He had been hoping Professor Bohr would want to look for a MYG101-like gene in humans. He ran some simple analysis on Professor Bohr's limited data and emailed Professor Bohr to let him know that he would need much more data to get any useful results. A few days later, David received a package with some samples from Chris and he put the samples in the freezer and forgot them for a while.

David plodded on. On a typical day, he would come to lab early, check his email, start a computer analysis of some new data, do a few experiments, and every so often he would check his email again and start a new computer analysis. Through email David communicated with many people about the computer program he had created for the mouse paper. Since he still did not want to distribute the software, he set up a web site at which people could use his program but would not have access to its code. David was a little worried about people using the program on their own since the analysis required multiple steps and seemingly trivial things could lead to large errors. This led him to put a disclaimer on the web site saying that he, David Cohn, was not responsible for errors or mistakes of any sort. He had his computer keep a log file of all people who used the site and he liked to look at it occasionally to find out who was using his program. Over time, David got tired of answering the same questions people would email him about his program, so he set up an automatic response system that would send a generic email message to anyone who asked about the program.

Eventually David realized that his advisor did not pay attention to what he was working on, whether it was mice, bacteria or video games. He was bored with mouse research and wanted something new. Just when David was becoming aware of his boredom, he found out from a literature search program that he was subscribed to that a gene similar to MYG101 had been cloned from the human Y chromosome. The authors of the paper called it HYG101. David felt this could be the break he was waiting for. That afternoon he took the samples Chris had sent him out of the freezer and started to play around with them. He used a few tricks that he knew for cloning and sequencing DNA from poorly preserved samples and much to his surprise, after only a few weeks of work, he got positive results: he had determined the DNA sequence of HYG101 from a bog man. He emailed Professor Bohr with the good news and Professor Bohr responded:
Date: Sun, 13 August, 1995 13:25:22 (PST)
From: dbohr2@upenn.edu
To: davidc@berkeley.edu
Subject: Bogged down

Great news. Although I am in Australia on sabbatical, I will get Chris to start sending you more samples including some more bog men and a few Neanderthals. Yes Neanderthals!

D
After a few back and forth email messages, they developed a plan: David would clone and sequence HYG101 from various ancient human or humanoid samples, Chris would do some control experiments, and they would keep each other informed. Since few people knew of Professor Bohr's Neanderthal samples, they decided not to tell anyone about their project. David knew he would need some help with all the samples so he emailed Kevin Gogan, who had been a graduate student in the Genetics Department a few years before, and who was now at Monkeygene Corporation, to ask if he could help. Kevin responded:
Date: Fri, 15 Sep 1995 11:10:21 (PST)
From: kgogan@monkeygene.com
To: davidc@berkeley.edu
Subject: Sequencing

Davey boy. Send me the samples. As I said before all I have to do is put them in with my samples on the automated DNA sequencer. Nobody will know the difference. Yes, I remember that IOU, but I might have done it anyway even if you hadn't reminded me. And yes it would be nice to be an author on the paper. BTW - Why can't you tell me what these samples are.

K
Slowly the project advanced. Every once in a while, David would send Kevin a tube containing the HYG101 preparation from a particular sample and after a few weeks Kevin would send David an email with the sequencing results. In his spare time, David refined his computer program, correcting a few bugs and adding some new features that would help handle large numbers of samples. In addition, David found out that a forensics project funded by the F.B.I. was going to determine the DNA sequence of HYG101 from hundreds of people of diverse genetic backgrounds. David realized that these sequences would be very helpful in placing the ancient samples in a genetic context. In exchange for helping the forensics project get off the ground, David was given access to a private web site at which the forensics project was going to put the sequences as they were determined.

Finally, after over a year of work on the ancient samples, and suggestive preliminary analysis, David and the others decided it was time to finish up their project. They felt a need to rush since the sequences from the forensics project were about to be released to the public and other people would likely start using these sequences for population genetic studies. Since David's computer was quite slow, and since they had a large number of samples to analyze, Professor Bohr suggested that they perform the sequence analysis using a government supercomputer on which he had an old account. David copied all of his files and his program to this computer and started the analysis. After what seemed like many months (but which was actually only two weeks), the program finished.

The results were, to put it simply, astonishing. The ancient human samples like the bog men could be placed into major human genetic groups by comparison to the modern day human samples from the forensics project. They even found that two men from the same bog were likely brothers. However, the most interesting result was that modern human Y chromosomes could be divided into two main lineages that had separated from each other a long time ago and that one of the lineages was closely related to the Y chromosome from Neanderthals. The only explanation that made sense was that there had been interbreeding between a Neanderthal male and a human female sometime in the past. This would be a huge story.
So after all the email, all the sending of samples back and forth, and all the computer time, they finally got around to writing a paper. It took many messages for them to agree on a specific plan, and even then David was not completely comfortable with it. Chris would write the Introduction, Discussion, and Conclusions since David knew little about human populations, and David would write the Methods and Results. The actual first author of the paper, which was a big deal, was determined by a random draw and David won. Chris and Professor Bohr agreed that Kevin should be a co-author too, which was good for David since he had already promised this to Kevin. They sent text and figures back and forth and the paper was finished in four weeks. When they were done, they submitted it, by email of course, to Nature.

Weeks went by with no response from Nature. David spent much of this time anxiously checking his email for any word about "The Nature Paper." Finally, after almost two months, they could no longer wait, and they wrote to the editor to find out what was going on. The editor wrote back:
Date: Mon, 23 December, 1996 12:11:32 (PST)
From: editor@nature.com
To: dbohr2@upenn.edu, davidc@berkeley.edu
Subject: Re: Manuscript

Dr's Cohn, et al.:
I must apologize for the delay in review. We had one reviewer who got ill and a replacement reviewer still has not responded. Given the amount of time it has taken, and the novelty of the data, I think we will just go with the first reviewers comments. The comments are mostly positive but there are some things that it was suggested you do before the paper can be accepted. I will fax the comments to you right away.

Robin Ralston, editor
Professor Bohr received the fax and faxed a copy to David. Chris was on vacation and Professor Bohr had a grant due in a few days, so Professor Bohr told David to deal with the reviewer's comments himself. David did not know much about the review process -- nobody had really explained it to him. His only previous paper had been accepted with no recommended revisions so David had never had to deal with reviewer's comments himself. David still had not told his advisor about "The Nature Paper" and thus did not want to ask him for advice. He asked Kevin for suggestions and Kevin told him to make a list detailing how he was responding to each of the reviewer's comments, and to send this list along with the revised manuscript to the editor. David was glad he was in charge of the revisions since he had noticed that one of the tables identifying which modern human Y chromosome lineages were related to Neanderthals had some inadvertent mix ups. Kevin told David to not mention this correction, and any other minor changes he wanted to make. That would just slow things down.

David went about addressing the reviewer's comments. There were seven things that the reviewer said "Cohn et al." should do to allow the paper to be accepted and published. One particular concern was the computer program used to perform the analysis. The reviewer wrote, "I used the program on the web site listed in the Methods section with a subset of the sequences you analyzed and I got an unusual result. Is there any explanation for the discrepancy?"

David had only used his web site once, and that was with a small data set. He realized that he had not put his refined version of the program on the web site. He installed the new version of the program on the web site and assumed that something in the old program had caused the unusual results observed by the reviewer.

The reviewer also said they should add the Seqbase reference numbers that would allow other people easy access to the HYG101 sequences from the forensics project. Seqbase was a public DNA sequence database maintained by the National Institutes of Health. David logged on to the Seqbase web site and ran a search for sequences named HYG101. The search program returned a summary list of eighty-five sequences. What David saw in the "Comments" column elicited an emotion he rarely felt: panic. He could feel his heart rate increase, and from the pounding of his pulse he felt that his blood pressure must have doubled. He even uttered a loud gasping noise, but fortunately nobody was around at the time. All this simply because the words "Neanderthal Lineage" came up over and over in the Seqbase entries.

Upon reading those words, David realized that they had been scooped. Some other group must have used the sequences from the forensics project, just as they had feared, and had also gotten access to Neanderthal samples. This was the worst scientific moment of his life. He opened up some of the sequence files to find out who had done the dirty deed, who had scooped them. All he could find in the files was a line that said either "In the Y chromosome lineage related to Neanderthals" or "In the Y chromosome lineage related to ancient humans," depending on which sequence it was. There was no reference to who had done the work. David examined the files more carefully and noticed that the Seqbase entries contained the same error that he had found in his original table. This could mean only one thing. Someone had somehow put results from their paper into Seqbase. They were scooped by themselves.

David's panic was replaced by anger. He uttered a profanity and pounded his fist against the only clear space on his desk. He had not told anyone about this work. He had not even told Kevin exactly what the samples were that Kevin was sequencing. Now everyone in the world could see their results, yet their paper might still get rejected if the editor or the reviewer did not like how he responded to the reviewer's comments. For all he knew, their results had been in Seqbase for months. Even with the errors from the jumbled Table, other researchers might get ideas from their results. To David, getting scooped by someone inadvertently stealing their results via Seqbase would be much worse than getting scooped by someone who came up with the idea on their own.

David wrote to Professor Bohr telling him about the Seqbase files and asking if he knew anything about how Seqbase might have gotten the information. Professor Bohr said that he and Chris had not sent the paper to anyone and he told David to write a letter to Nature's editor to make a formal complaint since the leak must have come through Nature or the reviewers. Just as David was about to send the letter, he got an idea. He had a way to find out who the reviewer was and this might provide clues about who the leak was. Since the reviewer claimed to have used David's program on his web site, the reviewer's email address should be in the log file. David's search of the log file convinced him that the reviewer was the leak -- one of the log entries came from a computer at Seqbase.

David did not tell Professor Bohr about his new discovery. Instead he wrote a letter that he sent to both Nature and the head of Seqbase. In the letter, he played up the fact that he was just a graduate student and that he did not understand the system very well but that it seemed wrong that their results were in Seqbase without their permission when their paper had not even been accepted. He did not mention that he knew someone at Seqbase was responsible. The email response was very quick:
Date: Wed, 25 December, 1996 11:12:44 (PST)
From: ablue@seqbase.org
To: davidc@berkeley.edu, editor@nature.com
Subject: I am responsible ...

Dear Dr. Cohn,

I am fully responsible for what happened. I reviewed (favorably) your manuscript. When I was attempting to reproduce your sequence analysis, I noticed that the HYG101 sequences from the forensics project had been released to the FBI web site. I made Seqbase entries for each of these since they were now publically available. So far so good. Then I created new files for each of these that included your results. These were not supposed to be released until your paper was accepted. But I forgot to tell the person in charge of our web server this and your results unfortunately became part of the monthly update of Seqbase. Since all updates are automatically sent to other sequence databases, your results are now probably in all major sequence databases around the world. The best I can do is correct the annotation in the next release.

I am very sorry about this incident and I hope that you accept my apologies. Let me know if there is any way I can make sure that you do not get hurt scientifically by this.

Dr. Arthur Blue,
Director, Seqbase
David emailed Dr. Blue back, saying he accepted his apologies, but was still a little concerned about their results being available to others before their paper was accepted. David added that it would probably be OK, as long as their paper got accepted and published as soon as possible. He also emailed the Nature editor to let her know that the apologies were accepted. Feeling that the editor and the reviewer owed him one, David sent in the revisions with a few changes that were not asked for but which David thought made the paper more interesting. He even added a section that Professor Bohr and Chris had taken out before the paper was submitted since they thought it was too controversial. Once the revisions were sent, he took off for his camping trip ...

Now, David was back from his trip, sitting in front of his computer, finding out "The Nature Paper" was somehow accepted and published in less than three weeks. What was going on? To find out, David decided to call Chris. Chris was supposed to be back from vacation and David did not feel like talking to Professor Bohr. David was surprised to find out Chris was a woman, but he tried to not let it show. After the introductions, and the expected comments about how they had never talked on the phone, he asked her about the paper being published so quickly.

Chris told David that, while David was gone, the editor of Nature had contacted Professor Bohr and told him that the revisions were approved and that the paper would be pushed out quickly. Professor Bohr and Chris had checked and corrected the page proofs; and that was that. Chris hinted that she and Professor Bohr were a little surprised by some of David's revisions but David did not care. They had dumped the task on him so he felt he was allowed some leeway. David and Chris chatted for a while about the New York Times. Chris had done her interview a few days earlier and she told David to relax and have fun. David found out Chris was from near where he grew up and they discussed this for a little while too. After they said their good-byes, David decided to call the reporter from the Times. When looking for the email from the Times reporter, David noticed a recent email from Kevin and he opened it:
Date: Sun, 12 January, 1997 00:01:11 (PST)
From: kgogan@monkeygene.com
To: davidc@berkeley.edu
Subject: Thanks monkey brain

Hey Jerkwad. Thanks for letting me know our paper was out. Anyway, youre forgiven.

Get this. Monkeygene, in all of its brilliance, decided to try to patent all of its sequences. Guess what??! Your HYG101 sequences got patented along with all the others because they just sent out the reads from the automated sequencer. It gets better. They patented them as monkey sequences. All sorts of different types of monkeys since I put your samples in with different monkey species samples. And more amazing, they are now in Seqbase, listed as monkey sequences. Pretty funny huh? So someone might think monkeys interbred with humans too, just like Neanderthals.
"Well," David thought, "that about does it for me." It was now seven in the evening and David decided to deal with the Times tomorrow. As he was closing the door to his office, a second of Mozart chirped from his computer. "At least there will be something to do tomorrow," David thought, and he let the door close.

3 comments:

  1. But amazingly, Djerassi convinced Nature to publish the story and thus I am a co-author on what was the first (and probably last) short story published in Nature.

    You may well have been the first, but for the last few years _Nature_ has been publishing a short science related SF story at the end under the rubric "Futures". Maybe your story helped influence that decision.

    ReplyDelete
  2. Ironically, I just read today's _Nature_ and they say that they are planning on ending their "Futures" series there. Hmm, famed Open Access supporting blogger posts a story that he once co-wrote for _Nature_, and _Nature_ brings their series of stories to an end. Coincidence?

    ReplyDelete
  3. I completely missed the recent fiction in Nature. But I guess that is not surpprising since I cancelled my subscription a while ago and do not really look at it too often anymore (not I confess due to any issues with Open Access - just due to the being overwhelmed with moving, new kid,etc).

    But I do buy the conspiracy theory again --- I already believed in conspiracies before this, but now I can say I am part of one.

    ReplyDelete