Tuesday, May 29, 2012

Diversity (of speakers, participants) at meetings: do something about it

Some unformed thoughts here but here goes.

Every so often I see a conference announcement and am annoyed by the XY/XX excess for the speakers.  Some recent examples
And more.

Now - I complain about this here and there on Twitter and the like

But I felt that this needed a blog post to not get lost in the Twitter stream.  So here it is.

I note - I have posted about this issue previously: A conference where the speakers are all women? | The Tree of Life and for conference for which I am involved I have been trying very hard to work on the speaker diversity (not just XX vs XY, but age, career status, ethnicity, etc).  And it certainly can be difficult to make sure that diversity is there.  But the meetings I list above are pretty egregious.  The Genome Canada one features seven major speakers - all white males.  Yes, they are all big names.  But in biology, where women are reasonably well represented, it suggests a bias to me if a meeting can somehow only manage to invite and/or attract all senior, white, XYs to be major speakers.  Not sure what that bias is and it could be different in each case -  could be who is invited - could be the field itself - could be timing/nature of the meeting - could be something to do with families (e.g., perhaps women are invited but are more likely to feel like limiting travel due to roles in child care).

Also I note - biases are not necessarily affecting any one gender or ethnic group.  For example, I have generally stopped going to meetings/conferences that are on weekends and I have also stopped going to meetings/dinners after 6 PM because I do not want to skip out on time with my family.

So here is a plea.  Next time you are involved in organizing a meeting - make some effort to have a strong representation of diversity of speakers and participants.  For example, if you invite lots of women for example and all say no - try to figure out why and see if you can fix the issue.  Offer travel fellowships for students.  Offer child care or child activity options (even if you cannot pay for it - at least make it easy for people).  Make sure to advertise/promote the meeting to groups/institutions with a high representation of underrepresented groups.  Don't give up if your first efforts don't work.  Sometimes it can be difficult to make sure diversity levels are high.  But keep trying ... it will help make the conference better and also will help the field in general ...

For other posts on this topic see

Sunday, May 27, 2012

Dubious Press Release from Cedars-Sinai linking Irritable Bowel Syndrome (IBS) and Bacteria in Gut

Quick one here.

Not impressed with this press release from Cedar-Sinai: Dr. Pimentel links IBS and gut bacteria - Cedars-Sinai (see other variants of it here: Daily Disruption – Cedars-Sinai Study Links Irritable Bowel Syndrome (IBS) and Bacteria in Gut and here: Irritable bowel syndrome clearly linked to gut bacteria).

Among the things that bug me here:
  • They don't include a link to the paper or even provide a citation
  • They claim that culturing microbes is the "gold standard" for connecting bacteria to the cause of this disease.  AND they imply this is the first method to use culturing to study the disease.  Both notions are wrongheaded.  
  • They confuse cause of IBS and symptoms.  They say that b/c antibiotics help reduce symptoms, therefore, bacteria cause the disease.  Really?  So then fevers must cause things like malaria and flu because ibuprofen helps reduce symptoms right?
  • At some point it might be nice to mention that the MD behind the new study has also been pushing the idea that IBS is caused by bacterial overgrowth for many years both in a book and via a testing company though it is unclear what his association with the company is.  I note - ads for the book claim " In addition, Dr. Pimentel presents a simple treatment protocol that will not only help you resolve your IBS symptoms, but will also prevent their recurrence."  So - apparently he already had a cure BEFORE the new study was even done.  I general I am skeptical of papers that show evidence for something coming from someone who apparently already "knew" the answer.
Of course, I am not saying IBS is NOT caused by bacterial overgrowth as they claim.  But I can say this - PRs like this make me skeptical that anything new was done in this current publication.

Friday, May 25, 2012

Nice Collection from Diane Dawson: Open Science and Crowd Science: Selected Sites and Resources

Quick post here - already posted to Twitter and wanted to make sure this one was seen by people who read this blog but don't follow me on Twitter.

There is a nice compilation/commentary/review from Diane Dawson titled Open Science and Crowd Science: Selected Sites and Resources.  It is in the journal "Issues in Science and Technology Librarianship" (which I note - is a new one to me).  It has a lot of useful resources and comments about various open science activities on the web.  Definitely worth checking out.

Yum - Carbon monoxide, worms, bacteria - all together - what could be better

Just a quick one here pointing people to a paper and some stories relating to work by Nicole Dubilier on the worm Olavius algarvensis and it's chemosynthetic symbionts.

Guest post: Story Behind the Paper by Joshua Weitz on Neutral Theory of Genome Evolution

I am very pleased to have another in my "Story behind the paper" series of guest posts.  This one is from my friend and colleague Josh Weitz from Georgia Tech regarding a recent paper of his in BMC Genomics.  As I have said before - if you have published an open access paper on a topic related to this blog and want to do a similar type of guest post let me know ...

A guest blog by Joshua Weitz, School of Biology and Physics, Georgia Institute of Technology

Summary This is a short, well sort-of-short, story of the making of our paper: “A neutral theory of genome evolution and the frequency distribution of genes” recently published in BMC Genomics. I like the story-behind-the-paper concept because it helps to shed light on what really happens as papers move from ideas to completion. It's something we talk about in group meetings but it's nice to contribute an entry in this type of forum.  I am also reminded in writing this blog entry just how long science can take, even when, at least in this case, it was relatively fast. 

The pre-history The story behind this paper began when my former PhD student, Andrey Kislyuk (who is now a Software Engineer at DNAnexus) approached me in October 2009 with a paper by Herve Tettelin and colleagues.  He had read the paper in a class organized by Nicholas Bergman (now at NBACC). The Tettelin paper is a classic, and deservedly so.  It unified discussions of gene variation between genomes of highly similar isolates by estimating the total size of the pan and core genome within multiple sequenced isolates of the pathogen Streptococcus agalactiae.  

Thursday, May 24, 2012

Nathan Wolfe talk at #UCDavis Wrap Up #Storify #Viruses

Nathan Wolfe talked at UC Davis yesterday.  I met with him for 30 minutes just before his talk.  Many times I feel that 30 minutes is more than enough when meeting with outside seminar speakers.  I definitely would have enjoyed more time with Wolfe - he does some pretty fascinating stuff.

Anyway - I escorted him to his talk and then I took notes for it on Twitter as did Pam Ronald (who was sitting next to me).  I then made a "Storification" of the talk (using the Storify.Com system).

This is below:

Wow - ALVIN submarine has potential to be vector for species movement cc: @deepseanews

Well, it is (relatively) common knowledge that surface ships can serve as unintentional vectors for the movement of organisms via things like ballast water (see for example this recent post on Deep Sea News which discusses this in part).  And the ecological favor wreaked by such ship-based-transport can be immense.

A new paper, and news story, call attention to an analogous process that might occur with deep-sea submarines (see news story here: U.S. News - Deep-sea aliens hitched ride by submarine to pristine area).  The basic summary is - researchers using the deep sea sub ALVIN have discovered that, contrary to expectations, some organisms from the deep were able to survive the sub surfacing, being brought on board the mother ship, and then being sent back down to another site.  Some limpets apparently hung out in some tubing for a day and were then "sampled" by the sub at another site.  Apparently, nobody had thought this might be an issue because they had assumed that the surfacing and bringing on deck and cleaning of ALVIN would kill any organisms from one site before traversing to the next place.  Apparently not.

I note one comment - it seems reasonable to think that microbes might be hitching a ride on ALVIN and other submersibles too ... which brings me back to the recent post on Deep Sea News I linked to above.  It is by Holly Bik, a post doc in my lab, and in it she discussed the possibility that microbes might be getting moved around by surface ships.  Well, it seems that submersibles should be looked at too ..

Wednesday, May 23, 2012

Skeptical of this: Invitation to Participate in the East African Universities' Lecture Series and a Safari

Hmm ... this smells off. Must be as SCAM somewhere in here.
Updated 12/13/13. I do not think this is a SCAM.  But I am not quite sure what it is.

---------- Forwarded message ----------
From: Tours of Purpose (TOP) <info@toursofpurpose.com>
Date: Wed, May 23, 2012 at 9:29 PM
Subject: Invitation to Participate in the East African Universities'
Lecture Series and a Safari

Dear University of California Faculty and Staff:

Tours of Purpose, TOP, a professor exchange agency dedicated to the
development and improvement of economic, academic and general welfare
in East Africa (i.e., Uganda, Kenya, Tanzania, Rwanda and Burundi) is
hereby extending a humble invitation to you to participate in our
ongoing program of lecture series taking place in our local
universities, colleges, high schools, primary schools and other
academic institutions.  Additionally, TOP would like to avail to you a
one in a lifetime opportunity to take a safari where you will see and
photograph lions, elephants, giraffes, leopards, zebras, hippos,
rhinos, among an array of wildlife, in addition to a rare opportunity
to visit the mountain gorillas and man's closest relative, the chimp,
at a TOP' scholars' give away price. Winston Churchill took this trip
and immortally dabbed Uganda "The Pearl of Africa", and Queen
Elizabeth was on such a safari when she learned that she had become
queen of England. The average visit lasts for about two weeks--with a
couple of days or so dedicated to visiting the said incredible African
wildlife reserves--although you may wish for your particular visit to
be shorter or longer. TOP would like to partner with a specific
professor, or any academician, in pursuing a possibility of coming to
East Africa to deliver lectures in any given dispositive academic
discipline.  TOP will cooperate with you in arranging and customizing
your travel details to Africa, including picking you up at the airport
in TOP state of the art SUVs, booking fair accommodation, arranging
your meals, setting and managing your speaking schedule, taking you on
a safari trip and other tours, and ultimately delivering you to the
airport for your flight home. TOP invariably offers the option of one
being paired with another educator from North America or Europe during
this trip, although traveling alone in East Africa is not complicated
nor precarious at all.

Kimberly-Clark's deceptive self serving PR regarding germs in the workplace #BadReportingToo

First I saw of this story was here: Study: Bacteria fills office break rooms - Local News - Houston, TX - msnbc.com

Something sounded off with this.  I think it was the fact that it involved "Cleaning products company Kimberly-Clark" that raised some alarm bells.  The involvement of Charles Gerba also left me a bit queasy as I have seen his name associated with a few recent "studies" which are basically germaphobia funded by cleaning product companies.

After looking around a bit I got discouraged at the whole thing and put it out of my head for a few hours.  And then David Coil, a post doc in my lab, sent me a link to the press release behind this story.  And boy is it a doozy.

The PR basically makes the following dubious statements or implications
* All bacteria are bad.  The whole PR references a study that they imply is about detecting bacteria in various locations.  And when they detect high levels they conclude this is bad.  For example in the title "Where the Germs Are: New Study Finds Office Kitchens and Break Rooms are Crawling with Bacteria".  Or in the text: "If you thought the restroom was the epicenter of workplace germs you don't want to know about office break rooms and kitchens" "office germ "hot-spots,"" "Office workers are potentially being exposed to illness-causing bacteria right in their own lunchrooms" and much more.  Uggh.  Not all bacteria are bad.  Gerba and Kimberly-Clark must know this yet they purposefully mislead.

* Presence of ATP means presence of bacteria (and see above - this must imply presence of bad bacteria).  Wow.  Not sure what to say here.  But they use a test for ATP which they say  "ATP is present in all animal, vegetable, bacteria, yeast and mold cells. Detection of ATP indicates the presence of contamination by any of these sources. Everyday objects with an ATP reading of 300 or higher are considered to have a high risk for illness transmission."  No citation given. And sounds highly dubious to go from ATP - > risk for illness.  Sounds completely dubious actually.

* That it is OK to make claims in Press Releases without presenting evidence behind the claims.  The PR tries to make this all seem very scientific.  Well, where is the paper behind this?  They claim "The findings are from a study carried out by Kimberly-Clark Professional* and is believed to be one of the most detailed and comprehensive studies ever conducted on identifying workplace hotspots where germs can lurk."  Where is the actual data?  Where are the methods described?  Yuck.

Alas - despite the fact that the Press Release is at best a self serving piece of dubious scientific quality - the press has run with the story sucking up everything Gerba and Kimberly Clark are saying.  Ugg.  Here are some examples, many of which really do a poor job on the science and the conflicts of interest inherent in an unpublished study from a cleaning products company

I am getting sick and tired of crap like this.  Kimberly-Clark may make some useful products.  I don't really know.  But deceptive press releases like this suggest that their dedication to science is, well, low.  They need to clean up their act.

Tuesday, May 22, 2012

What to do - what to do - cool microbial art w/ a #badomics word --- must resist purchasing -- must resist ...

OK - thanks to Dan Smith for pointing me to: Phonome original watercolor painting bacteria by artologica

This was inspired in part by phone sampling I helped Dan and Jack Gilbert do at the AAAS meeting.  And Michelle Banks (i.e., @artologica) has not only made microbial art out of it but has coined a new OME word.  I think she is aiming directly at me here ... must resist.  Must resist.

Friday, May 18, 2012

Story behind the paper guest post on "Resolving the ortholog conjecture"

This is another in my ongoing "Story behind the paper series". This one is from Christophe Dessimoz on a new paper he is an author on in PLoS Computational Biology that is near and dear to my heart.

See below for more. I am trying to post this from Yosemite National Park without full computer access so I hope the images come through. If not I will fix in a few days.


I'd like to thank Jonathan for the opportunity to tell the story behind our paper, which was just published in PLoS Computational Biology. In this work, we corroborated the "ortholog conjecture"—the widespread but little tested notion that orthologs tend to be functionally more conserved than paralogs.

I'd also like to explore more general issues, including the pitfalls of statistical analyses on highly heterogeneous data such as the Gene Ontology, and the pivotal role of peer-reviewers.

Like many others in computational biology, this project started as a quick analysis that was meant to take "just a few hours" but ended up keeping us busy for several years...

The ortholog conjecture and alternative hypotheses

The ortholog conjecture states that on average and for similar levels of sequence divergence, genes that started diverging through speciation ("orthologs") are more similar in function than genes that started diverging through duplication ("paralogs"). This is based on the idea that gene duplication is a driving force behind function innovation. Intuitively, this makes sense because the extra copy arising through duplication should provide the freedom to evolve new function. This is the conventional dogma.

Alternatively, for similar levels of sequence divergence, there might not be any particular difference between orthologs and paralogs. It is the simplest explanation (per Ockham's razor), and it also makes sense if the function of a gene is mainly determined by its protein sequence (let's just consider one product per gene). Following this hypothesis, we might expect considerable correlation between sequence and function similarity.

Thursday, May 17, 2012

Interesting report from White House: National Bioeconomy Blueprint

Been reading the "National Bioeconomy Blueprint" from the WhiteHouse.  It is is definitely worth checking out (for some background information see his blog post from the White House:  National Bioeconomy Blueprint Released | The White House and this NY Times article White House Promotes a Bioeconomy - NYTimes.com from last month).  Also check out the main page describing this document: National Bioeconomy Blueprint | The White House.

The blueprint outlines five main objectives:
  1. Support R&D investments that will provide the foundation for the future U.S. bioeconomy.
  2. Facilitate the transition of bioinventions from research lab to market, including an increased focus on translational and regulatory sciences.
  3. Develop and reform regulations to reduce barriers, increase the speed and predictability of regulatory processes, and reduce costs while protecting human and environmental health.
  4. Update training programs and align academic institution incentives with student training for national workforce needs.
  5.  Identify and support opportunities for the development of public-private partnerships and precompetitive collaborations—where competitors pool resources, knowledge, and expertise to learn from successes and failures.
And then goes through some background and recommendations to help achieve these objectives.  

Other discussion of this includes:

Something fishy with this story: bacteria in fish pedicures

Well, the title drew me in, without a doubt: Fish Pedicures: Bacteria in Your Foot Soak.

To start with _ i guess I have been out of touch as I have never heard of fish pedicures before.  Sounds lovely I must say.

Though if you are considering doing this you might be dissuaded by some of the revelations in the article including that "fish are living creatures that deposit their waste products in the very water in which people are soaking" and "the impossibility of disinfecting or sanitizing live fish."

Amazingly, fish pedicures are in fact apparently quite popular.  So popular that there are multiple investigations relating to this practice including that "British authorities investigated a reported bacterial outbreak among 6,000 Garra rufa fish " and "Last spring, British fish inspectors went to London's Heathrow Airport and intercepted Indonesian shipments of the silver, inch-long freshwater carp destined for British "fish spas."

And now - the reason for this article - there is a new report in the journal Emerging Infectious Diseases on "Zoonotic Disease Pathogens in Fish Used for Pedicure."  The article is actually somewhat fascinating and thanks to the CDC it is freely available.

Fun reading for the day ...

Tuesday, May 15, 2012

Sign of the apocalypse? Science conference SPAM hybridizes w/ Nigerian advanced fee SPAM.

Normally I do not share SPAM emails.  But I have posted occasionally about Journal SPAM and Conference SPAM.  So what do I do here.  I just received an email that appears to be a hybrid between Conference SPAM and Nigerian advanced fee SPAM.  OMG.  The merging of two SPAM systems.  Too bad the Conference is not about viagra - though since it is about Metagenomics perhaps it somehow got flagged due to studies of the penis or vagina microbiome?   In this case I just had to post ...

Dear Honored Sir or Madam,

I am Prof. Mohammed Kaoje Abubakar minister of Science for the Republic of Nigeria under former President Alhaji Umaru Musa Yar’Adua. In this role I became in control of large sums of money dedicated to scientific research and the exchange of ideas with researchers from across the globe. However, since the time of the unfortunate death of President Yar'Adua, I have been under intense scrutiny of the new director of the ministry and have been unable to complete my mission. Fortunately,  I remain in charge of most of the 200 million dollars US, but the current government will only release the funds in conjunction with scientific activities involving prominent foreign national scientists like yourself. 

Therefore, on behalf of the 3rd Nigerian Congress of Metagenomics I am pleased to welcome you to propose a speech on your recent discovery about the genomic basis for the origin and evolution of new functions at the congress by submitting your speech title and CV to us. Meanwhile, we hope you can share your stimulating data, valuable scientific information and influential experiences with other industrial leaders, professionals and research pioneers. You are encouraged to network and explore partnering opportunities. 
As a branded Conference of Nigeria Congresses LLC, "Your Think Tank", NCM continues to expand with magnificent scientific and social programs to maximize your network in a free communication meeting environment. 

l  Keynote Forum – Presentations from Nobel Prize Laureate and Senior Leaders of Renowned Company

l  Parallel Forum – 200+ Sessions and Symposiums provide 1000+ speech opportunities for experts from all of the world

l  Welcome Banquet – All the participants enjoy the formal buffet dinner with wonderful performance show

l  Project Matching Activity – Develop effective platform by free booths supply

l  Keymakers Summit – Special Forum for Enterprisers to discuss hot issues face to face

l  Exhibition and Poster Zone

3rd Nigerian Congress of Metagenomics  is initiated for filling the gap between Eastern and West World for metagenomic professionals of free information exchange. In the past decade, NCM has attracted more than 5,000 enthusiastic speakers to communicate on the R & D advances in different therapeutic fields, which have generated great impact on the Chinese Bio/pharmaceutical development, enhanced Research and Development outsourcing, helped regional liaison of big pharma seeking partnership and searching talents, created a lot of opportunities for face-to-face network for multilateral collaboration by sharing both scientific and technological breakthroughs and speed up the process of many challenging drug discovery projects.

For more information PS: http:www.ncm.ng

Warri is a major oil city in Delta State, Nigeria, with a population of over 300,000 people. We look forward to seeing you in Warri for a stimulating and enjoyable conference.     Kindest regards, 

Prof. Mohammed Kaoje Abubakar8 for the organizing committee. 

Useful comparative analysis of sequence classification systems w/ a few questionable bits

There is a useful new publication just out: BMC Bioinformatics | Abstract | A comparative evaluation of sequence classification programs by Adam L Bazinet and Michael P Cummings.  In the paper the authors attempt to do a systematic comparison of tools for classifying DNA sequences according to the taxonomy of the organism from which they come.

I have been interested in such activities since, well, since 1989 when I started working in Colleen Cavanaugh's lab at Harvard sequencing rRNA genes to do classification.  And I have known one of the authors, Michael Cummings for almost as long.

Their abstract does a decent job of summing up what they did

A fundamental problem in modern genomics is to taxonomically or functionally classify DNA sequence fragments derived from environmental sampling (i.e., metagenomics). Several different methods have been proposed for doing this effectively and efficiently, and many have been implemented in software. In addition to varying their basic algorithmic approach to classification, some methods screen sequence reads for ’barcoding genes’ like 16S rRNA, or various types of protein-coding genes. Due to the sheer number and complexity of methods, it can be difficult for a researcher to choose one that is well-suited for a particular analysis. 
We divided the very large number of programs that have been released in recent years for solving the sequence classification problem into three main categories based on the general algorithm they use to compare a query sequence against a database of sequences. We also evaluated the performance of the leading programs in each category on data sets whose taxonomic and functional composition is known. 
We found significant variability in classification accuracy, precision, and resource consumption of sequence classification programs when used to analyze various metagenomics data sets. However, we observe some general trends and patterns that will be useful to researchers who use sequence classification programs.

The three main categories of methods they identified are

  • Programs that primarily utilize sequence similarity search
  • Programs that primarily utilize sequence composition models (like CompostBin from my lab)
  • Programs that primarily utilize phylogenetic methods (like AMPHORA & STAP from my lab)
The paper has some detailed discussion and comparison of some of the methods in each category.  They even made a tree of the methods

Figure 1. Program clustering. A neighbor-joining tree
 that clusters the classification programs based on their similar attributes. From here.
In some ways - I love this figure.  Since, well, I love trees.  But in other ways I really really really do not like it.  I don't like it because they use an explicitly phylogenetic method (neighbor joining, which is designed to infer phylogenetic trees and not to simply cluster entities by their similarity) to cluster entities that do not have a phylogenetic history.  Why use neighbor-joining here?  What is the basis for using this method to cluster methods?  It is cute, sure.  But I don't get it.  What do deep branches represent in this case?  It drives me a bit crazy when people throw a method designed to represent branching history at a situation where clustering by similarity is needed.  Similarly it drives me crazy when similarity based clustering methods are used when history is needed.

Not to take away from the paper too much since this is definitely worth a read for those working on methods to classify sequences as well as for those using such methods.  They even go so far as to test various web served (e.g., MGRAST) and discuss time to get results.  They also test the methods for their precision and sensitivity.  Very useful bits of information here.

So - overall I like the paper.  But one other thing in here sits in my craw in the wrong way.  The discussion of "marker genes."  Below is some of the introductory text on the topic.  I have labelled some bits I do not like too much:

It is important to note that some supervised learning methods will only classify sequences that contain “marker genes”. Marker genes are ideally present in all organisms, and have a relatively high mutation rate that produces significant variation between species. The use of marker genes to classify organisms is commonly known as DNA barcoding. The 16S rRNA gene has been used to greatest effect for this purpose in the microbial world (green genes [6], RDP [7]). For animals, the mitochondrial COI gene is popular [8], and for plants the chloroplast genes rbcL and matK have been used [9]. Other strategies have been proposed, such as the use of protein-coding genes that are universal, occur only once per genome (as opposed to 16S rRNA genes that can vary in copy number), and are rarely horizontally transferred [10]. Marker gene databases and their constitutive multiple alignments and phylogenies are usually carefully curated, so taxonomic and functional assignments based on marker genes are likely to show gains in both accuracy and speed over methods that analyze input sequences less discriminately. However, if the sequencing was not specially targeted [11], reads that contain marker genes may only account for a small percentage of a metagenomic sample.  
I think I will just leave these highlighted sections uncommented upon and leave it to people to imagine what I don't like about them .. for now.

Anyway - again - the paper is worth checking out.  And if you want to know more about methods used for classifying sequences see this Mendeley collection which focuses on metagenomic analysis but has many additional paper on top of the ones discussed in this paper.

Monday, May 14, 2012

Interesting new paper: "Proving universal common ancestry with similar sequences"

Just discovered an interesting paper by Leonardo de Oliveira Martins and David Posada.  It is titled "Proving universal common ancestry with similar sequences."  It relates to a paper by Douglas Theobald: "A formal test of the theory of universal common ancestry. Nature 2010; 465:219-22." Although the latter paper is not openly available the more recent one is.  

The new paper is worth a look.  Not sure about the Theobald one as I do not have access from home.

Am hoping Leonardo writes more about this in his blog: Bayesian Procedures in Biology ....

Saturday, May 12, 2012

Mini post: Microbial forensics

A few months old here but there is a very interesting post from the Science Media Centre in New Zealand: Science Media Centre: Microbes in soil could help fight crime.  The post describes attempts to use microbes in soil as part of forensic activities.  This relates in many ways to my call for a "Field Guide to Microbes".

I have been interested in microbial forensics for many years since I worked at TIGR on part of the project to study anthrax genomes.  For those interested in microbe-related forensic activities I have created a Mendeley collection of references on the topic.

Oh the irony - new #OpenAccess #PLoSOne paper on Research Blogs doesn't share data behind analyses.

Interesting new paper: PLoS ONE: Research Blogs and the Discussion of Scholarly Information. All about the new world of science blogging.  Much of the context here relates to openness.  Yet as far as I can tell, the data collected that make up the meat of the analyses in the paper, are not shared.  Uggh.

Is there something I am missing here? Shouldn't a prerequisite of publishing this kind of paper be sharing the information / data used in the analyses?  Shouldn't that be released with the paper?

Definitely time to start "Open Data Watch" where people have a place to complain about lack of open availability of data behind papers (I came up with the name as a mimic of Ivan Oransky's diverse watch sites like Retraction Watch).  Originally in thinking about doing this I had been thinking about genomic data.  But I am sure this is a problem in other areas.  Consider paleontology, where openness to fossils and other samples is, well, not as common as it should be.  It is not that hard anymore to find a place to share one's data.  With places like Data Dryad and Biotorrents and FigShare and Merritt and 100s of others it is really inexcusable not to share the data behind a paper in most cases.  Certainly, in some cases there maybe privacy issues but that is not the case here (I think) and not an issue in most cases.

Come on people.  If scientific papers are to be reproducible and testable, you need to give people access to the data you used. ResearchBlogging.org Shema, H., Bar-Ilan, J., & Thelwall, M. (2012). Research Blogs and the Discussion of Scholarly Information PLoS ONE, 7 (5) DOI: 10.1371/journal.pone.0035869

Friday, May 11, 2012

'Danger and Evolution in the Twilight Zone': Guest post by Randen Patterson and Gaurav Bhardwaj

Figure 1. PHYRN concept and work flow.
'Danger and Evolution in the twilight zone'

I have been communicating with Randen Patterson on and off over the last five years or so about his efforts to try and study the evolution of gene families when the sequence similarity in the gene family is so low that making multiple sequence alignments are very difficult.  Recently, Randen moved to UC Davis so I have been talking / emailing with jim more and more about this issue.  Of note, Randen has a new paper in PLoS One about this topic: Bhardwaj G, Ko KD, Hong Y, Zhang Z, Ho NL, et al. (2012) PHYRN: A Robust Method for Phylogenetic Analysis of Highly Divergent Sequences. PLoS ONE 7(4): e34261. doi:10.1371/journal.pone.0034261.

Figure 8. Model for the Evolution of the DANGER Superfamily.

I invited Randen and the first author Gaurav Bhardwaj to do a guest post here providing some of the story behind their paper for my ongoing series on this topic.  I note - if you have published an open access paper on some topic related to this blog I would love to have a guest post from you too.   I note - I personally love the fact that they used the "DANGER" family as an example to test their method.

Here is their guest post:

A fundamental problem to phylogenetic inference in the “twilight zone” (<25% pairwise identity), let alone the “midnight zone” (<12% pairwise identity), is the inability to accurately assign evolutionary relationships at these levels of divergence with statistical confidence. This lack of resolution arises from difficulties in separating the phylogenetic signal from the random noise at these levels of divergence. This obviously and ultimately stymies all attempts to truly resolve the Tree of Life. Since most attempts at phylogenetic inferences in twilight/midnight zone have relied on MSA, and with no clear answer on the best phylogenetic methods to resolve protein families in twilight/midnight zone, we have presented rest of this blog post as two questions representative of these problems.  
Question1: Is MSA required for accurate phylogenetic inference? 
Our Opinion: MSA is an excellent tool for the inference from conserved data sets, but it has been shown by others and us, that the quality of MSA degrades rapidly in the twilight zone. Further, the quest for an optimal MSA becomes increasingly difficult with increased number of taxa under study. Although, quality of MSA methods has improved in last two decades, we have not made significant improvements towards overcoming these problems. Multiple groups have also designed alignment-free methods (see Hohl and Ragan, Syst. Biol. 2007), but so far none of these methods has been able to provide better phylogenetic accuracy than MSA+ML methods. We recently published a manuscript in PLoS One entitled “PHYRN: A Robust Method for Phylogenetic Analysis of Highly Divergent Sequences” introducing a hybrid profile-based method. Our approach focuses on measuring phylogenetic signal from homologous biological patterns (functional domains, structural folds, etc), and their subsequent amplification and encoding as phylogenetic profile. Further, we adopt a distance estimation algorithm that is alignment-free, and thus bypasses the need for an optimal MSA. Our benchmarking studies with synthetic (from ROSE and Seqgen) and biological datasets show that PHYRN outperforms other traditional methods (distance, parsimony and Maximum Liklihood), and provides significantly accurate phylogenies even in data sets exhibiting ~8% average pairwise identity. While this still needs to be evaluated in other simulations (varying tree shapes, rates, models), we are convinced that these types of methods do work and deserve further exploration. 
Question 2: How can we as a field critically and fairly evaluate phylogenetic methods? 
Our Opinion: A similar problem plagued the field of structural biology whereby there were multiple methods for structural predictions, but no clear way of standardizing or evaluating their performance.  An additional problem that applies to phylogenetic inference is that, unlike crystal structures of proteins, phylogenies do not have a corresponding “answer” that can be obtained.  Synthetic data sets have tried to answer this question to a certain extent by simulating protein evolution and providing true evolutionary histories that can be used for benchmarking.  However, these simulations cannot truly replicate biological evolution (e.g. indel distribution, translocations, biologically relevant birth-death models, etc). In our opinion, we need a CASP-like model (solution adopted by our friends in computational structural biology), where same data sets (with true evolutionary history known only to organizers) are inferred by all the research groups, and then submitted for a critical evaluation to the organizers. To convert this thought to reality, we hereby announce CAPE (Critical Assessment of Protein Evolution) for Summer 20132. We are still in pre-production stages, and we welcome any suggestions, comments and inputs about data sets, scoring and evaluating methods.   

ResearchBlogging.org Bhardwaj, G., Ko, K., Hong, Y., Zhang, Z., Ho, N., Chintapalli, S., Kline, L., Gotlin, M., Hartranft, D., Patterson, M., Dave, F., Smith, E., Holmes, E., Patterson, R., & van Rossum, D. (2012). PHYRN: A Robust Method for Phylogenetic Analysis of Highly Divergent Sequences PLoS ONE, 7 (4) DOI: 10.1371/journal.pone.0034261

Thursday, May 10, 2012

Quick post - new paper of interest on "The Infinitely Many Genes Model ..."

This paper seems of potential interest: The Infinitely Many Genes Model for the Distributed Genome of Bacteria by Franz Baumdicker, Wolfgang R. Hess, and Peter Pfaffelhuber

The distributed genome hypothesis states that the gene pool of a bacterial taxon is much more complex than that found in a single individual genome. However, the possible fitness advantage, why such genomic diversity is maintained, whether this variation is largely adaptive or neutral, and why these distinct individuals can coexist, remains poorly understood. Here, we present the infinitely many genes (IMG) model, which is a quantitative, evolutionary model for the distributed genome. It is based on a genealogy of individual genomes and the possibility of gene gain (from an unbounded reservoir of novel genes, e.g., by horizontal gene transfer from distant taxa) and gene loss, for example, by pseudogenization and deletion of genes, during reproduction. By implementing these mechanisms, the IMG model differs from existing concepts for the distributed genome, which cannot differentiate between neutral evolution and adaptation as drivers of the observed genomic diversity. Using the IMG model, we tested whether the distributed genome of 22 full genomes of picocyanobacteria (Prochlorococcus and Synechococcus) shows signs of adaptation or neutrality. We calculated the effective population size of Prochlorococcus at 1.01 × 1011 and predicted 18 distinct clades for this population, only six of which have been isolated and cultured thus far. We predicted that the Prochlorococcus pangenome contains 57,792 genes and found that the evolution of the distributed genome of Prochlorococcus was possibly neutral, whereas that of Synechococcus and the combined sample shows a clear deviation from neutrality.

Wish they had gone beyond these two cyanobacteria ... but still seems of possible interest. ResearchBlogging.org Baumdicker, F., Hess, W., & Pfaffelhuber, P. (2012). The Infinitely Many Genes Model for the Distributed Genome of Bacteria Genome Biology and Evolution, 4 (4), 443-456 DOI: 10.1093/gbe/evs016

Wednesday, May 09, 2012

Any method allowed for presentations at ASM meeting, as long as you use Powerpoint on a PC.

Just got this email from ASM linking to a message about my presentation at the upcoming ASM meeting in San Francisco.

Here is the message.


I highlight some parts that I find disappointing at best.  Basically- they say "You can do your presentation is any way.  As long as you convert it to PowerPoint for a PC."  Never mind other tools to do presentations.  Like Keynote.  Or Prezi.  Or, well, anything else.  Never mind people who use Macs.  Or Linux computers.  Or iPads.

I have NEVER had a problem doing a presentation off of my Mac or iPad.  I have had MANY problems when I have converted my Keynote or PDF files or other material to Powerpoint for a PC.

Oh, and forget about modifying your presentation in response to anything going on in the session (which I do frequently).  I try to tune my slides to the actual crowd.  No longer possible.

Maybe I should use no slides, like I did at TEDMED.  Or maybe I should do a Ross Perot and have charts.  Maybe I will bring my own projector and set it up just before my talk ...  who knows ... but I hate it when meetings say "Trust us - you won't have any problems with our system".

May 9, 2012

Dear Jonathan Eisen;
Thank you for participating as a speaker at asm2012, ASM's 112th General Meeting in San Francisco, June 16-19, 2012. As a speaker, we kindly request that you consider the following guidelines as you finalize your PowerPoint presentation in the session listed below and also take note of some of the new requirements and changes for asm2012.

Session Details
Session Date/Time: 6/17/2012 3:00:00 PM - 6/17/2012 5:30:00 PM
Session Title: The Great Indoors: Recent Advances in the Ecology of Built Environments
Presentation Title: microBEnet: The Microbiology of the Built Environment Network (If your presentation title is not listed or incorrect, please provide this information to xxxxx immediately.
Length of your Personal Presentation: You are allotted 30 minutes for your presentation or lecture unless otherwise notified by the convener of this session.

Scholarly Kitchen - getting more and more rotten as the days go by

In February I wrote about how something smelled funny with the connection between "The Scholarly Kitchen" blog and the Heartland Institute: The Tree of Life: Something rotten in the Scholarly Kitchen? (Climate Change Denialism is Everywhere)

Well, though I thought the Heartland Institute was a bit extreme from the previous "work" it seems they have gone even more off the deep end recently with their ad campaign featuring Charles Manson Ted Kaczynski

This least effort has led to an even further reduction in support for the folks at Heartland. See for example:
And more.  

So - amazingly - as Heartland dips more and more into extremism - I have seen no sign from anyone at the Scholarly Kitchen of any concern that one of their co-bloggers - David Wojick - also happens to work for the Heartland Institute:

What does this say about TSK?  Not sure.  But it continues to smell funny to me.  Wojick is using his position at TSK to make him seem like an academic.  Heartland is using his seeming academic status to promote their ideas, which get more and more extreme by the day apparently.  As even very conservative groups disavow themselves of any affiliation with Heartland, pulling money, and other kinds of support, I still have yet to see any public comments from TSK folks about whether they think Wojick is using his role in their blog to indirectly promote extreme ideas ...

Monday, May 07, 2012

Fun day at Capay Organic Farm (site of Farm Fresh to You) with DCCNS

My son goes to DCCNS preschool. They had a fundraiser yesterday at Capay Organic Farm in Capay (this is the place that the Farm Fresh to You folks have their farm). It was a nice time. Party started at 2 pm, they had tractor rides, kite flying, ladybug releasing, strawberry picking, and more. And then there was a raffle (the fundraiser part) and tomato planting to take home and a piñata. The farm was beautiful, the people there were great, and I am glad we get food from there (we are subscribers to their CSA). It was good for all the kids to see where some food comes from ... Here are some pics.

Friday, May 04, 2012

How complete do microbial genomes have to be for metabolic predictions (to be useful)?

OK.  Got a question for the blogo-twitto-webosphere.

In this day an age of rapid shotgun sequencing of genomes, many people are moving away from finishing the genomes.  As some may know, I was a co-author many years ago on a paper arguing for the need to finish (rather than just shotgun sequence) microbial genomes for many scientific questions.  But as sequencing costs continued to plummet, the relative cost of shotgun sequencing genomes kept going down while the cost of finishing genomes did not change much.  So. two years ago I posted to my blog a question regarding this: "Wanted:Feedback on Importance of Finishing (Microbial) Genomes" and got a lot of useful feedback.  And eventually, I came around to the argument that finishing was unnecessary for many purposes (and even got some props for admitting I was wrong).

Well, I am back.  Now I am arguing to some colleagues that if we want to make metabolic pathway predictions and metabolic models for genomes, we probably don't need finished genomes.  But alas, I have no evidence to back that up.  And in fact, I am not really sure anyway.  So I am asking everyone and anyone out there ... does anyone have data/evidence/opinions about whether there will be much difference in metabolic predictions one would make for an organism based upon a complete genome vs. a shotgun assembly of a genome generated by Illumina sequencing?


Thursday, May 03, 2012

Draft of a Proposal for a UC #OpenAccess policy - comments wanted

Just got sent this email
Dear Colleagues, 
On behalf of the Academic Senate Library Committee (ASLC), I am asking for your comments on the attached proposed Open Access Publishing Policy for the University of California.. All faculty, including Academic Federation members are invited to post their comments on the Academic Senate web-forum site at http://academicsenate.ucdavis.edu/Forums/index.cfm?Forum_ID=67. Please go to this site to submit your feedback. 
Briefly, the issue is this: the faculty of the University of California, in conjunction with the University Committee on Libraries and Scholarly Communication (UCOLASC), is proposing a new OPEN ACCESS PUBLISHING POLICY that will apply to the dissemination of all scholarly work. UCOLASC is seeking feedback from all campuses on this issue in order to inform a final version of the policy which will be presented to the Universitywide Academic Senate sometime this calendar year. 
The ASLC would appreciate your comments by Wednesday, May 9, 2012. Your ideas will then be shared with UCOLASC in time for its May 25th meeting. The web-forum will remain open substantially past May 9, and we will endeavor to include as many comments up to May 25 as possible. 
Brian H. Kolner 
Academic Senate Library Committee

The relate to a draft of a proposal for a new Open Access Publishing Policy being circulated at the University of California. The draft of the proposal can be found here.

UC Davis (and I presume other UCs) are now soliciting comments on the proposal. I would love to here / read comments from anyone. Personally, I think the policy is way to weak as it allows exceptions to be granted ...

Social Networks and Scientists: Chronicle for Higher Education Article

Quick post here.

There is a new article in the Chronicle for Higher Education in which I am quoted: Social Networks for Academics Proliferate, Despite Some Scholars' Doubts

The article discusses many connected topics relating to the use of social media by scientists - though it does not make clear how everything is connected perhaps.  Anyway the author talked to me about Mendeley and various uses of Mendeley and I told her about an effort to create a Mendeley collection of my father's papers.  The article also discussed LinkedIn, Academia.Edu, Twitter and other social media systems.

Some quotes

Jonathan A. Eisen, a professor of medical microbiology and immunology at the University of California at Davis, used Mendeley to distribute the research papers that his father, Howard J. Eisen, a researcher at the National Institutes of Health, published before he died, in 1987. After struggling to free papers locked behind pay walls, Jonathan Eisen compiled the articles and posted nearly all of them on a Mendeley page he had created for his father. 
Mr. Eisen, a self-described "obsessed open-access advocate," described the impact in a blog post last year: "Thanks to the social features of Mendeley, more and more people will see and have access to those papers, thus ensuring that they do not wallow in never-never land but continue to have some potential impact on science and society."

Perhaps most important from my point of view - I love the picture of me taken by Max Whittaker.

Summary of responses to question about metrics for density in phylogenetic trees

I posted a question to Twitter and Facebook about metrics for assessing density in a phylogenetic tree. Here is a "Storification" of the responses. Thanks for the help all.
Any other suggestions welcome in comments ...

Most recent post

My Ode to Yolo Bypass

Gave my 1st ever talk about Yolo Bypass and my 1st ever talk about Nature Photography. Here it is ...