Wu M, Sun LV, Vamathevan J, Riegler M, Deboy R, et al. (2004) Phylogenomics of the Reproductive Parasite Wolbachia pipientis wMel: A Streamlined Genome Overrun by Mobile Genetic Elements. PLoS Biol 2(3): e69 doi:10.1371/journal.pbio.0020069
Phylogenomics of the Reproductive Parasite Wolbachia pipientis wMel: A Streamlined Genome Overrun by Mobile Genetic ElementsMartin Wu1, Ling V. Sun2, Jessica Vamathevan1, Markus Riegler3, Robert Deboy1, Jeremy C. Brownlie3, Elizabeth A. McGraw3, William Martin4, Christian Esser4, Nahal Ahmadinejad4, Christian Wiegand4, Ramana Madupu1, Maureen J. Beanan1, Lauren M. Brinkac1, Sean C. Daugherty1, A. Scott Durkin1, James F. Kolonay1, William C. Nelson1, Yasmin Mohamoud1, Perris Lee1, Kristi Berry1, M. Brook Young1, Teresa Utterback1, Janice Weidman1, William C. Nierman1, Ian T. Paulsen1, Karen E. Nelson1, Hervé Tettelin1, Scott L. O'Neill2,3, Jonathan A. Eisen1*
1 The Institute for Genomic Research, Rockville, Maryland, United States of America, 2 Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut, United States of America, 3 Department of Zoology and Entomology, School of Life Sciences, The University of Queensland, St Lucia, Queensland, Australia, 4 Institut für Botanik III, Heinrich-Heine Universität, Düsseldorf, Germany
The complete sequence of the 1,267,782 bp genome of Wolbachia pipientis wMel, an obligate intracellular bacteria of Drosophila melanogaster, has been determined. Wolbachia, which are found in a variety of invertebrate species, are of great interest due to their diverse interactions with different hosts, which range from many forms of reproductive parasitism to mutualistic symbioses. Analysis of the wMel genome, in particular phylogenomic comparisons with other intracellular bacteria, has revealed many insights into the biology and evolution of wMel and Wolbachia in general. For example, the wMel genome is unique among sequenced obligate intracellular species in both being highly streamlined and containing very high levels of repetitive DNA and mobile DNA elements. This observation, coupled with multiple evolutionary reconstructions, suggests that natural selection is somewhat inefficient in wMel, most likely owing to the occurrence of repeated population bottlenecks. Genome analysis predicts many metabolic differences with the closely related Rickettsia species, including the presence of intact glycolysis and purine synthesis, which may compensate for an inability to obtain ATP directly from its host, as Rickettsia can. Other discoveries include the apparent inability of wMel to synthesize lipopolysaccharide and the presence of the most genes encoding proteins with ankyrin repeat domains of any prokaryotic genome yet sequenced. Despite the ability of wMel to infect the germline of its host, we find no evidence for either recent lateral gene transfer between wMel and D. melanogaster or older transfers between Wolbachia and any host. Evolutionary analysis further supports the hypothesis that mitochondria share a common ancestor with the α-Proteobacteria, but shows little support for the grouping of mitochondria with species in the order Rickettsiales. With the availability of the complete genomes of both species and excellent genetic tools for the host, the wMel–D. melanogaster symbiosis is now an ideal system for studying the biology and evolution of Wolbachia infections.
Citation: Wu M, Sun LV, Vamathevan J, Riegler M, Deboy R, et al. (2004) Phylogenomics of the Reproductive Parasite Wolbachia pipientis wMel: A Streamlined Genome Overrun by Mobile Genetic Elements. PLoS Biol 2(3): e69 doi:10.1371/journal.pbio.0020069
Received: November 19, 2003; Accepted: January 6, 2004; Published: March 16, 2004
Copyright: © 2004 Wu et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abbreviations: CDS, coding sequence; ENc, effective number of codons; IS, insertion sequence; LPS, lipopolysaccharide; RT, reverse transcription; TIGR, The Institute for Genomic Research
* To whom correspondence should be addressed. E-mail: email@example.com
IntroductionWolbachia are intracellular gram-negative bacteria that are found in association with a variety of invertebrate species, including insects, mites, spiders, terrestrial crustaceans, and nematodes. Wolbachia are transovarialy transmitted from females to their offspring and are extremely widespread, having been found to infect 20%–75% of invertebrate species sampled (Jeyaprakash and Hoy 2000; Werren and Windsor 2000). Wolbachia are members of the Rickettsiales order of the α-subdivision of the Proteobacteria phyla and belong to the Anaplasmataceae family, with members of the genera Anaplasma, Ehrlichia, Cowdria, and Neorickettsia (Dumler et al. 2001). Six major clades (A–F) of Wolbachia have been identified to date (Lo et al. 2002): A, B, E, and F have been reported from insects, arachnids, and crustaceans; C and D from filarial nematodes.
Figure 1. Circular Map of the Genome and Genome FeaturesCircles correspond to the following: (1) forward strand genes; (2) reverse strand genes, (3) in red, genes with likely orthologs in both R. conorii and R. prowazekii; in blue, genes with likely orthologs in R. prowazekii, but absent from R. conorii; in green, genes with likely orthologs in R. conorii but absent from R. prowazekii; in yellow, genes without orthologs in either Rickettsia (Table S3); (4) plot is of χ2 analysis of nucleotide composition; phage regions are in pink; (5) plot of GC skew (G–C)/(G+C); (6) repeats over 200 bp in length, colored by category; (7) in green, transfer RNAs; (8) in blue, ribosomal RNAs; in red, structural RNA.
Wolbachia have been hypothesized to play a role in host speciation through the reproductive isolation they generate in infected hosts (Werren 1998). They also provide an intriguing array of evolutionary solutions to the genetic conflict that arises from their uniparental inheritance. These solutions represent alternatives to classical mutualism and are often of more benefit to the symbiont than the host that is infected (Werren and O'Neill 1997). From an applied perspective, it has been proposed that Wolbachia could be utilized to either suppress pest insect populations or sweep desirable traits into pest populations (e.g., the inability to transmit disease-causing pathogens) (Sinkins and O'Neill 2000). Moreover, they may provide a new approach to the control of human and animal filariasis. Since the nematode worms that cause filariasis have an obligate symbiosis with mutualistic Wolbachia, treatment of filariasis with simple antibiotics that target Wolbachia has been shown to eliminate microfilaria production as well as ultimately killing the adult worm (Taylor et al. 2000; Taylor and Hoerauf 2001).
Despite their common occurrence and major effects on host biology, little is currently known about the molecular mechanisms that mediate the interactions between Wolbachia and their invertebrate hosts. This is partly due to the difficulty of working with an obligate intracellular organism that is difficult to culture and hard to obtain in quantity. Here we report the completion and analysis of the genome sequence of Wolbachia pipientis wMel, a strain from the A supergroup that naturally infects Drosophila melanogaster (Zhou et al. 1998).
Table 1. wMel Genome Features
Genome PropertiesThe wMel genome is determined to be a single circular molecule of 1,267,782 bp with a G+C content of 35.2%. This assembly is very similar to the genetic and physical map of the closely related strain wMelPop (Sun et al., 2003). The genome does not exhibit the GC skew pattern typical of some prokaryotic genomes (Figure 1) that have two major shifts, one near the origin and one near the terminus of replication. Therefore, identification of a putative origin of replication and the assignment of basepair 1 were based on the location of the dnaA gene. Major features of the genome and of the annotation are summarized in Table 1 and Figure 1.
Repetitive and Mobile DNAThe most striking feature of the wMel genome is the presence of very large amounts of repetitive DNA and DNA corresponding to mobile genetic elements, which is unique for an intracellular species. In total, 714 repeats of greater than 50 bp in length, which can be divided into 158 distinct families (Table S1), were identified. Most of the repeats are present in only two copies in the genome, although 39 are present in three or more copies, with the most abundant repeat being found in 89 copies. We focused our analysis on the 138 repeats of greater than 200 bp (Table 2). These were divided into 19 families based upon sequence similarity to each other. These repeats were found to make up 14.2 % of the wMel genome. Of these repeat families, 15 correspond to likely mobile elements, including seven types of insertion sequence (IS) elements, four likely retrotransposons, and four families without detectible similarity to known elements but with many hallmarks of mobile elements (flanked by inverted repeats, present in multiple copies) (Table 2). One of these new elements (repeat family 8) is present in 45 copies in the genome. It is likely that many of these elements are not able to autonomously transpose since many of the transposase genes are apparently inactivated by mutations or the insertion of other transposons (Table S2). However, some are apparently recently active since there are transposons inserted into at least nine genes (Table S2), and the copy number of some repeats appears to be variable between Wolbachia strains (M. Riegler et al., personal communication). Thus, many of these repetitive elements may be useful markers for strain discrimination. In addition, the mobile elements likely contribute to generating the diversity of phenotypically distinct Wolbachia strains (e.g., mod− strains [McGraw et al. 2001]) by altering or disrupting gene function (Table S2).
Table 2. wMel DNA Repeats of Greater than 200 bp
Genome Structure: Rearrangements, Duplications, and DeletionsThe irregular pattern of GC skew in wMel is likely due in part to intragenomic rearrangements associated with the many DNA repeat elements. Comparison with a large contig from a Wolbachia species that infects Brugia malayi is consistent with this (Ware et al. 2002) (Figure 3). While only translocations are seen in this plot, genetic comparisons reveal that inversions also occur between strains (Sun et al., 2003), which is consistent with previous studies of prokaryotic genomes that have found that the most common large-scale rearrangements are inversions that are symmetric around the origin of DNA replication (Eisen et al. 2000). The occurrence of frequent rearrangement events during Wolbachia evolution is supported by the absence of any large-scale conserved gene order with Rickettsia genomes. The rearrangements in Wolbachia likely correspond with the introduction and massive expansion of the repeat element families that could serve as sites for intragenomic recombination, as has been shown to occur for some other bacterial species (Parkhill et al. 2003). The rearrangements in wMel may have fitness consequences since several classes of genes often found in clusters are generally scattered throughout the wMel genome (e.g., ABC transporter subunits, Sec secretion genes, rRNA genes, F-type ATPase genes).
Figure 2. Phage Alignments and Neighboring GenesConserved gene order between the WO phage in Wolbachia sp. wKue and prophage regions of wMel. Putative proteins in wKue (Masui et al. 2001) were searched using TBLASTN against the wMel genome. Matches with an E-value of less than 1e−15 are linked by connecting lines. CDSs are colored as follows: brown, phage structural or replication genes; light blue, conserved hypotheticals; red, hypotheticals; magenta, transposases or reverse transcriptases; blue, ankyrin repeat genes; light gray, radC; light green, paralogous genes; gold, others. The regions surrounding the phage are shown because they have some unusual features relative to the rest of the genome. For example, WO-A and WO-B are each flanked on one side by clusters of genes in two paralogous families that are distantly related to phage repressors. In each of these clusters, a homolog of the radC gene is found. A third radC homolog (WD1093) in the genome is also flanked by a member of one of these gene families (WD1095). While the connection between radC and the phage is unclear, the multiple copies of the radC gene and the members of these paralogous families may have contributed to the phage rearrangements described above.
Figure 3. Alignment of wMel with a 60 kbp Region of the Wolbachia from B. malayiThe figure shows BLASTN matches (green) and whole-proteome alignments (red) that were generated using the “promer” option of the MUMmer software (Delcher et al. 1999). The B. malayi region is from a BAC clone (Ware et al. 2002). Note the regions of alignment broken up by many rearrangements and the presence of repetitive sequences at the regions of the breaks.
One duplication of particular interest is that of wsp, which is a standard gene for strain identification and phylogenetic reconstruction in Wolbachia (Zhou et al. 1998). In addition to the previously described wsp (WD0159), wMel encodes two wsp paralogs (WD0009 and WD0489), which we designate as wspB and wspC, respectively. While these paralogs are highly divergent from wsp (protein identities of 19.7% and 23.5%, respectively) and do not amplify using the standard wsp PCR primers (Braig et al. 1998; Zhou et al. 1998), their presence could lead to some confusion in classification and identification of Wolbachia strains. This has apparently occurred in one study of Wolbachia strain wKueYO, for which the reported wsp gene (gbAB045235) is actually an ortholog of wspB (99.8% sequence identity and located at the end of the virB operon [Masui et al. 2000]) and not an ortholog of the wsp gene. Considering that the wsp gene has been extremely informative for discriminating between strains of Wolbachia, we designed PCR primers to the wMel wspB gene to amplify and then sequence the orthologs from the related wRi and wAlbB Wolbachia strains from Drosophila simulans and Aedes albopictus, respectively, as well as the Wolbachia strain that infects the filarial nematode Dirofilaria immitis to determine the potential utility of this locus for strain discrimination. A comparison of genetic distances between the wsp and wspB genes for these different taxa indicates that overall the wspB gene appears to be evolving at a faster rate than wsp and, as such, may be a useful additional marker for discriminating between closely related Wolbachia strains (Table S5).
Inefficiency of Selection in wMelThe fraction of the genome that is repetitive DNA and the fraction that corresponds to mobile genetic elements are among the highest for any prokaryotic genome. This is particularly striking compared to the genomes of other obligate intracellular species such as Buchnera, Rickettsia, Chlamydia, and Wigglesworthia, that all have very low levels of repetitive DNA and mobile elements. The recently sequenced genomes of the intracellular pathogen Coxiella burnetti (Seshadri et al. 2003) has both a streamlined genome and moderate amounts of repetitive DNA, although much less than wMel. The paucity of repetitive DNA in these and other intracellular species is thought to be due to a combination of lack of exposure to other species, thereby limiting introduction of mobile elements, and genome streamlining (Mira et al. 2001; Moran and Mira 2001; Frank et al. 2002). We examined the wMel genome to try to understand the origin of the repetitive and mobile DNA and to explain why such repetitive/mobile DNA is present in wMel, but not other streamlined intracellular species.
We propose that the mobile DNA in wMel was acquired some time after the separation of the Wolbachia and Rickettsia lineages but before the radiation of the Wolbachia group. The acquisition of these elements after the separation of the Wolbachia and Rickettsia lineages is suggested by the fact that most do not have any obvious homologous sequences in the genomes of other α-Proteobacteria, including the closely related Rickettsia spp. Additional evidence for some acqui-sition of foreign DNA after the Wolbachia–Rickettsia split comes from phylogenetic analysis of those genes present in wMel, but not in the two sequenced rickettsial genomes (see Table S3; unpublished data). The acquisition prior to the radiation of Wolbachia is suggested by two lines of evidence. First, many of the elements are found in the genome of the distantly related Wolbachia of the nematode B. malayi (see Figure 3; unpublished data). In addition, genome analysis reveals that these elements do not have significantly anomalous nucleotide composition or codon usage compared to the rest of the genome. In fact, there are only four regions of the genome with significantly anomalous composition, comprising in total only approximately 17 kbp of DNA (Table 3). The lack of anomalous composition suggests either that any foreign DNA in wMel was acquired long enough ago to allow it to “ameliorate” and become compositionally similar to endogenous Wolbachia DNA (Lawrence and Ochman 1997, 1998) or that any foreign DNA that is present was acquired from organisms with similar composition to endogenous wMel genes. Owing to their potential effects on genome evolution (insertional mutagenesis, catalyzing genome rearrangements), we propose that the acquisition and maintenance of these repetitive and mobile elements by wMel have played a key role in shaping the evolution of Wolbachia.
Table 3. Regions of Anomalous Nucleotide Composition in the wMel Genome
Figure 4. Long Evolutionary Branches in wMelMaximum-likelihood phylogenetic tree constructed on concatenated protein sequences of 285 orthologs shared among wMel, R. prowazekii, R. conorii, C. crescentus, and E. coli. The location of the most recent common ancestor of the α-Proteobacteria (Caulobacter, Rickettsia, Wolbachia) is defined by the outgroup E. coli. The unit of branch length is the number of changes per amino acid. Overall, the amino acid substitution rate in the wMel lineage is about 63% higher than that of C. crescentus, a free-living α-Proteobacteria. wMel has evolved at a slightly higher rate than the Rickettssia spp., close relatives that are also obligate intracellular bacteria that have undergone accelerated evolution themselves. This higher rate is likely in part to be due to an increase in the rate of slightly deleterious mutations, although we have not ruled out the possibility of G+C content effects on the branch lengths.
Another possible explanation for inefficient selection is high mutation rates. It has been suggested that the higher evolutionary rates in intracellular bacteria are the result of high mutation rates that are in turn due to the loss of genes for DNA repair processes (e.g., Itoh et al. 2002). This is likely not the case in wMel since its genome encodes proteins corresponding to a broad suite of DNA repair pathways including mismatch repair, nucleotide excision repair, base excision repair, and homologous recombination (Table S6). The only noteworthy DNA repair gene absent from wMel and present in the more slowly evolving Rickettsia is mfd, which is involved in targeting DNA repair to the transcribed strand of actively transcribing genes in other species (Selby et al. 1991). However, this absence is unlikely to contribute significantly to the increased evolutionary rate in wMel, since defects in mfd do not lead to large increases in mutation rates in other species (Witkin 1994). The presence of mismatch repair genes (homologs of mutS and mutL) in wMel is particularly relevant since this pathway is one of the key steps in regulating mutation rates in other species. In fact, wMel is the first bacterial species to be found with two mutL homologs. Overall, examination of the predicted DNA repair capabilities of bacteria (Eisen and Hanawalt 1999) suggests that the connection between evolutionary rates in intracellular species and the loss of DNA repair processes is spurious. While many intracellular species have lost DNA repair genes in their recent evolution, different species have lost different genes and some, such as wMel and Buchnera spp., have kept the genes that likely regulate mutation rates. In addition, some free-living species without high evolutionary rates have lost some of the same pathways lost in intracellular species, while many free-living species have lost key pathways resulting in high mutation rates (e.g., Helicobacter pylori has apparently lost mismatch repair [Eisen 1997, Eisen 1998b; Bjorkholm et al. 2001]). Given that intracellular species tend to have small genomes and have lost genes from every type of biological process, it is not surprising that many of them have lost DNA repair genes as well.
We believe that the most likely explanations for the inefficiency of selection in wMel involve population-size related factors, such as genetic drift and the occurrence of population bottlenecks. Such factors have also been shown to likely explain the high evolutionary rates in other intracellular species (Moran 1996; Moran and Mira 2001; van Ham et al. 2003). Wolbachia likely experience frequent population bottlenecks both during transovarial transmission (Boyle et al. 1993) and during cytoplasmic incompatibility mediated sweeps through host populations. The extent of these bottlenecks may be greater than in other intracellular bacteria, which would explain why wMel has both more repetitive and mobile DNA than other such species and a higher evolutionary rate than even the related Rickettsia spp. Additional genome sequences from other Wolbachia will reveal whether this is a feature of all Wolbachia or only certain strains.
Mitochondrial EvolutionThere is a general consensus in the evolutionary biology literature that the mitochondria evolved from bacteria in the α-subgroup of the Proteobacteria phyla (e.g., Lang et al. 1999). Analysis of complete mitochondrial and bacterial genomes has very strongly supported this hypothesis (Andersson et al. 1998, 2003; Muller and Martin 1999; Ogata et al. 2001). However, the exact position of the mitochondria within the α-Proteobacteria is still debated. Many studies have placed them in or near the Rickettsiales order (Viale and Arakaki 1994; Gupta 1995; Sicheritz-Ponten et al. 1998; Lang et al. 1999; Bazinet and Rollins 2003). Some studies have further suggested that mitochondria are a sister taxa to the Rickettsia genus within the Rickettsiaceae family and thus more closely related to Rickettsia spp. than to species in the Anaplasmataceae family such as Wolbachia (Karlin and Brocchieri 2000; Emelyanov 2001a, 2001b, 2003a, 2003b).
In our analysis of complete genomes, including that of wMel, the first non-Rickettsia member of the Rickettsiales order to have its genome completed, we find support for a grouping of Wolbachia and Rickettsia to the exclusion of the mitochondria, but not for placing the mitochondria within the Rickettsiales order (Figure 5A and 5B; Table S7; Table S8). Specifically, phylogenetic trees of a concatenated alignment of 32 proteins show strong support with all methods (see Table S7) for common branching of: (i) mitochondria, (ii) Rickettsia with Wolbachia, (iii) the free-living α-Proteobacteria, and (iv) mitochondria within α-Proteobacteria. Since amino acid content bias was very severe in these datasets, protein LogDet analyses, which can correct for the bias, were also performed. In LogDet analyses of the concatenated protein alignment, both including and excluding highly biased positions, mitochondria usually branched basal to the Wolbachia–Rickettsia clade, but never specifically with Rickettsia (see Table S7). In addition, in phylogenetic studies of individual genes, there was no consistent phylogenetic position of mitochondrial proteins with any particular species or group within the α-Proteobacteria (see Table S8), although support for a specific branch uniting the two Rickettsia species with Wolbachia was quite strong. Eight of the proteins from mitochondrial genomes (YejW, SecY, Rps8, Rps2, Rps10, RpoA, Rpl15, Rpl32) do not even branch within the α-Proteobacteria, although these genes almost certainly were encoded in the ancestral mitochondrial genome (Lang et al. 1997).
This analysis of mitochondrial and α-Proteobacterial genes reinforces the view that ancient protein phylogenies are inherently prone to error, most likely because current models of phylogenetic inference do not accurately reflect the true evolutionary processes underlying the differences observed in contemporary amino acid sequences (Penny et al. 2001). These conflicting results regarding the precise position of mitochondria within the α-Proteobacteria can be seen in the high amount of networking in the Neighbor-Net graph of the analyses of the concatenated alignment shown in Figure 5. An important complication in studies of mitochondrial evolution lies in identifying “α-Proteobacterial” genes for comparison (Martin 1999). For example, in our analyses, proteins from Magnetococcus branched with other α-Proteobacterial homologs in only 17 of the 49 proteins studied, and in five cases they assumed a position basal to α-, β-, and γ-Proteobacterial homologs.
Host–Symbiont Gene TransfersMany genes that were once encoded in mitochondrial genomes have been transferred into the host nuclear genomes. Searching for such genes has been complicated by the fact that many of the transfer events happened early in eukaryotic evolution and that there are frequently extreme amino acid and nucleotide composition biases in mitochondrial genomes (see above). We used the wMel genome to search for additional possible mitochondrial-derived genes in eukaryotic nuclear genomes. Specifically, we constructed phylogenetic trees for wMel genes that are not in either Rickettsia genomes. Five new eukaryotic genes of possible mitochondrial origin were identified: three genes involved in de novo nucleotide biosynthesis (purD, purM, pyrD) and two conserved hypothetical proteins (WD1005, WD0724). The α-Proteobacterial origin of these genes suggests that at least some of the genes of the de novo nucleotide synthesis pathway in eukaryotes might have been laterally acquired from bacteria via the mitochondria. The presence of such genes in other Proteobacteria suggests that their absence from Rickettsia is due to gene loss (Gray et al. 2001). This finding supports the need for additional α-Proteobacterial genomes to identify mitochondrion-derived genes in eukaryotes.
Figure 5. Mitochondrial Evolution Using Concatenated AlignmentsNetworks of protein LogDet distances for an alignment of 32 proteins constructed with Neighbor-Net (Bryant and Moulton 2003). The scale bar indicates 0.1 substitutions per site. Enlargements at lower right show the component of shared similarity between mitochondrial-encoded proteins and (i) their homologs from intracellular endosymbionts (red) as well as (ii) their homologs from free-living α-Proteobacteria (blue). (A) Result using 6,776 gap-free sites per genome (heavily biased in amino acid composition). (B) Result using 3,100 sites after exclusion of highly variable positions (data not biased in amino acid composition at p = 0.95). All data and alignments are available upon request. Results of phylogenetic analyses are summa-rized in Table S7. Since amino acid content bias was very severe in these datasets, protein LogDet analyses were also preformed. In neighbor-joining, parsimony, and maximum-likelihood trees generated from alignments both including and excluding highly biased positions (6,776 and 3,100 gap-free amino acid sites per genome, respectively), mitochondria usually branched basal to the Wolbachia–Rickettsia clade, but never specifically with Rickettsia (Table S7).
Metabolism and TransportwMel is predicted to have very limited capabilities for membrane transport, for substrate utilization, and for the biosynthesis of metabolic intermediates (Figure S3), similar to what has been seen in other intracellular symbionts and pathogens (Paulsen et al. 2000). Almost all of the identifiable uptake systems for organic nutrients in wMel are for amino acids, including predicted transporters for proline, asparate/glutamate, and alanine. This pattern of transporters, coupled with the presence of pathways for the metabolism of the amino acids cysteine, glutamate, glutamine, proline, serine, and threonine, suggests that wMel may obtain much of its energy from amino acids. These amino acids could also serve as material for the production of other amino acids. In contrast, carbohydrate metabolism in wMel appears to be limited. The only pathways that appear to be complete are the tricarboxylic acid cycle, the nonoxidative pentose phosphate pathway, and glycolysis, starting with fructose-1,6-biphosphate. The limited carbohydrate metabolism is consistent with the presence of only one sugar phosphate transporter. wMel can also apparently transport a range of inorganic ions, although two of these systems, for potassium uptake and sodium ion/proton exchange, are frameshifted. In the latter case, two other sodium ion/proton exchangers may be able to compensate for this defect.
Many of the predicted metabolic properties of wMel, such as the focus on amino acid transport and the presence of limited carbohydrate metabolism, are similar to those found in Rickettsia. A major difference with the Rickettsia spp. is the absence of the ADP–ATP exchanger protein in wMel. In Rickettsia this protein is used to import ATP from the host, thus allowing these species to be direct energy scavengers (Andersson et al. 1998). This likely explains the presence of glycolysis in wMel but not Rickettsia. An inability to obtain ATP from its host also helps explain the presence of pathways for the synthesis of the purines AMP, IMP, XMP, and GMP in wMel but not Rickettsia. Other pathways present in wMel but not Rickettsia include threonine degradation (described above), riboflavin biosynthesis, pyrimidine metabolism (i.e., from PRPP to UMP), and chelated iron uptake (using a single ABC transporter). The two Rickettsia species have a relatively large complement of predicted transporters for osmoprotectants, such as proline and glycine betaine, whereas wMel possesses only two of these systems.
Regulatory ResponsesThe wMel genome is predicted to encode few proteins for regulatory responses. Three genes encoding two-component system subunits are present: two sensor histidine kinases (WD1216 and WD1284) and one response regulator (WD0221). Only six strong candidates for transcription regulators were identified: a homolog of arginine repressors (WD0453), two members of the TenA family of transcription activator proteins (WD0139 and WD0140), a homolog of ctrA, a transcription regulator for two component systems in other α-Proteobacteria (WD0732), and two σ factors (RpoH/WD1064 and RpoD/WD1298). There are also seven members of one paralogous family of proteins that are distantly related to phage repressors (see above), although if they have any role in transcription, it is likely only for phage genes. Such a limited repertoire of regulatory systems has also been reported in other endosymbionts and has been explained by the apparent highly predictable and stable environment in which these species live (Andersson et al. 1998; Read et al. 2000; Shigenobu et al. 2000; Moran and Mira 2001; Akman et al. 2002; Seshadri et al. 2003).
Host–Symbiont InteractionsThe mechanisms by which Wolbachia infect host cells and by which they cause the diverse phenotypic effects on host reproduction and fitness are poorly understood, and the wMel genome helps identify potential contributing factors. A complete Type IV secretion system, portions of which have been reported in earlier studies, is present. The complete genome sequence shows that in addition to the five vir genes previously described from Wolbachia wKueYO (Masui et al. 2001), an additional four are present in wMel. Of the nine wMel vir ORFs, eight are arranged into two separate operons. Similar to the single operon identified in wTai and wKueYO, the wMel virB8, virB9, virB10, virB11, and virD4 CDSs are adjacent to wspB, forming a 7 kb operon (WD0004–WD0009). The second operon contains virB3, virB4, and virB6 as well as four additional non-vir CDSs, including three putative membrane-spanning proteins, that form part of a 15.7 kb operon (WD0859–WD0853). Examination of the Rickettsia conorii genome shows a similar orga-nization (Figure 6A). The observed conserved gene order for these genes between these two genomes suggests that the putative membrane-spanning proteins could form a novel and, possibly, integral part of a functioning Type IV secretion system within these bacteria. Moreover, reverse transcription (RT)-PCRs have confirmed that wspB and WD0853–WD0856 are each expressed as part of the two vir operons and further indicate that these additional encoded proteins are novel components of the Wolbachia Type IV secretion system (Figure 6B).
In addition to the two major vir clusters, a paralog of virB8 (WD0817) is also present in the wMel genome. WD0818 is quite divergent from virB8 and, as such, does not appear to have resulted from a recent gene duplication event. RT-PCR experiments have failed to show expression of this CDS in wMel-infected Drosophila (data not shown). PCR primers were designed to all CDSs of the wMel Type IV secretion system and used to successfully amplify orthologs from the divergent Wolbachia strains wRi and wAlbB (data not shown). We were able to detect orthologs to all of the wMel Type IV secretion system components as well as most of the adjacent non-vir CDSs, suggesting that this system is conserved across a range of A- and B-group Wolbachia. An increasing body of evidence has highlighted the importance of Type IV secretion systems for the successful infection, invasion, and persistence of intracellular bacteria within their hosts (Christie 2001; Sexton and Vogel 2002). It is likely that the Type IV system in Wolbachia plays a role in the establishment and maintenance of infection and possibly in the generation of reproductive phenotypes.
Genes involved in pathogenicity in bacteria have been found to be frequently associated with regions of anomalous nucleotide composition, possibly owing to transfer from other species or insertion into the genome from plasmids or phage. In the four such regions in wMel (see above; see Table 3), some additional candidates for pathogenicity-related activities are present including a putative penicillin-binding protein (WD0719), genes predicted to be involved in cell wall synthesis (WD0095–WD0098, including D-alanine-D-alanine ligase, a putative FtsQ, and D-alanyl-D-alanine carboxy peptidase) and a multidrug resistance protein (WD0099). In addition, we have identified a cluster of genes in one of the phage regions that may also have some role in host–symbiont interactions. This cluster (WD0611–WD0621) is embedded within the WO-B phage region of the genome (see Figure 2) and contains many genes that encode proteins with putative roles in the synthesis and degradation of surface polysaccharides, including a UDP-glucose 6-dehydrogenase (WD0620). Since this cluster appears to be normal in terms of phylogeny relative to other genes in the genome (i.e., the genes in this region have normal wMel nucleotide composition and branch in phylogenetic trees with genes from other α-Proteobacteria), it is not likely to have been acquired from other species. However, it is possible that these genes can be transferred among Wolbachia strains via the phage, which in turn could lead to some variation in host–symbiont interactions between Wolbachia strains.
Figure 6. Genomic Organization and expression of Type IV Secretion Operons in wMel(A) Organization of the nine vir-like CDSs (white arrows) and five adjacent CDSs that encode for either putative membrane-spanning proteins (black arrows) or non-vir CDSs (gray arrows) of wMel, R. conorii, and A. tumefaciens. Solid horizontal lines denote RT experiments that have confirmed that adjacent CDSs are expressed as part of a polycistronic transcript. Results of these RT-PCR experiments are presented in (B). Lane 1, virB3-virB4; lane 2, RT control; lane 3, virB6-WD0856; lane 4, RT control; lane 5, WD0856-WD0855; lane 6, RT control; lane 7, WD0854-WD0853; lane 8, RT control; lane 9, virB8-virB9; lane 10, RT control; lane 11, virB9-virB11; lane 12, RT control; lane 13, virB11-virD4; lane 14, RT control; lane 15, virD4-wspB; lane 16, RT control; lane 17, virB4-virB6; lane 18, RT control; lane 19, WD0855-WD0854; lane 20, RT control. Only PCRs that contain reverse transcriptase amplified the desired products. PCR primer sequences are listed in Table S9.
ConclusionsAnalysis of the wMel genome reveals that it is unique among sequenced genomes of intracellular organisms in that it is both streamlined and massively infected with mobile genetic elements. The persistence of these elements in the genome for apparently long periods of time suggests that wMel is inefficient at getting rid of them, likely a result of experiencing severe population bottlenecks during every cycle of transovarial transmission as well as during sweeps through host populations. Integration of evolutionary reconstructions and genome analysis (phylogenomics) has provided insights into the biology of Wolbachia, helped identify genes that likely play roles in the unusual effects Wolbachia have on their host, and revealed many new details about the evolution of Wolbachia and mitochondria. Perhaps most importantly, future studies of Wolbachia will benefit both from this genome sequence and from the ability to study host–symbiont interactions in a host (D. melanogaster) well-suited for experimental studies.
Materials and MethodsPurification/source of DNA wMel DNA was obtained from D. melanogaster yw67c23 flies that naturally carry the wMel infection. wMel was purified from young adult flies on pulsed-field gels as described previously (Sun et al. 2001). Plugs were digested with the restriction enzyme AscI (GG^CGCGCC), which cuts the bacterial chromosome twice (Sun et al. 2001), aiding in the entry of the DNA into agarose gels. After electrophoresis, the resulting two bands were recovered from the gel and stored in 0.5 M EDTA (pH 8.0). DNA was extracted from the gel slices by first washing in TE (Tris–HCl and EDTA) buffer six times for 30 min each to dilute EDTA followed by two 1-h washes in β-agarase buffer (New England Biolabs, Beverly, Massachusetts, United States). Buffer was then removed and the blocks melted at 70°C for 7 min. The molten agarose was cooled to 40°C and then incubated in β-agarase (1 U/100 μl of molten agarose) for 1 h. The digest was cooled to 4°C for 1 h and then centrifuged at 4,100 × gmax for 30 min at 4°C to remove undigested agarose. The supernatant was concentrated on a Centricon YM-100 microconcentrator (Millipore, Bedford, Massachusetts, United States) after prerinsing with 70% ethanol followed by TE buffer and, after concentration, rinsed with TE. The retentate was incubated with proteinase K at 56°C for 2 h and then stored at 4°C. wMel DNA for gap closure was prepared from approximately 1,000 Drosophila adults using the Holmes–Bonner urea/phenol:chloroform protocol (Holmes and Bonner 1973) to prepare total fly DNA.
Library construction/sequencing/closure The complete genome sequence was determined using the whole-genome shotgun method (Venter et al. 1996). For the random shotgun-sequencing phase, libraries of average size 1.5–2.0 kb and 4.0–8.0 kb were used. After assembly using the TIGR Assembler (Sutton et al. 1995), there were 78 contigs greater than 5000 bp, 186 contigs greater than 3000 bp, and 373 contigs greater than 1500 bp. This number of contigs was unusually high for a 1.27 Mb genome. An initial screen using BLASTN searches against the nonredundant database in GenBank and the Berkeley Drosophila Genome Project site (http:/
Since it has been suggested that Wolbachia and their hosts may undergo lateral gene transfer events (Kondo et al. 2002), genome assemblies were rerun using all of the shotgun and closure reads without excluding any sequences that appeared to be of host origin. Only five assemblies were found to match both the D. melanogaster genome and the wMel assembly. Primers were designed to match these assemblies and PCR attempted from total DNA of wMel infected D. melanogaster. In each case, PCR was unsuccessful, and we therefore presume that these assemblies are the result of chimeric cloning artifacts. The complete sequence has been given GenBank accession ID AE017196 and is available at http:/
Repeats Repeats were identified using RepeatFinder (Volfovsky et al. 2001), which makes use of the REPuter algorithm (Kurtz and Schleiermacher 1999) to find maximal-length repeats. Some manual curation and BLASTN and BLASTX searches were used to divide repeat families into different classes.
Annotation Identification of putative protein-encoding genes and annotation of the genome was done as described previously (Eisen et al. 2002). An initial set of ORFs likely to encode proteins (CDS) was identified with GLIMMER (Salzberg et al. 1998). Putative proteins encoded by the CDS were examined to identify frameshifts or premature stop codons compared to other species. The sequence traces for each were reexamined and, for some, new sequences were generated. Those for which the frameshift or premature stops were of high quality were annotated as “authentic” mutations. Functional assignment, identification of membrane-spanning domains, determination of paralogous gene families, and identification of regions of unusual nucleotide composition were performed as described previously (Tettelin et al. 2001). Phylogenomic analysis (Eisen 1998a; Eisen and Fraser 2003) was used to aid in functional predictions. Alignments and phylogenetic trees were generated as described (Salzberg et al. 2001).
Comparative genomics All putative wMel proteins were searched using BLASTP against the predicted proteomes of published complete organismal genomes and a set of complete plastid, mitochondrial, plasmid, and viral genomes. The results of these searches were used (i) to analyze the phylogenetic profile (Pellegrini et al. 1999; Eisen and Wu 2002), (ii) to identify putative lineage-specific duplications (those proteins with a top E-value score to another protein from wMel), and (iii) to determine the presence of homologs in different species. Orthologs between the wMel genome and that of the two Rickettsia species were identified by requiring mutual best-hit relationships among all possible pairwise BLASTP comparisons, with some manual correction. Those genes present in both Rickettsia genomes as well as other bacterial species, but not wMel, were considered to have been lost in the wMel branch (see Table S3). Genes present in only one or two of the three species were considered candidates for gene loss or lateral transfer and were also used to identify possible biological differences between these species (see Table S3). For the wMel genes not in the Rickettsia genomes, proteins were searched with BLASTP against the TIGR NRAA database. Protein sequences of their homologs were aligned with CLUSTALW and manually curated. Neighbor-joining trees were constructed using the PHYLIP package.
Phylogenetic analysis of mitochondrial proteins For phylogenetic analysis, the set of all 38 proteins encoded in both the Marchantia polymorpha and Reclinomonas americana (Lang et al. 1997) mitochondrial genomes were collected. Acanthamoeba castellanii was excluded due to high divergence and extremely long evolutionary branches. Six genes were excluded from further analysis because they were too poorly conserved for alignment and phylogenetic analysis (nad7, rps10, sdh3, sdh4, tatC, and yejV), leaving 32 genes for investigation: atp6, atp9, atpA, cob, cox1, cox2, cox3, nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad9, rpl16, rpl2, rpl5, rpl6, rps1, rps11, rps12, rps13, rps14, rps19, rps2, rps3, rps4, rps7, rps8, yejR, and yejU. Using FASTA with the mitochondrial proteins as a query, homologs were identified from the genomes of seven α-Proteobacteria: two intracellular symbionts (W. pipientis wMel and Rickettsia prowazekii) and five free-living forms (Sinorhozobium loti, Agrobacterium tumefaciens, Brucella melitensis, Mesorhizobium loti, and Rhodopseudomonas sp.). Escherichia coli and Neisseria meningitidis were used as outgroups. Caulobacter crescentus was excluded from analysis because homologs of some of the 32 genes were not found in the current annotation. In the event that more than one homolog was identified per genome, the one with the greatest sequence identity to the mitochondrial query was retrieved. Proteins were aligned using CLUSTALW (Thompson et al. 1994) and concatenated. To reduce the influence of poorly aligned regions, all sites that contained a gap at any position were excluded from analysis, leaving 6,776 positions per genome for analysis. The data contained extreme amino acid bias: all sequences failed the χ2 test at p = 0.95 for deviation from amino acid frequency distribution assumed under either the JTT or mtREV24 models as determined with PUZZLE (Strimmer and von Haeseler 1996). When the data were iteratively purged of highly variable sites using the method described (Hansmann and Martin 2000), amino acid composition gradually came into better agreement with acid frequency distribution assumed by the model. The longest dataset in which all sequences passed the χ2 test at p = 0.95 consisted of the 3,100 least polymorphic sites. PROTML (Adachi and Hasegawa 1996) analyses of the 3,100-site data using the JTT model detected mitochondria as sisters of the five free-living α-Proteobacteria with low (72%) support, whereas PUZZLE, using the same data, detected mitochondria as sisters of the two intracellular symbionts, also with low (85%) support. This suggested the presence of conflicting signal in the less-biased subset of the data. Therefore, protein log determinants (LogDet) were used to infer distances from the 6,776-site data, since the method can correct for amino acid bias (Lockhart et al. 1994), and Neighbor-Net (Bryant and Moulton 2003) was used to display the resulting matrix, because it can detect and display conflicting signal. The result (see Figure 5A) shows both signals. In no analysis was a sister relationship between Rickettsia and mitochondria detected.
For analyses of individual genes, the 63 proteins encoded in the Reclinomonas mitochondrial genome were compared with FASTA to the proteins from 49 sequenced eubacterial genomes, which included the α-Proteobacteria shown in Figure 5, R. conorii, and Magnetococcus MC1, one of the more divergent α-Proteobacteria. Of those proteins, 50 had sufficiently well-conserved homologs to perform phylogenetic analyses. Homologs were aligned and subjected to phylogenetic analysis with PROTML (Adachi and Hasegawa 1996).
Analysis of wspB sequences To compare wspB sequences from different Wolbachia strains, PCR was done on total DNA extracted from the following sources: wRi was obtained from infected adult D. simulans, Riverside strain; wAlbB was obtained from the infected Aa23 cell line (O'Neill et al. 1997b), and D. immitis Wolbachia was extracted from adult worm tissue. DNA extraction and PCR were done as previously described (Zhou et al. 1998) with wspB-specific primers (wspB-F, 5′-TTTGCAAGTGAAACAGAAGG and wspB-R, 5′-GCTTTGCTGGCAAAATGG). PCR products were cloned into pGem-T vector (Promega, Madison, Wisconsin, United States) as previously described (Zhou et al. 1998) and sequenced (Genbank accession numbers AJ580921–AJ508923). These sequences were compared to previously sequenced wsp genes for the same Wolbachia strains (Genbank accession numbers AF020070, AF020059, and AJ252062). The four partial wsp sequences were aligned using CLUSTALV (Higgins et al. 1992) based on the amino acid translation of each gene and similarly with the wspB sequences. Genetic distances were calculated using the Kimura 2 parameter method and are reported in Table S5.
Type IV secretion system To determine whether the vir-like CDSs, as well as adjacent ORFs, were actively expressed within wMel as two polycistronic operons, RT-PCR was used. Total RNA was isolated from infected D. melanogaster yw67c23 adults using Trizol reagent (Invitrogen, Carlsbad, California, United States) and cDNA synthesized using SuperScript III RT (Invitrogen) using primers wspBR, WD0817R, WD0853R, and WD0852R. RNA isolation and RT were done according to manufacturer's protocols, with the exception that suggested initial incubation of RNA template and primers at 65°C for 5 min and final heat denaturation of RT-enzyme at 70°C for 15 min were not done. PCR was done using rTaq (Takara, Kyoto, Japan), and several primer sets were used to amplify regions spanning adjacent CDSs for most of the two operons. For operon virB3-WD0853, the following primers were used: (virB3-virB4)F, (virB3-virB4)R, (virB6-WD0856)F, (virB6-WD0856)R, (WD0856-WD0855)F, (WD0856-WD0855)R, (WD0854-WD0853)F, (WD0854-WD0853)R. For operon virB8-wspB, the following primers were used: (virB8-virB9)F, (virB8-virB9)R, (virB9-virB11)F, (virB9-virB11)R, (virB11-virD4)F, (virB11-virD4)R, (virD4-wspB)F, and (virD4-wspB)R. The coexpression of virB4 and virB6, as well as WD0855 and WD0854, was confirmed within the putative virB3-WD0853 operon using nested PCR with the following primers: (virB4-virB6)F1, (virB4-virB6)R1, (virB4-virB6)F2, (virB4-virB6)R2, (WD0855-WD0854)F1, (WD0855-WD0854)R1, (WD0855-WD0854)F2, and (WD0855-WD0854)R2. All ORFs within the putative virB8-wspB operon were shown to be coexpressed and are thus considered to be a genuine operon. All products were amplified only from RT-positive reactions (see Figure 6). Primer sequences are given in Table S9.
Figure S1. Phage TreesPhylogenetic tree showing the relationship between WO-A and WO-B phage from wMel with reported phage from wKue and wTai. The tree was generated from a CLUSTALW multiple sequence alignment (Thompson et al. 1994) using the PROTDIST and NEIGHBOR programs of PHYLIP (Felsenstein 1989).
(60 KB PDF).
Figure S2. Plot of the Effective Number of Codons against GC Content at the Third Codon Position (GC3)Proteins with fewer than 100 residues are excluded from this analysis because their effective number of codon (ENc) values are unreliable. The curve shows the expected ENc values if codon usage bias is caused by GC variation alone. Colors: yellow, hypothetical; purple, mobile element; blue, others. Most of the variation in codon bias can be traced to variation in GC, indicating that the mutation forces dominate the wMel codon usage. Multivariate analysis of codon usage was performed using the CODONW package (available from http:/
(289 KB PDF).
Figure S3. Predicted Metabolism and Transport in wMelOverview of the predicted metabolism (energy production and organic compounds) and transport in wMel. Transporters are grouped by predicted substrate specificity: inorganic cations (green), inorganic anions (pink), carbohydrates (yellow), and amino acids/peptides/amines/purines and pyrimidines (red). Transporters in the drug-efflux family (labeled as “drugs”) and those of unknown specificity are colored black. Arrows indicate the direction of transport. Energy-coupling mechanisms are also shown: solutes transported by channel proteins (double-headed arrow); secondary transporters (two-arrowed lines, indicating both the solute and the coupling ion); ATP-driven transporters (ATP hydrolysis reaction); unknown energy-coupling mechanism (single arrow). Transporter predictions are based upon a phylogenetic classification of transporter proteins (Paulsen et al. 1998).
(167 KB PDF).
Table S1. Repeats of Greater Than 50 bp in the wMel Genome (with Coordinates)(649 KB DOC).
Table S2. Inactivated Genes in the wMel Genome(147 KB DOC).
Table S3. Ortholog Comparison with Rickettsia spp.(718 KB XLS).
Table S4. Putative Lineage-Specific Gene Duplications in wMel(116 KB DOC).
Table S5. Genetic Distances as Calculated for Alignments of wsp and wspB Gene Sequences from the Same Wolbachia Strains(24 KB DOC).
Table S6. Putative DNA Repair and Recombination Genes in the wMel Genome(26 KB DOC).
Table S7. Phylogenetic Results for Concatenated Data of 32 Mitochondrial Proteins(34 KB DOC).
Table S8. Individual Phylogenetic Results for Reclinomonas Mitochondrial DNA-Encoded Proteins(117 KB DOC).
Table S9. PCR Primers(47 KB DOC).
Accession NumbersThe complete sequence for wMel has been given GenBank (http:/
The GenBank accession numbers for other sequences discussed in this paper are AF020059 (Wolbachia sp. wAlbB outer surface protein precursor wsp gene), AF020070 (Wolbachia sp. wRi outer surface protein precursor wsp gene), AJ252062 (Wolbachia endosymbiont of D. immitis sp. gene for surface protein), AJ580921 (Wolbachia endosymbiont of D. immitis partial wspB gene for Wolbachia surface protein B), AJ580922 (Wolbachia endosymbiont of A. albopictus partial wspB gene for Wolbachia surface protein B), and AJ580923 (Wolbachia endosymbiont of D. simulans partial wspB gene for Wolbachia surface protein B).
AcknowledgmentsWe acknowledge Barton Slatko, Jeremy Foster, New England Biolabs, and Mark Blaxter for helping inspire this project; Rehka Seshadri for help in examining pathogenicity factors and reading the manuscript; Derek Fouts for examination of group II introns; Susan Lo, Michael Heaney, Vadim Sapiro, and Billy Lee for IT support; Maria-Ines Benito, Naomi Ward, Michael Eisen, Howard Ochman, and Vincent Daubin for helpful discussions; Steven Salzberg and Mihai Pop for help in comparing wMel with the D. melanogaster genome; Elodie Ghedin for access to the B. malayi Wolbachia sequence data; Maria Ermolaeva for assistance with analysis of operons; Dan Haft for designing protein family hidden Markov models for annotation; Owen White for general bioinformatics support; four anonymous reviewers for very helpful comments and suggestions; and Claire M. Fraser for continuing support of TIGR's scientific research. This project was supported by grant UO1-AI47409–01 to Scott O'Neill and Jonathan A. Eisen from the National Institutes of Allergy and Infectious Diseases.
Conflicts of interest. The authors have declared that no conflicts of interest exist.
Author contributions. M. Wu contributed ideas and analysis in all aspects of the work. L. Sun performed purification of wMel DNA for initial libraries and closure. J. Vamathevan was the closure team leader, performed sequence assembly and analysis, and screened contigs against the Drosophila genome. M. Riegler performed validation of assembly against the physical map and confirmation of rearrangements by long PCR and analysis of repeat regions. R. Deboy was the annotation leader and managed the annotation, ORF management, and frameshifts. J. C. Brownlie performed analysis of Type IV secretion systems. E. A. McGraw performed validation of assembly against physical map and confirmation of rearrangements by long PCR and analysis of wsp paralogs. W. Martin, C. Esser, N. Ahmadinejad, and C. Wiegand performed the mitochondrial evolution analysis. R. Madupu, M. J. Beanan, L. M. Brinkac, S. C. Daugherty, A. S. Durkin, J. F. Kolonay, and W. C. Nelson performed genome annotation. Y. Mohamoud, P. Lee, and K. Berry performed the closure experiments (closed sequencing gaps, multiplex PCR, resolution of small repeats, coverage reactions, contig editing, resolution of large repeats by transposon and primer walking). M. B. Young was the shotgun sequencing leader. T. Utterback and J. Weidman performed shotgun sequencing and frameshift checking; Utterback also worked on the assembly. W. C. Nierman handled the library construction. I. T. Paulsen performed transporter analysis. K. E. Nelson performed metabolism analysis. H. Tettelin analyzed genome properties, repeats, and membrane proteins. S. L. O'Neill and J. A. Eisen supplied ideas, coordination, and analysis; Eisen is the corresponding author.
- Adachi J, Hasegawa M (1996) Model of amino acid substitution in proteins encoded by mitochondrial DNA. J Mol Evol 42:459–468. Find this article online
- Akman L, Yamashita A, Watanabe H, Oshima K, Shiba T, et al. (2002) Genome sequence of the endocellular obligate symbiont of tsetse flies, Wigglesworthia glossinidia. Nat Genet 32:402–407. Find this article online
- Andersson SG, Zomorodipour A, Andersson JO, Sicheritz-Ponten T, Alsmark UC, et al. (1998) The genome sequence of Rickettsia prowazekii and the origin of mitochondria. Nature 396:133–140. Find this article online
- Andersson SG, Karlberg O, Canback B, Kurland CG (2003) On the origin of mitochondria: A genomics perspective. Philos Trans R Soc Lond B Biol Sci 358:165–167. Find this article online
- Bazinet C, Rollins JE (2003) Rickettsia-like mitochondrial motility in Drosophila spermiogenesis. Evol Dev 5:379–385. Find this article online
- Bjorkholm B, Sjolund M, Falk PG, Berg OG, Engstrand L, et al. (2001) Mutation frequency and biological cost of antibiotic resistance in Helicobacter pylori. Proc Natl Acad Sci U S A 98:14607–14612. Find this article online
- Boyle L, O'Neill SL, Robertson HM, Karr TL (1993) Interspecific and intraspecific horizontal transfer of Wolbachia in Drosophila. Science 260:1796–1799. Find this article online
- Braig HR, Zhou W, Dobson SL, O'Neill SL (1998) Cloning and characterization of a gene encoding the major surface protein of the bacterial endosymbiont Wolbachia pipientis. J Bacteriol 180:2373–2378. Find this article online
- Bryant D, Moulton V (2003) Neighbor-Net: An agglomerative method for the construction of phylogenetic networks. Mol Biol Evol 20 Dec 5 [Epub ahead of print].
- Caturegli P, Asanovich KM, Walls JJ, Bakken JS, Madigan JE, et al. (2000) ankA: An Ehrlichia phagocytophila group gene encoding a cytoplasmic protein antigen with ankyrin repeats. Infect Immun 68:5277–5283. Find this article online
- Christie PJ (2001) Type IV secretion: Intercellular transfer of macromolecules by systems ancestrally related to conjugation machines. Mol Microbiol 40:294–305. Find this article online
- Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, et al. (1999) Alignment of whole genomes. Nucleic Acids Res 27:2369–2376. Find this article online
- Dumler SJ, Barbet AF, Bekker CPJ, Dasch GA, Palmer GH, et al. (2001) Reorganization of genera in the families Rickettsiaceae and Anaplasmataceae in the order Rickettsiales: Unification of some species of Ehrlichia with Anaplasma, Cowdria with Ehrlichia and Ehrlichia with Neorickettsia—Descriptions of six new species combinations and designation of Ehrlichiaqui and “HGE agent” as subjective synonyms of Ehrlichia phagocytophila. Intl J System Evol Microbiol 51:2145–2165. Find this article online
- Eiglmeier K, Parkhill J, Honore N, Garnier T, Tekaia F, et al. (2001) The decaying genome of Mycobacterium leprae. Lepr Rev 72:387–398. Find this article online
- Eisen JA (1997) Gastrogenomic delights: A movable feast. Nat Med 3:1076–1078. Find this article online
- Eisen JA (1998a) A phylogenomic study of the MutS family of proteins. Nucleic Acids Res 26:4291–4300. Find this article online
- Eisen JA (1998b) Phylogenomics: Improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res 8:163–167. Find this article online
- Eisen JA, Fraser CM (2003) Phylogenomics: Intersection of evolution and genomics. Science 300:1706–1707. Find this article online
- Eisen JA, Hanawalt PC (1999) A phylogenomic study of DNA repair genes, proteins, and processes. Mutat Res 435:171–213. Find this article online
- Eisen JA, Wu M (2002) Phylogenetic analysis and gene functional predictions: Phylogenomics in action. Theor Popul Biol 61:481–487. Find this article online
- Eisen JA, Heidelberg JF, White O, Salzberg SL (2000) Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol 1:1–9 RESEARCH0011. Find this article online
- Eisen JA, Nelson KE, Paulsen IT, Heidelberg JF, Wu M, et al. (2002) The complete genome sequence of Chlorobium tepidum TLS, a photosynthetic, anaerobic, green-sulfur bacterium. Proc Natl Acad Sci U S A 99:9509–9514. Find this article online
- Elfring LK, Axton JM, Fenger DD, Page AW, Carminati JL, et al. (1997) Drosophila PLUTONIUM protein is a specialized cell cycle regulator required at the onset of embryogenesis. Mol Biol Cell 8:583–593. Find this article online
- Emelyanov VV (2001a) Evolutionary relationship of Rickettsiae and mitochondria. FEBS Lett 501:11–18. Find this article online
- Emelyanov VV (2001b) Rickettsiaceae, Rickettsia-like endosymbionts, and the origin of mitochondria. Biosci Rep 21:1–17. Find this article online
- Emelyanov VV (2003a) Mitochondrial connection to the origin of the eukaryotic cell. Eur J Biochem 270:1599–1618. Find this article online
- Emelyanov VV (2003b) Phylogenetic affinity of a Giardia lamblia cysteine desulfurase conforms to canonical pattern of mitochondrial ancestry. FEMS Microbiol Lett 226:257–266. Find this article online
- Felsenstein J (1989) PHYLIP—Phylogeny inference package (version 3.2). Cladistics 5:164–166. Find this article online
- Frank AC, Amiri H, Andersson SG (2002) Genome deterioration: Loss of repeated sequences and accumulation of junk DNA. Genetica 115:1–12. Find this article online
- Gray MW, Burger G, Lang BF (2001) The origin and early evolution of mitochondria. Genome Biol 2:REVIEWS1018.
- Gupta RS (1995) Evolution of the chaperonin families (Hsp60, Hsp10 and Tcp-1) of proteins and the origin of eukaryotic cells. Mol Microbiol 15:1–11. Find this article online
- Hansmann S, Martin W (2000) Phylogeny of 33 ribosomal and six other proteins encoded in an ancient gene cluster that is conserved across prokaryotic genomes: Influence of excluding poorly alignable sites from analysis. Int J Syst Evol Microbiol 50:1655–1663. Find this article online
- Higgins D, Bleasby A, Fuchs R (1992) ClustalV: Improved software for multiple sequence alignment. Comput Appl Biosci 8:189–191. Find this article online
- Holmes DS, Bonner J (1973) Preparation, molecular weight, base composition, and secondary structure of giant nuclear ribonucleic acid. Biochemistry 12:2330–2338. Find this article online
- Hryniewicz-Jankowska A, Czogalla A, Bok E, Sikorsk AF (2002) Ankyrins, multifunctional proteins involved in many cellular pathways. Folia Histochem Cytobiol 40:239–249. Find this article online
- Itoh T, Martin W, Nei M (2002) Acceleration of genomic evolution caused by enhanced mutation rate in endocellular symbionts. Proc Natl Acad Sci U S A 99:12944–12948. Find this article online
- Jamnongluk W, Kittayapong P, Baimai V, O'Neill SL (2002) Wolbachia infections of tephritid fruit flies: Molecular evidence for five distinct strains in a single host species. Curr Microbiol 45:255–260. Find this article online
- Jeyaprakash A, Hoy MA (2000) Long PCR improves Wolbachia DNA amplification: wsp sequences found in 76% of sixty-three arthropod species. Insect Mol Biol 9:393–405. Find this article online
- Karlin S, Brocchieri L (2000) Heat shock protein 60 sequence comparisons: Duplications, lateral transfer, and mitochondrial evolution. Proc Natl Acad Sci U S A 97:11348–11353. Find this article online
- Kondo N, Nikoh N, Ijichi N, Shimada M, Fukatsu T (2002) Genome fragment of Wolbachia endosymbiont transferred to X chromosome of host insect. Proc Natl Acad Sci U S A 99:14280–14285. Find this article online
- Kurtz S, Schleiermacher C (1999) REPuter: Fast computation of maximal repeats in complete genomes. Bioinformatics 15:426–427. Find this article online
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, et al. (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921. Find this article online
- Lang BF, Burger G, O'Kelly CJ, Cedergren R, Golding GB, et al. (1997) An ancestral mitochondrial DNA resembling a eubacterial genome in miniature. Nature 387:493–497. Find this article online
- Lang BF, Seif E, Gray MW, O'Kelly CJ, Burger G (1999) A comparative genomics approach to the evolution of eukaryotes and their mitochondria. J Eukaryot Microbiol 46:320–326. Find this article online
- Lawrence JG (2001) Catalyzing bacterial speciation: Correlating lateral transfer with genetic headroom. Syst Biol 50:479–496. Find this article online
- Lawrence JG, Ochman H (1997) Amelioration of bacterial genomes: Rates of change and exchange. J Mol Evol 44:383–397. Find this article online
- Lawrence JG, Ochman H (1998) Molecular archaeology of the Escherichia coli genome. Proc Natl Acad Sci U S A 95:9413–9417. Find this article online
- Lin M, Rikihisha Y (2003) Ehrlichia chaffeensis and Anaplasma phagocytophilum lack genes for lipid A biosynthesis and incorporate cholesterol for their survival. Infect Immun 71:5324–5331. Find this article online
- Lo N, Casiraghi M, Salati E, Bazzocchi C, Bandi C (2002) How many Wolbachia supergroups exist? Mol Biol Evol 19:341–346. Find this article online
- Lockhart PJ, Steel MA, Hendy MD, Penny D (1994) Recovering evolutionary trees under a more realistic evolutionary model. Mol Biol Evol 11:605–612. Find this article online
- Martin W (1999) Mosaic bacterial chromosomes: A challenge en route to a tree of genomes. Bioessays 21:99–104. Find this article online
- Masui S, Sasaki T, Ishikawa H (2000) Genes for the type IV secretion system in an intracellular symbiont, Wolbachia, a causative agent of various sexual alterations in arthropods. J Bacteriol 182(22):6529–6531. Find this article online
- Masui S, Kuroiwa H, Sasaki T, Inui M, Kuroiwa T, et al. (2001) Bacteriophage WO and virus-like particles in Wolbachia, an endosymbiont of arthropods. Biochem Biophys Res Commun 283:1099–1104. Find this article online
- McGraw EA, Merritt DJ, Droller JN, O'Neill SL (2001) Wolbachia-mediated sperm modification is dependent on the host genotype in Drosophila. Proc R Soc Lond B Biol Sci 268:2565–2570. Find this article online
- Mira A, Ochman H, Moran NA (2001) Deletional bias and the evolution of bacterial genomes. Trends Genet 17:589–596. Find this article online
- Moran NA (1996) Accelerated evolution and Muller's rachet in endosymbiotic bacteria. Proc Natl Acad Sci U S A 93:2873–2878. Find this article online
- Moran NA, Mira A (2001) The process of genome shrinkage in the obligate symbiont Buchnera aphidicola. Genome Biol 2:RESEARCH0054.
- Muller M, Martin W (1999) The genome of Rickettsia prowazekii and some thoughts on the origin of mitochondria and hydrogenosomes. Bioessays 21:377–381. Find this article online
- O'Neill SL, Hoffmann AA, Werren JH, editors (1997a) Influential passengers: Inherited microorganisms and arthropod reproduction. Oxford: Oxford University Press. 228 p.
- O'Neill SL, Pettigrew MM, Sinkins SP, Braig HR, Andreadis TG, et al. (1997b) In vitro cultivation of Wolbachia pipientis in an Aedes albopictus cell line. Insect Mol Biol 6:33–39. Find this article online
- Ogata H, Audic S, Renesto-Audiffren P, Fournier PE, Barbe V, et al. (2001) Mechanisms of evolution in Rickettsia conorii and R. prowazekii. Science 293:2093–2098. Find this article online
- Parkhill J, Wren BW, Thomson NR, Titball RW, Holden MT, et al. (2001) Genome sequence of Yersinia pestis, the causative agent of plague. Nature 413:523–527. Find this article online
- Parkhill J, Sebaihia M, Preston A, Murphy LD, Thomson N, et al. (2003) Comparative analysis of the genome sequences of Bordetella pertussis, Bordetella parapertussis and Bordetella bronchiseptica. Nat Genet 35:32–40. Find this article online
- Paulsen IT, Sliwinski MK, Saier MH Jr (1998) Microbial genome analyses: Global comparisons of transport capabilities based on phylogenies, bioenergetics and substrate specificities. J Mol Biol 277:573–592. Find this article online
- Paulsen IT, Nguyen L, Sliwinski MK, Rabus R, Saier MH Jr (2000) Microbial genome analyses: Comparative transport capabilities in eighteen prokaryotes. J Mol Biol 301:75–100. Find this article online
- Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO (1999) Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc Natl Acad Sci U S A 96:4285–4288. Find this article online
- Penny D, McComish BJ, Charleston MA, Hendy MD (2001) Mathematical elegance with biochemical realism: The covarion model of molecular evolution. J Mol Evol 53:711–723. Find this article online
- Read TD, Brunham RC, Shen C, Gill SR, Heidelberg JF, et al. (2000) Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39. Nucleic Acids Res 28:1397–1406. Find this article online
- Roelofs J, Van Haastert PJ (2001) Genes lost during evolution. Nature 411:1013–1014. Find this article online
- Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26:544–548. Find this article online
- Salzberg SL, White O, Peterson J, Eisen JA (2001) Microbial genes in the human genome: Lateral transfer or gene loss? Science 292:1903–1906. Find this article online
- Selby CP, Witkin EM, Sancar A (1991) Escherichia coli mfd mutant deficient in “mutation frequency decline” lacks strand-specific repair: In vitro complementation with purified coupling factor. Proc Natl Acad Sci U S A 88:11574–11578. Find this article online
- Seshadri R, Paulsen IT, Eisen JA, Read TD, Nelson KE, et al. (2003) Complete genome sequence of the Q-fever pathogen Coxiella burnetii. Proc Natl Acad Sci U S A 100:5455–5460. Find this article online
- Sexton JA, Vogel JP (2002) Type IVB secretion by intracellular pathogens. Traffic 3:178–185. Find this article online
- Shigenobu S, Watanabe H, Hattori M, Sakaki Y, Ishikawa H (2000) Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407:81–86. Find this article online
- Sicheritz-Ponten T, Kurland CG, Andersson SG (1998) A phylogenetic analysis of the cytochrome b and cytochrome c oxidase I genes supports an origin of mitochondria from within the Rickettsiaceae. Biochim Biophys Acta 1365:545–551. Find this article online
- Sinkins SP, O'Neill SL (2000) Wolbachia as a vehicle to modify insect populations. In: James AA, editor. Insect transgenesis: Methods and applications. Boca Raton, Florida: CRC Press. 271–288.
- Stanhope MJ, Lupas A, Italia MJ, Koretke KK, Volker C, et al. (2001) Phylogenetic analyses do not support horizontal gene transfers from bacteria to vertebrates. Nature 411:940–944. Find this article online
- Strimmer K, von Haeseler A (1996) Quartet puzzling: A quartet maximum-likelihood method for reconstructing tree topologies. Mol Biol Evol 13:964–969. Find this article online
- Sun LV, Foster JM, Tzertzinis G, Ono M, Bandi C, et al. (2001) Determination of Wolbachia genome size by pulsed-field gel electrophoresis. J Bacteriol 183:2219–2225. Find this article online
- Sun LV, Riegler M, O'Neill SL (2003) Development of a physical and genetic map of the virulent Wolbachia strain wMelPop. J Bacteriol 185:7077–7084. Find this article online
- Sutton G, White O, Adams M, Kerlavage A (1995) TIGR assembler: A new tool for assembling large shotgun sequencing projects. Genome Sci Tech 1:9–19. Find this article online
- Tamas I, Klasson L, Canback B, Naslund AK, Eriksson AS, et al. (2002) 50 million years of genomic stasis in endosymbiotic bacteria. Science 296:2376–2379. Find this article online
- Taylor MJ (2002) A new insight into the pathogenesis of filarial disease. Curr Mol Med 2:299–302. Find this article online
- Taylor MJ, Hoerauf A (2001) A new approach to the treatment of filariasis. Curr Opin Infect Dis 14:727–731. Find this article online
- Taylor MJ, Bandi C, Hoerauf AM, Lazdins J (2000) Wolbachia bacteria of filarial nematodes: A target for control? Parasitol Today 16:179–180. Find this article online
- Tettelin H, Radune D, Kasif S, Khouri H, Salzberg SL (1999) Optimized multiplex PCR: Efficiently closing a whole-genome shotgun sequencing project. Genomics 62:500–507. Find this article online
- Tettelin H, Nelson KE, Paulsen IT, Eisen JA, Read TD, et al. (2001) Complete genome sequence of a virulent isolate of Streptococcus pneumoniae. Science 293:498–506. Find this article online
- Thompson JD, Higgins DG, Gibson TJ (1994) ClustalW: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680. Find this article online
- Tram U, Sullivan W (2002) Role of delayed nuclear envelope breakdown and mitosis in Wolbachia-induced cytoplasmic incompatibility. Science 296:1124–1126. Find this article online
- van Ham RC, Kamerbeek J, Palacios C, Rausell C, Abascal F, et al. (2003) Reductive genome evolution in Buchnera aphidicola. Proc Natl Acad Sci U S A 100:581–586. Find this article online
- Venter JC, Smith HO, Hood L (1996) A new strategy for genome sequencing. Nature 381:364–366. Find this article online
- Viale AM, Arakaki AK (1994) The chaperone connection to the origins of the eukaryotic organelles. FEBS Lett 341:146–151. Find this article online
- Volfovsky N, Haas BJ, Salzberg SL (2001) A clustering method for repeat analysis in DNA sequences. Genome Biol 2:RESEARCH0027.
- Ware J, Moran L, Foster J, Posfai J, Vincze T, et al. (2002) Sequencing and analysis of a 63 kb bacterial artificial chromosome insert from the Wolbachia endosymbiont of the human filarial parasite Brugia malayi. Int J Parasitol 32:159–166. Find this article online
- Wernegreen J, Moran NA (1999) Evidence for genetic drift in endosymbionts (Buchnera): Analyses of protein-coding genes. Mol. Biol. Evol 16:83–97. Find this article online
- Werren JH (1998) Wolbachia and speciation. In: Berlocher SH, editor. Endless forms: Species and speciation. New York: Oxford University Press. 245–260.
- Werren JH, O'Neill SL (1997) The evolution of heritable symbionts. In: O'Neill SL, Hoffmann AA, Werren JH, editors. Influential passengers: Inherited microorganisms and arthropod reproduction. Oxford: Oxford University Press. 1–41.
- Werren JH, Windsor DM (2000) Wolbachia infection frequencies in insects: Evidence of a global equilibrium? Proc R Soc Lond B Biol Sci 267:1277–1285. Find this article online
- Witkin EM (1994) Mutation frequency decline revisited. Bioessays 16:437–444. Find this article online
- Zhou W, Rousset F, O'Neill SL (1998) Phylogeny and PCR-based classification of Wolbachia strains using wsp gene sequences. Proc R Soc Lond B Biol Sci 265:509–515. Find this article online