As Tables 1 & 2 shows
Genome Centers around the world are making rapid progress sequencing genomes
of various organisms.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
|
|||||
| Haemophilus influenzae |
|
|
|
TIGR,
Science269(5223):496-512 |
|
| Mycoplasma genitalium |
|
|
|
TIGR,
Science270(5235):397-404 |
|
| Methanococcus jannaschii |
|
|
|
TIGR;
Science273(5278):1058-1073 |
|
| Synechocystis sp. PCC6803 |
|
|
|
KDRI (Japan) | |
| Aquifex aeolicus |
1.5
|
|
|
(pub) |
Diversa,
Nature 392:353-358 |
| Archaeoglobus fulgidus |
|
|
|
TIGR
Nature 390:364-370 |
|
| Helicobacter pylori |
|
|
|
TIGR,
GTC
Nature 390:539-547 |
|
| Streptococcus pneumoniae |
|
|
|
TIGR, Science 293:498-506
(20 Jul 2001) [Supp]
GTC(?), Alabama |
|
| Escherichia coli |
|
|
|
U. Wisconson, Japan
Science 277(5331):1453-1462 |
|
| Methanobacterium thermoautotrophicum |
|
|
|
GTC; Ohio State University; Jnl of Bacteriology 179:7135-55 (1997) | |
| Bacillus subtilis |
|
|
|
Europe, Japan
Nature 390:249-256 |
|
| Mycoplasma pneumoniae |
|
|
|
U. Heidelberg | |
| Neisseria meningitidis
(Serogroup A strain Z2491) (Serogroup B Strain MC58) |
100% |
2,158 |
|
Sanger, Nature404:502-506 TIGR, Oxford; Science 287(5459):1809-1815 (10 Mar 2000) |
|
| Pyrobaculum aerophilum |
2.22
|
|
|
|
CalTech, UCLA |
| Ureaplasma urealyticum |
0.8
|
|
|
U. Alabama, ABI | |
| Pyrococcus horikosjii |
|
|
|
MITI , Univ.
Tokyo;
DNA Research 5:55-76 (30 Apr 1998) |
|
| Clostridium acetobutylicum |
4.03
|
|
|
GTC | |
| Mycobacterium leprae |
|
|
Sanger, GTC | ||
| Neisseria gonorrhoeae |
|
|
U. Oklahoma | ||
| Streptococcus pyogenes |
|
1,752 |
|
U. Oklahoma
Sanger; PNAS98(8):4658-4663 (2001) |
|
| Mycobacterium tuberculosis |
|
|
|
Sanger,
Nature
393(6685):537-544 TIGR, GTC |
|
| Deinococcus radiodurans |
|
|
23/11/99 |
TIGR;
Science |
|
| Rhodobacter capsulatus |
4.00
|
|
|
Univ. Chicago | |
| Thermotoga maritima |
1.8
|
|
|
TIGR
Nature 399:323-329 (1999) |
|
| Enterococcus faecalis | 100% |
|
TIGR, Genome Therapeutics | ||
| Treponema pallidum |
|
|
|
TIGR,
Univ.
Texas
Science 281(5375):375-388 (1998) |
|
| Borrelia burgdorferi |
|
|
TIGR, BNL
Nature 390:580-586 |
||
| Vibrio cholerae |
4.03
|
|
|
|
TIGR; Nature 406(6795):477-483 (2000) |
| Rickettsia prowazekii |
1.12
|
|
|
|
Univ. of
Uppsala
Nature 396(6707):133-140 (1998) |
| Chlorobium tepidum | TIGR | ||||
| Caulobacter crescentus |
4.02
|
|
3,767 |
|
TIGR; PNAS 98(7):3460-3465 (2001) |
| Legionella pneumophila |
4.1
|
TIGR | |||
| Mycobacterium avium |
4.7
|
|
TIGR | ||
| Porphyromonas gingivalis |
|
|
TIGR; Science Daily article | ||
| Shewanella putrefaciens |
4.5
|
TIGR | |||
| Mycoplasma capricolum |
1.2
|
GMU | |||
| Pyrococcus furiosus |
|
|
|
U. Maryland, U. Utah | |
| Pyrococcus abyssii |
1.8
|
|
|
|
France; @Genoscope |
| Sulfolobus solfataricus |
|
|
IMB
(Canada)
LBMGE (Orsay, France) PNAS 98(14):7835-7840 |
||
| Thiobacillus ferrooxidans |
2.6
|
|
|
|
PNAS 97(7):3509-3514 |
| Salmonella typhimurium |
4.85
|
|
|
|
WashU
(in assembly), SGSCC
Infect. Immun. 66:4305-4312 (1998) FEMS Microbiology Letters 173:411-423 (1999) TIGR Nature 413:852 - 856 (25 Oct 2001) |
| Salmonella typhi |
4.5
|
|
|
|
Sanger
Nature 413:848 - 852 (25 Oct 2001) |
| Salmonella paratyphi |
|
|
WashU (2x shotgun coverage) | ||
| Klebsiella pneumoniae |
|
|
WashU (3x shotgun coverage) | ||
| Aeropyrum pernix |
1.67
|
|
|
|
NITE
DNA Research 6:83-101 (1999) |
| Streptomyces coelicolor |
8+
|
|
|
Sanger | |
| Chlamydia trachomatis
Serovar D MoPn/Strain Nigg
|
1.04 1.07
|
100%
|
924
|
|
UC Berkeley, Stanford Science 282(5389):754-759 (23 Oct 1998) TIGR Nuc. Acids Res. 28:1397-1406 (2000) |
| Chlamydia pneumoniae |
1.23
|
|
|
|
UC Berkeley, Stanford, Incyte, Nature Genetics 21(4):385-389 (1999) |
| Giardia lamblia | |||||
| Clostridium difficile |
4.4
|
|
|
Sanger (in assembly 11/01) | |
| Campylobacter jejuni |
|
|
|
Sanger; Nature 403:665-668 (10 Feb2000) | |
| Yersinia pestis |
|
|
|
|
Sanger; |
| Bordetella pertussis |
3.88
|
|
|
Sanger (in assembly 08/00) | |
| Bordetella bronchiseptica |
4.9
|
96.4% |
|
Sanger (in assembly 08/00) | |
| Bordetella parapertussis |
|
|
Sanger | ||
| Mycobacterium bovis |
4.4
|
|
|
Sanger (in assembly 08/00) | |
| Pseudomonas aeruginosa |
|
|
Pseudomonas Genome Project
|
||
| Thermotoga maritima |
|
|
|
TIGR;
|
|
| Deinococcus radiodurans | 3.28 | 100% | 3,245 | 11/19/99 | Science 286(5444):1571-7 (19 Nov 1999) |
| Bacillus Stearothermophilus |
3.13
|
|
|
U. Oklahoma (gap closure) | |
| Actinobacillus actinomycetemcomitans |
|
|
U. Oklahoma (gap closure) | ||
| Staphylococcus aureus
NCTC 8325 MRSA strain EMRSA-16 MSSA strain |
~99% 0% |
U. Oklahoma, Sanger |
|||
| Streptococcus mutans | U. Oklahoma | ||||
| Xylella fastidiosa |
2.7
|
|
|
|
UNICAMP; Nature 406:151-157 (13 Jul 2000) |
| Thermoplasma acidophilum |
1.6
|
|
|
|
Max-Planck-Institute for Biochemistry;
Nature 407:508-513 (28 Sep 2000) |
| Thermoplasma volcanium |
|
|
|
AIST | |
| Halobacterium salinarium |
4.0
|
Max-Planck-Institute for Biochemistry | |||
| Methanosarcina mazei |
4.10
|
|
|
|
Göttingen Genomics lab |
| Methanogenium frigidum |
?
|
UNSW, Sydney | |||
| Methanococcus maripaludis |
1.66
|
|
|
|
University of Washington |
| Sulfolobus tokodaii |
2.69
|
|
|
|
None? |
| Ferroplasma acidarmanus |
2.0
|
JGI | |||
| Methanosarcina barkeri fusaro |
2.8
|
JGI | |||
| Halobacterium NRC-1 |
2.6
|
|
|
|
PNAS 97(22):12176-81 (24 Oct 2000) |
| Crenarcheaum symbiosum |
~2.5
|
Diversa | |||
| Sinorhizobium meliloti |
6.7
|
|
|
|
Science 293(5530):668-672 |
| Pasteurella multocida |
2.26
|
|
|
|
PNAS 98(6):3460-3465 |
| Nitrosomonas europaea |
|
JGI | |||
| Prochlorococcus marinus |
|
JGI | |||
| Rhodopseudomonas palustris |
|
JGI | |||
| Nostoc punctiforme |
|
JGI | |||
| Marine synechococcus |
|
JGI | |||
| Cytophaga hutchinsonii | JGI | ||||
| Ralstonia metallidurans |
|
~12/??/00 | JGI | ||
| Pyrolobus furmarii |
1.85
|
|
|
|
Diversa & Celera; BBC article |
| Streptomyces coelicolor |
8.66
|
|
|
|
ABCNews article |
| Fusobacterium nucleatum (ATCC 25586) |
2.17
|
|
|
|
J. Bacteriol 184(7):2005-18 (Apr 2002) |
| Brucella melitensis |
3.29
|
|
|
|
PNAS 99(1):443-448 (2002) |
| Brucella suis | 3.31 | 100% |
|
|
PNAS 99(20):13148-13153 (2002) |
| Shewanella oneidensis |
4.97
|
|
|
|
Nat Biotechnol. 220(11):1118-23 (Nov 2002) |
| Bacteroides thetaiotaomicron |
?
|
|
|
|
Science, March 28,
2003
Science Daily article |
| Porphyromonas gingivalis (W83) |
2.34
|
|
|
|
J. Bacteriol 186(2):593 (Jan 2004) |
| Wolbachia pipientis wMel |
|
|
|
|
PLoS Biol. 2(3):E69
(16 Mar 2004)
Science Daily article The Japan Times article (25 Mar 2004) |
| Treponema denticola |
2.84
|
|
|
|
PNAS 101(15):5646-51 (13 Apr 2004) |
| Desulfovibrio vulgaris |
3.57
|
|
|
|
Science Daily article
(14 Apr 2004)
Nature Biotechnology 22(5):554-9 (May 2004). |
| Geobacter sulfurreducens |
|
|
Science 302(5652):1967-9
(12 Dec 2003)
U. Mass. Press Release (11 Dec 2003) |
||
| Silicibacter pomeroyi |
|
|
|
Nature 432(7019):910-3 (16 Dec 2004) | |
| DehalocDoccoides ethenogenes | ~1.4 |
|
|
|
Science 307(5706):105-8
(7 Jan 2005);
Science Daily article (1/20/05) |
| Cryptococcus neoformans |
|
|
|
|
Science Epub
(13 Jan 2005);
Science Daily article (1/24/05) |
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Mbp | |||||
| Saccharomyces cereviseae (yeast) |
|
|
|
Stanford,
WashU,
Sanger,
Europe
Science 274(5287):546-567 (25 Oct 1996) |
|
| C. elegans (nematode) |
~100
|
|
|
|
Sanger, WashU
Science 282:2012-2018(11 Dec 1998); WormPep 18 (11 Oct 1999) |
| Schizosaccharomyces pombe (yeast) |
~14
|
|
|
Sanger | |
| Arabidopsis thaliana |
~125
|
|
|
|
Stanford;
A
thaliana Genome Center; TAIR
Nature 408:791;796-815 (14 Dec 1999) |
| Plasmodium falciparum (malaria) |
23
|
|
|
|
Sanger,
TIGR,
Stanford
Nature 419(6906):498-511 (3 Oct 2002). |
| Leishmania major |
33.6
|
|
|
Sanger | |
| Candida albicans (yeast) |
~16
|
|
|
|
Stanford, Sanger, U. Minnesota, Incyte |
| Trypanosoma brucei |
14.2
|
|
|
TIGR, Sanger, Cambridge | |
| Pneumocystis carinii (pneumonia) |
7.7
|
Sanger | |||
| Aspergillus fumigatus |
~33
|
|
Sanger; Aspergillus web site | ||
| Aspergillus nidulans |
31
|
|
Aspergillus Genomics; U. Oklahoma | ||
| Dictyostelium discoideum |
34
|
|
|
Sanger | |
| Babesia bovis |
9.4
|
Sanger | |||
| Plasmodium chabaudi |
~30
|
Sanger | |||
| Drosophila melanogaster (fruit fly) |
~120
|
|
|
|
Celera Genomics;
Science 287:2185-95(24 Mar 2000) |
| Homo sapiens (man) |
3213
|
|
~42,000 |
|
NCBI,
Celera
Genomics
Sequencing ~ complete June 2000 Papers released 11 Feb 2001 |
| Oryza sativa ssp. japonica (rice) |
430
|
|
|
Monsanto, University of Washington,
IRGSP |
|
| Mus musculus (mouse) |
2600
|
|
|
Whitehead/MIT,
Celera
Genomics;
Science now 427:2 (2001) |
|
|
|
Public sequence, unassembled
Nature 411:121 (2001) |
||||
| Apis mellifera (honey bee) |
|
Baylor; NHGRIHoney Bee Genome Assembled (6x coverage); S.D. Honey Bee Genome Assembled | |||
| Rattus norvegicus (rat) | ~2750 |
|
28,51622 |
|
Baylor, Celera
Genomics, Genome Therapeutics
Science 305:3 (5 Mar 2001) Nature 428(6982):493-521 (1 Apr 2004). |
| Fugu rubripes (pufferfish) |
|
|
|
|
Fugu Project; Fugu
Genome Page
Science 297:1301-10 (23 Aug 2002). |
| Anopheles gambiae (mosquito) |
|
|
|
|
Science 306:3
(6 Mar 2001); R. A. Holt et al.; Science298:129-149
(4 Oct 2002);
The Scientist article (3 Oct 2002). |
| Cryptosporidium parvum |
~9
|
|
|
Science Daily article
(31 Mar 2004);
Science304:441-5 (16 Apr 2004) |
|
| Cryptosporidium hominis |
9.2
|
|
|
Science Daily article
(2 Nov 2004);
Nature431:1107-12 (28 Oct 2004) |
|
| Brachydanio rerio (Zebrafish) | ~1500 |
|
Sanger, Max-Planck-Institut für Entwicklungsbiologie, Harvard; U. Oregon; Zebrafish Genome Initiative | ||
| Phanerochaete chrysosporium
(white rot fungus) |
|
|
|
|
JGI (4 May 2004). Science Daily article; Press release; Nat Biotechnol. Epub 2004 May 02 |
| Gallus gallus (chicken) |
|
|
|
Sanger, Iowa State; Washington University; Press Release; White Paper; Chicken Genome Assembled; Nature article (8 Dec 2004); | |
| Bos Taurus (cow) |
|
Cow Databases | |||
| Oreochromis niloticus (Tilapia [fish]) | Tilapia Genome Project | ||||
| Musa acuminata calcutta 4
(Banana) |
~600
|
|
Global Musa Genomics Consortium
INIBAP PROMUSA; BBC article Future Harvest article Science article |
||
| Neurospora crassa
(bread mold) |
~40
|
|
|
|
Nature 422(6934):859-68
(24 Apr 2003);
MIT Press Release (26 Sep 2000). |
| Pan troglodytes (chimpanzee) | ~3100 |
draft |
|
RIKEN's Genomic Sciences Center with centers from Germany, China, Taiwan,
and Korea (Science note)
Chimp Genome Assembled by Sequencing Centers (4x coverage); Science Daily article BCM Chimpanzee Genomic Analysis |
|
| Canis familiaris (dog) | ~2500 |
|
|
New Scientist ns99994682
(16 Feb 2004);
Dog Genome Assembled (7x coverage); Science Daily article. |
|
| Phytophthora ramorum
(sudden oak death) |
65
|
|
|
|
New Scientist ns99995102 (11 Jun 2004). |
| Phytophthora sojae | |||||
| Phanerochaete chrysosporium
(white rot fungus) |
~30
|
||||
| Xenopus tropicalis | ~1700 |
|
White Paper | ||
| Macropus eugenii
(tammar wallaby) |
~3000 |
|
Kangaroo Hops in Line for Genome Sequencing (2x coverage); White Paper | ||
| Monodelphis domestica
(short-tailed opossum) |
|
White Paper | |||
| Coffee |
|
draft(?) |
|
|
Non-public sequence reported by São Paulo Research Assistance Foundation (Fapesp) and the Genetic Resources and Biotechnology Center (Cenargen) of the Brazilian Agricultural Research Company (Embrapa) in Brazil. |
| Trichomonas vaginalis |
|
~100%
(draft) |
? | ? |
Science Daily article |
| Entamoeba histolytica |
|
~100%
(draft) |
~10,000 | Nature 433:865-868
(24 Feb 2005)
Pathema Database; Science Daily article |
|
| Zea mays (corn) |
|
|
|
|
http://www.maizeseq.org/ |
| Fusarium graminearum
(Wheat Scab Fungus) |
|
|
|
|
@ Broad
Institute
@USDA |
| Magnaporthe girsea
(Rice Blast Fungus) |
|
|
|
|
N.C.S.U., Broad Institute
Science Daily article |
| Mycosphaerella graminicola
(Wheat leaf spootting fungus) |
|
|
|
|
Science Daily article
USDA ARS announcement |
Progress on the Human Genome is increasing and at the current time the
fraction of the Human Genome that has been sequenced is 24.7%
(+66.2% in draft form, as of Sept. 18, 2000). This data is available
in public databases. An additional 85,000+ gene sequence clusters
from cDNA's representing ~100 Mbases (3% of human genome) is present in
public and private database (Unigene and comments made by Incyte and HGS
at public conferences). In 1996, only ~20 MBases/year of large scale
sequencing capacity was directed towards the HGP. This has lead the
Wellcome
Trust to commit to providing the funding to increase the capacity of
the Sanger Center to allow it to
sequence 1/6th of the Human Genome (500 MBases) which would require a minimum
capacity of 150 MBases / year (assuming a coverage of 2). The entry
of Celera Genomics in late 1998 caused
the NIH to significantly accelerate genome sequencing in 1999, so estimates
for the completion of the genome are now in the 2001-2002 time period.
In September of 1999, Incyte announced
that it predicted the number of human genes would be between
129,769 and 142,634.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The requirements for sequencing capacity for the HGP underestimate the overall requirements for sequencing capacity. There are projects, initial efforts, plans or good scientific reasons for doing the genomes of the 50 major human disease causing organisms (500+ Mbp), Drosophila (~120 Mbp of coding sequence), Mouse (3 Gbp) and Zebrafish (1.7 Gbp).
In agriculture, mapping or sequencing projects are in progress for Cattle (3 Gbp), Pig (2.7 Gbp), Sheep, Chicken (1.2 Gbp), Arabidopsis (100 Mbp), Maize (5 Gbp), Rice (400 Mbp), Barley, Bean, Cotton, Lettuce, Mushroom, Pine, Rye, Sorghum, Sugarcane, Wheat, Apple, Brassicas, Cabbage, Clover, Cucumber, Grass, Lilium, Pea, Peach, Rye, Snapdragon, Spruce, Tobacco, Tomato, Alfalfa, Almond, Asparagus, Berry, Carrot, Celery, Chrysanthemum, Citrus, Clover, Cocoa, Cotton, Cucumber, Cuphea, Lentil, Oat, Onion, Papaya, Pea, Peach, Peanut, Pear, Pepper, Pine, Plum, Poplar, Potato, Rice, Rose, Soybean, Spruce, Squash, Sugarcane, Sunflower and Turf Grass.
The sequencing requirements for these organisms should exceed 200 billion base pairs!
The Stanford Genome Center, SW Medical Center, and the Whitehead Institute have devoted significant resources to the developing highly automated high throughput methods for genome mapping and sequencing. The Stanford Genome Center intends to make available in October of 1997, the parts lists, plans and diagrams for the automated machines which have been developed for genome sequencing (www-sghc.stanford.edu). Estimates from Stanford, place the current cost for finished DNA sequence information at around $.25 per base by the year 2000, implying that Genome Centers which are sequencing 10 MB/year have budgets in excess of $2.5 million per year.
NIH grants in 1997 were targeting a cost of $.05 per base for genome sequencing. These may not have taken into account the throughput increases that would be allowed by newer capillary sequencing machines.
In understanding sequencing, it is necessary to recognize that the bacterial and yeast genome sequencing which has been done thus far has emphasized shotgun sequencing methods. This approach requires that the genome be randomly fragmented, sequenced to high redundancy and assembled by computer sequence matching of overlapping fragments. Genomes sequenced in this way require raw sequence data from 5 to 10 times greater than the actual genome size. The lowest redundancy levels reached thus far have been around 2.5x genome size by the University of Heidelberg on M. pneumoniae primarily using primer walking sequencing methods. Current estimates from Stanford would place human genome sequencing requirements at 3-4x the genome size.
Table 3 lists the existing large scale genome
centers and their current or planned participation in large scale sequencing
efforts and the Human Genome Project.
|
|
|
|
|---|---|---|
| Sanger Center, U.K. | 37 ABI 373's, 35: ABI 377's = ~4 Mbp/day (raw) (1997)
1/6th of Human Genome (planned) |
Yeast,
C. elegans, Human (Chr. 22, X, 6, 1, 20, 13) |
| Washington University, St. Louis, MO | 8/97 capacity ~3.5 Mbp/month (finished),
1 Chomosome+ |
C. elegans, ESTs,
Human (Chr. 2, 7 + 3, 5, 8 ,12, 16, 22, 14, X partial) |
| DOE Genome Center (LLNL, LBL, LANL), Foster City, CA | 3 Chromosomes, center currently under construction. | Human (Chr. 21, 19, 16, 5), Drosophila |
| TIGR, MD | 21 ABI 373's, 6 ABI 377's (11/1995)
6 ABI 373's, 6 ABI 377's devoted to Chr. 16p in 1996 |
ESTs, Microbial genomes
Human (Chr. 16p) |
| Stanford Human Genome Center, Stanford University, CA | 1 Chromosome,
5-10 Mbp/year in 1997, 17.5 Mbp 1997-1999. |
|
| SW Medical Center, Univ. Texas, TX | 1 Chromosome, 10 Mbp anticipated 1/8/97 to 1/8/98. | Human (Chr. 11, 15) |
| Whitehead Institute, MIT, MA | 158 capillary based sequencers (9/26/2000), capacity 135,000 sequencing reactions (~70+MBP)/day | Mouse |
| University of Oklahoma; OK | 1 Chromosome | Microbial genomes,
Human (Chr. 22, 7) |
| Baylor College, TX | 2 Chromosomes | Human (Chr. X, 8) |
| University of Washington, WA | 1 Chromosome | Human (Chr. 7) |
| University of Texas, San Antonio, TX | 2 Chromosomes | Human (Chr. 3 and 8) |
| Columbia University, NY | 2 Chromosomes | Human (Chr 13 and 1) |
| Albert Einstein/Yale, CN | 1 Chromosome | Human (Chr. 12) |
| University of Pennsylvania, PA | 1 Chromosome | Human (Chr. 22); NCHGR Chr22 |
| University College London | 2 Chromosomes | Human (Chr. 9 and Y) |
| JSC Corp. | Portions of 3 Chromosomes | Human (Chr. 3, 6, 21) |
There are several companies which have large amounts of sequencing capacity
but which are not actively involved in the Human Genome Project as shown
in Table 5.
| Company | Sequencing Capacity | Interest Areas |
|---|---|---|
| Celera Genomics | 1999: 200+ ABI 3700's | Drosophila Genome, Human Genome, Rice Genome |
| Human Genome Sciences | Similar to Incyte | ESTs |
| Incyte | 1997: 68 ABI 377's. | ESTs |
| Genome Therapeutics | Much less than Incyte | Human (Chr. 10) |
| Millenium | Much less than Incyte | |
| Sequana | Much less than Incyte | asthma, obesity, osteoporosis, diabetes, schizophrenia, manic depression |
At the current time these companies are focused primarily on human diseases
with some emphasis on agricultural genomes. It is not expected, with
the exception of Genome Therapeutics that they will play a major role in
genome sequencing at this time.