Protein Based Assembly of Nanoscale Parts

Version 0.99
October 11, 2004
Robert J. Bradbury
© July 2001-2004, All Rights Reserved

Contents

Abstract

The development of multiple disciplines such as retrosynthetic chemical analysis, microfluidics, whole genome engineering, and protein and enzyme design are reaching the stage of development where the design and production of assembly lines for nanoscale parts seems conceivable.   This paper discusses the development of these disciplines and how their convergence may be applied to accelerate the production of nanoscale parts.  Utilizing existing and near-term technologies, a strategy for the production of a single assembly line for manufacturing a currently designed nanoscale machine part, containing ~2600 atoms, is estimated at ~$6 million.  The developments necessary over the next decade to produce the nanopart assembly lines for the parts in programmable nanoassembly system, containing 6-8 million atoms, for an estimated $200,000 are examined.  Continued developments suggest that within 20 years the design of assembly lines for the parts required for manufacturing nanorobots containing billions of atoms for less than $1 million is feasible.

Introduction

One of the largest barriers facing the full vision of molecular nanotechnology is the manufacture of a programmable nano-assembler (PNA) as described by Eric Drexler [Dre92] or Ralph Merkle [Mer97, Mer99a].  To avoid possible misunderstandings, it will be clearly stated that a PNA is a nanoscale machine designed to reliably position and bond atoms or small molecules.  Its primary use is in the assembly of stiff nanoscale parts that cannot easily be manufactured by other means.  It is in no way, shape or form a self-replicating system.  Its most productive use would be in the highly parallel assembly-lines as envisioned by Drexler and Hall [Dre92, Chap 14; Hal99].

A critical feature a PNA must possess to precisely assemble atoms or small molecules is stiffness -- "In designing machinery for molecular manufacturing, stiffness is a central concern." [Dre92, pg 448].  Unfortunately, the paths from wet (generally not-stiff) solution-phase organic chemistry and/or biotechnology to the dry (stiff) mechanosynthetic chemistry envisioned by Drexler & Merkle are not particularly clear to most observers.  Drexler [Dre92, pg 471] provided an outline of a 4 stage development path for increasingly stiff assemblers but it appears, to the author, that there has been little progress along that path to date.  As pointed out by Merkle [Mer99b], one of the stumbling blocks is that the complex molecules that we can currently manufacture, DNA, RNA & proteins, are polymers of subunits joined to each other at only two points.  To get the structural stiffness in the parts required to build a PNA we will most likely want to be able to position and strongly bond  atoms, or larger molecular groups, in 3 dimensions1.  Do we know of any ways to do this?  Yes.  As pointed out by Drexler [Dre81], protein engineering may provide the path to the "second generation of molecular machinery whose components would not be coiled hydrated polypeptide chains but compact structures having three-dimensional covalent bonding".  Subsequently he suggested [Dre94], the simultaneous use of multiple stabilization techniques in "designer" proteins, including chemical cross-linking, could produce molecular parts of intermediate stiffness that would move us from Stage 1 to Stage 2 assemblers.  Methods for stiffening protein derived nanoparts were also discussed in [Dre92, Sect. 15.3].   A review of the progress made during the 90's [Fre99, Chap. 2], shows that for the most part stiff molecules which we can synthesize, such as pagodane or calixarene, are still at a very primitive level (a few dozen atoms).   The largest stiff molecules that we can assemble are buckytubes, but we are using bulk manufacturing processes, not directed molecular assembly, to do this.  Scientists are actively engineering proteins, particularly enzymes, but this is typically done for industrial biocatalysis applications [e.g. Cha98], not for the assembly of the parts needed for a PNA.  Zyvex is pursuing the development of PNA nanoparts via the dry mechanosynthetic path based on atomic force microscopes but this is generally acknowledged to be a very difficult approach.  In 1998, Freitas described a novel path to first generation molecular assemblers [Fre98] and extended this by proposing mechanoenzymes might provide the means of implementing this path [Fre00a].  This paper significantly extends these ideas by showing that small molecule and enzyme design and the bioengineering of microorganisms have advanced sufficiently that we may envision combining small "designer" molecules with larger "designer" enzymes to provide a clear path for the manufacture of PNA parts and their assembly.

One of the difficulties of developing realistic strategies to build a PNA is the lack of a widespread realistic comprehension of the size scale of the problem.  As Figure 1 documents, a PNA is significantly larger than the biological assembler, the ribosome.  PNAs have been estimated to require 3-4,000,000 atoms.  It has been estimated a functional PNA with support structures and interfaces to the external systems providing it with power and instructions could require ~8,000,000 atoms [Fre99, pg 65]2.  An example of a part that could be used as a small subcomponent in a PNA is a fine-motion controller with ~2600 atoms, weighing ~30,000 Daltons, approximately the same mass as an average cellular protein.  Ribosomes, the biological assemblers of proteins, have masses ranging from ~2,700,000 in prokaryotes to ~4,200,000 in eukaryotes [EMB94].  This would equate to ~200,000 to ~300,000 atoms.  Because the biological assembler was developed with virtually no intelligence or sophisticated computer-aided design tools we may suppose that the design of a PNA is not beyond the grasp of the combined capabilities of nanoengineers and software assistants.  The fact that single individuals can design nanoparts, such as the fine-motion controller, with 1/1000th - 1/1600th of the atoms of a PNA, in a few person-months [Dre01], suggests that ~150-260 person-years would be required to the design the parts contained in a PNA.  But because each PNA contains many identical or highly similar parts (e.g. toroidal worm drives, intersegment bearings, drive gears and shafts) that only need to be designed once, suggests that its design time requirements may reduced by 5-10×.  Thus, teams of 20-50 design engineers could design one assembly of parts as complex as a PNA each year (using only the software tools and computing power available in the early 1990s(!)).  One may expect that computer-aided-nanopart-design, or evolved and selected designs  [see Nano@Home], would reduce required amount of human thought time -- the expensive resource -- still further.  Arguments along these lines lead to the conclusion that the "design" of PNAs containing millions of atoms, and eventually nanobots containing billions of atoms, is potentially within our grasp.
 
 

Figure 1. Nanoscale Part Size Comparison
Image of Drexler's Nanoassembler arm in relative size with (a) the Ribosome [the biological assembler] and (b) the Fine-motion controller [a small subset of the components that would be required for a full nanoassembly capabilities].

Nanoassembler arm image is derived from that found in Nanosystems © Copyright 1992 John Wiley & Sons, Inc. New York, 1992.  Fine-motion controller image is © Copyright Institute for Molecular Manufacturing.  Ribosome image may be © PDB, December, 2000.

So if you can design nanoparts and assemblies of nanoparts -- can you construct them?  Maybe.  The first problem to be solved is, "How does one assemble the complex parts such as the fine-motion controller?".  Solution-based self-assembly, used by multi-component cellular complexes such as the ribosome or the proteosome, seems difficult because you not only have to design the entire part and its subcomponents, but you have to design it to put itself together.  That additional complexity makes the problem much harder.  And it doesn't solve the problem that we really need a large number of covalent bonds in the assembled part.  So we have to be more clever.  To bootstrap the assembly of nanoparts we may have to do precisely what nature does -- build tools that assist in the creation of complex covalently bonded structures.  There are two possible paths for this we may consider.

One path would require that chemists prepare molecules that would react and bond as desired if their positions and orientations are precisely controlled (e.g. controlled position based assembly).  Another path would require that enzymologists design enzymes that can grip molecules, prepare them for the desired reactions and approximately position them such that they have reasonably high probabilities of reacting in the desired fashion (enzymatic assisted assembly).  So there may be a tradeoff in the positional accuracy requirements vs. the functional complexity requirements in the design of  tools to assemble nanoparts.

Now it becomes appropriate to ask how wide is the divide between where we are and where we would like to be.

State of the Art in Nanoscale Parts

Some of the most complex nanoscale parts currently manufactured (by humans) are those used in molecular electronic devices such as the ring-on-a-stick structure of rotaxane [Bro01] and the chain-mail structure of catenane [Col00].  Another molecular switch, pioneered by CALMEC is chiropticene which requires the movement of electronic charges and a shift in bonds  instead of movements by molecules to change the state of the memory element.  A cursory examination of these nanoscale parts reveals that our current capabilities using organic chemistry synthesis methods are 1-2 orders of magnitude less capable than we need to assemble nanoparts like the fine-motion controller (i.e. they contain tens to hundreds of atoms rather than thousands).  However, as discussed in Appendix A, organic chemists can synthesize non-polymeric molecules weighing up to several thousand Daltons when sufficiently motivated.
 
Figure 2. Molecular Electronics Parts
[2] rotaxane
catenane
chiropticene

One point of interest regarding rotaxane and catenane is that they involve threading one molecule through another molecule!  This form of complex chemical synthesis is very difficult to accomplish using solution chemistry where thermally induced motion causes the molecules to assume random orientations with regard to one another.  How can we solve this problem?  Chemists utilize tricks like adding hydrophobic rings to the molecules which will be attracted to each other in a polar solution like water.  Another possible solution is to utilize molecular "grippers" engineered into proteins to "catch" the precursors of the final molecules in the solution and then bring them into correct orientation for assembly into the final part.  Merkle [Mer99] suggested that the positional control of the building blocks would allow the extension of the normal reactions found in biological systems to include free-radical chemistry. It is becoming clearer that natural enzymes make use of these processes [Hol00].  Thus one can imagine a multi-functional protein being engineered with the following properties:

  1. Unfolded, it grabs onto molecular building blocks, carefully keeping them separate from each other to avoid nonspecific reactions;
  2. using a specific enzyme catalytic sites near the bound building blocks, it "activates" the molecules (perhaps producing one or more free radicals [Gui99, Mar00, Fre00, Ehr01, Fir01]);
  3. induced folding brings the building blocks into relatively precise alignment allowing the desired chemical reaction(s) to occur;
  4. the protein is induced to unfold, releasing the final product.
Thus one has a way of threading one molecule through another molecule producing interlinked structures such as those outlined above without the need of adding the hydrophobic ring structures.  So we may envision designing proteins that would assemble very small molecular nanoscale parts (with a molecular weight < 1000 Daltons).  It is useful to note that scientists have demonstrated that DNA may be engineered to accomplish this type of mechanically directed assembly as well [Mao99, Yan02, Gar02].

An alternative, suggested by Rafal Smigrodzki, would be to engineer gripper molecules, such as proteins, which are designed to self-assemble in a precise way.  This behavior is similar to the assembly of multimeric enzyme complexes or the assembly of viral capsids.  The nanopart subcomponents would be positioned at precise locations on these molecules most likely through designing small pockets or grabbing moeities that precisely hold and orient the subcomponent.  The self-assembly properties of the gripper molecules, would precisely position the subcomponents with respect to each other allowing them to bond into a larger nanopart subcomponent.  Heating the multi-gripper assembled-subcomponent complex would result in its disassembly freeing the assembeld-subcomponent nanopart.  Initial steps towards designed self-assembling proteins have already been demonstrated [Pad01].

Protein-Directed Part Assembly

Proteins are long chains of amino acids that fold up to form complex three dimensional structures.  Enzymes are a subset of proteins that are capable of grabbing small molecular "parts" and performing chemical reactions between them.  Generally, in natural enzymes, as the folding process occurs, atoms located on different amino acids are brought into positions with respect to one another to allow the protein to accomplish the "grabbing" and "reacting" actions.  Scientists are rapidly determining the 3-D structures of many proteins to understand completely how they accomplish this "magic" [PDBCG01].

What is the range of capabilities that enzymes possess?  Well the simplest capability may be that of adding single atom or a small group of atoms (e.g. hydrogenases (EC 1.12.-.-, oxygenases (EC  1.13.-.- ), methyltransferases (EC 2.1.1.- ), phosphotransferases (EC 2.7.-.-) and sulfurtransferases (EC 2.8.1.-)) while more complex enzymes connect larger atomic assemblies (e.g. nucleotidyltransferases (EC 2.7.7.- ) and ligases (EC 6.-.-.-)).  Enzymes are known that mediate the transfer of atoms ranging from hydrogen (MW: 1 Dalton (D); e.g. hydrogen dehydrogenase from Rhodococcus opacus) to tungsten (MW: 183D, e.g. aldehyde ferredoxin oxidoreductase from Pyrococcus furiosus)3.  The most frequently manipulated atoms range from carbon (MW: 12D) to zinc (MW: 65D).  More complex molecular groups are also manipulated by enzymes, including amino acids (MW: 89-204D),  and DNA and RNA bases (MW: 111-151D).   Most enzymes in cells are involved in manipulating small atoms and molecules, MW < ~250D.  There are enzymes however that are involved in manufacturing complex covalently bound molecules, such as vitamins, enzyme cofactors, antibiotics, toxins, etc.  These are discussed further in Appendix A.   These molecules may have masses up to several thousand Daltons.  Molecules larger than this, such as the proteins themselves, messenger RNA and DNA are manipulated by complex machinery utilizing multiple proteins and other cofactors such as transfer RNA.  These include the spliceosome, the ribosome, the proteosome and the DNA replication complex.

Natural proteins are built from only 20 amino acids.  But these are a small subset of the amino acids that are known.  Various chemical suppliers sell dozens of synthetic amino acids and variants  [Bio01, Chi01, Pre99, Syn01].   The incorporation of artificial amino acids into proteins is reviewed in [Dem98] and [And99] discusses the construction or protein helices using mixtures of natural and artificial amino acids.  From our knowledge of the 3-D structure required to "grab" specific molecules or catalyze reactions, we can envision designing and manufacturing synthetic amino acids with the "gripping" or "reacting" capabilities similar to those found in natural enzymes.  These "grips" may be used to hold onto single atoms, small molecules like nitrogen (N2) or methane (CH4) or even more complex molecules like amino acids, DNA bases, etc.  The flexible amino acid backbone of proteins may then be envisioned as a robotic arm, except it has more degrees of freedom than commonly manufactured robotic arms and its size is measured in nanometers.  By designing synthetic enzymes consisting of synthetic amino acids, we can envision grabbing molecular parts in a solution and then, as the enzyme folds, bring them into proper alignment and cause them to react.  This is Protein-Directed Part Assembly.  Proteins often use metal ions such as Zn++, Ca++ or Mg++ to bring together different parts of the protein.  Proteins can be designed such that they may fold differently when a succession of metal ions is made available.  This could be considered Ion-Directed Part Assembly.  If the proteins were in microfluidic devices [Des97, Bok98], the ions could be pumped into the chambers where the proteins were located.  If the proteins were within microorganisms, various external signaling molecules in the solution could activate pumps built into the cell membrane that pumped specific ions in or out of the cell.

Having the flexible architecture of proteins allows you to take advantage of assembly strategies that are not used by biological systems.  For example, one can design an artificial enzyme such that when heated, it unfolds making the grippers available to the solution.  The grippers then grab simple subcomponents that have been synthesized via normal chemical synthesis methods.  Cooling the enzyme will allow it to refold in a precise manner bringing the parts into correct orientation with each other such that natural reactions occur between the pre-prepared parts or catalyzed reactions enabled by precise positioning of catalytic "reactors".  Heating the enzyme a second time will allow it to disgorge the newly synthesized nanopart.  Proteins such as this that manage chemical reactions by positioning parts and not by directly influencing the reaction may be termed "robozymes".  Some people have objected to biologically based nanotechnology because they consider the flexibility of enzymes to be a handicap.  Here we can see how it is possible to turn that feature into an advantage.

We are also not limited to using enzymes to form the covalent bonds required in nanoparts.  The development of RNA-based ribozymes and even DNA-zymes is ongoing [Hag96, War00, Bre97] and there may be reactions for which they are better suited than proteins.

Extending the Genetic Code

One might argue that there will be nanoparts for which no grippers exist, or that producing grippers for specific nanoparts may be too expensive.  This could constrain Protein-Directed Part Assembly to a small fraction of the operations space to which it might be applied.  In this section we will argue that academic and commercial trends in biotechnology are providing a robust foundation for the rearchitecting of microorganisms to produce first the enzymes and later the assembly lines required to assemble complex nanoparts, even if such assembly requires heretofore nonexistent capabilities.

Over the next decade we can expect the development of "whole genome engineering" [Bra00].  How may this be used?  One application that springs to mind is "designer" bacteria to improve human health.  Currently humans obtain only a fraction of their vitamin requirements, primarily B-12 and K from bacteria in their intestines.  Significant benefits could be derived from bacteria engineered to produce greater quantities of and an expanded variety of vitamins   These include reductions in birth defects [Moy01] and cancer [Ame01] as well as the long-term expenses associated multi-vitamin supplements that are currently taken to avoid such problems.  If one were to develop such bacteria, it would be highly attractive to include a means to promote their retention in the body when antibiotics are used to treat infections of pathogenic bacteria.  So slight modifications to the structures of antibiotic targets (the ribosome, the cell wall manufacturing system, etc.) could be engineered into these "designer" bacteria to make them tolerant of current antibiotic therapies.  But it is necessary to ensure that the instructions in the genes of these organisms that allow them to resist the antibiotics not be allowed to escape into the pathogenic organisms.  Further it is desirable to ensure that any antibiotic resistance that might develop in the pathogenic microorganisms, something that is a growing problem [Ell99, Ric01, Ryb01, Clo01], be untransferable to engineered microorganisms.  One of the most straight forward ways of creating such a "firewall" between the natural bacterial world and the "designer" bacteria is to change the genetic code of the "designer" bacteria.  Such a genetic code would be able to manufacture the same proteins found in bacteria but would contain no useful coding information if it jumped the firewall into another organism.  Similarly, code jumping into the engineered microorganism would be non-functional.  The net result of this approach would be to slowly back us away from the "green goo" situation that currently exists in Nature.

As Table 1 (A) shows, the genetic code is highly redundant.  Only two of the amino acids, Methionine and Tryptophan have unique mappings from a single messenger RNA codon triplet.  Once whole genome engineering is feasible, there is no reason that the redundancy be retained.  Table 1 (B) shows one example of an alternate genetic code.  This code is designed to swap acidic and alkalinic amino acids and exchanges large amino acids with small amino acids.  A side effect of this approach is to change many hydrophilic side-chains that one expects to find on the exterior surfaces of protein into hydrophobic residues that one expects to find on the interior of proteins.  This causes proteins assembled from naturally coded DNA will either not fold at all or if it folds will produce non-functional enzymes. Alternate genetic codes could certainly be designed by computer simulations of the substitutions that would cause the greatest disruption to known protein structures.  The alternate code makes available 43 empty slots that can be utilized to extend the genetic code with codes for synthetic amino acids.  Some amino acids such as Cysteine and Proline contribute important properties to protein structure that would make their removal from the genetic code difficult.  Others such as Leucine and Isoleucine have similar properties that might allow one to substitute for another in the genetic code.  If this proves feasible, then several additional slots could be made available.
 

Table 1: Compression of the Genetic Code
A: Old Genetic Code
1st
nucleotide
2nd nucleotide 3rd
nucleotide
  A C G U  
A
Lys Thr Arg Ile A
Asn Thr Ser Ile C
Lys Thr Arg Met G
Asn Thr Ser Ile U
C
Gln Pro Arg Leu A
His Pro Arg Leu C
Gln Pro Arg Leu G
His Pro Arg Leu U
G
Glu Ala Gly Val A
Asp Ala Gly Val C
Glu Ala Gly Val G
Asp Ala Gly Val U
U
STOP Ser STOP Leu A
Tyr Ser Cys Phe C
STOP Ser Trp Leu G
Tyr Ser Cys Phe U
B: New Genetic Code
1st
nucleotide
2nd nucleotide 3rd
nucleotide
  A C G U  
A
Val Gln Asp Cys A
Leu       C
       Pro G
        U
C
Thr Met   Asn A
Glu       C
        G
        U
G
His  Tyr  Trp Lys A
Arg       C
        G
        U
U
   Phe     A
Ala    Ile Ser C
     Gly   G
      STOP U
A = Adenine, C = Cytosine, G = Guanine, U = Uracil, 

Why would one want to extend the genetic code for synthetic amino acids?  It may be difficult to design grippers for some nanoscale parts using the natural code.  One could imagine multiple genetic codes designed with artificial amino acids precisely tailored towards gripping or manipulating parts with specific features or atoms of specific sizes.  One might have one genetic code optimized for manipulating molecules containing silicon and germanium (for the semiconductor industry) and a completely different code optimized for manipulating molecules containing gallium and arsenic (for the optoelectronics industry).  Furthermore, if one desires to make proteins with higher structural strength that may allow us to approach the 'holy grail' of mechanosynthetic nanoassembly [Dre92, Chap. 8], then the addition of other synthetic amino acids to the code grants humans (and computers) designing stronger-than-natural proteins [Dre94] many greater opportunities for assembly path design and optimization.

It is useful to note that food crops engineered on the basis of an alternate genetic code would eliminate a primary complaint of the neo-luddite "naturalists" who are afraid that cross-pollination by genetically modified crops will eventually fatally pollute the "natural" crops provided by nature (and so carefully bred and selected by humans).  Crops based on a different genetic code cannot swap genes with their "natural" progenitors.  They can however produce the same proteins, cellular structures and additional molecules (flavors, etc.) that are found in our food supply.

What is required to extend the genetic code?

To expand the genetic code it is necessary to develop a transfer RNA (tRNA) that matches a new codon and an aminoacyl-tRNA synthetase (AtS) enzyme that can bind the novel amino acid to the tRNA.  The ribosome can then insert the novel amino acid into the protein.  These enzymes, as a class, are one of the most studied in biology [Beu99].  Crystal structures for 14 of the 20 synthetases are known [Bra01].  Groups have altered the specificity of a synthetase [Ago98], have a detailed understanding of how the synthetase recognizes both the amino acid [Ser01] and the anti-codon on the tRNA [Sek01].  They are determining which synthetic amino acids are acceptable to ribosomes [Kii01].  Several groups are even developing organisms with altered genetic codes [Liu99, Sen99, Wan01, Dör01, Wan03, Meh03, And04] .  Current methods utilize genetics and directed evolution to create the required tRNA's and enzymes, but because so much is known about the structure of the tRNA's, amino acids and synthetases, the utilization of computer-aided-design methods for AtS enzymes for novel amino acids seems feasible.

In addition, one must engineer the organism so it can import the novel amino acid from the environment, or  manufacture the amino acid itself.  As the chemical synthesis of amino acids is a $500+ million/yr industry [BCC98], there will certainly be incentives and interest on the part of manufacturers to fund the development of bacteria that contain the enzyme pathways to directly synthesize artificial amino acids and thereby lower manufacturing costs.  Manufacturers would expose shallow pools of such bacteria to sunlight and simply harvest the manufactured products.

Can we go beyond the limit of a code that maps a triplet of 4 bases into 63 amino acids?  As Freitas points out [Fre99, pg 44, based on Fay92], we could expand the code by adding additional nucleotide pairs to the normal A=T(U) and CG) used in DNA (RNA). That would allow the expansion of the genetic code to 63 = 216 codons for 6 base code and 83 = 512 codons for an 8 base code.  However the reengineering of DNA polymerase, RNA polymerase, the DNA repair enzymes, the ribosome and the new nucleotide synthesis and degradation pathways required to accomplish this would not be insignificant.13  A less difficult approach would be to simply increase the number of bases which must be paired between the messenger and the transfer RNAs by the ribosome.  Increasing the 3-base code to a 4-base code would allow 44 = 256 mappings (allowing 255 amino acids), while increasing it to a 5-base code would allow 45 = 1024 mappings (1023 amino acids).  Presumably this might be accomplished with only a small amount of engineering to the ribosome and transfer RNAs.  Admittedly this does decrease the coding density in DNA but this is likely to be more than offset by a significant increase in the phase space of proteins that can be manufactured.  An expanded discussion of the size of the synthesis phase space for engineered genetic codes is in [Bra01d].

It is worth noting that one can find at least two natural examples of "extended" genetic codes.  The first example is the use of the UGA (opal) stop codon  to code for the insertion of the modified amino-acid selenocysteine (the 21st amino acid) in both prokaryotes [Zin87] and eukaryotes [Sta87, Lei88].  Interestingly, the mechanism for this extension in eukaryotes relies on a SECIS element in the upstream 3' untranslated region of the mRNA and effects all UGA codons in the mRNA while in prokaryotes the SECIS elements are immediately downstream from specific UGA codons and only impacts individual codon mappings [Tuj00].  This suggests two possible methods for extending the genetic code without a need for expanding the code from DNA base triplets to quadruplets.  The first would be to add elongation factors to the ribosome that significantly change the mapping (of triplets into a specific amino acid) for an entire mRNA (the eukaryotic approach) and the second would be to modify the translational behavior of a ribosome complex based on the local molecular mRNA environment (the prokaryotic approach).

The second example of an "extended" genetic code is found in Methanosarcina barkeri which uses the UAG (amber) stop codon to code for the novel (22nd) amino acid pyrrolysine [Sri02].

Protein and Enzyme Engineering

Over the last decade there has been a significant growth in the number of DNA sequences in databases known to produce proteins [Gen01].  We now have the code for more than 100,000 protein sequences.  Protein 3D structural analysis and enzyme functional analysis are following this growth curve.  There are over 14,000 protein structures currently available [PDBCG01].  If historic growth rates were to continue, the PDB should contain 100,000 structures by 2009.  But an increasing number of X-ray sources could allow crystal structure analysis to reach ~30,000/yr if the fraction of crystallography stations increases from 40% of the beams available in Y2000 to 50% of those expected to be available by 2003 [Pat01].  Efforts to determine protein structure and function have reached the level of commercial activity [Tho01, Ste01, Syr01].

The first demonstrations that protein design was feasible were led by William DeGrado of DuPont in the late 1980s [Reg88].  Progress since then may be reviewed in [DeG89, Bry95, DeG99].  To test their understanding of protein structure and function scientists have developed protein and enzyme engineering [Hah90, Han92, Get92, Cho98, Bry98, Woo00, Pas01, Bak03]. There are number of laboratories who have these disciplines as their primary focus [PDL01]. The problem of predicting the structure of folded proteins is yielding increasingly accurate results [Pil01] and projects such as Folding@Home [Shi00, Pan01] and IBM's Blue Gene Project [All01] are likely to finally solve this problem.  Though early efforts at designing enzymes based on peptide mimetics were unsuccessful, methods involving catalytic antibodies and reengineered enzymes have been successful [Cor96].  Catalytic antibodies have been developed that can perform reactions that are unachievable by normal chemical methods [Hil00].  Scientists can take pre-existing enzymes and engineer both where the enzymes act [Wen94, Pom98] as well as what they act on [Sor97, Rot01], can engineer how they fold [Nau01] and control their stability [Wak94, Col97, Pet99, Leh00, Che01].  Textbooks have been written on enzyme catalysis and structure [Bra99, Fer99, Sil99, Cop00, Les01] and frequent conferences focus on protein and enzyme engineering [EEC, Jar98].  In part because of recent demonstrations of the feasibility of automated protein design [May96, Dah97, Jia97, Coo00, Bol01], scientists now feel that the de novo design of enzymes seems feasible [Far01].  In situations where enzyme engineering alone is insufficient, chemical alterations may be used to modify enzyme properties [DeS99].  The discipline of protein and enzyme engineering has become sufficiently robust that commercial firms can obtain funding for abilities such as the design of novel Zinc Finger proteins for gene regulation [Sangamo BioSciences (SGMO)] and the evolution of enzymes with superior catalytic properties [Maxygen (MAXY)].

It is worth noting the fraction of genes identified in the sequenced genomes that are involved in metabolism: Yeast: 1062 (~17% of the total), Drosophila: ~1900 (~13%), C. elegans: ~2000 (10%), Human: ~3200 (10%), Mustard weed (Arabadopsis): ~4600 (18%) [Bra01c]. These genes have been organized into families by the PRINTS [Att00], PFAM [Bat00] and PROSITE [Hof99] databases which in turn are being further integrated by databases such as InterPro [Apw00] and MetaFam [Sil01].  The New Folds data from the Protein Database  shows a declining number of novel folds being entered into the database.  Novel enzyme functions will continue to be discovered as we sequence novel microorganisms, or the organisms with unusual capabilities such as the manufacture of siliceous shells in diatoms or protein-CaCO3 composite shell of abalones, but our discovery of protein structural complexity seems to be approaching a significant fraction of the phase space that has been explored by natural evolution.  This is to be expected because evolution has primarily been using a cut-and-paste approach to increase the complexity of higher level organisms.  While these numbers may increase somewhat, as unknown genes are characterized, it is likely that we currently have in databases the code for much of the enzyme functionality that nature has evolved.  We are even starting to discover cases where evolution produced two separate paths for the same enzymatic function [Gal98].

Where does this lead?  First, robust databases of structural classifications such as SCOP, CATH and FFSP have been developed.  Second, similarity matching methods, as predicted by Holm & Sander [Hol94, Hol96] has been realized in programs such as Dali [Hol96], CE [Shi98], ProSup [Lac00], PartsList [Qia01] and others.  The implications are that continued refinements in these methods will increase the accuracy of protein modeling without the requirement for X-ray crystallography or NMR studies.  Thus one can predict fewer researchers will be involved in reverse engineering the 3D structures of natural organisms and more can become available for the engineering of new structures required for nanopart assembly.

Process Integration

Now the questions become: Let us propose the following analytical framework: Now we can utilize what we know about the nature of the parts that an PNA is composed of to provide estimates of the amount of work required to build one and how much it will cost.  We assume that normal solution phase organic chemistry or slightly more complex combinatorial chemistry [Cza98, Mie99, Sen00] is utilized to build the lowest level building blocks for nanoparts.  For example, Merkle cites adamantane and its many derivatives as a small 10-carbon atom molecule that he envisioned could be used to manufacture larger components [Mer99].  These molecules have molecular weights around 150 Daltons which is similar to that of the building blocks for proteins (amino acids) and RNA (purine and pyrimidine nucleotides bases).  From the perspective of assembling nanoscale parts, one can view enzyme engineering as being equivalent to the design of metal or ceramic "dies" used to cast the parts used in the automobile or aerospace industries. Given the small molecular sub-components, you "pour" them into the die (the enzyme) and what pops out is a finished "part".  Just as in macroscale industry, nanoscale industry may require thousands of "dies" to build something as complex as an automobile.

We can frame the assembly problem for the fine-motion controller (MW: ~30,000 Daltons), one of Drexler'snanoparts, by looking at the tradeoff in the complexity of synthesizing the sub-component building blocks (SBB), with molecular weights from ~50-5000 Daltons, and the complexity of designing and synthesizing the enzymes (with MW from 10,000-100,000 Daltons) to put SBBs together. Table 2 shows the  number of building blocks that must be combined to equal in size a 30,000-Dalton fine-motion controller device and the minimum number of assembly enzymes required in a one-pass linear assembly process.
 

Table 2. Enzyme requirements vs. substrate
(subcomponent building block) size
Molecular
Building Block
MW Sub-component
Building Blocks
Required
Assembly
Enzymes
Required
(comparative size)
(Daltons)
#
minimum number
Amino Acid
      ~150     (avg)
200
199
Cholesterol
  386.73 
77
  76
Taxol
  839.96 
35
  34
Hypothetical SBB 1000  30   29
Vitamin B-12
1355.55 
22
  21

From this table the tradeoff in the building block size and the number of assembly enzymes required can be seen.  One approach could use small common building blocks but this requires the design of many novel enzymes to assemble them.  Each of these enzymes would have to be designed to grip the growing assembly, direct the next small building block to the proper location of the assembly and mediate its catalytic addition.  Another approach would utilize large building blocks that would be manufactured through complex (and low-yielding) chemical synthesis paths that would be assembled by many fewer newly designed enzymes.  As there are many more organic chemists capable of working on the synthesis of complex molecules and many fewer individuals with experience in enzyme design, the fastest path would be to use the highest molecular weight building blocks we can imagine assembling and the fewest number of newly designed enzymes.  Because complex synthesis paths will generally have low yields, one could gradually replace the chemical reactions with the lowest yields with enzymatic reactions that increase the production efficiency of the large building blocks.  For the calculations below, we will assume organic chemists can readily produce SBBs with a MW of ~1000, meaning the number of novel enzymes that must be designed is 29.  In practice, the SBB size chosen will depend on the rate of advancement computer-aided retrosynthesis analysis (discussed below).  As automated retrosynthesis capabilities advance, the size of the building blocks can increase, limited primarily by the yield of the synthesis steps4.

Computer-aided design of chemical synthesis paths has a long history, dating back to the development of LHASA (Logic and Heuristic Applied Synthesis Analysis: description) in 1964 [Cor64, Cor69, Jud85, Jud90, Jud92]. Other efforts have included SECS [Wip77], SYNCHEM [Gel77], PASCOP [Kau81], CAMEO [Jor90], SynTree [Fig91], HOLOWin and functions in WODCA. OSET is an open-source effort to develop computer-aided organic synthesis.  These programs and others have been analyzed in detail by Fick [Fic96].  One of the more interesting members of the group is Hendrickson's SynGen [Hen01, Hen95] which is based on a careful analysis of reaction descriptions [Hen92, Hen90].  The approach used by SynGen -- backward chaining from the product to the reactants using known reactions -- has proven useful in completely different fields such as the study of nucleosynthesis and the abundance of elements in stars [Koc98].  So it may be considered a general method that could be applied to the development of tools for the retroassembly of nanoparts.

SynGen's relevance to this discussion is that it requires only a couple of minutes of computer time on a Macintosh to analyze several thousand synthesis paths and select the least expensive.  Other work in the field is focused on selecting synthesis paths that utilize, or produce as byproducts, the least toxic chemicals.  Given these robust software tools, we will assert that the design of complex chemical synthesis sequences for the production of SBBs with a molecular weights of ~1000 Daltons is a small fraction of the time required to design enzymes where the software tools are much less robust.  We would also assert that the progress that has been made in computer-aided design of molecular synthesis paths will be repeated for enzyme design.  It should however, have a shorter development time (<< 20-30 years) due to the greater availability of fast, inexpensive computer systems and the existence of a growing number of individuals with knowledge bases including programming methodologies, organic chemistry and protein structure.

We will conservatively propose that the design of each assembly line enzyme or robozyme requires ~2 person-years to design and test.  The reasons for this are as follows:

We also assume that the design is the most expensive part of the process (i.e. that the generation of the genes and production of the enzyme are relatively automated and therefore insignificant costs compared with the cost of a human engineer who has to think about the design of the enzyme).  Using these figures we conclude ~58 person-years are required for the effort to design the enzymes needed to assemble a fine-motion controller.  Assuming a salary, materials production and overhead cost of ~$100,000/yr/person, this implies a cost in the vicinity of $5.8 million.  A complete PNA system (the Big Kahoona) has ~3000 times the number of atoms of the fine-motion controller (based on [Fre99]) and would therefore at first glance seem to require design and "die" manufacturing costs of ~$17.8 billion using approximately 90,000 large SBB5.

However, even if the funds were available to fund the design of a PNA system, a problem arises because there are probably only a few hundred people in the world today who are skilled in the discipline of protein design.6   This creates a problem because the time required to design the enzymes to produce the parts for a PNA could be as high as 29 enzymes × 2 yrs/enzyme × 3000 times as many atoms = 174,000 person-years.7  Even if all of the people with the necessary education were working on this project (a very unrealistic assumption) it would still take decades to complete.  Clearly support for the education of such individuals needs to be increased.  Fortunately, there are many chemists who could move into enzyme design with relatively little reeducation8.  The high cost of human labor for protein design and the scarcity of protein designers suggest there is a huge incentive to develop more automated methods for protein design.

The number of enzymes or robozymes required for the production of intermediate sized nanoparts, such as the fine-motion controller, would seem to range from dozens to hundreds.  The genes and regulatory sequences required to manufacture these proteins can easily be added to a bacterial genome using whole genome engineering [Bra00].  Thus we will produce microorganisms, where each organism has as its raison d'être the assembly of a complete nanopart!  These nanoparts could be harvested and subsequently assembled into complete nanosystems such as a PNA using MEMS or NEMS methods (see Figure 6 for more details).

Cost Reduction Analysis

It can be seen from the discussion above that to make sophisticated nanoscale designs affordable that more efficient methods of producing the enzymes, robozymes and assembly lines will be needed.  Cost savings can be realized by exploiting the following: system parts redundancy, improvements in computer aided enzyme or robozyme design, Moore's Law driven increases in computational speed, advances in algorithmic efficiency, the development of optimized computational architectures and the use of offshore labor, as discussed below.

Nanopart Molecular Redundancy

Complex nanosystems may have redundancies at two levels.  The first is within subcomponents at the molecular level when structures are composed of repetitive subunits.  The second is at the subcomponent level where multiple subcomponents are utilized to provide greater strength or enhance robustness or reliability.  The 2808 atom strained-shell sleeve bearing designed by Drexler and Merkle [Dre92, pgs 268 & 296] has molecular level redundancy because the shaft has 34-fold rotational symmetry and the sleeve has 46-fold rotational symmetry.  An optimal approach for the synthesis of such a part could involve 2 possibile strategies:
  1. Two polymerizing enzymes that assemble the majority of the circular shaft and sleeve structures by adding subunits with identical symmetry, an enzyme to complete (seal) the shaft ring structure, a robozyme to insert the completed shaft structure into the incomplete sleeve structure and a final enzyme to complete (seal) the sleeve (5 assembly line components).
  2. Two similar polymerizing enzymes that each assemble 1/2 of the shaft and sleeve structures, an enzyme that bonds shaft halves together and an enzyme that bonds sleeve halves together around a preassembled shaft (4 assembly line components).
Well known enzymes with polymerization capabilities include DNA and RNA polymerase and enzymes that seal molecular structures include DNA ligase and DNA gyrase.  Other novel polymerizing enzymes include Uroporphyrinogen I synthase involved in heme biosynthesis and bacterial proteins such as HAP2(FlgD) that assemble the bacterial flagella by carefully positioning flagellin subunits in a rotary brick-stacking fashion [Yon00].  These examples strongly suggest that such assembly methods are feasible.

The fine-motion controller has only moderate amount of redundancy.  It has sub-components that range from non-redundant (the tip and the shaft which holds the positioning rings in place), to two-fold redundancy (the shaft end-plates and the 2 large positioning rings and attached positioning arms) to 4-fold redundancy (the central positioning rings).  A PNA in contrast is a highly redundant structure consisting of many molecular components that may be nothing more than simple repeats of the same atomic pattern or patterns that may be scaled up/or down in size (for example as the positioning components develop increasingly finer resolution).  The circular outer casings that provide the strong structural stiffness are examples of repeats.  Their diameter is ~30 nm implying a ~94 nm circumference which would translate to a ring of ~610 carbon atoms.  State of the art protein design might require 5-6 years to design a protein such as a functional replacement for the photosynthetic reaction center, but 2nd generation designs may be produced in an order of magnitude less time [Far01].  We extrapolate this experience to suggest that while the initial design of an enzyme to polymerize molecular subunits into curved ring structures or a robozyme to put them together may require a large amount of work, subsequent enzymes performing similar functions are likely to require significantly less design time.

As specified by Freitas [Fre96], a nanorobot designed to supplement the functions of red blood cells (O2 delivery, CO2 removal), known as a respirocyte, contains ~18 billion structural atoms.  Even though its mass is ~2250× greater than a PNA, because it has a 12-fold structural redundancy its design cost is only ~200× greater after allowing for some non-redundant components.  It is useful to note that the assembly lines for many of the nanoparts that nanorobots require such as nanocomputers, fuel cells, molecular sensors, communications receivers and transmitters, etc. (for more detail see [Fre99]) need only be designed once and can be reused in a variety of nanorobots.

Improvements in Computer Aided Enzyme and Robozyme Design

As was seen with the Human Genome Project, an effort up front to develop the correct tools (e.g. high-speed DNA sequencers) resulted in the project being done for less than the projected budget and being largely complete ahead of schedule.  For example, in 1990 DNA sequencing costs were estimated to be ~$10/base [NHGRI98].  By 1997, costs had fallen by 20× to $0.50/base and cost reductions of an additional order of magnitude were being sought to decrease costs to $0.05/base [NHGRI97].  The "correct tools" for biotechnology assisted molecular nanotechnology development are robust software for computer aided enzyme design and inexpensive whole genome engineering.  Using the Human Genome Project as an example, these developments should decrease development time and reduce development costs by 10-100× by shifting increasing amounts of the "human thought" and "process management" required onto the computer.  In particular, the critical component involving the time to design useful proteins drops from years to months to days.

Moore's Law Driven Increases in Computational Speed

The advantage of the automated protein design approach is that it can take advantage of the increasing speed and decreasing cost of computing capacity (Moore's Law [Moo65, Sch96, Int01]).  In agreement with Moore's Law computing capacity has increased 100× every decade since 1970 [Sza01].  We will assume industries interested in maximizing productivity will replace their human protein designers with automated protein design systems as soon as it is cost effective to do so.  In practical terms this probably means the computers get the easy low-level designs while the humans get stuck with the designs that require a non-trivial amount of creativity or higher level systems integration tasks such as macroscale nanofactory production systems or nanorobot designs.  Thus the costs of automated design of nanopart assembly proteins should decrease by 10× by 2006 and 100× by 2011.  There are concerns that the semiconductor development path gets very rough after ~2014 [Nor01], however as pointed out by Meindl et al [Mei01], even at the projected level of silicon semiconductor device capabilities in Y2011 there is still a 5 order of magnitude difference (~25 years of continued progress in Moore's Law) before the limits dictated by physical laws are reached.  Achieving the full potential offered by silicon may require a change from clocked ciruits to unclocked circuits, a process that is well underway [Tri01].  In addition there are other technologies such as Si:Ge chips [Mey00], Ge MOSFET transistors [McK01], GaAs on Si [Wal01], InP [Bol01] and even molecular electronics [CALMEC] that seem capable of stepping up to the plate should progress in silicon based chips stumble.

Advances in Algorithmic Efficiency

Early in a technological development process, neither the methods nor the tools are generally optimal.  Based on previous experience in such areas as sorting [Sza01], DNA and protein sequence homology matching, and molecular dynamics  [Lea02] and protein folding simulations, we can expect the design of the algorithms utilized to design proteins to become increasingly efficient, probably matching the speedups achieved in Moore's Law driven hardware advances.  For example, the use of a hard-sphere bump calculation instead of a 10-iteration minimization of van der Waals energy to determine optimal protein structures is ~105× less expensive computationally [Jai00].  We will conservatively assume that algorithmic efficiency will improve an order of magnitude each decade (in contrast to the 2 orders of magnitude per decade demonstrated through Moore's Law).

Development of Optimized Computational Hardware

History shows that when general purpose computer hardware is insufficient for the task, special purpose hardware will be developed.  Examples of this include the Connection Machine (esp. running UHGROMOS); the nCUBE; Compugen'sBioXL, Timelogic'sDeCypher and the SAMBA Project's DNA & Protein search accelerators; The GRAPE Project (esp. [Hig94, Ito94, Mak02]); GROMACS with HAMM; etc.  So one can expect the machine architectures themselves to be optimized for computer-aided enzyme design, protein folding and molecular modeling.  This process has already begun with IBM's Blue Gene Project [All01]. It is focused on optimizing the CPU-memory and proccessor communications architecture to allow a 100× speedup in the applications such as protein folding simulations by 2005 [All01].10  Protein folding is an integral part of enzyme and robozyme design since after one creates a design, one must determine whether or not it folds correctly.  We will conservatively assume that computer architectural improvements in will enable an order of magnitude improvement per decade.

Use of Offshore Labor

The use of off-shore labor such as protein designers in Russia, India or China would reduce direct labor costs by 7-10×  [Man96, Sch00, Win00, Cos00, Wid01, Kri01].  These costs must be adjusted by overhead costs such as training requirements, travel by U.S. or European based executives, dealing with arcane government regulations and corrupt bureaucracies and long term trends that may raise foreign salary levels closer to those of more developed countries.  These may of course be offset to some degree by the emergence of alternate labor sources in underdeveloped countries with large populations such as Indonesia or China.  We will conservatively assume for the current decade, offshore labor provides a 5× cost savings, decreasing to 3× in the subsequent decade.

Table 3 summarizes the cost savings in protein design.
 

Table 3. Cost Savings in Protein-Based Nanoassembly Design
Source Decrease in
Design Cost
Time Frame
(est.)
Comments
Nanopart Molecular Redundancy
1-50×
2001+
 part design dependent 
Improvements in Computer Aided Enzyme Design
10-100×
2001-2020
 (if funded!)
Moore's Law driven increases in computational speed
10×
100×
1000×
2001-2006
2006-2011
2011-2016
 
Advances in Algorithmic Efficiency
10×
100×
2001-2010
2010-2020
 (requires research)
Development of Optimized Computational Hardware
10×
100×
2001-2010
2010-2020
 
Use of Offshore Labor

2001-2010
2010-2020
 may require training
Maximum Combined Savings
(w/o allowing for redundancy)

1250×
50,000×
3×109×
 2001 
 2005 
 2010 
2020
 

Table 4 uses combined savings that are quite conservative compared with those suggested by Table 3 to provide an estimate of how the combined cost savings will effect the manufacuturing costs for the assembly lines to build nanosystems at various levels of complexity.
 

Table 4. Progress in Nanosystem Design Costs
  Estimated Cumulative Cost Reductions
(w/o Redundancy)
2001 2005 2010 2020
1
100
10,000
1,000,000
Nanosystem Level Effective
Redundancy
Estimated Costs for Assembly Line Design
Fine Motion Controller  3 $  5.8 million $20,000 $200 < $2
PNA System 10 $17.8 billion $17,800,000 $178,000 $1,780
Nanorobot 12 $  3.6 trillion $3,600,000,000 $36,000,000 $360,000

We can see from the table, that the design and manufacture of an assembly line for nanoparts such as the fine-motion controller starts to become a research effort that could be undertaken by a university lab or small corporation sometime between now and 2005.  Large projects typically carry multi-million dollar price tags.  For example, IBM is spending ~$100 million on the development of Blue Gene, the development cost of the fat substitute Olestra cost Proctor and Gamble ~$200 million and the typical figure used to develop and take a drug successfully to market is $400 million.  So while the development for the PNA assembly lines is beyond the budget of all but the richest governments today, between 2005 and 2010 we can see the cost falling to that which may be considered by large corporations and eventually startups that nanoliterate VC firms would fund.   It is worth noting that spread over a 5 year period, the estimated cost of the PNA in 2005 is less than 1% of the current annual funding available for the U.S. National Nanotechnology Initiative!

Given the benefits that one would predict can be provided by molecular nanotechnology and molecular assemblers, the cost of producing the infrastructure required to manufacture PNAs falls from ~$60/person now to 6 cents per person in 2005 (based on the current U.S. population).  That price seems to be a rather small cost for the benefits that would result.  One of the key things to remember is that once the first PNA is produced, the subsequent manufacture of nanoparts by direct atomic assembly becomes much easier.  Does that immediately negate the investment in the development of the nanobiotechnology based assembly path?  Probably not.  Until there is a very robust suite of nanoscale parts that have been designed and assembly paths have been developed using the PNA, it is likely that the approach discussed here will remain useful.  It will make much more sense to devote time and resources to developing assembly paths for parts that can only be assembled by PNAs rather than attempt to compete with the nano-biotechnological manufacturing process for a PNA.  In addition, postponing the development of full self-replicating capabilities for molecular nanotechnology (PNAs that can assemble PNAs) seems to be a good strategy to minimize the near-term development of the negative uses such abilities [Dre86, Fre00b].  So the use of nanobiotechnology to assemble the nanoparts required for PNAs creates a firewall that prolongs the period during which defensive methods, such as PNAs that can disassemble PNAs, may be developed that minimize the risks posed by the development and spread of molecular nanotechnology.

This analysis leads to a number of recomendations if we are to follow the path outlined above.  Government funding agencies should promote education and training in protein and enzyme design.  Foundations and far sighted VCs should invest in the development of software directed towards the task of entirely automating the design, manufacture and testing of the enzymes needed for nanopart assembly-lines.  Finally businesses should be attempting to understand the potential impact that inexpensive nanopart manufacture will have on their industry and be preparing themselves to take advantage of that.

Conclusions

From the discussion above and in the Appendices, we can look back in retrospect and see why we do not already have robust molecular nanotechnology even though the path to get there was roughly outlined more than 20 years ago [Dre81].  To design the enzymes required for the nanoscale assembly of nanoparts requires that one must have the software to design the proteins and the computer horsepower to run moderately large molecular simulations of the enzymes acting on the molecular subcomponents.  That software and computing horsepower has only become available to a moderately large number of research groups since ~1995.  These groups are only now beginning to approach a critical mass of sufficiently educated people and relatively well-developed set of software tools, that could allow an industrial approach to designing all of the enzymes required for the assembly of a nanopart.  This is discussed in more detail in Appendix B.  For an "industrial approach" to work effectively, circa 2001, it would have to employ virtually all of the people trained in enzyme design, perhaps even in protein structure analysis, in the world!  Performing such an "act-of-god" would require either a group of farsighted venture capitalists or a large-scale government "Manhattan Project" approach at this time.  However, as the foundation of free software for molecular-scale design and molecular modeling increases, and the average home computer begins to possess the computing horsepower currently found only in major supercomputer centers, the design and simulation of nanopart assembly-lines will evolve from an academic tour-de-force into a cottage industry.  One of the reasons this path may be the most likely to develop is that it leverages the significant investments that have been made by governments and foundations in projects like the Human Genome Project and industry in such areas as computer-aided drug design and biotechnology.

Appendix A: Discussion of Enzyme Assembly Capabilities

Cholesterol

Cholesterol (C27H46O, MW: 386.73) is a well known complex molecule manufactured by animals.  Its primary feature is a 4-ring molecular structure with methyl and hydroxyl groups attached at specific locations.  Cholesterol synthesis requires at least 20 chemical steps involving a dozen or more enzymes.
 
Figure 3. Cholesterol

Further Information: Biosynthesis of Cholesterol.

Porphyrins

Porphyrins are ringed molecular structures that are able to bind a metal ion.  They have been adapted to a variety of uses by biological organisms.  Example of molecules containing porphyrins include chlorophyll (with Mg), heme (Fe) and vitamin B-12 (Co), turacin (Cu) and coenzyme F430 (Ni) .  Porphyrins may be synthesized through both natural or synthetic pathways.

There are several interesting aspects of the synthesis of porphryins.  The first involves the joining of 2 linear d-aminolevulinic acid molecules to form the C4N ring structure of porphobilinogen by the enzyme ALA dehydratase.  Then Uroporphyrinogen I synthase joins 6 porphobilinogen molecules as a linear string, using what can only be considered a limited polymerase activity (enzymes can count!), and releases 4 of them in the form of hydroxymethylbilane [Dev92].  Finally, uroporphyrinogen III cosynthase closes the heme ring structure forming the spiro intermediate form and then breaks the ring structure, flips one of the porphobilinogen subunits and relinks the ring producing uroporphyrinogen III.
 

Figure 4. Heme Molecule
Protoporphryin IX (heme precursor)
Heme molecule
C34H32FeN4O4
MW: 616.55

Further Information: Biosynthesis of Heme.

Taxol

Taxol (C46H49O14N, MW: 839.96) is a complex molecule whose anti-cancer properties were discovered in 1962.  It wasn't until 1994 however that scientists managed to synthesize it in the lab due to its high level of complexity [Edw96].  Taxol has one of the most complex 3-D molecular structures found in natural compounds to date.  Only the first 3 steps of its biological synthesis path have been determined [Hez97].
 
Figure 5. Taxol Molecule

Molecules like these show that enzymes are fully capable of constructing complex molecules with covalent bonds.  Other molecules with complex 3-D structures include the antibiotic Erythromycin A (C37H67NO13, MW: 734.05) and the neurotoxin responsible for 'red tide' Brevetoxin B (C50H70O14, MW: 895.2) [Nic95].  Perhaps the largest common molecule synthesized is Vitamin B-12 (C63H88CoN14O14P, MW: 1355.55) [Woo79].  Even higher molecular weight natural molecules are known and have been synthesized.  These include vancomycin (C66H77N9O24Cl2, MW: 1485.73) [Nag88, Bog01], palytoxin (C129H234N3O54, MW: 2691.3) [Suh94] and maitotoxin (C164H256O68S2Na2, MW: 3425.8) [Tac96, Kis98, Mur00].

Dave Woodcock, a professor at Okanagan University College maintains a page listing the features of many of the most complex natural molecules known. The National Cancer Institute's Molecular Targets Drug Discovery Program maintains catalog of natural compounds with useful properties.  Many of these are non-polymeric compounds with 50-100 carbon atoms.  Indiana University's Molecular Structure Center has links to similar pages under their Simple, Common, and Interesting Molecules page.

Background Information: Natural Toxins.


Appendix B: Detailed Refinement of Assembly Path Process

The analysis below identifies the information and software components that may be valuable in reducing the design and assembly problems for nanomachinery to "manageable" proportions:
  1. One needs to construct a complete list of the chemical reactions available in nature (see for example Enzyme Nomenclature and the Enzyme Database: ExPASy or  Prowl).  The crystal structures of these enzymes needs to be determined (see PDB) and a database of the chemical reaction mechanisms and the 3-D structure of the essential amino acids involved (e.g. "catalytic triads" [Wal97]) needs to be built.  This information serves as the core information resource, the "Reaction-Structure Library", for designer enzymes.  Where "gaps" exist in this library, i.e. an enzyme has not been discovered that mediates a reaction that chemists believe can occur between two molecules, then efforts should be made to design catalytic structures, preferably those that can function as a single artificial amino acid, that can do this.  In particular, it may be necessary in nanopart assembly to create multiple atomic bonds simultaneously.  Natural enzymes do not usually work this way10.  So it may be necessary to engineer multi-bond capable "Reaction-Structure Library" modules that are collections of simpler single-bond reactions.
  2. One needs robust Computer-Aided, preferably semi- or fully automated design of robozymes and eventually enzymes [Jai00, COR00].  The development times for ERNA (7 person-years for 150,000 lines of code) indicate that this need not be an overly time consuming or expensive process.
  3. One needs automated testing of the enzymes with nanopart subcomponents gradually building up to enzyme "assembly-line" systems.  These enzyme "assembly-lines", can either be engineered into microorganisms, perhaps with enhanced genetic codes, or microfluidic microchannel based devices accept molecular inputs as "feedstock" and produce nanopart outputs.
  4. One needs computer programs that simplify the design of "stiff" nanoparts in the 1000-100,000 atom size scale (i.e. the size of small proteins or Drexler's nanoparts).11
  5. One needs computer programs that can "virtually" disassemble the "stiff" nanopart designs into off-the-shelf molecular (sub-component) parts (i.e. those "parts" that can be produced using chemical synthesis methods available today).   This can potentially produce a very large number of possible assembly paths.  The path tree must then be pruned retaining those paths that (a) rely on inexpensive molecular parts; and (b) preferentially utilize assembly reactions that preexist in the "Reaction-Structure Library".  This results in the minimum number of enzymes and/or reactions that need to be developed de novo to perform a complete set of nanopart assembly steps.  This software is extended by integrating it into the computer-aided design of nanoparts by making it aware of the available reactions and can assist in "Design-for-Assembly".  For example, if one has a camshaft [Smi01], one may not be able to assemble the camshaft and housing separately and then slide the camshaft into the housing.  Instead, one may have to assemble the housing around the camshaft, which could require a much different assembly path.
  6. Finally, one needs to extend this approach up to larger scale levels, e.g. those required for nanobots with billions of atoms.
This should allow full realization of the complete vision of what is feasible using molecular nanotechnology.
 
 
 
Figure 6. Nanorobot Development Process Flow Diagram
Diagram of the complete process flow for the development of nanobots using nanobiotechnology.  Using software designed to (a) evolve nanopart designs, e.g. Nano@Home, or (b) Nanopart Design Tools (NanoCAD), Nanoscale Part Designs are produced.  These are then retroassembled into basic building blocks and specifications for the enzymes that must put those building blocks together.  The building blocks are further disassembled using currently existing retrosynthesis programs  producing the building block chemical synthesis pathways (not shown).  As the "Computer Aided Nanopart Design" and "Computer Aided Enzyme Design" systems develop, they must be integrated with each other such that "Design for Assembly" is achieved.  Further development would replace "Computer Aided Enzyme Design" with "Automated Enzyme Design" to lower costs.  Once a set of enzymes is designed they are produced in Engineered Microorganisms.  The resulting proteins may be harvested and used in Microfluidic Flow-Through Channels through which the previously synthesized complex building blocks flow such that assembled nanoparts result.  Alternatively, a complete set of assembly enzymes plus any necessary synthesis enzymes or building-block import transporters can be engineered into a microorganism that manufactures the subcomponents and assembles the nanopart.12  Nanoparts are assembled into more complex structures, perhaps using NEMS/MEMS based robotics for assistance.  Finally programmable nanoassemblers are produced that can lead to the production of real nanorobots [Fre96].

Respirocyte (nanorobot) image is © Copyright 1999 Interworld Productions, LLC, P.O. Box 30121, Seattle, WA 98103, derived from detailed description in [Fre96]. Fine-motion controller image is © Copyright Institute for Molecular Manufacturing.


Notes

  1. Robert Freitas pointed out in his review of this paper that we do not know for certain that relatively floppy assemblers cannot produce stiff parts.   But as Drexler points out, the stiffness drives the precision of the assembly.  Floppy assemblers will much more error prone and therefore operate more slowly and make poorer use of the feedstock.  To get stiff parts, one must position and bond individual atoms (or functional groups) to atoms that bind in a tetrahedral fashion such as carbon and silicon, or position planar elements such as the carbon rings composing buckyballs and buckytubes, or the heterocyclic purine and pyrimidine rings that compose the bases in DNA in ways that the form interconnected 3D structures.  The essential features are (a) atoms that precisely specify 3D structures through covalent bonds to 4 other atoms, or (b) atoms that are able to covalently bond to 3 other atoms, defining planar structures that must be further linked on a larger scale.  Examples of (b) would be GaN and BN nanotubes [Han97, Sue97, Zha98].  It only seems feasible to create very stiff structures out of polymers linked by 2 bonds if one ties the long chains in knots or otherwise constrains their flexibility with surrounding atoms.
  2. This paper assumes that one must have large numbers of PNAs operating in parallel to assemble macroscale parts in reasonable times.  In the strategy being outlined, the expense is all at the front-end, once you have the first PNA system, the manufacturing costs for additional PNA systems is very low.  To be explicit the strategy is to have self-replicating engineered biological systems manufacture the parts for PNA systems and potentially assemble subassemblies.  The strategy is not to have self-replicating PNA systems.
  3. Strictly speaking, tungsten itself is not manipulated, a ion of tungstate WO42-, is manipulated in such a way as to replace two of the oxygen bonds with sulfur bonds binding the tungsten atom to a pterin molecule.  See the Promise database here and a discussion of molybdopterin synthesis. There are a host of metalloenzymes documented in the Promise database and a number of metallochelatases are known that manage the distribution of copper and iron ions in cells.
  4. It could be argued that the design of enzymes that can grip large molecular building blocks would be more difficult than designing enzymes that could grip smaller natural building blocks for which the structure of enzyme grippers is already known.  This may be true but this seems offset by the much smaller enzyme count required for assembly paths based on larger building blocks.  One does not have to design an enzyme that precisely forces the partially assembled nanopart and the small building block being added into extremely precise alignment.  So long as the natural flexing of the enzyme brings the sub-components together in approximately the correct orientation some fraction of the time, the desired reaction will occur.  Separation methods such as HPLC or gel separation could then be used to isolate properly assembled intermediate stage components which could be fed back into the assembly line.  This is not dissimilar from biological assembly systems that disassemble or otherwise reject misassembled parts.  Further discussion of the enzyme requirements may be found in [Bra01e].
  5. If built using small building blocks (amino acid sized), the FMC would require ~400 person years and cost $40 million while the PNA would require 1.2 million person years and cost ~$120 billion.  Because the PNA contains significant redundancy compared with the FMC one would expect costs to be significantly less, perhaps $10-100 billion.  Even at $120 billion the cost seems justified.  It amortizes to < $20 per person over the global population or ~$420 per person over the U.S. population.  For comparison purposes the Apollo Program of the 1960's cost ~$20-24 billion which allowing for inflation would be $150 billion in 1992 dollars [Lau99, Jon95].  [N.B. These estimates should be taken with a grain of salt, as estimates using the Johnson Space Center GDP Calculator, or Economagic's GDP deflator chart, suggest the inflated cost should be ~$88 billion in Y2000 dollars!]  In either case, even a very difficult route to a PNA is comparable in size to the Apollo Project.  The benefits that would result from having programmable nanoassemblers [Dre92] would be much greater than those that resulted from the Apollo Project [Tif00].
  6. A list of people involved in the study of protein folding, an area of study that enzyme design may be considered largely a small subset of, contains less than 100 groups [Sau00].  A Google search for "protein folding" turns up > 50,000 pages while a Google search for "enzyme design" returns only ~250 pages.
  7. For comparison purposes, Rockwell International's direct labor requirements for the U.S. Space Shuttle Program were 95,300 person-years [Sch81, RISD1974].
  8. The American Chemical Society has over 150,000 members.  The ideal candidates for reeducation as protein designers would be organic chemists with experience in chemical synthesis and molecular modeling.
  9. It is worth noting that the 1st generation Blue Gene computers will not be built using state-of-the-art semiconductor fabrication processes, nor are their CPUs highly optimized for molecular dynamics calculations (unlike the GRAPE computers that are optimized for the gravity calculation), so there is still significant room for improvement in the performance of these machines.
  10. There are exceptions to this generalization however.  Oxidosqualene-lansterol cyclase must alter at least 7 bonds to convert 2,3-oxidosqualene into lansterol.  It is not strictly necessary to enzymatically mediate all of the bonds required for each synthesis step.  If an assembly step requires the formation of a dozen bonds, it may only be necessary to mediate a two or three bond connection between subcomponents to get proper assembly.  This would constrain the motion of the sub-components sufficiently that it would significantly decrease the probability of undesirable reactions occurring.  Thus the assembly could be generated via a combination of enzymatic mediated constraints on reactant orientations followed by standard organic chemical synthesis methods to produce the full set of inter-atomic bonds.  This is not unlike 2 or 3 hinges of a door constraining the orientation of the door to fit precisely into the door frame.
  11. There may well be programs capable of doing this.  See Google's Molecular_Modeling page,  LLNL's Science & Technology Education Program's Molecular Modeling, Viewing and Drawing page, the UCSF MidasPlus or Amber page or Art Robert's Biotech-Resource "Free Software" page, VCU's Molecular Modeling Related Sites page, or WUSTL's Tinker page.
  12. Microorganisms allow faster scale-up to macroscale production quantities and cheaper manufacturing costs than the microfluidics based approach.  The short replication times for bacteria (~20 minutes) allows the manufacture of a large number of nanopart assembly lines in a very short period.  Microorganisms may be replicated in large-scale fermentation installations (e.g. those utilized by the beer and wine producers) which are simpler technologies to work with compared with those required for microfluidics chip manufacture (e.g. clean rooms).  While initial laboratory efforts might be based on microfluidics, the manufacture of macroscale production quantities (kg of nanoparts) would seem to be better handled by microorganism based assembly lines.  Many precedents exist for the engineering of molecular assembly lines in microorganisms, e.g. Mur93, Wan99, Kak01, San01, Mar01, Pfe01, Tao01 and Epp01.
  13. It is worth noting that DNA constructed with additional base pairs other than the standard A, C, G and T [Pic90] and the engineering of DNA polymerase enzymes capable of copying such extended DNA have been produced [Sis04] by a single lab in less than 15 years.  See also "Evolving Artificial DNA" for Astrobiology Magazine (Feb 2004).

Acknowledgements

References

Protein and Enzyme Design Laboratories (2001)

Unreferenced at present

Chemistry Index Pages

Chemistry Software Links



Created: July 2001
Last Modified: December 17, 2004