PATENT NUMBER | This data is not available for free |
PATENT GRANT DATE | 02.04.2002 |
PATENT TITLE |
Synthetic genes for enhanced expression |
PATENT ABSTRACT |
A method of making a synthetic nucleic acid sequence comprises providing a starting nucleic acid sequence, which optionally encodes an amino acid sequence, and determining the predicted .DELTA.G.sub.folding of the sequence. The starting nucleic acid sequence can be a naturally occurring sequence or a non-naturally occurring sequence. The starting nucleic acid sequence is modified by replacing at least one codon from the starting nucleic acid sequence with a different corresponding codon to provide a modified nucleic acid sequence. As used herein, a "different corresponding codon" refers to a codon which does not have the identical nucleotide sequence, but which encodes the identical amino acid. The predicted .DELTA.G.sub.folding of the modified nucleic acid sequence is determined and compared with the .DELTA.G.sub.folding of the starting nucleic acid sequence. In accordance with the invention, the predicted .DELTA.G.sub.folding of the starting nucleic acid sequence can be determined before or after the modified starting nucleic acid is provided. |
PATENT INVENTORS | This data is not available for free |
PATENT ASSIGNEE | This data is not available for free |
PATENT FILE DATE | January 31, 2000 |
PATENT REFERENCES CITED |
Zahn, K., Overexpression of an mRNA Dependent on Rare Codons Inhibits Protein Synthesis and Cell Growth, Journal of Bacteriology, May 1996, p 2926-2933. Kane, J. F., Effects of Rare Codon Clusters on High-Level Expression of Heterologous Proteins in Escherichia Coli, Current Opinion in Biotechnology Ltd ISSN, vol. 6 p 494-500. Nakayama, T. et al., Purification of Bacterial L-Methionine .gamma.-Lyase, Analytical Biochemistry, vol. 138 p 421-424. |
PATENT GOVERNMENT INTERESTS |
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT This invention was made with government support under Grant # 1R43DK55951-01 awarded by the National Institutes of Health. The government has certain rights in the invention. |
PATENT CLAIMS |
What is claimed is: 1. A method of making a synthetic nucleic acid sequence, the method comprising: (a) providing a starting nucleic acid sequence; (b) determining the predicted .DELTA.G.sub.folding of the starting nucleic acid sequence; (c) modifying the starting nucleic acid sequence by replacing at least one codon from the starting nucleic acid sequence with a different corresponding codon to provide a modified nucleic acid sequence; (d) determining the predicted .DELTA.G.sub.folding of the modified nucleic acid sequence; (e) comparing the .DELTA.G.sub.folding of the modified nucleic acid sequence with the .DELTA.G.sub.folding of the starting nucleic acid sequence (f) determining whether the .DELTA.G.sub.folding of the modified nucleic acid sequence is increased relative to the .DELTA.G.sub.folding of the starting nucleic acid sequence by a desired amount; (g) if the .DELTA.G.sub.folding of the modified nucleic acid sequence is not increased by the desired amount, further modifying the modified nucleic acid sequence by replacing at least one codon from the modified nucleic acid sequence with a different corresponding codon to provide a different modified nucleic acid sequence; and repeating steps(f) and (g) until the .DELTA.G.sub.folding of the modified nucleic acid sequence is increased by the desired amount to ultimately provide a final nucleic acid sequence. 2. The method of claim 1 wherein the starting nucleic acid sequence encodes an amino acid sequence. 3. The method of claim 1 further comprising physically creating the final nucleic acid sequence. 4. The method of claim 1 wherein the codon replacement is in a region of the starting nucleic acid sequence containing secondary structure. 5. The method of claim 1 further comprising selecting a host for expressing the modified nucleic acid sequence and transforming the host with the modified sequence, wherein the host expresses the modified nucleic acid sequence better than the host would express the starting sequence if the host were transformed with the starting sequence. 6. The method of claim 5 wherein the selected host is E. coli. 7. The method of claim 1 wherein the different corresponding codon is a codon that occurs with higher frequency in the selected host. 8. The method of claim 1 wherein the desired amount is at least about 2%. 9. The method of claim 1 wherein the desired amount is at least about 10%. 10. The method of claim 1 wherein the desired amount is at least about 20%. 11. The method of claim 1 wherein the desired amount is at least about 30%. 12. The method of claim 1 wherein the different corresponding codon has fewer guanine or cytosine residues than the replaced codon. 13. The method of claim 1 further comprising selecting a host for expressing the modified nucleic acid sequence, wherein the starting nucleic acid sequence is derived from an amino acid sequence native to a bacterium different from the host selected for expression. 14. The method of claim 13 wherein the amino acid sequence is native to a bacterium of the genus Pseudomonas. 15. The method of claim 1 wherein the starting nucleic acid sequence is a naturally occurring sequence. 16. The method of claim 1 wherein the starting nucleic acid sequence is a non- naturally occurring sequence. 17. The method of claim 1 wherein the .DELTA.G.sub.folding of the modified nucleic acid sequence is more positive than about -0.2 kcal/(mol)(base). 18. The method of claim 1 further comprising selecting a desired amino acid sequence, wherein the modified nucleic acid sequence encodes the desired amino acid sequence. 19. The method of claim 18 further comprising selecting a host for expressing the desired amino acid sequence, wherein the host is an enteric bacterium, the amino acid sequence is native to a bacterium different from the host selected for expression, and the different corresponding codon is one which occurs with higher frequency in the enteric bacterium than does the replaced codon. 20. The method of claim 19 wherein the enteric bacterium is Escherichia coli. 21. The method of claim 19 wherein the amino acid sequence is native to a bacterium of the genus Pseudomonas. 22. The method of claim 1 wherein the modified, final sequence is more amplifiable than the starting sequence. 23. The method of claim 1 wherein the different codon is selected from the most frequently used by a selected host. |
PATENT DESCRIPTION |
BACKGROUND OF THE INVENTION The field of the invention is synthetic nucleic acid sequences for improved amplification and expression in a host organism, and methods of creating them. It has been a goal of biotechnology to promote the expression of cloned genes for analysis of gene structure and function and also for commercial-scale synthesis of desirable gene products. DNA cloning methods have enabled the genetic modification of bacteria and unicellular eukaryotes to produce heterologous gene products. In principle, the genes may originate from almost any source, including other bacteria, animal cells or plant cells. Although this expression of heterologous genes is a function of a variety of complex factors, maximizing the expression of cloned sequences has been under intense and rapid development. Plasmid and viral vectors have been developed in both prokaryotes and eukaryotes that enhance the level of expression of cloned genes. In some cases the vector itself contains the regulatory elements controlling the expression of genes which are not normally expressed in the host cell so that a high level of expression of heterologous genes can be obtained. Several problems exist, however, in the expression of many proteins across phyla and even across species. Post-translational handling and modification of expressed proteins by the host cell often does not mimic that of the heterologous gene's own cell type. Frequently, even if the protein is expressed in a useful form, heterologous genes are poorly expressed. Low yields of expressed protein may make manufacture of commercially useful quantities impossible or prohibitively costly. Vectors designed to enhance expression are not able to overcome some expression problems if the regulatory elements of the vector are not the constraint on robust expression. Other cellular or translational constraints are at issue. Genes encoding poorly expressed proteins are often themselves difficult to clone and amplify as well. This is due to secondary structure inherent in the gene, for example caused by high G-C content. Some methods have been used to reduce these difficulties, such as the use of DMSO or betaine to bring G-C and A-T melting behaviors more into alignment, or the use of ammonium sulfate (hydrogen binding cations) to destabilize G-C bonding during PCR. The problem with these methods is that the effects of the additives are concentration dependent, so variations in template size and G-C content mean lengthy optimization procedures. Additionally, these steps do nothing to facilitate subsequent expression of the nucleic acid once it has been cloned. The frequency of particular codon usage in Escherichia coli and other enteric bacteria has long been known, and it has been hypothesized that replacement of certain rare codons encoding a particular amino acid in a heterologous gene with a codon that is more commonly used by such bacteria would enhance expression (see, e.g., Kane, Curr Opin Biotechnol 6:494-500 (1995) and Zahn, J. Bacteriol., 178:2926-2933 (1996)). This is based on the theory that rare codons have only a few tRNAs per cell and that transcription of heterologous sequences having numerous occurrences of these rare codons is limited by too few available tRNAs for those codons. However, simple replacement of rare codons does not reliably improve expression of heterologous genes, and no broadly applicable method exists to select which codon changes are best to increase expression of heterologous sequences. Further, it is not known in detail how codon usage is related to expression level. Bacterial gene products are commonly used as research and assay reagents, and various microbial enzymes increasingly are finding applications as industrial catalysts (see, for example, Rozzell, J. D., "Commercial Scale Biocatalysis: Myths and Realities," Bioorganic and Medicinal Chemistry, 7:2253-2261 (1999), herein incorporated by reference). Some have substantial commercial value. Examples include heat-stable Taq polymerase from Thermus aquaticus, restriction enzymes such as Eco RI from E. coli, lipase from Pseudomonas cepacia, .beta.-amylase from Bacillus sp., penicillin amidase from E. coil and Bacillus sp., glucose isomerase from the genus Streptomyces, and dehalogenase from Pseudomonas putida. Genes from bacteria may express easily in commercially useful host strains, but many do not. In particular, genes from bacteria having significantly different codon preferences from enteric bacteria, including but not limited to filamentous bacteria such as streptomycetes and various strains of the genus Bacillus, Pseudomonas, and the like can be difficult to express abundantly in enteric bacteria such as E. coli. An example of a Pseudomonas gene that is difficult to express in E. coli is the enzyme methionine gamma-lyase, useful for the assay of L-homocysteine and/or L-methionine as described in U.S. Pat. No. 5,885,767 (herein incorporated by reference). This assay is particularly useful in the diagnosis and treatment of homocystinuria, a serious genetic disorder characterized by an accumulation of elevated levels of L-homocysteine, L-methionine and metabolites of L-homocysteine in the blood and urine. Homocystinuria is more fully described in Mudd et al., "Disorders of transsulfuration," In: Scriver et al., eds., The Metabolic and Molecular Basis of Inherited Disease, McGraw-Hill Co., New York, 7.sup.th Edition, 1995, pp. 1279-1327 (herein incorporated by reference). In developing an assay for the accurate quantitation of L-homocysteine and L-methionine according to the methods described in U.S. Pat. No. 5,885,767, obtaining large amounts of methionine gamma-lyase is necessary. However, this Pseudomonas gene contains a number of codons that are less commonly found in genes of desirable bacterial hosts for expression such as E. coli. Because plasmid vectors designed to enhance expression with a variety of promotors or other regulatory elements often do not resolve the difficulty in expressing certain genes, and because no systematic approach exists for codon replacement to aid amplification of nucleic acids or their expression, there is clearly a need for an improved method for amplification and expression of genes, including genes from various bacteria such as streptomycetes, Bacillus, Pseudomonas and the like introduced into enteric bacterial hosts such as E. coli. SUMMARY OF THE INVENTION In one embodiment, the invention is directed to a method of making a synthetic nucleic acid sequence. The method comprises providing a starting nucleic acid sequence, which optionally encodes an amino acid sequence, and determining the predicted .DELTA.G.sub.folding of the sequence. The starting nucleic acid sequence can be a naturally occurring sequence or a non-naturally occurring sequence. The starting nucleic acid sequence is modified by replacing at least one codon from the starting nucleic acid sequence with a different corresponding codon to provide a modified nucleic acid sequence. As used herein, "codon" generally refers to a nucleotide triplet which codes for an amino acid or translational signal (e.g., a stop codon), but can also mean a nucleotide triplet which does not encode an amino acid, as would be the case if the synthetic or modified nucleic acid sequence does not encode a protein (e.g., upstream regulatory elements, signaling sequences such as promotors, etc.). As used herein, a "different corresponding codon" refers to a codon which does not have the identical nucleotide sequence, but which encodes the identical amino acid. The predicted .DELTA.G.sub.folding of the modified nucleic acid sequence is determined and compared with the .DELTA.G.sub.folding of the starting nucleic acid sequence. In accordance with the invention, the predicted .DELTA.G.sub.folding of the starting nucleic acid sequence can be determined before or after the modified starting nucleic acid is provided. Thereafter, it is determined whether the .DELTA.G.sub.folding of the modified nucleic acid sequence is increased relative to the .DELTA.G.sub.folding of the starting nucleic acid sequence by a desired amount, such as at least about 2%, at least about 10%, at least about 20%, or at least about 30%. If the .DELTA.G.sub.folding of the modified nucleic acid sequence is not increased by the desired amount, the modified nucleic acid sequence is further modified by replacing at least one codon from the modified nucleic acid sequence with a different corresponding codon to provide a different modified nucleic acid sequence. These steps are repeated until the .DELTA.G.sub.folding of the modified nucleic acid sequence is increased by the desired amount to ultimately provide a final nucleic acid sequence, which is the desired nucleic acid sequence. The modified and/or final nucleic acid sequence can then be physically created. By the present invention, a desired nucleic acid sequence can be created that is more highly expressed in a selected host, such as E. coli, than the starting sequence. By "more highly expressed" is meant more protein product is produced by the same host than would be with the starting sequence, preferably at least 5% more, more prefererably at least 10% more, and most preferably at least 20% more. Preferably the codon replacement is in a region of the starting nucleic acid sequence or modified nucleic acid sequence containing secondary structure. It is also preferred that the different corresponding codon is one that occurs with higher frequency in the selected host. In a particularly preferred embodiment, the desired amino acid sequence is expressed in Escherichia coli, and the amino acid sequence is from a bacterium of the genus Pseudomonas, and the different corresponding codon is selected to be one that occurs with higher frequency in Escherichia coli than does the replaced codon. Alternatively, or in addition, the different corresponding codon is selected as one that has fewer guanine or cytosine residues than the replaced codon. In a particularly preferred embodiment, the starting nucleic acid sequence is derived, e.g., converted, from an amino acid sequence native to an organism different from the desired host for expression, for example Pseudomonas. The method of the invention also provides a modified, final sequence that is more amplifiable than the starting sequence. In other words, the final sequence is amplified more readily in a full length form, more rapidly or in greater quantity. In another embodiment, the invention is directed to a synthetic nucleic acid sequence having a plurality of codons and encoding a methionine gamma-lyase protein from Pseudomonas putida. As used herein, the phrase "nucleic acid sequence encoding a protein" means that the nucleic acid sequence encodes at least the functional domain of the protein. The sequence having no more than about 95% homology, preferably no more than about 90% homology, more preferably no more than about 85% homology, still more preferably no more than about 80% homology, to a naturally occurring methionine gamma-lyase gene from Pseudomonas putida. At least about 5%, preferably at least about 10%, more preferably at least about 20%, still more preferably at least about 30%, even more preferably at least about 40%, of the codons in the synthetic nucleic acid sequence are different from codons found in the naturally occurring gene. In one aspect, the codons in the synthetic nucleic acid sequence encode the same amino acids as the codons in the naturally occurring gene. In another aspect, at least one of the codons in the synthetic nucleic acid sequence encodes an amino acid different from the numerically corresponding amino acid found in the naturally occurring sequence. In yet another aspect, at least one of the different codons in the synthetic nucleic acid sequence is in an area of secondary structure in the naturally occurring gene. In another embodiment, the invention is directed to a method of creating a synthetic nucleic acid. The method comprises providing a sense nucleic acid sequence having a 5' end and a 3' end and providing an antisense nucleic acid sequence having a 5' end and a 3' end. Preferably the sense and antisense nucleic acid sequences are between about 10 and about 200 bases, more preferably between about 80 and about 120 bases. The 3' end of the sense sequence has a plurality of bases complimentary to a plurality of bases of the 3' end of the antisense sequence, thereby forming an area of overlap. Preferably the area of overlap is at least 6 bases, more preferably at least 10 bases, still more preferably at least 15 bases. The 5' end of the sense sequence extends beyond the 3' end of the antisense sequence, and the 5' end of the antisense sequence extends beyond the 3' end of the sense sequence. The method further comprises annealing the sense and antisense sequences at the area of overlap. A polymerase and free nucleotides are added to the sequences. Said nucleotides may be naturally occurring, i.e., A, T, C, G, or U, or they may be non-natural, e.g., iso-cytosine, iso-guanine, xanthine, and the like. The sequences can be annealed before or after addition of the polymerase and free nucleotides. The sequences are extended, wherein the area of overlap serves to prime the extension of the sense and antisense sequences in the 3' direction, forming a double stranded product. The extended sequence can then be amplified. Further, a second step to the method can be added where the double stranded first extension product is separated into an extended sense strand and an extended antisense strand and a second set of sense and antisense nucleic acid sequences are provided having a 5' end and a 3' end. Each has a plurality of bases on its 3' end complementary to a plurality of bases on the 3' end of the extended sense or antisense strand respectively, thereby forming second and third areas of overlap. A polymerase and free nucleotides are added to the sequences and separated strands, wherein the second and third areas of overlap serve to prime a second extension of the sequences and strands that encompasses the sequence of the first sense and antisense nucleic acid sequences and the second sense and antisense nucleic acid sequences. |
PATENT EXAMPLES | This data is not available for free |
PATENT PHOTOCOPY | Available on request |
Want more information ? Interested in the hidden information ? Click here and do your request. |