Main > GENOMICS > Human Genomics > Genome Anthology > Patent > Claims > Claim 1: Method of Generating > Genome Anthologies: a) Obtain > Plurality Native Genomic DNA Sample > s from Different Sources; b) Target > a DNA Locus within DNA Samples > & c) Isolating DNA Locus from > DNA Samples. Claim 3: Targeting & > Isolation carried Simultaneously > using Targeted In Vivo Cloning > Patent Assignee

Product USA. G

PATENT NUMBER This data is not available for free
PATENT GRANT DATE October 26, 1999
PATENT TITLE Genome anthologies for harvesting gene variants

PATENT ABSTRACT The present invention relates to the development of collections of a single gene locus from a collection of individuals or organisms, called genome anthologies. The invention describes several novel methods for producing collections of a gene or gene families from multiple individuals or organisms. One method is targeted in vivo cloning. Another method is locus specific primer extension and exonuclease degradation method.
PATENT INVENTORS This data is not available for free
PATENT ASSIGNEE This data is not available for free
PATENT FILE DATE December 10, 1997
PATENT REFERENCES CITED Degryse et al. In Vivo Cloning by Homologous Recombination in Yeast Using a Two-Plasmid-Based System. Yeast. 11: 629-640, 1995.
Burke et al. Cloning of Large Segments of Exogenous DNA into Yeast by Means of Artificial Chromosome Vectors. Science. 236: 806-812, May 1987.
Bradshaw et al., (1993) 1.sup.st European Science Foundation Conference on Developmental Biology, Karause Ittengen, Switzerland Jun. 14-17.
Bradshaw et al., (1995) Acids Res. 23:4850-4856.
Bradshaw et al., (1996) Proc. Natl. Acad. Sci. USA 93:2426-2430.
Burgers and Percival, (1987) Analyt. Biochem., 163:391-397.
Campbell et al., (1991) Proc. Natl. Acad. Sci. USA 88:5744-5748.
Campbell et al., (1995) Nucl. Acids Res. 23:3691-3695.
Erickson and Johnston (1993) Genetics 134:151-157.
Frengen et al., (1997) Genetic Analysts: Biomolecular Engineering 14:55-59.
Ketner et al. (1994) Proc. Natl. Acad. Sci. USA 91:6186-6190.
Kogoma, (1997) Microbiol. Mol. Biol. Reviews 61:212-238.
Kouprina et al., (1994) Genomics 21:7-17.
Larionov et al., (1994) Nucl. Acids. Res. 22:4154-4162.
Larionov et al., (1966a) Proc. Natl. Acad. Sci. USA 93:491-496.
Larionov et al., (1996b) Proc. Natl. Acad. Sci. USA 93:13925-13930.
Larionov et al., (1997) Proc. Natl. Acad. Sci. USA 94:7384-7387.
McGonigal et al., (1995) Gene 155:267-271.
Miller et al., (1993) Genes & Dev. 7:933-947.
Orr-Weaver, et al., (1983a) Proc. Natl. Acad. Sci., USA 78:6354-6358.
Orr-Weaver, et al., (1933b) Meth. Enzymol. 101:228-245.
Pavan et al., (1990) Proc. Natl. Acad. Sci. USA 87:1300-1304.
Sikorski and Hieter (1989) Genetics 122:19-27.
Simpson and Huxley (1996) Nucl. Acids. Res. 24:4693-4699.
Spencer et al., (1993) Methods: A Companion to Methods in Enzymology 5:161-175.
van Dijk et al., (1997) Cancer Res 57:3478-3485.
PATENT PARENT CASE TEXT This data is not available for free
PATENT CLAIMS We claim:

1. A method of generating genome anthologies consisting of the steps of:

a. obtaining a plurality of native genomic DNA samples from different sources;

b. targeting a DNA locus within said DNA samples; and

c. isolating said DNA locus from said DNA samples, thereby producing a genome anthology.

2. The method of claim 1 wherein the plurality of DNA samples is obtained by pooling DNA samples from different sources.

3. The method of claim 1 wherein targeting and isolation steps are carried out simultaneously using targeted in vivo cloning.

4. A method for generating a genome anthology using targeted in vivo cloning comprising the steps of:

a. linearizing a vector, said vector comprising a bacterial replication origin, a bacterial marker gene, a yeast replication origin, a yeast centromere sequence, a yeast marker gene, a unique cloning site and recombinogenic ends homologous to specific regions of a target DNA locus;

b. introducing said linearized vector and a target sequence into yeast cells, said target sequence consistingz of native genomic DNA, such that homologous recombination occurs in the yeast cells, said homologous recombination resulting in formation of circular hemizygous clones;

c. shuttling said circular hemizygous clones into bacterial cells; and

d. amplifying hemizygous clones, thereby producing a genome anthology.
--------------------------------------------------------------------------------

PATENT DESCRIPTION FIELD OF THE INVENTION

The present invention relates to the field of genomics and the development of genome anthologies. The present invention also relates to the use of such genome anthologies to screen patients for sensitivity to specific drugs. The present invention also relates to the isolation of naturally occurring gene targets for pharmaceutical development. The present invention also relates to the correlation of genotype to phenotype in specific individual genomes.

BACKGROUND OF THE INVENTION

Analysis of genetic population structure of any organism at the molecular level, requires a thorough understanding of the nature and distribution of DNA sequences among its component individuals and populations. Techniques used in such studies have included allele frequency data (Bowcock, 1987), restriction fragment length polymorphisms (Botstein et al., 1980) and by discovering association among loci for mapping both simple (Collins, 1995) and complex diseases (Lander et al 1989). Similarly, a wealth of data have emerged from studies on the maintenance and evolution of DNA sequences in specific areas of the Drosophila genome such as Adh (Krietman 1983), Xdh (Riley et al., 1989) and amylase (Aquadro et al., 1991). Clearly, a detailed analysis of genomic regions and the genealogy of these genomes across populations, species and genera, using a variety of highly innovative techniques toward the construction of high density haplotypes, and sequence analysis of specific regions would yield information not only on the genome diversity among populations, their aggregates and species but also reveal the significance of diversity at the phenotypic level.

The study of haplotype (haploid genotype) diversity has been recognized as an important tool for studying evolutionary lineages among populations (Templeton et al 1987)) as well as establishing associations and linkage or gametic disequilibria among loci. Since 1989, several methodologies for haplotyping individual genetic markers at specified loci have been investigated using strictly molecular means. The concept of linkage disequilibrium, defined as non-random associations of alleles among loci, plays an important role in mapping genes that are valuable in population, anthropological and medical research. In addition, polymorphic short tandem repeat markers (STR) have been employed to obtain informative haplotypes for linkage analysis (Weber & May, 1989; Dubovsky et al., 1995). However, the use of such haplotype systems is compromised by frequent instances of ambiguous linkage phase in a population sample. Although genotyping of pedigrees often allows determination of linkage phase for many populations of medical and anthropological interest, material on informative families is often unavailable or inadequate. Furthermore, robust statistical methods to estimate haplotype frequencies (Excoffier & Satkin, 1995) often mis-identify rare haplotypes and occasionally generate spurious haplotypes (Tishkoff et al., 1996b). Hence, new and accurate molecular methods for generation of haplotypes are urgently needed.

Methods to isolate genes and specific loci can be grouped into the following two broad categories: construction of genomic DNA libraries and the polymerase chain reaction (PCR). First, production and maintenance of genomic libraries is not only labor intensive, but also requires at least three fold over-sampling to meet the odds of recovering the specific locus. Occasional under-representation of specific regions due to variations in the method of construction of libraries, cloning vectors and stochiasticity associated with biological systems, further increases the uncertainty of recovering a well-defined region of interest. Thus, library construction and screening to study molecular genetic diversity of a large number of individuals across several populations and species will become a formidable task.

Alternatively, while PCR methods offer rapid and efficient analysis of specific loci, there is a limitation on the size of the sampled region. Current methods can accommodate up to 35 kb using cloned DNA as template (Barnes, 1994), but only 25 kb from complex genomic DNA (Cheng et al., 1994). Additionally, optimizations of conditions for long range PCR of genomic DNA, coupled with the introduction of sequence errors during amplification, pose serious problems in the comparative analysis of DNA sequence variation of a specific region among individuals and populations. Directly cloning the desired region from native genomic DNA would provide an effective alternative to library construction and PCR.

One object of the present invention relates to methods for generating collections of a single genetic locus from various sources.

Another object of the present invention relates to genome anthologies, that is, collections of a specific locus, including, for example, a gene or group of genes from multiple sources.

Another object of the present invention relates to the generation of genome anthologies from all members of a gene family from one source or from multiple sources.

Yet another object of the present invention relates to the use of such genome anthologies in a method for identifying specific haplotyping targets.

A further object of the present invention relates to novel methods for haplotyping individual genetic markers at specified loci.

Yet another object of the present invention relates to harvesting human DNA variants to generate targets for drug discovery.

A further object of the present invention relates to the use of haplotyping to screen individuals for sensitivities to specific drugs or treatment regimes.

Another object of the invention is the development of molecular haplotyping kits for a several loci distributed throughout the human genome.

Another object of this invention is to collect multiple variants of a single complete gene from different members of a population, in a manner that not only is efficient, but results in a permanent, replicatable, expressible, fully manipulatable, individually identifiable collection of hemizygous entities.

Another object of this invention is to use genetic variation 1) to enhance the efficacy of therapeutics by customizing such genetic variation for specific population groups, 2) to reduce the costs of developing new drugs, and 3) to increase the chances that a new drug will be successful in clinical trials and, therefore, gain FDA approval.

SUMMARY OF THE INVENTION

The present invention relates to collections of a single genetic locus or loci comprising a gene family isolated from various sources, and methods of generating such genomic collections. These collections, known as genome anthologies, are useful in identifying haplotypes, identifying drug targets and determining sensitivities to reagents based upon genetic determinants and/or genetic variability.

One method for generating genome anthologies employs targeted in vivo cloning ("TIVC"), providing simultaneous genome targeting and locus isolation. This method allows generation of specific haplotyping targets from any number of individuals in a population. In this embodiment, up to 300 kb of DNA can be cloned, permitting ready analysis of linkage disequilibrium as a function of distance, by sampling markers across large hemizygous regions and for recovering entire genes, including regulatory regions many kilobases away from the coding region.

Yet another embodiment of the present invention is to harvest all members of a gene family by means of TIVC. This is accomplished by employing, specific "signature," or conserved sequences, common to all gene family members to retrieve all members of the gene family regardless of their chromosomal location.

Yet another method for generating genome anthologies may be employed. Long range PCR can be used to accommodate as much of a locus as possible. In particular, this method is useful in cases where a very specific DNA region is targeted. This method comprises (i) polymerase chain reaction from a plurality of DNA templates; (ii) cloning of the PCR products into a vector; and (iii) analysis of the variants in isolation for genotyping, expression and sequence variation. Genome anthologies may also be generated by employing a method which combines long range primer extension and exonuclease treatment in the targeting step.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1. Comparison of a genomic library with a genome anthology. Genomic libraries are collections of all genetic loci present in the genome of an individual.

FIG. 2. Creation of a genome anthology by TIVC. An in vivo cloning vector, pClasper, equipped with sequences that target locations Y and Z in genomic DNA, is constructed and mixed with DNA pooled from individuals of a defined population. The mixture of genomic DNA and vector is used to co-transform yeast. By homologous recombination, the target DNA defined by sites Y and Z is rescued in the vector as hemizygous clones that can be shuttled to bacteria for amplification.

FIG. 3. Molecular haplotyping using a primer extension and exonuclease degradation method. This method generates selected loci from whole genome samples by selective protection from exonuclease degradation of the desired locus. Allele-specific strand extension using a thermostable DNA polymerase generates a double-stranded allele specific product that is not degraded in the presence of exonuclease that digests the remaining unprotected single-stranded genomic DNA. All polymorphic sites within the region extended from the allele-specific primer are also protected.

FIGS. 4A-4C Monitor cassette for optimizing yeast transformation. The URA3 gene is the target for pClasper equipped with recombinogenic ends complementing the vector backbone. The recombinogenic sequences correspond to sequences of 321 and 627 base pairs flanking the unique Xmnl site in the ampicillin resistance gene of pBluescript. Linearization of the plasmid with Xmnl exposes the recombinogenic sites for targeting by Clasper. (B) Construction of Clasper recovery vector. The Clasper recovery vector for the URA3 monitor cassette was constructed by cloning PCR products amplified from the pBluescript vector into pClasper. Linearization of Clasper at a unique Nrul site between the recombinogenic ends prepares Clasper for recovery of the monitor. (C) Recombination between monitor cassette and Clasper. Linearized Clasper (pClSX) and monitor (pBSU) are mixed and presented to the yeast spheroplasts during the co-transformation event. Recombination takes place between the homologous DNA sequences of pBSU and pClSX. Only recombinants are able to grow on selection medium lacking leucine and uracil.

FIG. 5. Effect of vector:target molar ratio on co-transformation efficiency. Yeast cells were co-transformed in parallel experiments with pClSX (vector) and pBSU (target monitor) using molar ratios as given, starting with 1 .mu.g of pBSU. All transformations were performed in triplicate. Optimal vector:target molar ratios for co-transformations were in the range of 1:8 to 1:16.

FIG. 6. XbaI restriction enzyme digestion of pClSXURA3 recombinants shuttled from yeast to bacteria. DNAs from nine pClSXURA3 yeast recombinants were shuttled to bacteria by electroporation. Plasmid DNA from bacterial tranformants, each representing one of the nine original yeast recombinants, were digested with XbaI to release the 1.5 kb URA3 gene from the remaining 12.6 kb pClSXURA3 recombinant. "-" means non-recombinants from control plates; "M" indicates size marker.

FIG. 7. Co-transformation of yeast strain Y724 with pClSX, PBSU, and complex mixtures of DNA. pClSX and pBSU were mixed at the molar ratios indicated and used to co-transform yeast spheroplasts in the presence and absence of 1 .mu.g total native genomic DNA. The genomic DNA was added to illustrate rescue from a complex pool of DNA. In the experiment using a molar ratio of 1:0.0075 in the presence of genomic DNA, the target (pBSU) was present as approximately 0.1% of the total DNA used to transform the yeast spheroplasts. The result of these experiments was the successful recovery of a low copy target from a complex mixture of vector and total native genomic DNA.

FIG. 8. Construction of a P1/PAC rescue vector (P1RQ). The P1 rescue Clasper, P1RQ, was constructed with recombinogenic sequences that target the P1 or PAC vectors adjacent to the cloned insert. The pCYPAC vector was digested with BgIII and BssHII (*), releasing a 7 kb fragment containing the BamHI cloning site, T7 and Sp6 promoter recognition sites and the pUC19 stuffer fragment. The 7 kb BgIII/BssHII fragment was cloned into pClasper at the BamHI and AscI sites. To prepare the P1RQ vector for co-transformation, the vector is digested with BamHI (**) releasing the pUC19 stuffer fragment and exposing recombinogenic ends of 1.1 and 2.9 kb in length. P1RQ can be used to rescue the entire cloned insert from any P1 or PAC library clone.

FIG. 9. Amplification products from P1 rescue Clasper (P1RQ) recombinants. Yeast spheroplasts were co-transformed with P1RQ and 4 different P1 clones containing the mouse Cdx-2 gene. Recombinants were selected on leu.sup.- plates. Ten colonies from each transformation were picked and analyzed by colony PCR using primers that amplify a 254 bp region of the Cdx-2 gene. Between 50% and 80% of the colonies scored as positives indicating recombination between P1 and P1RQ. "M" means size marker; "C+" indicates PCR positive control; "C-" is a PCR negative control.

FIG. 10. EcoRI restriction enzyme digestion of PlRQ recombinants shuttled from yeast to bacteria. Seven PCR-positive recombinants were picked for transfer to bacteria. Total yeast DNA was prepared from each colony and directly used to transform E. coli strain DH10B by electroporation. Bacterial transformants were selected on LB plates containing chloramphenicol. Plasmid preps were made and the DNA was digested with EcoRI. Bacterial tranformants from the PCR-positive yeast recombinants all gave restriction patterns similar to that of the original Cdx-2 P1 (P1). The similarity in restriction patterns indicates that initial rescue in yeast followed by bacterial transformation and growth resulted in no gross rearrangements and that stable replication was accomplished. "M" is a 1 kb marker.

FIG. 11. Construction of a Clasper vector to generate a genome anthology for the Endothelin-1 gene. Recombinogenic ends of 372 base pairs from the 5' flanking region and 472 base pairs from the 3' flanking regions of the Endothelin-1 gene were cloned into Clasper.

FIG. 12. Colony hybridization of pClEDN transformants. A probe was prepared by amplifying and radiolabeling a .about.500 bp fragment from exon 2 of the Endothelin-1 gene using total human genomic DNA as a template. Filters were hybridized at 65.degree. C. and washed to a final stringency of 0.2.times. SSC, 0.1% SDS at 55.degree. C.

FIGS. 13A-13B. (A) Co-transformation with pClEDN and individual or pooled DNAs. Yeast were co-transformed with pClEDN and native genomic DNA from individuals and pooled members of Family 13294. Each bar represents a co-transformation experiment performed in triplicate and indicate the DNAs giving the highest (individual #131) and lowest (individual #72) number of transformants when used alone. This is compared with the number of transformants using pooled DNA from individuals. (B) Co-transformation with Hoxc8 Claspers and genomic DNA from transgenic mice. Each bar represents triplicate iterations of co-transformations with either pClC8-18 kb or pClC8-21 kb and genomic DNA from transgenic mice carrying the lacZ-URA3 reporter construct.

FIG. 14. Construction of Endothelin A receptor Clasper. pClENDRA was constructed using recombinogenic ends of 520 and 568 bp, from the 5' and 3' flanking regions, respectively, of the Endothelin A receptor gene.

FIG. 15. Construction of Estrogen receptor Clasper. pClESR was constructed using recombinogenic ends of 518 and 462 bp, from the 5' and 3' flanking regions, respectively, of the Estrogen receptor gene.

FIG. 16. Construction of Hoxc8 Claspers, pClC8-18 kb and pClC8-21 kb. Two Claspers were constructed to recover regions of 18 and 21 kb from native genomic DNA derived from transgenic mice carrying the lacZ-URA3 reporter in the Hoxc8 gene. The Claspers were constructed with identical 650 bp arms anchored at the Hoxc9 gene. pClC8-18 kb had a 565 bp recombinogenic arm from the 5' flanking region of Hoxc6. pClC8-21 kb had a 530 bp recombinogenic arm from a sequence approximately 3 kb downstream from the 565 bp site. Arrowheads indicate the 400 bp probe from the early enhancer region used to analyze recombinants.

FIG. 17. PCR analysis of Hoxc8 lacZ-URA3 recombinants. Transformants were analyzed by colony PCR using primers that amplified a 400 bp region from the Hoxc8 early enhancer region.

DETAILED DESCRIPTION OF THE INVENTION

The present invention relates to genomic collections of a single locus, called genome anthologies and to new methods for generating such genomic collections. The present invention provides methods for isolating templates from DNA samples of individuals. These collections of a single locus provide a new resource of information, useful in diagnosing an individual's condition or disease state. The present invention also relates to a new method for molecular haplotyping. Further the genome anthologies, when contained in expression cell lines, allow correlation between a haplotype and an alteration in the gene's expression levels, or structural changes in the gene product that result in altered function.

One use of a genome anthology is to determine the exact composition and frequency of haplotypes of a given gene in a population. For example, targeted in vivo cloning ("TIVC") is used to recover the gene for the Endothelin A receptor (ENDRA) from a statistical cross-section of the North American population. Each member of the ENDRA genome anthology is determined. The resulting data is valuable for studies on population diversity, anthropological lineage, the significance of diversity and lineage at the phenotypic level, and for establishing control haplotypes and frequencies of "normal" populations for comparison with disease states.

Another use of a genome anthology is in establishing associations between candidate genes and disease states. A genome anthology prepared for a disease candidate gene X, in either an extended family or a population collected for the presence of the disease, can be analyzed to determine the frequency of association of the disease with a specific haplotype of gene X, and in the case of multifactorial disease, the contribution of gene X to the disease.

Another use of genome anthologies is to make risk assessments for disease in populations. Comparisons can be made among anthologies generated from normal and disease population or families to determine the frequency with which the disease haplotype occurs in each population.

Another use of a genome anthology is to determine the specific contributions of haplotypes to disease etiology. The haplotypes of a gene, whose association with a disease has been established, will consist of variation residing in coding or regulatory regions of the gene. Variation in these regions may be critical to the structure and regulation of the gene product and contribute to the aberrant phenotype. Such causal or contributory relationships can be established by 1) analyzing the genome anthology for variation that could potentially affect phenotype, and 2) by developing and expressing the genome anthology in a model system of the disease.

Another use of a genome anthology is to diagnose disease in an affected individual. By recovering disease genes from a patient by TIVC, the patient's haplotypes can be determined and compared to a database of haplotype information prepared from pre-established genome anthologies created for the disease genes.

Another use of a genome anthology is to generate targets for developing therapeutics. Genome anthology targets can be used to prioritize lead compounds or to find lead compounds that have the most pervasive activity in all targets of relevance for a given disease. For example, the constituents of a genome anthology created for the ENDRA gene could be expressed in a mammalian cell line to develop a simple screening assay for drug compounds that bind to the receptor and elicit a specific response. The variation that occurs among the different members of the ENDRA genome anthology may affect the binding affinity of the compound, and/or affect the cellular response to the signal triggered by binding. Correlations can then be made between a specific haplotype for ENDRA and the structure activity relationship (SAR) of a compound or group of compounds.

Another use of a genome anthology is to determine patient sensitivities to specific drugs or treatment regimes. Genetic variability is a determinant of patient response to therapy in terms of both efficacy and side effects. The genome anthology can be used in this manner for both new and existing drugs. For example, by using genome anthologies to correlate a specific haplotype with a disease, and by using the genome anthology as targets for drug screening and development, it is possible to create a prognostic test for customizing therapy based on the patient's genotype. Alternatively, during clinical trials for a drug, a genome anthology can be created for the target gene from all members of the trial group and used to make correlations between genetic variation and treatment response. This information can then be used to determine the proper therapeutic regime for a patient with a given genetic makeup.

"Primers," as the term is used herein, refers to an oligonucleotide, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product which is complementary to a nucleic acid strand is induced, i.e. in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH. The primers are preferably single stranded for maximum efficiency in amplification, but may alternatively be double stranded. If double stranded, the primer is first treated to separate its strands before being used to prepare amplification products. Preferably, the primers are oligodeoxyribonucleotides at their 3' end. Primers can be entirely composed of DNA nucleotides or may contain PNAs or other nucleotide analogs at their 5' end. The primers must be sufficiently long to prime the synthesis of extension products in the presence of the inducing agent. The exact lengths of the primers will depend on many factors, including temperature, source of primer and use of the method. For diagnostic methods, the primers typically contain at least 10 or more nucleotides. It is also possible to use a primer which has been isolated from a biological source. One such example may be a restriction endonuclease digest of a large nucleic acid molecule.

"Genome anthologies" as the term is used herein, means a collection of hemizygous samples, from a single locus or members of a gene family, from multiple individuals of a defined population (FIG. 1). Genome anthologies are collections of defined loci from multiple individuals of a population selected by a common criterion. The term "locus" as it is used herein refers to any specific region of DNA, a gene, a group of genes or any group of nucleotides defining a DNA region of interest.

Haplotype systems involve multiple markers sufficiently closely linked that their inheritance is correlated and the effects of random genetic drift on them are not independent. Haplotypes may include, but are not limited to RFLPs (restriction length polymorphisms), microsatellites, short tandem 2-5 bp repeats, (Weber & May, 1989), bi-allelic markers, single base polymorphisms and insertion/deletions.

The genome anthologies of the present invention can be generated by the method comprising the steps of: obtaining a plurality of DNA samples; targeting a locus suspected of being in the DNA samples; and isolating the locus which is targeted. The DNA samples may be obtained by pooling DNA samples. Such a pooling step may combine two or more DNA samples from different sources. The sources may, for example, be different in that they are derived from different organs of a single individual, different individuals, different ethnic or nationality groups, different sexes, different populations or different organisms. Generally, any sources of DNA may be pooled for the creation of a genome anthology. "Pooling" comprises the combining of or mixing of the samples.

The targeting step of the instant invention may be carried out by one of several technologies. Many such targeting technologies are sequence determined. For example, long range primer extension may be used. This technique requires knowledge of the sequence flanking the targeted region. Alternatively nuclease digestion may be used to specifically cleave a genomic sequence at points flanking the target region. Other alternatives for the targeting step use other features of the genomic DNA, including but not limited to topology, protein-binding sites and membrane attachment sites. These features may be used to identify and target the sequence to be collected in the genome anthology.

Additionally, targeting may require allele specificity. One method of generating allele specificity uses an allele specific oligonucleotide in combination with a primer extension reaction and exonuclease degradation to generate hemizygous DNA targets.

Further, targeting of the locus may be accomplished by functional means. A locus that conveys a selective advantage when transformed into a deficient cell for that locus will be propagated advantageously. This locus from transfected variants can also become an anthology.

Generation of genome anthologies by TIVC requires specific targeting of the desired locus and high frequencies of recombination, particularly when pooled DNAs from a number of individual genomes are used. Co-transformation protocols to obtain high levels of recombination can be optimized in varius ways.

One method of enhancing the recombination frequency is to supplement the DNA with proteins involved in the recombination event. One such protocol uses Rad51, the yeast homolog of bacterial RecA (Shinohara et al. 1993; Kogoma, 1997). During the homologous recombination process, Rad51 catalyzes homologous pairing and strand exchange between single-stranded DNA and double-stranded DNA. pClasper targeting vectors may be constructed such that the cut ends of the recombinogenic arms are rendered single-stranded for a portion of their length by exonuclease digestion. Prior to co-transformation of yeast, the pClasper with single-stranded recombinogenic ends is mixed with the Rad51 protein. Rad51 protein binds to the single-stranded portions of the recombinogenic arms and upon co-transformation of yeast, facilitates recombination with the target DNA sequence.

Another example method for increasing the recombination frequency pre-targets the desired locus and unravels and stabilizes that area of DNA to promote access by pClasper. Jankowsky et al. (1997) have shown that the binding of short PNAs or oligonucleotides to regions flanking the hammerhead catalytic sequence in RNA molecules enhances the activity of ribozymes. Presumably, binding of the PNA or oligomer alters higher order structures in the RNA that hinder the alignment and annealing of the ribozyme to the substrate. In a similar fashion for TIVC, short oligonucleotides or PNAs (9-12 mers) are synthesized complementary to sequences of the target locus that flank the recombinogenic sites. The annealed oligonucleotides or PNAs can exert a localized effect to constrain the genomic DNA in a more open, untangled manner. Such a configuration can make the recombinogenic sites more accessible for pClasper.

Yet another method involves modifications to the autonomous replication sequences (ARS) in pClasper. Human genomic DNA has ARS-like sequences that may be capable of providing the same function in yeast (Stinchcomb et al., 1980). Deleting this sequence from the vector may improve recombination frequencies by reducing the background caused by recircularization and replication of the vector in the absence of recombination. Genomic DNA targets that contain an ARS-like sequence can provide complementation for the deleted ARS sequence in pClasper. By targeted mutagenesis, a version of pClasper has been produced with a non-functional ARS sequence to take advantage of this possibility. In addition, by optimizing the co-transformation procedures as described previously (Example 1), a very low background has been achieved.

The isolation step of the present method may be accomplished by several steps known in the art, including but not limited to, standard cloning and amplification techniques. For example, amplification techniques include PCR (Mullis U.S. Pat. Nos. 4,683,202 and 4,683,795) LCR (EP Pat. No. 320,308), SDA (U.S. Pat. No. 5,455,166), and NASBA (U.S. Pat. Nos. 5,130,238; 5,480,784; U.S. Pat. No. 5,399,491). Cloning, as employed in the isolation step of the present invention, is described in detail in Sambrook et al. (1989).

Optionally, these isolated collections of a specific locus may be separated and individual variants may be analyzed.

A particularly preferred embodiment combines the targeting and isolating steps of the method. In this embodiment of the present invention, a method for generating genome anthologies comprises targeted in vivo cloning in yeast using a yeast bacteria shuttle vector (FIG. 2). Up to 300 kb of DNA can be cloned by the targeted in vivo cloning method of the present invention, permitting ready analysis of linkage disequilibrium as a function of distance, by sampling markers across large hemizygous regions. Genome anthologies produced by the TIVC method are cloned hemizygous genomic DNA samples. The cloning aspect of the present invention facilitates the preservation of genome anthologies as permanent collections, as well as, the production of significant quantities of DNA from each sample. The yeast bacteria shuttle vector has recombinogenic ends homologous to specific regions in uncloned genomic DNA that flank the segment to be haplotyped and/or targeted. Yeast cells are co-transformed with the linearized vector and uncloned genomic DNA from one or more individual samples to be haplotyped. Preferably, pools of DNA from several individuals are used as the target template. By homologous recombination in yeast, the target region is rescued in the yeast bacteria shuttle vector as hemizygous clones that can be shuttled to bacteria for amplification. Significant quantities of hemizygous DNA from both haplotypes of each individual are thus made available for genotyping of all polymorphisms present in the haplotype.

One yeast bacterial shuttle vector used in TIVC comprises a yeast replication origin, a yeast selection marker gene, a bacteria replication origin, a bacteria selection marker gene and at least one unique cloning site. The yeast replication origin is used for replication and propagation in yeast. Examples of such elements are any of the known yeast autonomous replicating sequences (ARS) or centromeres (CEN). The yeast selection marker gene may be selected from a known gene capable of being selected for: such genes include but are not limited to genes encoding auxotrophic markers, such as LEU2, HIS3, TRP1, URA3, ADE2 and LYS2. Alternatively, genes encoding a protein conferring drug resistance on a host cell can be used as a yeast selection marker. Such genes include, but are not limited to CAN1 and CYH2. The bacterial replication origin is preferably selected from those replication origins used for stable bacterial replication of large DNA inserts, including the F factor and the P1 replicon. Other bacterial replicons may be used for smaller DNA inserts. Any of the many known bacterial selection marker genes may be used; examples of bacterial selection marker genes include genes conferring bacterial resistance to antibiotics, such as kanamycin, ampicillin, tetracycline, Zeocin, neomycin and hygromycin and chloramphenicol. Other antibiotic resistance genes are encompassed herein and are known to those skilled in the art.

One such yeast bacteria shuttle vector useful in the TIVC embodiment of the present invention comprises combinations of several vector features: 1) site-specific targeting, 2) yeast to bacterial (and bacteria to yeast) shuttle capability, 3) interchangeable recombinogenic ends, 4) large insert capacity, and 5) near universal compatibility with large insert cloning systems in bacteria and yeast. One such vector is known as pCLASPER, which was first described by Ruddle (Bradshaw et al, 1995). The present invention may use a yeast bacteria shuttle vector, such as pCLASPER to shuttle uncloned genomic DNA into bacteria for further analysis, such as haplotyping.

Transformed yeast DNA integrates into the yeast genome almost exclusively by homologous recombination. Free ends of DNA are highly recombinogenic, and recombination frequencies increase by several orders of magnitude when double-stranded breaks are made within homologous sequences carried on a plasmid (Orr-Weaver et al., 1981; 1983). When plasmids are linearized with a partial loss of homologous sequence information, the sequence gap is repaired by recombination with chromosomal sequences. In cases where plasmids contain more than one possible recombination site, gap-repair can be site-directed by linearizing the plasmid within a specific sequence. Selectable markers located on the plasmid permit high-frequency rescue of the gap-repaired plasmid. When a target gene in genomic DNA and the TIVC vector equipped with target recognition sequences are used to co-transform yeast, recombination preferentially takes place between the target and vector.

As part of a general experimental scheme (FIG. 2), the general procedures of Bradshaw et al. (1995) from recombinogenic end cloning to transfer of circularized recombinants into E. coli are followed. Standard PCR and molecular cloning techniques are used for preparing recombinogenic ends in pCLASPER. The pCLASPER construct is linearized at a unique site between the end sequences and used to co-transform yeast with the target DNA.

The TIVC method for creating genome anthologies relies on the high frequency of homologous recombination of yeast, for which an extensive body of knowledge exists. While the frequency of homologous recombination is much lower in mammalian systems, there is a great deal of ongoing research to determine the factors that control this activity. Methods to enhance and control the event frequency in mammalian cells is becoming available. current knowledge based on research of recombination stem cells (Mansour et al, 1988), and the development of mammalian artificial chromosomes (Huxley, 1997; Rosenfield, 1997; Harrington et al., 1997), makes it possible to create genome anthologies directly in human cells, thus eliminating the steps of isolating the human genomic DNA, mixing with a pClasper rescue vector, and co-transforming yeast. The method for TIVC in human cells (or with small modifications, any mammalian cells), would include a pClasper-like vector capable of stable, selectable, and autonomous replication in mammalian cells. The vector would include 1) human centromeric and telomeric sequences, 2) a selectable marker gene such as the neomycin resistance gene, and 3) recombinogenic ends for the target gene. Replication can be satisfied by sequences present in the genomic DNA to be rescued (i.e. ARS-like sequences). Optional capability of replication and selection in a bacterial host may be employed, primarily for ease in rescue and manipulation of the recombinant. Alternatively, it may be possible to modify vector types to be used for expression of genes in mammalian cells.

To perform TIVC in human cells, lymphocytes collected from a blood sample can be transformed with the human pClasper and recombination can take place between the human pClasper and the lymphocyte chromosomal DNA at the target site. The lymphocytes containing the recombinants are selected in tissue culture medium supplemented with neomycin.

In another embodiment of the present invention, a method for generating genome anthologies comprises a long-range primer extension step and an exonuclease degradation of single stranded DNA step, in multiply heterozygous genomic DNA. The primer extension step of the method may be primed with oligonucleotides. The oligonucleotide serves as an extension primer for DNA synthesis by a thermostable DNA polymerase starting from a polymorphic locus and extending to other polymorphic loci downstream or upstream. The resulting double stranded product is protected from exonuclease hydrolysis.

This exonuclease-based method generates selected loci from whole genome samples by selective protection of the desired locus from exonuclease hydrolysis. See FIG. 3 for a detailed schematic diagram of this method. A primer is designed to flank the locus. The genomic DNA is denatured and the primer allowed to anneal to its target by lowering the temperature. After annealing, the primer is extended at least once with a thermostable DNA polymerase at 72.degree. C., generating a double-stranded allele-specific extension product. DNA strands without the primer annealing sequence remain single stranded. Treatment with the exonuclease degrades single-stranded DNA while double-stranded DNA remains intact. If the extension product reaches a distal variable site (the M locus in FIG. 3), the variable site (the M allele in FIG. 3) is protected. The protected extension products are cloned into a vector to create an anthology. Genotyping of individual clones may then be performed by any of several known methods for the distal variable site with the expectation that only a single allele will be found. The result of the assay is thus a genotype at the B and M loci as well as the unequivocal haplotypes for the combination.

In an alternative method for generating genome anthologies one may use affinity capture techniques. In this embodiment, genomic DNA from an individual is digested by restriction enzyme (for example, an infrequently cutting enzyme-8 bp cutter or larger, or methylation sensitive, etc.) and applied to an affinity matrix at a specific temperature and annealing buffer composition. The affinity matrix may consist of a specific DNA oligonucleotide that anneals to the target site in the genomic DNA. Non-binding genomic DNA is washed away leaving the target sequence attached to the matrix. The target sequence is then eluted from the matrix and collected. This material represents both strands of the target sequence. The collected material may be cloned for preservation purposes, not necessarily for accumulation of quantities or isolation of molecules.

In a similar method, the oligonucleotide is labelled with a biotin moiety, annealed to the genomic DNA target, and applied to a streptavidin affinity matrix. Biotin/streptavidin is an obvious example; other combinations should be possible.

Instead of DNA oligonucleotides, the targeting may consist of Peptide Nucleic Acids (PNA). PNAs are DNA analogs with the sugar/phosphate backbone replaced by a peptide-like backbone. PNA/DNA duplexes have a higher dissociation temperature than DNA/DNA or DNA/RNA, allowing greater stringency for specificity in annealing and purification conditions.

In one embodiment of the present invention, normal genotypic variation is correlated with subtle phenotypic changes, and the variation found is utilized for diagnosis and therapeutics (i.e., pharmacogenomics). By providing clones which can be efficiently manipulated in a number of ways and with different techniques, the present invention provides sufficient material for a variety of tasks over long periods of time. Inherent variation along with unambiguous phase information at any site within the locus is discoverable and identifiable, without contamination or interaction with other regions.

In another embodiment of the present invention, natural gene variants are harvested to generate a genome anthology. DNA from individuals is extracted and collected in a genome anthology for a given locus which can be utilized in the pharmaceutical development process. Each of the gene variants is a naturally occurring product that can then be manipulated as a target for pharmaceutical development.

Another embodiment of the present invention relates to the expression of the gene variants to obtain the protein products that correlate with the genes in an expression system. The variants of the gene, now manifested as isoforms, have different properties, either in catalysis, agonist binding, or target adhesion, as well as general structural properties which renders them specific targets for pharmaceutical development.

In yet another embodiment, the gene variants are cloned in reporter systems. The reporter is then utilized to monitor the expression of the gene variant. Different gene variants may be correlated to variable expression levels, and not to the structure of the protein product itself. Therefore, the gene variants in the anthology may also comprise collections of variable regulatory sequences, enhancers, and promoters that may be of importance in pharmaceutical developments. For instance, different gene variants may respond to a given suppressor compound in different ways. Some cell-lines containing gene-variants are hypersensitive to the suppressor and may be known to be down-regulated, whereas all lines with less sensitive variants will continue expression levels close to normal.

Any gene or genomic locus may be targeted for a genome anthology. Typically, the locus selected may be up to 300 kb in size. However, locus of greater than 300 kb are also possible. The criterion for selecting a locus will vary from one anthology to the next. Below, several example target regions are set forth. These examples are not intended to limit the scope of the present invention but rather serve as illustrative examples of regions which can be selected. The skilled artisan will recognize that any locus may be targeted for a genome anthology and hence is encompassed by the present invention.

All publications, patents and articles referred to within the specification are herewith incorporated in toto, by reference into the application. The following examples are presented to illustrate the present invention but are in no way to be construed as limitations on the scope of the invention. One skilled in the art will readily recognize other permutations within the purview of the invention.

PATENT EXAMPLES Available on request
PATENT PHOTOCOPY Available on request

Want more information ?
Interested in the hidden information ?
Click here and do your request.


back