ORIGIN OF REPLICATION COMPLEX GENES - COLD SPRING HARBOR LAB

Title:

ORIGIN OF REPLICATION COMPLEX GENES

Document Type and Number:

WIPO Patent Application WO/1996/040977

Kind Code:

Abstract:

Origin of Replication Complex (ORC) genes, nucleic acids which encode ORC proteins and hybridization reagents, probes and primers capable of hybridizing with ORC genes and methods for screening chemical libraries for lead compounds for pharmacological agents useful in the diagnosis or treatment of disease associated undesirable cell growth are provided. An exemplary screen involves forming a mixture comprising a recombinant ORC protein, a natural intracellular ORC protein binding target, and a candidate pharmacological agent; incubating the mixture under conditions whereby, but for the presence of said candidate pharmacological agent, said ORC protein selectively binds said binding target; and detecting the presence or absence of specific binding of said ORC protein to said binding target.

Inventors:

STILLMAN BRUCE W
BELL STEPHEN P
KOBAYASHI RYUJI
RINE JASPER
FOSS MARGIT
MCNALLY FRANCIS J
LAURENSON PATRICIA
HERSKOWITZ IRA
LI JOACHIM J

Application Number:

PCT/US1996/009403

Publication Date:

December 19, 1996

Filing Date:

June 07, 1996

Export Citation:

Click for automatic bibliography generation Help

Assignee:

COLD SPRING HARBOR LAB (US)
UNIV CALIFORNIA (US)

International Classes:

C12N15/09; A61K38/00; A61K48/00; A61P31/12; A61P35/00; C07H21/04; C07K14/39; C07K14/395; C07K14/47; C12Q1/68; G01N33/94; (IPC1-7): C12Q1/00; C07H21/00; C07H21/04; C12N15/00; C12P21/06

Other References:

DATABASE EMBL, INTELLIGENETICS, INC., August 1994, Accession #S48333, BOWMAN et al., "Hypothetical Protein - Yeast (Saccharomyces Cerevisiae)".
MOL. BIOL. CELL., June 1995, Vol. 6, LOO et al., "The Origin Recognition Complex in Silencing, Cell Cycle Progression and DNA Replication", pages 741-756.
SCIENCE, 17 December 1993, Vol. 262, BELL et al., "Yeast Origin Recognition Complex Functions in Transcription Silencing and DNA Replication", pages 1844-1849.
NATURE, 14 May 1992, Vol. 357, BELL et al., "ATP-Dependent Recognition of Eukaryotic Origins of DNA Replication by a Multiprotein Complex", pages 128-134.
BIOL. ABSTR./RRM, Vol. 46, No. 8, May 1994, Abstract No. 124959 (MT-239), WEEKS et al., "Differential Effects of Heavy Metal Ions on Mammalian DNA Replication".
CELL, 02 June 1995, Vol. 81, LIANG et al., "ORC and Cdc6p Interact and Determine the Frequency of Initiation of DNA Replication in the Genome", pages 667-676.
CELL, 17 November 1995, Vol. 83, BELL et al., "The Multidomain Structure of Orclp Reveals Similarity to Regulators of DNA Replication and Transcriptional Silencing", pages 563-568.
PROC. NATL. ACAD. SCI. U.S.A., March 1995, Vol. 92, RAO et al., "The Origin Recognition Complex Interacts with a Bipartite DNA Binding Site within Yeast Replicators", pages 2224-2228.

Download PDF:

View/Download PDF PDF Help

Claims:

WHAT IS CLAIMED IS:

1.	An isolated nucleic acid encoding an origin of replication (ORC) protein, said ORC protein selected from the group consisting of K. lactis, S. pombe and human ORCl and A. thaliana, C. elegans and human ORC2.

2.	An isolated nucleic acid according to claim 1 , wherein said ORC protein is a human ORC.

An isolated recombinant nucleic acid consisting of a natural ORC gene, gene transcript, or complement thereof, selected from the group consisting of K. lactis, S. pombe and human ORCl and A. thaliana, C. elegans and human ORC2, directly joined to a nucleotide sequence not naturally directly joined to said gene, gene transcript or complement thereof.

4.	An isolated nucleic acid according to claim 1, wherein said ORC gene, gene transcript, or complement thereof, is a human ORC gene, gene transcript or complement thereof.

An isolated origin of replication (ORC) hybridization probe comprising an ORC gene, gene transcript, or complement thereof, sequence contained in a natural ORC gene, gene transcript, or complement thereof, selected from the group consisting of K. lactis, S. pombe and human ORCl and A. thaliana, C. elegans and human ORC2, capable of specifically hybridizing with the corresponding said natural ORC gene in the presence of the corresponding gene, gene transcript, or complement thereof, from S. cerevisiae.

6.	An isolated hybridization probe according to claim 3, wherein said ORC gene, gene transcript, or complement thereof, is a human ORC gene, gene transcript, or complement thereof.

7.	An isolated origin of replication (ORC) protein selected from the group consisting of K. lactis, S. pombe and human ORCl and A. thaliana, C. elegans and human ORC2.

8.	An isolated protein according to claim 5, wherein said protein is a human ORC protein.

A method of screening for lead pharmaceuticals, said method comprising the steps of: forming a mixture comprising a recombinant origin or replication (ORC) protein expressed from an isolated nucleic acid according to claim 1, 2, 3 or 4, a natural intracellular ORC protein binding target capable of specifically binding said ORC protein, and a candidate pharmacological agent; incubating said mixture under conditions whereby, but for the presence of said candidate pharmacological agent, said ORC protein selectively binds said binding target at a first binding affinity; detecting a second binding affinity of said ORC protein to said binding target in the presence of said agent, wherein a difference between said first and second binding affinity indicates that said agent is a lead pharmaceutical agent.

10.	A method of modulating ORC gene expression in a cell, said method comprising contacting said cell with a hybridization probe according to claim 3 or 4.

11.	A method according to claim 9, wherein said cell is a human cell and said ORC protein is expressed from an isolated nucleic acid according to claim 2 or 4.

Description:

Origin of Replication Complex Genes

INTRODUCTION Field of the Invention

The field of this invention is protein and genes involved in replication and their use in drug screening.

Background

The identification of new pharmaceuticals is a multibillion dollar industry. The goal of therapeutic intervention is frequently to control cell growth, whether the cell be a host cell (e.g a cancer cell) or a foreign cell (e.g. an infectious pathogen). Cellular components involved in the initiation of DNA synthesis have provided proven targets for therapeutic intervention to control cell growth. Such targets find immediate industrial application in the screening of chemical libraries for inhibitors of cellular replication. Study of the control and regulation of DNA synthesis in the yeast Saccharomyces cerevisiae has identified a mutiprotein complex, the origin recognition complex (ORC), which is essential for DNA replication (Bell and

Stillman, 1992). Disclosed herein are ORC genes and proteins from a number of representative animal species.

Relevant Literature

A multi-protein complex that recognizes cellular origins of DNA replication was reported in Bell and Stillman (1992) Nature 357, 128-134. ORC genes have been reported in Micklem et al. (1993) Nature 366, 87-89, Foss et al. (1993) Science 262, 1838-1844, Li and Herskowicz (1993) Science 262, 1870-1874, Bell et al. (1993), Science 262, 1844-1870 and Liang, Weinreich and Stillman (1995) Cell 81 (June 1, 1995) issue.

SUMMARY OF THE INVENTION The invention provides methods and compositions relating to Origin of Replication Complex (ORC) genes. The compositions include nucleic acids which encode ORC proteins and hybridization reagents, probes and primers capable of hybridizing with ORC genes. The invention includes methods for screening chemical libraries for lead compounds for pharmacological agents useful in the diagnosis or treatment of disease associated undesirable cell growth. In one embodiment, the methods involve (1) forming a mixture comprising a recombinant ORC protein, a natural intracellular ORC protein binding target, and a candidate pharmacological agent; (2) incubating the mixture under conditions whereby, but for the presence of said candidate pharmacological agent, said ORC protein selectively binds said binding target; and (3) detecting the presence or absence of specific binding of said ORC protein to said binding target, wherein the absence of said selective binding indicates that said candidate pharmacological agent is a lead compound for a pharmacological agent capable of disrupting ORC protein function and inhibiting cell growth.

DETAILED DESCRIPTION OF THE INVENTION The invention provides methods and compositions relating to the eukaryotic origin of replication complex. The complex comprises six proteins which are highly conserved across eukaryotes. The nucleotide sequences of cDNAs of natural transcripts encoding K. lactis, S. pombe and human ORC1 are shown as SEQ ID NOS:l, 3 and 5, respectively; and the full corresponding conceptual translates of these cDNAs are shown as SEQ ID NOS:2, 4 and 6. The nucleotide sequences of cDNAs of natural transcripts encoding A. thaliana, C. elegans and human ORC2 are shown as SEQ ID NOS:7, 9 and 11, respectively; and the full corresponding conceptual translates of these cDNAs are shown as SEQ ID NOS:8, 10 and 12.

The subject ORC proteins of the invention may be incomplete translates of the cDNA sequences or deletion mutants of the corresponding conceptual translates, which translates or deletion mutants have the ORC binding activity and specificity described herein. The subject ORC proteins are isolated, partially pure or pure and are typically recombinantly produced. An "isolated" protein for example, is unaccompanied by at least some of the material with which it is associated in its natural state and constitutes at least about 0.5%, preferably at least about 2%, and more preferably at least about 5% by weight of the total protein in a given sample; a partially pure protein constitutes at least about 10%, preferably at least about 30%, and more preferably at least about 60% by weight of the total protein in a given sample; and a pure protein constitutes at least about 70%, preferably at least about 90%, and more preferably at least about 95% by weight of the total protein in a given sample. A wide variety of molecular and biochemical methods are available for generating and expressing the subject compositions, see e.g. Molecular Cloning, A Laboratory Manual (Sambrook, et al. Cold Spring Harbor Laboratory), Current Protocols in Molecular Biology (Eds. Aufubel, et al., Greene Publ. Assoc, Wiley-Interscience, NY) or that are otherwise known in the art. The invention provides ORC-specific binding agents including natural intracellular binding targets such as ori sites, other ORC proteins, etc. and methods of identifying and making such agents, and their use in diagnosis, therapy and pharmaceutical development. For example, ORC-specific agents, especially agents which modulate ORC function, are useful in a variety of diagnostic and therapeutic applications, especially where disease is associated with excessive cell growth. Novel ORC-specific binding agents include ORC-specific antibodies and other natural intracellular binding agents identified with assays such as one- and two- hybrid screens, non-natural intracellular binding agents identified in screens of chemical libraries, etc.

Generally, ORC-specificity of the binding agent is shown by binding equilibrium constants. Such agents are capable of selectively binding an ORC, i.e. with an equilibrium constant at least about 10 ⁷M ^"', preferably at least about 10 ⁸ M ^"1, more preferably at least about 10 ⁹ M ^"1. A wide variety of cell-based and cell-free assays may be used to demonstrate ORC-specific binding; preferred are rapid in vitro, cell-free assays such as mediating or inhibiting ORC-protein (e.g. ORC-ORC) binding, gel shift assays, immunoassays, etc.

The invention also provides nucleic acids encoding the subject proteins, which nucleic acids may be part of ORC-expression vectors and may be incorporated into recombinant cells for expression and screening, transgenic animals for functional studies (e.g. the efficacy of candidate drugs for disease associated with expression of a ORC), etc and ORC-specific hybridization probes comprising an ORC-specific sequence, including replication/amplification primers. The hybridization probes contain a sequence common or complementary to the corresponding ORC gene sufficient to make the probe capable of specifically hybridizing to the corresponding ORC. Hybridization probes having in excess of 50 continuous bases of ORC sequence are generally capable of hybridizing to the corresponding ORC cDNA under stringency conditions characterized by a hybridization buffer comprising 0.9 M saline/0.09 M sodium citrate (SSC) buffer at a temperature of 37°C and remaining bound when subject to washing with the SSC buffer at 37 °C; and preferably in a hybridization buffer comprising 20% formamide in 0.9 M saline/0.09 M sodium citrate (SSC) buffer at a temperature of 42 °C and remaining bound when subject to washing at 42 °C with 0.2 X SSC buffer at 42 °C.

The subject nucleic acids are isolated, meaning they comprise a sequence joined to a nucleotide other than that which it is joined to on a natural chromosome and usually constitutes at least about 0.5% , preferably at least about 2%, and more preferably at least about 5% by weight of total nucleic acid present in a given fraction. A partially pure nucleic acid constitutes at least about 10%, preferably at least about 30%, and more preferably at least about 60% by weight of total nucleic acid present in a given fraction. A pure nucleic acid constitutes at least about 80%, preferably at least about 90%, and more preferably at least about 95% by weight of total nucleic acid present in a given fraction. The subject nucleic acids find a wide variety of applications including use as translatable transcripts, hybridization probes, PCR primers, therapeutic nucleic acids, etc.; use in detecting the presence of ORC genes and gene transcripts, in detecting or amplifying nucleic acids encoding additional ORC homologs and structural analogs, and in gene therapy applications, e.g. antisense oligonucleotides capable of inhibiting the intracellular expression of a targeted ORC transcrip The invention provides efficient methods of identifying pharmacological agents or lea compounds for agents active at the level of a ORC modulatable cellular function, particularly DNA replication. Generally, these screening methods involve assaying for compounds which interfere with an ORC binding activity. The methods are amenable to automated, cost-

effective high throughput screening of chemical libraries for lead compounds. Identified reagents find use in the pharmaceutical industries for animal and human trials; for example, the reagents may be derivatized and rescreened in in vitro and in vivo assays to optimize activity and minimize toxicity for pharmaceutical development. Target therapeutic indications are limited only in that the target cellular function be subject to modulation, usually inhibition, by disruption of the formation of a complex comprising ORC and one or more natural ORC intracellular binding targets. Target indications may include infection, cell growth and regulatory disfunction, such as neoplasia, inflammation, hypersensitivity, etc.

A wide variety of assays for binding agents are provided including labeled in vitro kinase assays, protein-protein binding assays, immunoassays, cell based assays, etc. The ORC compositions used the methods are usually added in an isolated, partially pure or pure form and are typically recombinantly produced. The ORC may be part of a fusion product with another peptide or polypeptide, e.g. a polypeptide that is capable of providing or enhancing protein-protein binding, stability under assay conditions (e.g. a tag for detection or anchoring), etc. The assay mixtures comprise a natural intracellular ORC binding target. While native binding targets may be used, it is frequently preferred to use portions (e.g. peptides, nucleic acid fragments) thereof so long as the portion provides binding affinity and avidity to the subject ORC conveniently measurable in the assay. The assay mixture also comprises a candidate pharmacological agent. Generally a plurality of assay mixtures are run in parallel with different agent concentrations to obtain a differential response to the various concentrations. Typically, one of these concentrations serves as a negative control, i.e. at zero concentration or below the limits of assay detection. Candidate agents encompass numerous chemical classes, though typically they are organic compounds; preferably small organic compounds and are obtained from a wide variety of sources including libraries of synthetic or natural compounds. A variety of other reagents may also be included in the mixture. These include reagents like salts, buffers, neutral proteins, e.g. albumin, detergents, etc. which may be used to facilitate optimal binding and/or reduce non-specific or background interactions, etc. Also, reagents that otherwise improve the efficiency of the assay, such as protease inhibitors, nuclease inhibitors, antimicrobial agents, etc. may be used.

Frequently, the assay mixtures comprise at least a portion a nucleic acid comprising a sequence which shares sufficient sequence similarity with a gene or gene regulatory region to

which the targeted ORC protein naturally binds (e.g. an ori sequence) to provide sequence- specific binding. Such a nucleic acid may further comprise one or more sequences which facilitate the binding of one or more additional ORC proteins which cooperatively bind the nucleic acid. Where used, the nucleic acid portion bound by the ORC may be continuous or segmented and is usually linear and double-stranded DNA, though circular plasmids or other nucleic acids or structural analogs may be substituted so long as ORC sequence-specific binding is retained. In some applications, supercoiled DNA provides optimal sequence- specific binding and is preferred. The nucleic acid may be of any length amenable to the assay conditions and requirements.

The resultant mixture is incubated under conditions whereby, but for the presence of the candidate pharmacological agent, the ORC specifically binds the cellular binding target, portion or analog. The mixture components can be added in any order that provides for the requisite bindings. Incubations may be performed at any temperature which facilitates optimal binding, typically between 4 and 40 °C, more commonly between 15° and 40 °C. Incubation periods are likewise selected for optimal binding but also minimized to facilitate rapid, high- throughput screening, and are typically between .1 and 10 hours, preferably less than 5 hours, more preferably less than 2 hours.

After incubation, the presence or absence of specific binding between the ORC fragment and one or more binding targets is detected by any convenient way. For cell-free binding type assays, a separation step is often used to separate bound from unbound components. Separation may be effected by precipitation (e.g. immunoprecipitation), immobilization (e.g. on a solid substrate such as a microtiter plate), etc., followed by washing Detection may be effected in any convenient way. For cell-free binding assays, one o the components usually comprises or is coupled to a label. A wide variety of labels may be employed - essentially any label that provides for detection of bound protein. The label may provide for direct detection as radioactivity, luminescence, optical or electron density, etc. o indirect detection such as an epitope tag, an enzyme, etc. The label may be appended to the protein e.g. a phosphate group comprising a radioactive isotope of phosphorous, or incorporated into the protein structure, e.g. a methionine residue comprising a radioactive isotope of sulfur. A variety of methods may be used to detect the label depending on the nature of the label and other assay components. For example, the label may be detected bo

to the solid substrate or a portion of the bound complex containing the label may be separated from the solid substrate, and thereafter the label detected. Labels may be directly detected through optical or electron density, radiative emissions, nonradiative energy transfers, etc. or indirectly detected with antibody conjugates, etc. For example, in the case of radioactive labels, emissions may be detected directly, e.g. with particle counters or indirectly, e.g. with scintillation cocktails and counters.

The following experiments and examples are offered by way of illustration and not by way of limitation.

EXPERIMENTAL 1. Isolation and cloning of ORCs.

The S. cerevisiae ORCl gene encodes a protein that is the largest subunit of ORC. The ORCl protein has two regions of homology with other known proteins; at the amino terminus there is homology with SIR3, a S. cerevisiae gene involved in transcriptional repression, and in the carboxyl region there is homology with a class of nucleotide binding proteins. To identify genes related to ORCl in closely related yeast species, we took a PCR approach with primers based on amino acids conserved between ORCl and SIR3 and identified a gene highly related to ORCl in the yeast Kluyveromyces lactis, a budding yeast closely related to S. cerevisiae and the pathogenic yeast Candida albicans. SEQ ID NOS: 1 and 2 show the cDNA and conceptual translate of ORCl from K. lactis, coding is from nucleotides 395-3056. Another ORCl gene was identified in the fission yeast

Schizosaccharomyces pombe by low stringency DNA hybridizations. SEQ ID NOS:3 and 4 show the cDNA and conceptual translate of ORCl from S. pombe, coding is from nucleotides 86-2209.

An alignment of the three yeast species of ORC 1 revealed areas of the protein that were highly conserved. To identify an ORCl -related gene in human cells, we designed degenerate PCR primers to domains conserved between three related yeast ORCl genes. These primers were used in pairwise combinations on human cDNA to identify a human ORCl gene. PCR products that were found to be related to ORCl were then used to isolate a full-length cDNA.

cDNA Synthesis: Reverse transcription of total RNA isolated from human 293 cells was carried out in 30 μl reactions containing 10 μg total RNA, 10 pmole of primer, 6 μl of 5X Superscript II reaction buffer, 1 mM DTT, 1 mM dNTPs, 25 units of RNasin (Promega), and 200 units of Superscript II reverse transcriptase (GD3CO-BRL). The RNA and primers were heated at 70 °C for 5 minutes and then cooled on ice. The remaining reaction components were added and the reactions were carried out at 37 °C for 1 hour. The reverse transcriptase was inactivated at 70 ^CC for 15 minutes and the reactions were phenol-extracted and ethanol precipitated. The products were resuspended in 250 μl of DEPC-treated water and used in PCR reactions.

PCR: PCR reactions were carried out in 50 μl reactions containing 5 μl of template cDNA synthesized with primer PO1PCR5, 100 pmole of each primer, 10% DMSO, 1.5 mM dNTPs, 5 μl 10X reaction buffer [166 mM ammonium sulfate, 670 mM Tris-HCl (pH 8.8), 20 mM MgCl ₂, 100 mM B-mercaptoethanol, 67 μM EDTA] 4-6 mM MgC12, and 1.5 units of Taq DNA polymerase (Boeringer-Mannheim). The reactions were overlaid with mineral oil and cycled in a Perkin-Elmer Thermal cycler 480 with the first cycle consisting of denaturation for 2 minutes at 94 °C, annealing for 1 minute at 42 °C, and extension for 1 minute at 72 °C, followed by 27 cycles of 40 sec at 94°C, 1 minute at 42°C, 1 minute at 72°C, with a final extension of 5 minutes at 72 °C. The reactions were phenol-extracted, precipitated, and analyzed on an 8% TBE polyacrylamide gel. Products of the correct predicted size were extracted from the gel, cloned and analyzed by sequencing. Sequence analysis of several clones revealed homology between the primer binding sites to S. cerevisiae ORCl. An internal, exact primer was designed and used in conjunction with 3' RACE to identify a larger fragment.

3' RACE: cDNA Synthesis: Reverse transcription of 10 μg of total 293 RNA was carried out in 30 μl reaction containing 10 μM 3' anchor primer, as described above, except that the reaction was carried out for 30 minutes at 37 °C, 30 minutes at 42 °C, with a final incubation for 15 minutes at 50 °C. The reverse transcriptase was inactivated by heat treatment at 70°C for 15 minutes. The reaction was phenol-extracted, ethanol precipitated, and the products were resuspended in 300 μl of DEPC-treated water and used as template f RACE reactions.

RACE: First-round 3' RACE PCR reactions were performed in a 50 ul reaction containing 100 pmole of each primer, 5 μl of cDNA, 1.5 mM dNTPs, 10% DMSO, 6 mM MgC12, and 2.5 units of Taq DNA polymerase. Thermal cycling was performed with the first cycle consisting of denaturation at 94 °C for 3 minutes, annealing at 55 °C for 1 minute, and extension at 72 °C for 20 minutes for one cycle, followed by 28 cycles of 94 °C for 1 minute, 55°C for 1 minute, and 72°C for 4 minutes with a final extension at 72°C for 10 minutes. Second-round PCR was performed as described for the first round except that the template was 1 μl from the first round PCR reaction, and the 3' anchor primer was replaced with the 3 ' adapter primer. The reaction was cycled for 29 cycles of 94 °C for 1 minute, 55 °C for 1 minute, and 72 °C for 4 minutes, with a final extension at 72 °C for 10 minutes. The reactions were phenol-extracted, ethanol-precipitated and analyzed by electrophoresis on 1% agarose gel and visualized with ethidium bromide. Amplified products were gel purified, cloned and sequenced. Sequence analysis revealed clones with high homology to S. cerevisiae ORCl.

To isolate a full-length cDNA, we screened a phage lambda gtlO cDNA library constructed from NTERD21, an embryonic carcinoma human cell line, with a RACE product as a probe. A total of 950,000 plaques were screened by hybridization at 65 °C in 7% SDS/ 0.25M NaPO4, pH 7.0. The filters were washed with successively stringent washes, with the final wash of 0.2X SSC, 0.1% SDS at 65°C. Positives plaques were purified and phage DNA was isolated, cloned into pKS-i- and sequenced on both strands using an automated sequencer (Applied Biosystems). SEQ ID NOS:5 and 6 show the cDNA and conceptual translate of human ORCl: the coding region is from 220 to 2805. An alignment of the 4 ORCl -related genes reveals a high degree of sequence identity and similarity. For example, the S. cerevisiae and K. lactis amino acid sequences are 50% identical whereas the more distantly related S. cerevisiae and human amino acid sequences are 27% identical with each other. This demonstrates that the ORC proteins are conserved from yeast to human.

Partial cDNA sequences from A. fhaliana and C. elegans, translated amino acid sequences showing sequence similarity to the S. cerevisiae ORC2 protein sequences shown herein were identified in the NCBI dbest database by computer based sequence searching. Those DNA fragments were isolated by a PCR based method using DNA isolated from lambda cDNA libraries as a template. Entire cDNAs were then isolated using the partial

cDNAs to design primers for PCR or as probes to screen the cDNA library. The amino acid sequences predicted from these cDNA libraries were aligned and conserved regions were used to design degenerate oligonucleotide primers to isolate a partial cDNA from human. This partial cDNA was amplified by RT-PCR using the degenerate primers and cloned into a plasmid vector. Full length cDNAs were then isolated from the cDNA library by using the PCR generated DNA fragment as a probe. Each DNA and protein sequence and the result of the alignment among four species are shown below.

Isolation of A. thaliana ORC2: Four DNA sub fragments were isolated to cover the full length of the cDNA. First, a partial cDNA sequence (344 bp), the translated amino acid sequence from which is similar to a region from the ORC2 protein from S. cerevisiae, was identified in the NCBI dbest database (#1443). A probe was obtained to screen the a cDNA library using standard PCR reactions with a lambda phage cDNA library as a template and oligonucleotide primers based on the DNA sequence in the dbest database. The resulting PC fragment was cloned into a BlueScript plasmid vector and sequenced. Next, to extend this isolated DNA sequence in both directions, nested PCR using two primers (20 mer) complementary to each end of the isolated DNA were designed. PCR reactions were performed using one of these specific primers and a primer from the vector (ZAPII). The 5 '-end and 3 '-end (containing the polyA tail) DNA fragments were amplified by nested PCR using a second (internal) primer and the products cloned and sequenced. Finally, the 5 '-end the cDNA fragment was isolated by the 5 '-RACE procedure using two oligonucleotides complementary to the most 5 ' end of the isolated cDNAs and the CLONTECH RACE procedure. The combined clones covered the entire A. thaliana cDNA. SEQ ID NOS:7 and show the cDNA and conceptual translate of ORC2 from A. thaliana; the coding region is fro 277 to 1368.

Isolation of C. elegans ORC2: First, a partial cDNA sequence (446 bp) homologous the S. cerevisiae ORC2 gene and a genomic DNA sequence containing this sequence were identified in the NCBI dbest (#16625) and embl (#Z36949) databases, respectively. The partial cDNA fragment was amplified by nested PCR using DNA from a ZAP cDNA library and oligonucleotides complementary to the dbest cDNA sequence. The PCR product was cloned and used as a probe to screen the C. elegans cDNA lambda library). 5 x 105 plaques were screened and the a length of the cDNA was isolated. SEQ ID NO:9 and 10 show the

cDNA and conceptual translate of ORC2 from C. elegans; the coding region is from 13 to

1305

Isolation of a human ORC2: Based on the computer assisted alignment of the amino acid sequences of ORC2 from S. cerevisiae, A. thaliana and C. elegans, degenerate oligonucleotide probes were designed isolate a partial cDNA from human cells by reverse transcriptase assisted PCR. A 340-bp partial cDNA homologous to ORC2 gene in S. cerevisiae was isolated by RT-PCR reaction against human HeLa cell mRNA. First strand cDNA was synthesized using an oligo(dT) primer against 2 mg of HeLa mRNA at 42 °C for 1 hour. One hundredth volume of this cDNA pool was used as a template for the PCR reaction.

This PCR also amplified DNA from K. lactis that was related to the S. cerevisiae ORC2 gene. The PCR reaction conditions were 94 ° C for 45 seconds/46 ° C for 45 seconds/72 ° C for 2 minutes for 70 cycles. The PCR product was cloned and sequenced and found to be related to the three ORC2 sequences.

Next, using this DNA fragment as a probe, cDNA clones covering a complete ORF from the gene were isolated from a human lambda phage cDNA library derived from human embryonic carcinoma cells. 5 x 105 plaques were screened and 6 positive clones were isolated. Both strands of these cDNAs were determined without any gaps. SEQ ID NOS: 11 and 12 show the cDNA and conceptual translate of human ORC2: the coding region is from

187 to 1920.

A multiple alignment of the cDNA sequences from S. cerevisiae, A. thaliana, C elegans and human reveals that all four sequences are highly related to each other. For example, the percent identities between the S. cerevisiae ORC2 amino acid sequence and the A. thaliana, C elegans and human sequences are 31%, 23% and 24% respectively.

The foregoing sequence data and methods for isolating origin recognition complex proteins enable one of ordinary skill in this art to isolate ORC-encoding cDNA sequences from any eukaryotic species. These data from fungi (yeasts), plant and animal (invertebrate and human) show evolutionary sequence and function conservation. Using these data, we have also characterized an ORC5 sequence from Drosophila melanogaster (Genbank accession number L39626).

EXAMPLES

1. Protocol for high-throughput in vitro ORC complex binding assay A. Reagents:

- Neutralite Avidin: 20 μg/ml in PBS.

- Blocking buffer: 5% BSA, 0.5% Tween 20 in PBS; 1 hour at room temperature.

- Assay Buffer: 100 mM KC1, 20 mM HEPES pH 7.6, 0.25 mM EDTA, 1% glycerol, 0.5 % NP-40, 50 mM BME, 1 mg/ml BSA, cocktail of protease inhibitors.

- ^P recombinant ORC protein IQx. StQCk: 10 ^"6 - 10 ^-8 M equimolar "cold" mixture of recombinant ORC 1-6 proteins (bacuiovirus expression system) supplemented with 200,000- 250,000 cpm of labeled ORC2 protein (Beckman counter). Place in the 4°C microfridge during screening. - Protease inhibitor C C tail (1Q0QX): 10 mg Trypsin Inhibitor (BMB # 109894), 10 mg Aprotinin (BMB # 236624), 25 mg Benzamidine (Sigma # B-6506), 25 mg Leupeptin (BMB # 1017128), 10 mg APMSF (BMB # 917575), and 2mM NaVo ₃ (Sigma # S-6508) in 10 ml of PBS.

- Oligonucleotide Stock: (specific biotinylated). Biotinylated oligo at 17 pmole/μl, ARS 1 ori sequence ORC complex binding site.

B . Preparation of assay plates :

- Coat with 120 μl of stock N- Avidin per well overnight at 4°C.

- Wash 2 times with 200 μl PBS.

- Block with 150 μl of blocking buffer. - Wash 2 times with 200 μl PBS .

C. Assay:

- Add 40 μl assay buffer/well.

- Add 10 μl compound or extract.

- Add 10 μl ³³P-ORC protein mixture (20,000-25,000 cpm/0.1-10 pmoles/well =10 ^*9- 10 ^"7 M final concentration).

- Shake at 25 °C for 15 minutes.

- Incubate additional 45 minutes at 25 °C.

- Add 40 μl oligo mixture (1.0 pmoles/40 ul in assay buffer with 1 ng of ss-DNA)

- Incubate 1 hour at room temperature. - Stop the reaction by washing 4 times with 200 μl PBS.

- Add 150 ul scintillation cocktail.

- Count in Topcount.

D. Controls for all assays (located on each plate): a. Non-specific binding (no oligo added) b. Specific soluble oligo at 80% inhibition.

2. Protocol for high-throughput in vitro ORC protein - protein binding assay.

A. Reagents:

- Neutralite Avidin: 20 μg/ml in PBS.

- Blocking buffer: 5% BSA, 0.5% Tween 20 in PBS; 1 hour at room temperature. - Assay Buffer: 100 mM KC1, 20 mM HEPES pH 7.6, 0.25 mM EDTA, 1 % glycerol,

0.5 % NP-40, 50 mM BME, 1 mg/ml BSA, cocktail of protease inhibitors.

- ^P recombinant ORC protein lQx Stock: 10 ^-6 - lO ^-8 M equimolar "cold" mixture of recombinant ORC 1-6 proteins (bacuiovirus expression system) supplemented with 200,000- 250,000 cpm of labeled ORC2 protein (Beckman counter). Place in the 4°C microfridge during screening.

- Protease inhibitor cocktail (1QQQX): 10 mg Trypsin Inhibitor (BMB # 109894), 10 mg Aprotinin (BMB # 236624), 25 mg Benzamidine (Sigma # B-6506), 25 mg Leupeptin (BMB # 1017128), 10 mg APMSF (BMB # 917575), and 2mM NaVo ₃ (Sigma # S-6508) in 10 ml of PBS. - recombinant QRC5 protein lQx Stock: 10 ^"8 - 10 ^'5 M biotinylated ORC5 protein in

PBS.

B. Preparation of assay plates:

- Coat with 120 μl of stock N- Avidin per well overnight at 4°C.

- Wash 2 times with 200 μl PBS. - Block with 150 μl of blocking buffer.

- Wash 2 times with 200 μl PBS.

C. Assay:

- Add 40 μl assay buffer/well.

- Add 10 μl compound or extract.

- Add 10 μl ³³P-ORC protein mixture (20,000-25,000 cpm/0.1-10 pmoles/well =10 ^"9- 0 ^'7 M final concentration).

- Shake at 25 °C for 15 minutes.

- Incubate additional 45 minutes at 25 °C.

- Add 40 μl biotinylated ORC5 protein (0.1-10 pmoles/40 ul in assay buffer) - Incubate 1 hour at room temperature.

- Stop the reaction by washing 4 times with 200 μl PBS.

- Add 150 μl scintillation cocktail.

- Count in Topcount.

D. Controls for all assays (located on each plate): a. Non-specific binding (no ORC5 protein) b. Soluble (non-biotinylated ORC5 protein) at 80% inhibition.

All publications and patent applications cited in this specification are herein incoiporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. Although the foregoin invention has been described in some detail by way of illustration and example for purposes o clarity of understanding, it will be readily apparent to those of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the spirit or scope of the appended claims.

SEQUENCE LISTING

(1) GENERAL INFORMATION:

(i) APPLICANT: COLD SPRING HARBOR LABORATORY

THE REGENTS OF THE UNIVERSITY OF CALIFORNIA

(ii) TITLE OF INVENTION: ORIGIN OF REPLICATION COMPLEX GENES

(iii) NUMBER OF SEQUENCES: 12

(iv) CORRESPONDENCE ADDRESS:

(A) ADDRESSEE: SCIENCE & TECHNOLOGY LAW GROUP

(B) STREET: 268 Bush Street, Suite 3200 (C) CITY: San Francisco

(D) STATE: California

(E) COUNTRY: USA

(F) ZIP: 94104

(v) COMPUTER READABLE FORM:

(A) MEDIUM TYPE: Floppy disk

(B) COMPUTER: IBM PC compatible

(D) SOFTWARE: Patentin Release #1.0, Version #1.30

(vi) CURRENT APPLICATION DATA:

(A) APPLICATION NUMBER:

(B) FILING DATE:

(viii) ATTORNEY/AGENT INFORMATION:

(A) NAME: Osman Ph.D., Richard Aron

(B) REGISTRATION NUMBER: 36,627

(ix) TELECOMMUNICATION INFORMATION:

(A) TELEPHONE: (415) 343-4341

(B) TELEFAX: (415) 343-4342

(2) INFORMATION FOR SEQ ID NO:1:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 3278 base pairs

(B) TYPE: nucleic acid

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l:

CAAGGAATGG TGCATGCAAG GAGATGGCGC CCAACAGTCC CCGCCACGGG CCTGCCACCA 60

TACCCACGCC GAAACAAGCG CTCATGAGCC CGAAGTGGCG AGCCCGATCT TCCCCATCGG 120

TGATGTCGGC GATATAGGCG CCAGCAACCG CACCTGTGGC GCCGGTGATG CCGGCCACGA 180

TGCGTCCGGC GTAGAGGATC TTAATTCAGT AAACAGAGGA ACCGTGTAAC AACCAATATG 240

CTATGAGATA AAAGAATGCT ACGGAAACAG GTAGCTGTCA TTTCAACATA CTTGGCCAGC 300

AAGTAACTMC NACTAGTTTA GGAAGGNNTT ACTGCATTTT AACGGTTATC TGATTATTTT 360

TCCTTTTTAT TCCGTGGTAG CGAGTTTATT AGGCATGGCG TCAACGTTAG CTGAGTTTGA 420

AGTTCAATGG GAAATACAGA AGACAGACTT GAAGGGGAAT CTCATTGCTG AAACTCCTAG 480

GCGAAGAAGA AGAGGAGATG CTACAGAACA TGAAGTGATT AATTTGGTAC GATACGATGG 540

AGTCAGACTT TATCCTGGTG TTACGATTGT GTGCAAGGTA GAGGGTGCAG ACGAGTTATC 600

AGCGTATATG ATCCATGAGG TGCGATTGAA TACAAGCAAT TACGTAGAAC TCTGGTGTTT 660

GAACTATTTG AGTTGGTACG AGATCAATGC TGCGGAAAGA TATAAACAGC TTGATGGAGA 720

GTTTTATGAG ACTAATAAGG AAAAAGGTGA CAAATTTTTT GAGGAAACCT TCGCGTCACA 780

ATCGATAAAG AACGAATTGT ATTTGACAGC TGAGCTTTCA GAGATTTATC TACGGGACTT 840

GCAATTTGTA GCTAATATTA AAAATGAAAA GGAGTATTTA GACTCTGTCA ATGAAGGGAA 900

AATGGATTCT AATATGTTTT TATGTCGATC TGCATGCTTG CCTTCAGGAA CTAATCTGGC 960

GGATTTAGAT ATACATTTCT TTGAAGAAAA AATACGTTCC TCGAATCCTA AGGTGTCTCT 1020

GGAGTATTTG CGTGATATTA CTTTACCCAA GCTTCCAAAA CCTTTAAATA AATCCAAGGT 1080

CCACGCACGA GAGAAGGTAG TGGCGACGAA ATTGCAGTCC GACAACACAC CAAGCAAAAA 1140

AAGCTTTCAA CAAACAGTGA GCAAAACCAA CGCTGAAGTC CAACGCATTG CATCTACTAT 1200

TGTTAACGAA AAGGAAGCTA TATCAGATAA TGAATCGGAT TTATCTGAAT ATCACGAAAG 1260

TAAAGAAGAG TTTGCAAACG CATCCTCTTC GGACAGTGAT GAAGAGTTTG AAGATTACCA 1320

GTCTGCAGAA GAGCTTGCAA TTGTAGAACC TGCCAAGAAA AAGGTGAGAT CTATTAAACC 1380

AGATATACCC ATTTCACCAG TAAAATCACA GACTCCATTG CAGCCATCAG CAGTTCATTC 1440

ATCTCCTAGA AAGTTCTTTA AGAATAATAT AGTGCGCGCT AAAAAGGCAT ATACTCCATT 1500

TTCCAAACGG TATAAGAATC CGAAGATTCC TGACTTGAAC GATATTTTCC AAAGGCATAA 1560

TAATGATTTG GATATAGCTG CATTAGAGGA GAGATTCAGA ACAGTTTCTG CTAAAGGCAA 1620

AATGGAGACT ATTTTTTCTA AGGTGAAGAA GCAATTGAAC TCAAGGAATA GCAAAGAAGA 1680

AATTGTCAAA GCTGCTGATT TCGACAATTA TCTTCCGGCA AGAGAAAATG AATTTGCAAG 1740

TATATACCTC TCACTTTACA GTGCAATTGA AGCAGGCACT AGCACCAGTA TTTACATTGC 1800

CGGGACGCCA GGCGTTGGTA AAACTTTGAC GGTTCGAGAG GTAGTTAAGG ATTTAATGAC 1860

ATCTGCAGAC CAAAAAGAAC TTCCAAGATT CCAATACATT GAAATCAATG GTTTAAAGAT 1920

TGTCAAAGCA AGTGATAGTT ATGAAGTCTT TTGGCAAAAA ATATCTGGAG AAAAGCTTAC 1980

ATCTGGAGCT GCCATGGAAT CTCTGGAGTT TTATTTTAAC AAAGTTCCAG CTACGAAAAA 2040

ACGTCCTATC GTTGTGTTAT TGGATGAGCT TGATGCATTA GTTAGCAAGA GCCAAGATGT 2100

AATGTACAAC TTCTTTAACT GGGCTACCTA TTCAAATGCG AAACTTATTG TTGTAGCTGT 2160

CGCAAACACC TTAGATCTCC CCGAACGCCA TCTTGGTAAC AAGATTTCGT CCAGAATTGG 2220

TTTTACTAGA ATTATGTTCA CTGGTTACAC GCATGAAGAG CTTAGAACAA TCATCAATTT 2280

GAGACTTAAA TATTTGAACG AATCTAGTTT CTATGTCGAC CCGGAGACAG GGAGTTCGTA 2340

CATGATCTCT CCGGATAGTA GTACTATAGA AACTGATGAA GAAGAAAAGC GAAAAGACTT 2400

CTCTAACTAT AAACGACTAA AACTTAGGAT TAATCCTGAT GCCATTGAGA TTGCATCAAG 2460

AAAAATTGCT AGTGTCAGTG GTGATGTGCG GAGAGCTTTA AAGGTGGTCA AAAGAGCGGT 2520

AGAATATGCG GAAAATGATT ACTTAAAGAG GCTTAGATAT GAGCGACTAG TCAATTCCAA 2580

AAAAGATACT AGTGGCAATG GTACAGGAAA TGAAGAATTA CAGAGTGTAG AAATTAAGCA 2640

TATTACCAAG GCATTAAACG AAAGTTCGAC CTCTCCGGAA CAACAATTCA TATCTGGTCT 2700

GTCATTTAGC GGAAAACTTT TCCTATACGC ATTAATCAAT TTAATTAAGA AGAAGCAAAC 2760

TGACGTACAA CTTGGTGATA TCGTAGAAGA AATGAGGCTC CTCATTGATG TCAATGGGAA 2820

TAACAAATAC ATTTTAGAGT TGAAACGGAT TTTATTCCAA AATGATTCTG TTGATACAAA 2880

GGAACAGTTA AGGGCCGTGT CTTGGGACTA TATTTTATTG CAATTATTGG ATGCAGGTGT 2940

TGTAGTAAGG CAATATTTCA AGAATGAGAG GCTCTCGACG ATCAAATTAA ATATTTCCAT 3000

GGAAGATGCG GACGAATGCT TGCATGAAGA TGAAATGTTG AAGACATTTT AGTATATGCC 3060

TTCAAGACGC CTTTGCTGCT ATTATAATTG CTACTTAGGT TGTCATGTAG CGTACGTTAA 3120

GTAGAATATG AAACTGCTTT TTNCAACTAT TTAATTATAA GATAGAAAGA TATAATAAAG 3180

GATGCATTTT TTTTAACTAC TATTTTACCG TGTTTATTCA TTCTTTACCC TCCGCTTCGG 3240

CAAGATGAAC GTGATCACGT AATAGGAGGT AGGTGATT 3278

(2) INFORMATION FOR SEQ ID NO:2:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 885 amino acids

(B) TYPE: amino acid

(ii) MOLECULE TYPE: protein

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:2:

Met Ala Ser Thr Leu Ala Glu Phe Glu Val Gin Trp Glu lie Gin Lys 1 5 10 15

Thr Asp Leu Lys Gly Asn Leu lie Ala Glu Thr Pro Arg Arg Arg Arg

20 25 30

Arg Gly Asp Ala Thr Glu His Glu Val lie Asn Leu Val Arg Tyr Asp 35 40 45

Gly Val Arg Leu Tyr Pro Gly Val Thr lie Val Cys Lys Val Glu Gly 50 55 60

Ala Asp Glu Leu Ser Ala Tyr Met lie His Glu Val Arg Leu Asn Thr 65 70 75 80

Ser Asn Tyr Val Glu Leu Trp Cys Leu Asn Tyr Leu Ser Trp Tyr Glu 85 90 95

lie Asn Ala Ala Glu Arg Tyr Lys Gin Leu Asp Gly Glu Phe Tyr Glu 100 105 110

Thr Asn Lys Glu Lys Gly Asp Lys Phe Phe Glu Glu Thr Phe Ala Ser 115 120 125

Gin Ser lie Lys Asn Glu Leu Tyr Leu Thr Ala Glu Leu Ser Glu lie 130 135 140

Tyr Leu Arg Asp Leu Gin Phe Val Ala Asn lie Lys Asn Glu Lys Glu 145 150 155 160

Tyr Leu Asp Ser Val Asn Glu Gly Lys Met Asp Ser Asn Met Phe Leu 165 170 175

Cys Arg Ser Ala Cys Leu Pro Ser Gly Thr Asn Leu Ala Asp Leu Asp 180 185 190

lie His Phe Phe Glu Glu Lys lie Arg Ser Ser Asn Pro Lys Val Ser 195 200 205

Leu Glu Tyr Leu Arg Asp lie Thr Leu Pro Lys Leu Pro Lys Pro Leu 210 215 220

Asn Lys Ser Lys Val His Ala Arg Glu Lys Val Val Ala Thr Lys Leu 225 230 235 240

Gin Ser Asp Asn Thr Pro Ser Lys Lys Ser Phe Gin Gin Thr Val Ser 245 250 255

Lys Thr Asn Ala Glu Val Gin Arg lie Ala Ser Thr lie Val Asn Glu

260 265 270

Lys Glu Ala lie Ser Asp Asn Glu Ser Asp Leu Ser Glu Tyr His Glu 275 280 285

Ser Lys Glu Glu Phe Ala Asn Ala Ser Ser Ser Asp Ser Asp Glu Glu 290 295 300

Phe Glu Asp Tyr Gin Ser Ala Glu Glu Leu Ala lie Val Glu Pro Ala 305 310 315 320

Lys Lys Lys Val Arg Ser lie Lys Pro Asp lie Pro lie Ser Pro Val 325 330 335

Lys Ser Gin Thr Pro Leu Gin Pro Ser Ala Val His Ser Ser Pro Arg 340 345 350

Lys Phe Phe Lys Asn Asn lie Val Arg Ala Lys Lys Ala Tyr Thr Pro 355 360 365

Phe Ser Lys Arg Tyr Lys Asn Pro Lys lie Pro Asp Leu Asn Asp lie 370 375 380

Phe Gin Arg His Asn Asn Asp Leu Asp lie Ala Ala Leu Glu Glu Arg 385 390 395 400

Phe Arg Thr Val Ser Ala Lys Gly Lys Met Glu Thr lie Phe Ser Lys 405 410 415

Val Lys Lys Gin Leu Asn Ser Arg Asn Ser Lys Glu Glu lie Val Lys 420 425 430

Ala Ala Asp Phe Asp Asn Tyr Leu Pro Ala Arg Glu Asn Glu Phe Ala 435 440 445

Ser lie Tyr Leu Ser Leu Tyr Ser Ala lie Glu Ala Gly Thr Ser Thr

450 455 460

Ser lie Tyr lie Ala Gly Thr Pro Gly Val Gly Lys Thr Leu Thr Val 465 470 475 480

Arg Glu Val Val Lys Asp Leu Met Thr Ser Ala Asp Gin Lys Glu Leu 485 490 495

Pro Arg Phe Gin Tyr lie Glu lie Asn Gly Leu Lys lie Val Lys Ala

500 505 510

Ser Asp Ser Tyr Glu Val Phe Trp Gin Lys lie Ser Gly Glu Lys Leu 515 520 525

Thr Ser Gly Ala Ala Met Glu Ser Leu Glu Phe Tyr Phe Asn Lys Val 530 535 540

Pro Ala Thr Lys Lys Arg Pro lie Val Val Leu Leu Asp Glu Leu Asp 545 550 555 560

Ala Leu Val Ser Lys Ser Gin Asp Val Met Tyr Asn Phe Phe Asn Trp 565 570 575

Ala Thr Tyr Ser Asn Ala Lys Leu lie Val Val Ala Val Ala Asn Thr 580 585 590

Leu Asp Leu Pro Glu Arg His Leu Gly Asn Lys lie Ser Ser Arg lie 595 600 605

Gly Phe Thr Arg lie Met Phe Thr Gly Tyr Thr His Glu Glu Leu Arg 610 615 620

Thr lie lie Asn Leu Arg Leu Lys Tyr Leu Asn Glu Ser Ser Phe Tyr 625 630 635 640

Val Asp Pro Glu Thr Gly Ser Ser Tyr Met lie Ser Pro Asp Ser Ser 645 650 655

Thr lie Glu Thr Asp Glu Glu Glu Lys Arg Lys Asp Phe Ser Asn Tyr 660 665 670

Lys Arg Leu Lys Leu Arg lie Asn Pro Asp Ala lie Glu lie Ala Ser 675 680 685

Arg Lys He Ala Ser Val Ser Gly Asp Val Arg Arg Ala Leu Lys Val 690 695 700

Val Lys Arg Ala Val Glu Tyr Ala Glu Asn Asp Tyr Leu Lys Arg Leu 705 710 715 720

Arg Tyr Glu Arg Leu Val Asn Ser Lys Lys Asp Thr Ser Gly Asn Gly 725 730 735

Thr Gly Asn Glu Glu Leu Gin Ser Val Glu He Lys His He Thr Lys

740 745 750

Ala Leu Asn Glu Ser Ser Thr Ser Pro Glu Gin Gin Phe He Ser Gly 755 760 765

Leu Ser Phe Ser Gly Lys Leu Phe Leu Tyr Ala Leu He Asn Leu He 770 775 780

Lys Lys Lys Gin Thr Asp Val Gin Leu Gly Asp He Val Glu Glu Met 785 790 795 800

Arg Leu Leu He Asp Val Asn Gly Asn Asn Lys Tyr He Leu Glu Leu 805 810 815

Lys Arg He Leu Phe Gin Asn Asp Ser Val Asp Thr Lys Glu Gin Leu 820 825 830

Arg Ala Val Ser Trp Asp Tyr He Leu Leu Gin Leu Leu Asp Ala Gly 835 840 845

Val Val Val Arg Gin Tyr Phe Lys Asn Glu Arg Leu Ser Thr He Lys 850 855 860

Leu Asn He Ser Met Glu Asp Ala Asp Glu Cys Leu His Glu Asp Glu 865 870 875 880

Met Leu Lys Thr Phe 885

(2) INFORMATION FOR SEQ ID NO:3:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 2504 base pairs

(B) TYPE: nucleic acid (C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3 :

TACGAGTCTT GTTAGTCCAG CACTACAACT CAGGATAACT TTGACCATTG CAATGTTGAT 60

AAACTAGTGT TGAACTTCTC TTAATATGCC TAGAAGAAAG TCATTGAGGA GTCAACTATT 120

AATTAACGGC ATTGATAAAA GTCTGCTATC TGATGACAGC GCTGACAGTT CTGATATTGA 180

CGAAGAGGAA GTTTACGGTG TTTGGACTGA AGAGCCCTTT CAAAAAGAGG CTGGACGTTC 240

TTATTACAGA TCTTTAAAGA AAAACGATGT AATATATCGC GTTGGAGATG ATATTACTGT 300

ACATGATGGA GACTCAAGCT TTTATCTGGG GGTAATTTGT AAATTGTACG AAAAAGCAAT 360

TGATAAGCAT TCTGGAAAGA AATATGTTGA AGCAATTTGG TATAGTCGAG CTTATGCTAA 420

GAGAATGGAA ATTAAACCTG AATATTTGTT GCCAGACCGG CATATAAATG AGGTGTACGT 480

TTCTTGTGGC CGGGATGAAA ACCTGACTTC ATGTATAATA GAGCATTGTA ATGTCTACTC 540

TGAAGCAGAG TTTTTTTCAA AATTTCCCGC TGGAATTCCT ACAAAACGAA AAGATTTGTT 600

TCCTTGTAAC TTCTTTATCC GACGCGGTGT ACACTTGAAA GTGAACAAAT ACACAGAACC 660

TCTCGATTGG TCTTATTATG CTCATAATCT TGAAAGGATA GAAGATCTTT TGGTTGAGAT 720

GGAAGAAAAT TTGCGACCAA CTAAAAAGAA ATCTGGTTCT AGAGGTCGTG GTCGCCCTCG 780

TAAATATCCT TTACCAAATG TCGAAAGCAA AGAAAGCAGT TCCAAAGTTA ACTCTAAGGA 840

TGAAAATTTT GATTTACAAG ATGATAGTGA ATCTTCAGAA GATAATTTGA CTATACAACC 900

TCAGACACCA AGGCGCCGTC ATAAAAGATC AAGACACAAT TCATCAAATT TGGCTTCTAC 960

TCCAAAAAGA AATGGCTACA AACAACCATT ACAAATTACT CCGCTACCTA TTCGTATGCT 1020

GTCCCTTGAG GAGTTTCAGG GTTCTCCTCA TAGAAAAGCT AGGGCTATGC TTCATGTTGC 1080

TTCAGTTCCA AGCACATTAC AATGTCGCGA TAACGAATTT TCTACCATAT TTTCGAACTT 1140

AGAAAGTGCC ATTGAAGAAG AGACAGGGGC TTGTCTCTAT ATATCTGGTA CGCCGGGAAC 1200

AGGAAAAACT GCTACTGTTC ACGAAGTAAT TTGGAATCTT CAGGAATTAT CTCGAGAAGG 1260

ACAACTTCCT GAATTTTCAT TCTGCGAAAT TAATGGAATG CGTGTAACCA GTGCAAACCA 1320

GGCATATTCT ATTCTCTGGG AATCTTTGAC GGGTGAAAGA GTTACTCCAA TCCATGCAAT 1380

GGACCTTCTT GATAACCGAT TTACTCATGC TTCTCCAAAC CGCAGTAGTT GTGTTGTTCT 1440

TATGGATGAG CTCGATCAAC TAGTCACCCA TAATCAAAAA GTTTTATACA ATTTTTTCAA 1500

TTGGCCGTCT CTACCACATT CACGGTTAAT CGTTGTTGCA GTTGCTAATA CGATGGACTT 1560

ACCTGAACGT ATTTTATCAA ATCGCATTTC ATCACGTTTA GGTTTGTCCA GAGTTCCGTT 1620

TGAGCCTTAT ACGCATACTC AGCTAGAAAT AATAATCGCT GCCCGTTTGG AGGCTGTTCG 1680

GGATGACGAT GTTTTTTCTT CAGATGCAAT TCGGTTTGCA GCTCGAAAAG TAGCTGCGGT 1740

TAGCGGTGAT GCTAGAAGAG CCCTTGATAT ATGTCGTCGT GCGTCAGAGC TTGCTGAAAA 1800

CAAAAACGGC AAAGTTACAC CTGGATTAAT TCATCAAGCA ATTTCCGAAA TGACAGCTTC 1860

ACCGCTTCAA AAAGTATTAC GAAATCTCTC ATTCATGCAG AAAGTATTTT TATGTGCTAT 1920

AGTCAATCGT ATGCGCCGGT CTGGATTTGC AGAGTCGTAT GTTTATGAAG TACTTGAAGA 1980

AGCTGAACGG TTGTTGCGAG TCATGACTAC TCCTGATGCT GAAGCAAAAT TTGGCGAGTT 2040

AATATTGAGA AGACCAGAGT TTGGATATGT TTTATCAAGT CTAAGCGAGA ATGGTGTTCT 2100

CTACCTTGAA AATAAAAGTA GTAGGAATGC AAGAGTACGG CTAGCAATTG CAGATGATGA 2160

GATTAAATTG GCATTTCGTG GAGATTCGGA ACTTGCTGGG ATAGCATAAA AGCTATACTT 2220

TTTGGATGAA ATAGGCAATT TACCGATTGA ACAAAGTATA AAAACTTTCC TTACCTTACC 2280

TCTTGAATTT TAAAATGTTT ACTTCTAATT ATAAATTACG ACTTAAATTA TCTTTTAATT 2340

TGCCCATGAW AAMRAARMWR WAAAMRMRWR WWWWAWWMMG ATACTACTAC TTCTATTATT 2400

ACTACCTATA GAGAACCGGG TGACGATACT TATTGTGTTA TCTAGTAAAG TAAAAGAGAA 2460

GTAATAGCTA CTGATTAACC TTAGTTGTAA AATTTCAAAA ATTC 2504

(2) INFORMATION FOR SEQ ID NO:4:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 706 amino acids

(B) TYPE: amino acid (C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4:

Met Pro Arg Arg Lys Ser Leu Arg Ser Gin Leu Leu He Asn Gly He 1 5 10 15

Asp Lys Ser Leu Leu Ser Asp Asp Ser Ala Asp Ser Ser Asp He Asp 20 25 30

Glu Glu Glu Val Tyr Gly Val Trp Thr Glu Glu Pro Phe Gin Lys Glu 35 40 45

Ala Gly Arg Ser Tyr Tyr Arg Ser Leu Lys Lys Asn Asp Val He Tyr 50 55 60

Arg Val Gly Asp Asp He Thr Val His Asp Gly Asp Ser Ser Phe Tyr 65 70 75 80

Leu Gly Val He Cys Lys Leu Tyr Glu Lys Ala He Asp Lys His Ser 85 90 95

Gly Lys Lys Tyr Val Glu Ala He Trp Tyr Ser Arg Ala Tyr Ala Lys

100 105 110

Arg Met Glu He Lys Pro Glu Tyr Leu Leu Pro Asp Arg His He Asn 115 120 125

Glu Val Tyr Val Ser Cys Gly Arg Asp Glu Asn Leu Thr Ser Cys He 130 135 140

He Glu His Cys Asn Val Tyr Ser Glu Ala Glu Phe Phe Ser Lys Phe 145 150 155 160

Pro Ala Gly He Pro Thr Lys Arg Lys Asp Leu Phe Pro Cys Asn Phe 165 170 175

Phe He Arg Arg Gly Val His Leu Lys Val Asn Lys Tyr Thr Glu Pro

180 185 190

Leu Asp Trp Ser Tyr Tyr Ala His Asn Leu Glu Arg He Glu _Asp _Leu 195 200 205

Leu Val Glu Met Glu Glu Asn Leu Arg Pro Thr Lys Lys Lys Ser Gly 210 215 220

Ser Arg Gly Arg Gly Arg Pro Arg Lys Tyr Pro Leu Pro Asn Val Glu

77 PCT/US96/09403

225 230 235 240

Ser Lys Glu Ser Ser Ser Lys Val Asn Ser Lys Asp Glu Asn Phe Asp 245 250 255

Leu Gin Asp Asp Ser Glu Ser Ser Glu Asp Asn Leu Thr He Gin Pro 260 265 270

Gin Thr Pro Arg Arg Arg His Lys Arg Ser Arg His Asn Ser Ser Asn 275 280 285

Leu Ala Ser Thr Pro Lys Arg Asn Gly Tyr Lys Gin Pro Leu Gin He 290 295 300

Thr Pro Leu Pro He Arg Met Leu Ser Leu Glu Glu Phe Gin Gly Ser 305 310 315 320

Pro His Arg Lys Ala Arg Ala Met Leu His Val Ala Ser Val Pro Ser 325 330 335

Thr Leu Gin Cys Arg Asp Asn Glu Phe Ser Thr He Phe Ser Asn Leu

340 345 350

Glu Ser Ala He Glu Glu Glu Thr Gly Ala Cys Leu Tyr He Ser Gly 355 360 365

Thr Pro Gly Thr Gly Lys Thr Ala Thr Val His Glu Val He Trp Asn 370 375 380

Leu Gin Glu Leu Ser Arg Glu Gly Gin Leu Pro Glu Phe Ser Phe Cys 385 390 395 400

Glu He Asn Gly Met Arg Val Thr Ser Ala Asn Gin Ala Tyr Ser He 405 410 415

Leu Trp Glu Ser Leu Thr Gly Glu Arg Val Thr Pro He His Ala Met

420 425 430

Asp Leu Leu Asp Asn Arg Phe Thr His Ala Ser Pro Asn Arg Ser Ser 435 440 445

Cys Val Val Leu Met Asp Glu Leu Asp Gin Leu Val Thr His Asn Gin 450 455 460

Lys Val Leu Tyr Asn Phe Phe Asn Trp Pro Ser Leu Pro His Ser Arg

465 470 475 480

Leu He Val Val Ala Val Ala Asn Thr Met Asp Leu Pro Glu Arg He 485 490 495

Leu Ser Asn Arg He Ser Ser Arg Leu Gly Leu Ser Arg Val Pro Phe 500 505 510

Glu Pro Tyr Thr His Thr Gin Leu Glu He He He Ala Ala Arg Leu 515 520 525

Glu Ala Val Arg Asp Asp Asp Val Phe Ser Ser Asp Ala He Arg Phe 530 535 540

Ala Ala Arg Lys Val Ala Ala Val Ser Gly Asp Ala Arg Arg Ala Leu 545 550 555 560

Asp He Cys Arg Arg Ala Ser Glu Leu Ala Glu Asn Lys Asn Gly Lys 565 570 575

Val Thr Pro Gly Leu He His Gin Ala He Ser Glu Met Thr Ala Ser

580 585 590

Pro Leu Gin Lys Val Leu Arg Asn Leu Ser Phe Met Gin Lys Val Phe 595 600 605

Leu Cys Ala He Val Asn Arg Met Arg Arg Ser Gly Phe Ala Glu Ser 610 615 620

Tyr Val Tyr Glu Val Leu Glu Glu Ala Glu Arg Leu Leu Arg Val Met 625 630 635 640

Thr Thr Pro Asp Ala Glu Ala Lys Phe Gly Glu Leu He Leu Arg Arg 645 650 655

Pro Glu Phe Gly Tyr Val Leu Ser Ser Leu Ser Glu Asn Gly Val Leu

660 665 670

Tyr Leu Glu Asn Lys Ser Ser Arg Asn Ala Arg Val Arg Leu Ala He 675 680 685

Ala Asp Asp Glu He Lys Leu Ala Phe Arg Gly Asp Ser Glu Leu Ala 690 695 700

Gly He

705

(2) INFORMATION FOR SEQ ID NO:5:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 3214 base pairs

(B) TYPE: nucleic acid

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5:

CCGGGGCCAC GCGATTGGCG CGAAGTTTTC TTTTCTCCTT CCACCTTCTT TTCATTTCTA 60

GTGAGACACA CGCTTTGGTC CTGGCTTTCG GCCCGTAGTT GTAGAAGGAG CCCTGCTGGT 120

GCAGGTTAGA GGTGCCGCAT CCCCCGGAGC TCTCGAAGTG GAGGCGGTAG GAAACGGAGG 180

GCTTGCGGCT AGCCGGAGGA AGCTTTGGAG CCGGAAGCCA TGGCACACTA CCCCACAAGG 240

CTGAAGACCA GAAAAACTTA TTCATGGGTT GGCAGGCCCT TGTTGGATCG AAAACTGCAC 300

TACCAAACCT ATAGAGAAAT GTGTGTGAAA ACAGAAGGTT GTTCCACCGA GATTCACATC 360

CAGATTGGAC AGTTTGTGTT GATTGAAGGG GATGATGATG AAAACCCGTA TGTTGCTAAA 420

TTGCTTGAGT TGTTCGAAGA TGACTCTGAT CCTCCTCCTA AGAAACGTGC TCGAGTACAG 480

TGGTTTGTCC GATTCTGTGA AGTCCCTGCC TGTAAACGGC ATTTGTTGGG CCGGAAGCCT 540

GGTGCACAGG AAATATTCTG GTATGATTAC CCGGCCTGTG ACAGCAACAT TAATGCGGAG 600

ACCATCATTG GCCTTGTTCG GGTGATACCT TTAGCCCCAA AGGATGTGGT ACCGACGAAT 660

CTGAAAAATG AGAAGACACT CTTTGTGAAA CTATCCTGGA ATGAGAAGAA ATTCAGGCCA 720

CTTTCCTCAG AACTATTTGC GGAGTTGAAT AAACCACAAG AGAGTGCAGC CAAGTGCCAG 780

AAACCCGTGA GAGCCAAGAG TAAGAGTGCA GAGAGCCCTT CTTGGACCCC AGCAGAACAT 840

GTGGCCAAAA GGATTGAATC AAGGCACTCC GCCTCCAAAT CTCGCCAAAC TCCTACCCAT 900

CCTCTTACCC CAAGAGCCAG AAAGAGGCTG GAGCTTGGCA ACTTAGGTAA CCCTCAGATG 960

TCCCAGCAGA CTTCATGTGC CTCCTTGGAT TCTCCAGGAA GAATAAAACG GAAAGTGGCC 1020

TTCTCGGAGA TCACCTCACC TTCTAAGAGA TCTCAGCCTG ATAAACTTCA AACCTTGTCT 1080

CCAGCTCTGA AAGCCCCAGA GAAAACCAGA GAGACTGGAC TCTCTTATAC TGAGGATGAC 1140

AAGAAGGCTT CACCTGAACA TCGCATAATC CTGAGAACCC GAATTGCAGC TTCGAAAACC 1200

ATAGACATTA GAGAGGAGAG AACACTTACC CCTATCAGTG GGGGACAGAG ATCTTCAGTG 1260

GTGCCATCCG TGATTCTGAA ACCAGAAAAC ATCAAAAAGA GGGATGCAAA AGAAGCAAAA 1320

GCCCAGAATG AAGCGACCTC TACTCCCCAT CGTATCCGCA GAAAGAGTTC TGTCTTGACT 1380

ATGAATCGGA TTAGGCAGCA GCTTCGGTTT CTAGGTAATA GTAAAAGTGA CCAAGAAGAG 1440

AAAGAGATTC TGCCAGCAGC AGAGATTTCA GACTCTAGCA GTGACGAAGA AGAGGCTTCC 1500

ACACCGCCCC TTCCAAGGAG AGCACCCAGA ACTGTGTCCA GGAACCTGCG ATCTTCCTTG 1560

AAGTCATCCT TACATACCCT CACGAAGGTG CCAAAGAAGA GTCTCAAGCC TAGAACGCCA 1620

CGTTGTGCCG CTCCTCAGAT CCGTAGTCGA AGCCTGGCTG CCCAGGAGCC AGCCAGTGTG 1680

CTGGAGGAAG CCCGACTGAG GCTGCATGTT TCTGCTGTAC CTGAGTCTCT TCCCTGTCGG 1740

GAACAGGAAT TCCAAGACAT CTACAATTTT GTGGAAAGCA AACTCCTTGA CCATACCGGA 1800

GGGTGCATGT ACATCTCCGG TGTCCCTGGG ACAGGGAAGA CTGCCACTGT TCATGAAGTG 1860

ATACGCTGCC TGCAGCAGGC AGCCCAAGCC AATGATGTTC CTCCCTTTCA ATACATTGAG 1920

GTCAATGGCA TGAAGCTGAC GGAGCCCCAC CAAGTCTATG TGCACATCTT GCAGAAGCTA 1980

ACAGGCCAAA AAGCAACAGC CAACCATGCG GCAGAACTGC TGGCAAAGCA ATTCTGCACC 2040

CGAGGGTCAC CTCAGGAAAC CACCGTCCTG CTTGTGGATG AGCTCGACCT TCTGTGGACT 2100

CACAAACAAG ACATAATGTA CAATCTCTTT GACTGGCCCA CTCATAAGGA GGCCCGGCTT 2160

GTGGTCCTGG CAATTGCCAA CACAATGGAC CTGCCAGAGC GAATCATGAT GAACCGGGTG 2220

TCCAGCCGAC TGGGTCTTAC CAGGATGTGC TTCCAGCCCT ATACATATAG CCAGCTGCAG 2280

CAGATCCTAA GGTCCCGGCT CAAGCATCTA AAGGCCTTTG AAGATGATGC CATCCAGCTG 2340

GTAGCCAGGA AGGTAGCAGC ACTGTCTGGA GATGCACGAC GGTGCCTGGA CATCTGCAGG 2400

CGTGCCACAG AGATCTGTGA GTTCTCCCAG CAGAAGCCTG ACTCCCCTGG CCTGGTCACC 2460

ATAGCCCACT CAATGGAAGC TGTGGATGAG ATGTTTTCAT CATCATACAT CACGGCCATC 2520

AAAAATTCCT CTGTTCTGGA ACAGAGCTTC CTGAGAGCCA TCCTCGCAGA GTTCCGTCGA 2580

TCAGGACTGG AGGAAGCCAC GTTTCAACAG ATATATAGTC AACATGTGGC ACTGTGCAGA 2640

ATGGAGGGAC TGCCGTACCC CACCATGTCA GAGACCATGG CCGTGTGTTC TCACCTGGGC 2700

TCCTGTCGCC TCCTGCTTGT GGAGCCCAGC AGGAACGATC TGCTCCTTCG GGTGCGGCTC 2760

AACGTCAGCC AGGATGATGT GCTGTATGCG CTGAAAGACG AGTAAAGGGG CTTCACAAGT 2820

TAAAAGACTG GGGTCTTGCT GGGTTTTGTT TTTTGAGACA GGGTCTTGCT CTGTCGCCCA 2880

GGCTGGAGTG CAGTGGCACG ATCATGGCTC ACTGCAGCCT TGACTTCTCA GGCTTAGGTG 2940

ACCCCCCAAC CTCATCCTCC CAGGTGGCTG AAACTACAGG CACATGCCAC CATGCCCAGC 3000

TGATTTTTTG TAGAGACAGG GCTTCACCAT GTTGCCAAGC TAGTCTACAA AGCATCTGAT 3060

TTTGGAAGTA CATGGAATTG TTGTAACAAA GTATATTGAA TGGAAATGGC TCTCATGTAT 3120

TTTGGAATTT TCCATTAAAT AATTTGCTTT TTAAAAAAAA AAAAAAAAAA AAAAAAAAAA 3180

AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAA 3214

(2) INFORMATION FOR SEQ ID NO:6:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 861 amino acids (B) TYPE: ammo acid

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6:

Met Ala His Tyr Pro Thr Arg Leu Lys Thr Arg Lys Thr Tyr Ser Trp 1 5 10 15

Val Gly Arg Pro Leu Leu Asp Arg Lys Leu His Tyr Gin Thr Tyr Arg 20 25 30

Glu Met Cys Val Lys Thr Glu Gly Cys Ser Thr Glu He His He Gin 35 40 45

He Gly Gin Phe Val Leu He Glu Gly Asp Asp Asp Glu Asn Pro Tyr 50 55 60

Val Ala Lys Leu Leu Glu Leu Phe Glu Asp Asp Ser Asp Pro Pro Pro 65 70 75 80

Lys Lys Arg Ala Arg Val Gin Trp Phe Val Arg Phe Cys Glu Val Pro 85 90 95

Ala Cys Lys Arg His Leu Leu Gly Arg Lys Pro Gly Ala Gin Glu He

100 105 110

Phe Trp Tyr Asp Tyr Pro Ala Cys Asp Ser Asn He Asn Ala Glu Thr 115 120 125

He He Gly Leu Val Arg Val He Pro Leu Ala Pro Lys Asp Val Val 130 135 140

Pro Thr Asn Leu Lys Asn Glu Lys Thr Leu Phe Val Lys Leu Ser Trp 145 150 155 160

Asn Glu Lys Lys Phe Arg Pro Leu Ser Ser Glu Leu Phe Ala Glu Leu 165 170 175

Asn Lys Pro Gin Glu Ser Ala Ala Lys Cys Gin Lys Pro Val Arg Ala

180 185 190

Lys Ser Lys Ser Ala Glu Ser Pro Ser Trp Thr Pro Ala Glu His Val 195 200 205

Ala Lys Arg He Glu Ser Arg His Ser Ala Ser Lys Ser Arg Gin Thr 210 215 220

Pro Thr His Pro Leu Thr Pro Arg Ala Arg Lys Arg Leu Glu Leu _Gly 225 230 235 240

Asn Leu Gly Asn Pro Gin Met Ser Gin Gin Thr Ser Cys Ala Ser _Leu

245 250 255

sp Ser Pro Gly Arg He Lys Arg Lys Val Ala Phe Ser Glu He Thr 260 265 270

Ser Pro Ser Lys Arg Ser Gin Pro Asp Lys Leu Gin Thr Leu Ser Pro 275 280 285

Ala Leu Lys Ala Pro Glu Lys Thr Arg Glu Thr Gly Leu Ser Tyr Thr 290 295 300

Glu Asp Asp Lys Lys Ala Ser Pro Glu His Arg He He Leu Arg Thr 305 310 315 320

Arg He Ala Ala Ser Lys Thr He Asp He Arg Glu Glu Arg Thr Leu 325 330 335

Thr Pro He Ser Gly Gly Gin Arg Ser Ser Val Val Pro Ser Val He

340 345 350

Leu Lys Pro Glu Asn He Lys Lys Arg Asp Ala Lys Glu Ala Lys Ala 355 360 365

Gin Asn Glu Ala Thr Ser Thr Pro His Arg He Arg Arg Lys Ser Ser 370 375 380

Val Leu Thr Met Asn Arg He Arg Gin Gin Leu Arg Phe Leu Gly Asn 385 390 395 400

Ser Lys Ser Asp Gin Glu Glu Lys Glu He Leu Pro Ala Ala Glu He 405 410 415

Ser Asp Ser Ser Ser Asp Glu Glu Glu Ala Ser Thr Pro Pro Leu Pro

420 425 430

Arg Arg Ala Pro Arg Thr Val Ser Arg Asn Leu Arg Ser Ser Leu Lys 435 440 445

Ser Ser Leu His Thr Leu Thr Lys Val Pro Lys Lys Ser Leu Lys Pro 450 455 460

Arg Thr Pro Arg Cys Ala Ala Pro Gin He Arg Ser Arg Ser Leu Ala 465 470 475 480

Ala Gin Glu Pro Ala Ser Val Leu Glu Glu Ala Arg Leu Arg Leu His 485 490 495

Val Ser Ala Val Pro Glu Ser Leu Pro Cys Arg Glu Gin Glu Phe Gin 500 505 510

Asp He Tyr Asn Phe Val Glu Ser Lys Leu Leu Asp His Thr Gly Gly 515 520 52_i

Cys Met Tyr He Ser Gly Val Pro Gly Thr Gly Lys Thr Ala Thr Val 530 535 540

His Glu Val He Arg Cys Leu Gin Gin Ala Ala Gin Ala Asn Asp Val 545 550 555 560

Pro Pro Phe Gin Tyr He Glu Val Asn Gly Met Lys Leu Thr Glu Pro 565 570 575

His Gin Val Tyr Val His He Leu Gin Lys Leu Thr Gly Gin Lys Ala

580 585 590

Thr Ala Asn His Ala Ala Glu Leu Leu Ala Lys Gin Phe Cys Thr Arg 595 600 605

Gly Ser Pro Gin Glu Thr Thr Val Leu Leu Val Asp Glu Leu Asp Leu 610 615 620

Leu Trp Thr His Lys Gin Asp He Met Tyr Asn Leu Phe Asp Trp Pro 625 630 635 640

Thr His Lys Glu Ala Arg Leu Val Val Leu Ala He Ala Asn Thr Met 645 650 655

Asp Leu Pro Glu Arg He Met Met Asn Arg Val Ser Ser Arg Leu Gly

660 665 670

Leu Thr Arg Met Cys Phe Gin Pro Tyr Thr Tyr Ser Gin Leu Gin Gin 675 680 685

He Leu Arg Ser Arg Leu Lys His Leu Lys Ala Phe Glu Asp Asp Ala 690 695 700

He Gin Leu Val Ala Arg Lys Val Ala Ala Leu Ser Gly Asp Ala Arg 705 710 715 720

Arg Cys Leu Asp He Cys Arg Arg Ala Thr Glu He Cys Glu Phe Ser 725 730 735

in Gin Lys Pro Asp Ser Pro Gly Leu Val Thr He Ala His Ser Met 740 745 750

Glu Ala Val Asp Glu Met Phe Ser Ser Ser Tyr He Thr Ala He Lys 755 760 765

Asn Ser Ser Val Leu Glu Gin Ser Phe Leu Arg Ala He Leu Ala Glu 770 775 780

Phe Arg Arg Ser Gly Leu Glu Glu Ala Thr Phe Gin Gin He Tyr Ser 785 790 795 800

Gin His Val Ala Leu Cys Arg Met Glu Gly Leu Pro Tyr Pro Thr Met 805 810 815

Ser Glu Thr Met Ala Val Cys Ser His Leu Gly Ser Cys Arg Leu Leu

820 825 830

Leu Val Glu Pro Ser Arg Asn Asp Leu Leu Leu Arg Val Arg Leu Asn 835 840 845

Val Ser Gin Asp Asp Val Leu Tyr Ala Leu Lys Asp Glu 850 855 860

(2) INFORMATION FOR SEQ ID NO:7:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 1480 base pairs

(B) TYPE: nucleic acid (C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 277..1365

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7:

TGAATCGGGA ATCTGATTCA TATGTTTGGG GTTTAATAGT CTCAGCTCAA ATAAATCTAG 60

GTTAAACTGT GTGGATCGAT TCATATATCC TCCGTCAAAA CCAAAACCAA ACCGATTTGT 120

CATAATTTTT TCTTATCATC CACTTTCATT GGCTAGAGGG ACATTGTAAC GGTGTCGTCG 180

TCGCCAAACG ATTTGCCTCT TCCTAAAGGA GATTCTTTCC TACATAGGAA TTGAGTTTAA 240

GGTGGAATTC TTCTGTTATT TTGTTGTTGC ACGAAA ATG GAG GAC ATT GAG AAC 294 Met Glu Asp He Glu Asn

865

ATA GAA GAA GAT GAG TAT GGG TTT TCA AGA AAC TAC TTC TTG GCA AAA 342 He Glu Glu Asp Glu Tyr Gly Phe Ser Arg Asn Tyr Phe Leu Ala Lys 870 875 880

GAA TTG GGT GGG GCG AGT AAG CGT TCT GCC CAC AAG CTC TCT GAT ATA 390 Glu Leu Gly Gly Ala Ser Lys Arg Ser Ala His Lys Leu Ser Asp He 885 890 895

CAT ATT GTT GAT GAG CAG GAG CTT AGA GAA ACG GCT TCT ACA ATT GAA 438 His He Val Asp Glu Gin Glu Leu Arg Glu Thr Ala Ser Thr He Glu 900 905 910 915

ATG AAG CAC TCG AAA GAG ATA TCT GAG CTT ATG AGT GAT TAC AAG ACT 486 Met Lys His Ser Lys Glu He Ser Glu Leu Met Ser Asp Tyr Lys Thr 920 925 930

ATG TAC TCA AAG TGG GTC TTT GAG CTC AGG TGT GGC TTT GGC CTT CTA 534 Met Tyr Ser Lys Trp Val Phe Glu Leu Arg Cys Gly Phe Gly Leu Leu

935 940 945

ATG TAT GGC TTT GGA TCT AAG AAA GCT TTA GTT GAA GAT TTT GCT TCT 582 Met Tyr Gly Phe Gly Ser Lys Lys Ala Leu Val Glu Asp Phe Ala Ser 950 955 960

GCT TCT TTG ACT GAC TAT TCT GTT GTG GTC ATC AAT GGC TAC CTC CCT 630 Ala Ser Leu Thr Asp Tyr Ser Val Val Val He Asn Gly Tyr Leu Pro 965 970 975

TCC GTA AAT CTA AAG CAG GTT CTT TTG GCA TTA GCT GAA CTT CTA TCC 678 Ser Val Asn Leu Lys Gin Val Leu Leu Ala Leu Ala Glu Leu Leu Ser 980 985 990 995

GAG CTT TTG AAA TGT AAA AGA AAG AGT TCC GGG AGT TTG TCT AAA GGT 726 Glu Leu Leu Lys Cys Lys Arg Lys Ser Ser Gly Ser Leu Ser Lys Gly 1000 1005 1010

CAA GAA ACA TTT CCT TCA CGC TCC ATG GAT GAT ATT CTT TCC TTT CTA 774

in Glu Thr Phe Pro Ser Arg Ser Met Asp Asp He Leu Ser Phe Leu 1015 1020 1025

CAT GGT CCA CAG TCT GGA GAT AAA GAC TGC TTC ATA TGC GTT GTT GTT 822 His Gly Pro Gin Ser Gly Asp Lys Asp Cys Phe He Cys Val Val Val 1030 1035 1040

CAT AAC ATT GAC GGC CCT GCT CTA AGA GAT CCC GAA TCA CAA CAA ACT 870 His Asn He Asp Gly Pro Ala Leu Arg Asp Pro Glu Ser Gin Gin Thr 1045 1050 1055

CTT GCC CGG CTT TCT TCT TGT TCA CAC ATA CGC TTG GTT GCC TCT ATT 918 Leu Ala Arg Leu Ser Ser Cys Ser His He Arg Leu Val Ala Ser He 1060 1065 1070 1075

GAC CAT GTC AAC GCT CCA TTA TTG TGG GAC AAG AAA ATG GTG CAC AAA 966 Asp His Val Asn Ala Pro Leu Leu Trp Asp Lys Lys Met Val His Lys 1080 1085 1090

CAG TTT AAC TGG CTA TGG CAC CAT GTT CCA ACA TTT GCA CCA TAC AAT 1014 Gin Phe Asn Trp Leu Trp His His Val Pro Thr Phe Ala Pro Tyr Asn

1095 1100 1105

GTC GAA GGT GTA TTC TTC CCG TTG GTT CTT GCA CAG GGA AGC ACA GCC 1062 Val Glu Gly Val Phe Phe Pro Leu Val Leu Ala Gin Gly Ser Thr Ala 1110 1115 1120

CAA ACC GCC AAA ACA GCA GCC ATT GTT TTA CAG AGT TTA ACA CCA AAC 1110 Gin Thr Ala Lys Thr Ala Ala He Val Leu Gin Ser Leu Thr Pro Asn 1125 1130 1135

GGT CAG AAT GTC TTC AAG ATT CTT GCT GAG TAC CAA CTT TCA CAC CCA 1158 Gly Gin Asn Val Phe Lys He Leu Ala Glu Tyr Gin Leu Ser His Pro 1140 1145 1150 1155

GAT GAA GAT GGG ATG CCC ACT GAT GAT CTG TAT TCA GCG TCT CGG GAA 1206 Asp Glu Asp Gly Met Pro Thr Asp Asp Leu Tyr Ser Ala Ser Arg Glu 1160 1165 1170

CGC TTC TTT GTG AGC AGT CAA GTG ACT TTA AAC TCT CAT CTC ACG GAA 1254 Arg Phe Phe Val Ser Ser Gin Val Thr Leu Asn Ser His Leu Thr Glu

1175 1180 1185

TTT AAA GAC CAC GAA CTG GTT AAG ACC AAG AGA AAC TCC GAT GGT CAA 1302 Phe Lys Asp His Glu Leu Val Lys Thr Lys Arg Asn Ser Asp Gly Gin

1190 1195 1200

GAG TGT TTG AAT ATA CCG CTC ACT TCG GAT GCA ATT CGA CAG CTT TTG 1350 Glu Cys Leu Asn He Pro Leu Thr Ser Asp Ala He Arg Gin Leu Leu 1205 1210 1215

CTT GAT CTC AAT CAG TAGCCTGAAA TTGTATTTCT GATATGATTC ATTTTTATTG 1405

Leu Asp Leu Asn Gin

1220

CTTGAACGAG TTATTATAGT TCACACAGTT TACATGTTTA ATTGAATGTT ATAGTCAGCA 1465

CTCACAGCTC TTATT 1480

(2) INFORMATION FOR SEQ ID NO:8:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 363 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:

Met Glu Asp He Glu Asn He Glu Glu Asp Glu Tyr Gly Phe Ser Arg 1 5 10 15

Asn Tyr Phe Leu Ala Lys Glu Leu Gly Gly Ala Ser Lys Arg Ser Ala 20 25 30

His Lys Leu Ser Asp He His He Val Asp Glu Gin Glu Leu Arg Glu 35 40 45

Thr Ala Ser Thr He Glu Met Lys His Ser Lys Glu He Ser Glu Leu 50 55 60

Met Ser Asp Tyr Lys Thr Met Tyr Ser Lys Trp Val Phe Glu Leu Arg 65 70 75 80

Cys Gly Phe Gly Leu Leu Met Tyr Gly Phe Gly Ser Lys Lys Ala Leu 85 90 95

Val Glu Asp Phe Ala Ser Ala Ser Leu Thr Asp Tyr Ser Val Val Val

100 105 110

He Asn Gly Tyr Leu Pro Ser Val Asn Leu Lys Gin Val Leu Leu Ala 115 120 125

Leu Ala Glu Leu Leu Ser Glu Leu Leu Lys Cys Lys Arg Lys Ser Ser 130 135 140

Gly Ser Leu Ser Lys Gly Gin Glu Thr Phe Pro Ser Arg Ser Met Asp 145 150 155 160

Asp He Leu Ser Phe Leu His Gly Pro Gin Ser Gly Asp Lys Asp Cys 165 170 175

Phe He Cys Val Val Val His Asn He Asp Gly Pro Ala Leu Arg Asp 180 185 190

Pro Glu Ser Gin Gin Thr Leu Ala Arg Leu Ser Ser Cys Ser His He 195 200 205

Arg Leu Val Ala Ser He Asp His Val Asn Ala Pro Leu Leu Trp Asp 210 215 220

Lys Lys Met Val His Lys Gin Phe Asn Trp Leu Trp His His Val Pro 225 230 235 240

Thr Phe Ala Pro Tyr Asn Val Glu Gly Val Phe Phe Pro Leu Val Leu 245 250 255

Ala Gin Gly Ser Thr Ala Gin Thr Ala Lys Thr Ala Ala He Val Leu 260 265 270

Gin Ser Leu Thr Pro Asn Gly Gin Asn Val Phe Lys He Leu Ala Glu 275 280 285

Tyr Gin Leu Ser His Pro Asp Glu Asp Gly Met Pro Thr Asp Asp Leu 290 295 300

Tyr Ser Ala Ser Arg Glu Arg Phe Phe Val Ser Ser Gin Val Thr Leu 305 310 315 320

Asn Ser His Leu Thr Glu Phe Lys Asp His Glu Leu Val Lys Thr Lys 325 330 335

Arg Asn Ser Asp Gly Gin Glu Cys Leu Asn He Pro Leu Thr Ser Asp

340 345 350

Ala He Arg Gin Leu Leu Leu Asp Leu Asn Gin 355 360

(2) INFORMATION FOR SEQ ID NO:9:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 1676 base pairs (B) TYPE: nucleic acid

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 13..1302

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9:

AAGTTTGAGA AA ATG CCA CGG CCA AAA ATT TTG AAA CGA GCA ACT GTC 48

Met Pro Arg Pro Lys He Leu Lys Arg Ala Thr Val 365 370 375

CAG CCC AGT GCC GCC GTT CCT GTG AAA AAA TCG ACT CCA GAA AAA GAA 96 Gin Pro Ser Ala Ala Val Pro Val Lys Lys Ser Thr Pro Glu Lys Glu 380 385 390

GGA TCC AGA CAG AAA AAG ACG AAT GGA AAA GAG AAT GCT TCT AGA AAT 144 Gly Ser Arg Gin Lys Lys Thr Asn Gly Lys Glu Asn Ala Ser Arg Asn 395 400 405

TTG CAA TCA AAT TTA GAA GAA GAT TTG GAA CAA CTG GGC TTC GAG GAT 192 Leu Gin Ser Asn Leu Glu Glu Asp Leu Glu Gin Leu Gly Phe Glu Asp 410 415 420

GAA ACT GTA TCA ATG GCT CAA TCA GCA ATC GAA AAT TAC TTT ATG CAA 240 Glu Thr Val Ser Met Ala Gin Ser Ala He Glu Asn Tyr Phe Met Gin 425 430 435

GGA AAA TCG GCG TCA GAA CGA ATG AAT AAT GCG AAA TCC CGT CGT GGA 288 Gly Lys Ser Ala Ser Glu Arg Met Asn Asn Ala Lys Ser Arg Arg Gly 440 445 450 455

AGA CGT GCT GGA AAT GGA AAT ACT GAA GAA ATT GAG GAA GAC GAT GAG 336 Arg Arg Ala Gly Asn Gly Asn Thr Glu Glu He Glu Glu Asp Asp Glu 460 465 470

ATC AGT AAT GCT ATC ACT GAT TTC ACA AAA TGT GAT CTC CCT GGA CTT 384 He Ser Asn Ala He Thr Asp Phe Thr Lys Cys Asp Leu Pro Gly Leu 475 480 485

CGA AAT TAT ATT ACC AAA AAA GAT AAC ACG GAA TTC GAA AAA CGA TTG 432 Arg Asn Tyr He Thr Lys Lys Asp Asn Thr Glu Phe Glu Lys Arg Leu 490 495 500

GAG CAT CTC GCG GAT AAT GAT TTC GGA AAA TGG AAG CTT TAC CTA GCA 480 Glu His Leu Ala Asp Asn Asp Phe Gly Lys Trp Lys Leu Tyr Leu Ala 505 510 515

GCT GGA TTT AAT ATT CTT TTG CAC GGT GTC GGT TCG AAG CGT GAT GTT 528 Ala Gly Phe Asn He Leu Leu His Gly Val Gly Ser Lys Arg Asp Val 520 525 530 535

CTC ACA GAA TTT GAG AAT GAG CTA TCC GAT TAT ACA TAT ATG AGA GTG 576 Leu Thr Glu Phe Glu Asn Glu Leu Ser Asp Tyr Thr Tyr Met Arg Val 540 545 550

GAT GCA CGG AAA GAT GGG CTC AAT GTA AAA GTT CTT CTT GGA GCT ATC 624 Asp Ala Arg Lys Asp Gly Leu Asn Val Lys Val Leu Leu Gly Ala He

555 560 565

AAT GAG AAT ATG AAG CTG AAT TGT AAT GTG AAG AGA GGC CAA TCT ACG 672 Asn Glu Asn Met Lys Leu Asn Cys Asn Val Lys Arg Gly Gin Ser Thr 570 575 580

ATT AGT TGG GCT CGA TCT ATT CGC AGA AAA ATG AAT AGC CAA CAG TTG 720 He Ser Trp Ala Arg Ser He Arg Arg Lys Met Asn Ser Gin Gin Leu 585 590 595

ATT CTT ATC ATT GAT AAT ATT GAA GCT CCT GAT TGG AGA AGT GAT CAA 768 He Leu He He Asp Asn He Glu Ala Pro Asp Trp Arg Ser Asp Gin 600 605 610 615

GAA GCA TTT TGC GAA CTT CTT GAG AAT CGG GAT TCG GTG AAA TTG ATT 816 Glu Ala Phe Cys Glu Leu Leu Glu Asn Arg Asp Ser Val Lys Leu He 620 625 630

GCT ACA GTT GAT CAC ATT TAC TCG ACG TTC ATC TGG AAT TCG CGT CAA 864

Ala Thr Val Asp His He Tyr Ser Thr Phe He Trp Asn Ser Arg Gin 635 640 645

CTA TCA TCA CTC TCA TTC GTT CAC ATC ACA ATC AAC ACC TTC GAA ATT 912 Leu Ser Ser Leu Ser Phe Val His He Thr He Asn Thr Phe Glu He 650 655 660

CCA CTT CAA GAA TTA ATG ACT GGA GAT TCT CGT CTT CTT GGT CTT GAT 960 Pro Leu Gin Glu Leu Met Thr Gly Asp Ser Arg Leu Leu Gly Leu Asp 665 670 675

GCT CGT TCG AAT CAA TCC TCT CAT ACA ATG TCA TCG CTT GAT GTG TTC 1008 Ala Arg Ser Asn Gin Ser Ser His Thr Met Ser Ser Leu Asp Val Phe 680 685 690 695

TGG AAA TCT CTT GCC GTC AAT TCA CAA AAA TTA TTC CGT CTC TTT TTC 1056 Trp Lys Ser Leu Ala Val Asn Ser Gin Lys Leu Phe Arg Leu Phe Phe 700 705 710

CAA ATG TAC TTT GAC ACC AAG AAG CCT GTC AAA TTC TGG GAT TTG TTC 1104 Gin Met Tyr Phe Asp Thr Lys Lys Pro Val Lys Phe Trp Asp Leu Phe

715 720 725

AAT GCG GCA AAA GAT GAT TTC ATT GCT TCA ACT GAC GCT GCT CTT CGA 1152 Asn Ala Ala Lys Asp Asp Phe He Ala Ser Thr Asp Ala Ala Leu Arg 730 735 740

ACC CAA CTT GTC GAA TTC AAG GAT CAT CGG GTT TTG AAG TGG ACC CGT 1200 Thr Gin Leu Val Glu Phe Lys Asp His Arg Val Leu Lys Trp Thr Arg 745 750 755

GGT GAT GAC GGA AAC GAT CAG CTG TCG GGC ATT GTC GAA TTA CGA TTA 1248 Gly Asp Asp Gly Asn Asp Gin Leu Ser Gly He Val Glu Leu Arg Leu 760 765 770 775

GTG ACC GAA TTT CTC GAA TCG AAG AAC ATG CCG TTA GAC GAA AAG AAA 1296 Val Thr Glu Phe Leu Glu Ser Lys Asn Met Pro Leu Asp Glu Lys Lys 780 785 790

GAC GAG TAGCTGCTGC TACTGCTGGA GGACCTCAAA AATGAACACA CTCTGCCTCC 1352 Asp Glu

TTTTGACTCA ATGTATTTAC CTTCAATTGT TTTATTTGTT GACTCTGCGC CCCCCGTCCG 1412

TCCGTCGATG CTTCTTCATC CCATTTTTTT TTACTTCAAT TGAAACCTCA ATCTTCACTT 1472

ACTCTCATCT GAACGCTCAT ATTTAAGGCA ATAATTTTCA TTTTCAAATA TATCAATTGA 1532

AACCTTTATC TACCGTAATA CCAATTTTGT GTACCTTTTC AAAAATCTCA TTTCCCCCTC 1592

GGTTTTTTCT TCACGATTTC TCAATTATTT TCAGTTTCTC ACTATCAGTT TCACATTCCC 1652

ATATTTGAAT GAATCTCATT TTCC 1676

(2) INFORMATION FOR SEQ ID NO:10:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 430 amino acids (B) TYPE: amino acid

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:

Met Pro Arg Pro Lys He Leu Lys Arg Ala Thr Val Gin Pro Ser Ala 1 5 10 15

Ala Val Pro Val Lys Lys Ser Thr Pro Glu Lys Glu Gly Ser Arg Gin

20 25 30

Lys Lys Thr Asn Gly Lys Glu Asn Ala Ser Arg Asn Leu Gin Ser Asn 35 40 45

Leu Glu Glu Asp Leu Glu Gin Leu Gly Phe Glu Asp Glu Thr Val Ser 50 55 60

Met Ala Gin Ser Ala He Glu Asn Tyr Phe Met Gin Gly Lys Ser Ala 65 70 75 80

Ser Glu Arg Met Asn Asn Ala Lys Ser Arg Arg Gly Arg Arg Ala Gly 85 90 95

Asn Gly Asn Thr Glu Glu He Glu Glu Asp Asp Glu He Ser Asn Ala

100 105 110

He Thr Asp Phe Thr Lys Cys Asp Leu Pro Gly Leu Arg Asn Tyr He 115 120 125

Thr Lys Lys Asp Asn Thr Glu Phe Glu Lys Arg Leu Glu His Leu Ala 130 135 140

Asp Asn Asp Phe Gly Lys Trp Lys Leu Tyr Leu Ala Ala Gly Phe Asn 145 150 155 160

He Leu Leu His Gly Val Gly Ser Lys Arg Asp Val Leu Thr Glu Phe 165 170 175

Glu Asn Glu Leu Ser Asp Tyr Thr Tyr Met Arg Val Asp Ala Arg Lys 180 185 190

Asp Gly Leu Asn Val Lys Val Leu Leu Gly Ala He Asn Glu Asn Met 195 200 205

Lys Leu Asn Cys Asn Val Lys Arg Gly Gin Ser Thr He Ser Trp Ala 210 215 220

Arg Ser He Arg Arg Lys Met Asn Ser Gin Gin Leu He Leu He He 225 230 235 240

Asp Asn He Glu Ala Pro Asp Trp Arg Ser Asp Gin Glu Ala Phe Cys 245 250 255

Glu Leu Leu Glu Asn Arg Asp Ser Val Lys Leu He Ala Thr Val Asp 260 265 270

His He Tyr Ser Thr Phe He Trp Asn Ser Arg Gin Leu Ser Ser Leu 275 280 285

Ser Phe Val His He Thr He Asn Thr Phe Glu He Pro Leu Gin Glu 290 295 300

Leu Met Thr Gly Asp Ser Arg Leu Leu Gly Leu Asp Ala Arg Ser Asn 305 310 315 320

Gin Ser Ser His Thr Met Ser Ser Leu Asp Val Phe Trp Lys Ser Leu 325 330 335

Ala Val Asn Ser Gin Lys Leu Phe Arg Leu Phe Phe Gin Met Tyr Phe 340 345 350

Asp Thr Lys Lys Pro Val Lys Phe Trp Asp Leu Phe Asn Ala Ala Lys 355 360 365

sp Asp Phe He Ala Ser Thr Asp Ala Ala Leu Arg Thr Gin Leu Val 370 375 380

Glu Phe Lys Asp His Arg Val Leu Lys Trp Thr Arg Gly Asp Asp Gly 385 390 395 400

Asn Asp Gin Leu Ser Gly He Val Glu Leu Arg Leu Val Thr Glu Phe 405 410 415

Leu Glu Ser Lys Asn Met Pro Leu Asp Glu Lys Lys Asp Glu 420 425 430

(2) INFORMATION FOR SEQ ID NO:11:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 2729 base pairs

(B) TYPE: nucleic acid

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA

(ix) FEATURE:

(A) NAME/KEY: CDS (B) LOCATION: 187..1917

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:11:

GGCGCGAATT ACTGGAAATT GGCTTTTCCC GTTGGGGCCG AAGGTACCTT CCCTGCGGCG 60

GCGACTCAGC GGGGTGTCGT TCGGCCGGCG TGACGCAGCC GGATCGGCGC CAGACGGAAA 120

CCTAGCGGTG ACTGTATCTG AATTTTGCAG CTGCAGAATG TGTAGTACCT TAAAAGGTTG 180

GCAACA ATG AGT AAA CCA GAA TTA AAG GAA GAC AAG ATG CTG GAG GTT 228 Met Ser Lys Pro Glu Leu Lys Glu Asp Lys Met Leu Glu Val 435 440

CAC TTT GTG GGA GAT GAT GAT GTT CTT AAT CAC ATT CTA GAT AGA GAA 276 His Phe Val Gly Asp Asp Asp Val Leu Asn His He Leu Asp Arg Glu 445 450 455 460

GGA GGA GCT AAA TTG AAG AAG GAG CGA GCG CAC GTT TTG GTC AAC CCC 324 Gly Gly Ala Lys Leu Lys Lys Glu Arg Ala His Val Leu Val Asn Pro

465 470 475

AAA AAA ATA ATA AAG AAG CCA GAA TAT GAT TTG GAG GAA GAT GAC CAG 372 Lys Lys He He Lys Lys Pro Glu Tyr Asp Leu Glu Glu Asp Asp Gin 480 485 490

GAG GTC TTA AAA GAT CAG AAC TAT GTG GAA ATT ATG GGA AGA GAT GTT 420 Glu Val Leu Lys Asp Gin Asn Tyr Val Glu He Met Gly Arg Asp Val 495 500 505

CAA GAA TCA TTG AAA AAT GGC TCT GCT ACA GGT GGT GGA AAT AAA GTT 468 Gin Glu Ser Leu Lys Asn Gly Ser Ala Thr Gly Gly Gly Asn Lys Val 510 515 520

TAT TCT TTT CAG AAT AGA AAA CAC TCT GAA AAG ATG GCT AAA TTA GCT 516 Tyr Ser Phe Gin Asn Arg Lys His Ser Glu Lys Met Ala Lys Leu Ala 525 530 535 540

TCA GAA CTA GCA AAA ACA CCA CAA AAA AGT GTT TCA TTC AGT TTG AAG 564 Ser Glu Leu Ala Lys Thr Pro Gin Lys Ser Val Ser Phe Ser Leu Lys 545 550 555

AAT GAT CCT GAG ATT ACG ATA AAC GTT CCT CAA AGT AGC AAG GGC CAT 612 Asn Asp Pro Glu He Thr He Asn Val Pro Gin Ser Ser Lys Gly His 560 565 570

TCT GCT TCA GAC AAG GTT CAA CCG AAG AAC AAT GAC AAA AGT GAA TTT 660 Ser Ala Ser Asp Lys Val Gin Pro Lys Asn Asn Asp Lys Ser Glu Phe 575 580 585

CTG TCA ACA GCA CCT CGT AGT CTA AGA AAA AGA TTA ATA GTT CCA AGG 708 Leu Ser Thr Ala Pro Arg Ser Leu Arg Lys Arg Leu He Val Pro Arg 590 595 600

TCT CAT TCT GAC AGT GAA AGC GAA TAT TCT GCT TCC AAC TCA GAG GAT 756 Ser His Ser Asp Ser Glu Ser Glu Tyr Ser Ala Ser Asn Ser Glu Asp 605 610 615 620

GAT GAA GGG GTT GCA CAG GAA CAT GAA GAG GAC ACT AAT GCA GTC ATA 804 Asp Glu Gly Val Ala Gin Glu His Glu Glu Asp Thr Asn Ala Val He 625 630 635

TTC AGC CAA AAG ATT CAA GCT CAG AAT AGA GTA GTT TCA GCT CCT GTT 852 Phe Ser Gin Lys He Gin Ala Gin Asn Arg Val Val Ser Ala Pro Val 640 645 650

GGC AAA GAA ACA CCT TCT AAG AGA ATG AAA AGA GAT AAA ACA AGT GAC 900 Gly Lys Glu Thr Pro Ser Lys Arg Met Lys Arg Asp Lys Thr Ser Asp 655 660 665

TTA GTA GAA GAA TAT TTT GAA GCT CAC AGC AGT TCA AAA GTT TTA ACC 948 Leu Val Glu Glu Tyr Phe Glu Ala His Ser Ser Ser Lys Val Leu Thr 670 675 680

TCT GAT AGA ACA CTG CAG AAG CTA AAG AGA GCT AAA CTG GAT CAG CAA 996 Ser Asp Arg Thr Leu Gin Lys Leu Lys Arg Ala Lys Leu Asp Gin Gin 685 690 695 700

ACT TTG CGT AAC TTA TTG AGC AAG GTT TCC CCT TCC TTT TCT GCC GAA 1044 Thr Leu Arg Asn Leu Leu Ser Lys Val Ser Pro Ser Phe Ser Ala Glu 705 710 715

CTT AAA CAA CTA AAT CAA CAG TAT GAA AAA TTA TTT CAT AAA TGG ATG 1092 Leu Lys Gin Leu Asn Gin Gin Tyr Glu Lys Leu Phe His Lys Trp Met 720 725 730

CTG CAA TTA CAC CTT GGG TTC AAC ATT GTG CTT TAT GGT TTG GGT TCT 1140 Leu Gin Leu His Leu Gly Phe Asn He Val Leu Tyr Gly Leu Gly Ser 735 740 745

AAG AGA GAT TTA CTA GAA AGG TTT CGA ACC ACT ATG CTG CAA GAT TCC 1188 Lys Arg Asp Leu Leu Glu Arg Phe Arg Thr Thr Met Leu Gin Asp Ser 750 755 760

ATT CAC GTT GTC ATC AAT GGC TTC TTT CCT GGA ATC AGT GTG AAA TCA 1236 He His Val Val He Asn Gly Phe Phe Pro Gly He Ser Val Lys Ser 765 770 775 780

GTC CTG AAT TCT ATA ACA GAA GAA GTC CTC GAT CAT ATG GGT ACT TTC 1284 Val Leu Asn Ser He Thr Glu Glu Val Leu Asp His Met Gly Thr Phe 785 790 795

CGC AGT ATA CTG GAT CAG CTA GAC TGG ATA GTA AAC AAA TTT AAA GAA 1332 Arg Ser He Leu Asp Gin Leu Asp Trp He Val Asn Lys Phe Lys Glu 800 805 810

GAT TCT TCT TTA GAA CTC TTC CTT CTC ATC CAC AAT TTG GAT AGC CAG 1380 Asp Ser Ser Leu Glu Leu Phe Leu Leu He His Asn Leu Asp Ser Gin 815 820 825

ATG TTG AGA GGA GAG AAG AGC CAG CAA ATC ATT GGT CAG TTG TCA TCT 1428

Met Leu Arg Gly Glu Lys Ser Gin Gin He He Gly Gin Leu Ser Ser 830 835 840

TTG CAT AAC ATT TAC CTT ATA GCA TCC ATT GAC CAC CTC AAT GCT CCT 1476 Leu His Asn He Tyr Leu He Ala Ser He Asp His Leu Asn Ala Pro 845 850 855 860

CTC ATG TGG GAT CAT GCA AAG CAG AGT CTT TTT AAC TGG CTC TGG TAT 1524 Leu Met Trp Asp His Ala Lys Gin Ser Leu Phe Asn Trp Leu Trp Tyr 865 870 875

GAA ACT ACT ACA TAC AGT CCT TAT ACT GAA GAA ACC TCC TAT GAG AAC 1572 Glu Thr Thr Thr Tyr Ser Pro Tyr Thr Glu Glu Thr Ser Tyr Glu Asn 880 885 890

TCT CTT CTG GTA AAG CAG TCT GGA TCC CTG CCA CTT AGC TCC CTT ACT 1620 Ser Leu Leu Val Lys Gin Ser Gly Ser Leu Pro Leu Ser Ser Leu Thr 895 900 905

CAT GTC TTA CGA AGC CTT ACC CCT AAT GCA AGG GGA ATT TTC AGG CTA 1668 His Val Leu Arg Ser Leu Thr Pro Asn Ala Arg Gly He Phe Arg Leu 910 915 920

CTA ATA AAA TAC CAG CTG GAC AAC CAG GAT AAC CCT TCT TAC ATT GGC 1716 Leu He Lys Tyr Gin Leu Asp Asn Gin Asp Asn Pro Ser Tyr He Gly 925 930 935 940

CTT TCT TTT CAA GAT TTT TAC CAG CAG TGT CGG GAG GCA TTC CTC GTC 1764 Leu Ser Phe Gin Asp Phe Tyr Gin Gin Cys Arg Glu Ala Phe Leu Val 945 950 955

AAT AGT GAT CTG ACA CTC CGG GCC CAG TTA ACT GAA TTT AGG GAC CAC 1812 Asn Ser Asp Leu Thr Leu Arg Ala Gin Leu Thr Glu Phe Arg Asp His 960 965 970

AAG CTT ATA AGA ACA AAG AAG GGA ACT GAT GGA GTA GAG TAT TTA TTA 1860 Lys Leu He Arg Thr Lys Lys Gly Thr Asp Gly Val Glu Tyr Leu Leu 975 980 985

ATT CCT GTT GAT AAT GGA ACA TTG ACT GAT TTC TTG GAA AAG GAA GAA 1908 He Pro Val Asp Asn Gly Thr Leu Thr Asp Phe Leu Glu Lys Glu Glu 990 995 1000

GAG GAG GCT TGAAGCTTTC CTTTATTCTT GAATCTCCCA TGGAAGGGTT 1957

Glu Glu Ala

1005

GTACCCCAGC TGCCACTCCT CTAGTTGAAA GTGTTGTGTT TACATCTGAC ATTAAATTAT 2017

TTTTCCAGCA TACAAGATTT AAATTTGGGA AGGGGGGGAT GTCCTCAATT AGAACTTTTT 2077

GATCAGCCTG GCTGGTACCG TCTAGTACTA TGCAGCGGTC CTCAAGTTGG AGAAAATGTG 2137

CCTTTCATTC ATTACCTCTC TGGAGACTTC TTGCTGGAAT GAACAGTGTG CTCAGGGACT 2197

ATTTGGAACT GGATGTTTTT GAATTATTTT ATACTTAGAG ATATTCTGAA TTTTTTGAGG 2257

GCCTTTTAAC ACTCCCCGAG CTGATTGTTT GCAAGTGTGT TTGTTCCAGA GTGTGGAAGT 2317

ATAAAGACAT GGGCATCACG TAAATTGGTT TTGTTTGCTA TTCTGTGTGT CAGAACCAAC 2377

GAGTGTAATG GAGAGGGCAG GTCATCTCTT ATTGTTTCTA AAACAACTTA AAAGGTGTAG 2437

ATTGGGAAGA GGTGAGTGAT CCAGCTTTCT CCTTTTGGAT TGAGGCTATG TACTTGGTGG 2497

GGGCAGGGGA GGGAATATAT TATAATACTA TTCAGTTGGG ATAATGGGAA AAACAGAGTA 2557

TATAGGGTAT CTACCCAGCC TAGAAAGCAC AGGAACAATA CGTCATATAT TTGGAACAGT 2617

TATTGTCTGT GCCATGACCT TCATGATACC AGTGAGAAGC CAGGCTAGAG AAATAAAATC 2677

CTGAATTACA TTTTAGTAAT TGTTTTCAAG ACAACAAAAA ATAAAACATT TC 2729

(2) INFORMATION FOR SEQ ID NO:12:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 577 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

( i) SEQUENCE DESCRIPTION: SEQ ID NO:12:

Met Ser Lys Pro Glu Leu Lys Glu Asp Lys Met Leu Glu Val His Phe 1 5 10 15

Val Gly Asp Asp Asp Val Leu Asn His He Leu Asp Arg Glu Gly Gly 20 25 30

Ala Lys Leu Lys Lys Glu Arg Ala His Val Leu Val Asn Pro Lys Lys 35 40 45

He He Lys Lys Pro Glu Tyr Asp Leu Glu Glu Asp Asp Gin Glu Val 50 55 60

Leu Lys Asp Gin Asn Tyr Val Glu He Met Gly Arg Asp Val Gin Glu 65 70 75 80

Ser Leu Lys Asn Gly Ser Ala Thr Gly Gly Gly Asn Lys Val Tyr Ser 85 90 95

Phe Gin Asn Arg Lys His Ser Glu Lys Met Ala Lys Leu Ala Ser Glu 100 105 110

Leu Ala Lys Thr Pro Gin Lys Ser Val Ser Phe Ser Leu Lys Asn Asp 115 120 125

Pro Glu He Thr He Asn Val Pro Gin Ser Ser Lys Gly His Ser Ala 130 135 140

Ser Asp Lys Val Gin Pro Lys Asn Asn Asp Lys Ser Glu Phe Leu Ser 145 150 155 160

Thr Ala Pro Arg Ser Leu Arg Lys Arg Leu He Val Pro Arg Ser His 165 170 175

Ser Asp Ser Glu Ser Glu Tyr Ser Ala Ser Asn Ser Glu Asp Asp Glu 180 185 190

Gly Val Ala Gin Glu His Glu Glu Asp Thr Asn Ala Val He Phe Ser 195 200 205

Gin Lys He Gin Ala Gin Asn Arg Val Val Ser Ala Pro Val Gly Lys 210 215 220

Glu Thr Pro Ser Lys Arg Met Lys Arg Asp Lys Thr Ser Asp Leu Val 225 230 235 240

Glu Glu Tyr Phe Glu Ala His Ser Ser Ser Lys Val Leu Thr Ser _Asp 245 250 255

Arg Thr Leu Gin Lys Leu Lys Arg Ala Lys Leu Asp Gin Gin Thr _Leu 260 265 270

Arg Asn Leu Leu Ser Lys Val Ser Pro Ser Phe Ser Ala Glu Leu Lys 275 280 285

Gin Leu Asn Gin Gin Tyr Glu Lys Leu Phe His Lys Trp Met Leu Gin 290 295 300

Leu His Leu Gly Phe Asn He Val Leu Tyr Gly Leu Gly Ser Lys Arg 305 310 315 320

Asp Leu Leu Glu Arg Phe Arg Thr Thr Met Leu Gin Asp Ser He His 325 330 335

Val Val He Asn Gly Phe Phe Pro Gly He Ser Val Lys Ser Val Leu 340 345 350

Asn Ser He Thr Glu Glu Val Leu Asp His Met Gly Thr Phe Arg Ser 355 360 365

He Leu Asp Gin Leu Asp Trp He Val Asn Lys Phe Lys Glu Asp Ser 370 375 380

Ser Leu Glu Leu Phe Leu Leu He His Asn Leu Asp Ser Gin Met Leu 385 390 395 400

Arg Gly Glu Lys Ser Gin Gin He He Gly Gin Leu Ser Ser Leu His 405 410 415

Asn He Tyr Leu He Ala Ser He Asp His Leu Asn Ala Pro Leu Met 420 425 430

Trp Asp His Ala Lys Gin Ser Leu Phe Asn Trp Leu Trp Tyr Glu Thr 435 440 445

Thr Thr Tyr Ser Pro Tyr Thr Glu Glu Thr Ser Tyr Glu Asn Ser Leu 450 455 460

Leu Val Lys Gin Ser Gly Ser Leu Pro Leu Ser Ser Leu Thr His Val 465 470 475 480

Leu Arg Ser Leu Thr Pro Asn Ala Arg Gly He Phe Arg Leu Leu He 485 490 495

Lys Tyr Gin Leu Asp Asn Gin Asp Asn Pro Ser Tyr He Gly Leu Ser 500 505 510

Phe Gin Asp Phe Tyr Gin Gin Cys Arg Glu Ala Phe Leu Val Asn Ser 515 520 525

Asp Leu Thr Leu Arg Ala Gin Leu Thr Glu Phe Arg Asp His Lys Leu 530 535 540

He Arg Thr Lys Lys Gly Thr Asp Gly Val Glu Tyr Leu Leu He Pro 545 550 555 560

Val Asp Asn Gly Thr Leu Thr Asp Phe Leu Glu Lys Glu Glu Glu Glu 565 570 575

Ala

Previous Patent: OPTICAL RESOLUTION OF ALKYL CHROMAN-2-CARBOXYLATES

Next Patent: ELECTROCHEMILUMINESCENT MONITORING OF COMPOUNDS