Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ENDONUCLEASE FOR GENOME EDITING
Document Type and Number:
WIPO Patent Application WO/2014/121222
Kind Code:
A1
Abstract:
A chimeric endonuclease is provided comprising the GIY-YIG nuclease domain which is linked to a DNA-targeting domain by a linking domain. The endonuclease is useful in gene editing.

Inventors:
EDGELL DAVID R (CA)
KLEINSTIVER BENJAMIN P (CA)
WANG LI (US)
BOGDANOVE ADAM J (US)
Application Number:
PCT/US2014/014491
Publication Date:
August 07, 2014
Filing Date:
February 03, 2014
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV WESTERN ONTARIO (CA)
EDGELL DAVID R (CA)
KLEINSTIVER BENJAMIN P (CA)
WANG LI (US)
BOGDANOVE ADAM J (US)
International Classes:
C12N9/10; C12N15/54; C12N15/63
Domestic Patent References:
WO2011064751A12011-06-03
Foreign References:
US20110158957A12011-06-30
US20130210151A12013-08-15
Other References:
KLEINSTIVER, BENJAMIN P. ET AL.: "Monomeric site-specific nucleases for genome editing", P. N. A. S., vol. 109, no. 21, 22 May 2012 (2012-05-22), pages 8061 - 8066
Attorney, Agent or Firm:
CARROLL, Lawrence J. (P.O. Box 70371201 West Peachtree Stree, Atlanta Georgia, US)
Download PDF:
Claims:
CLAIMS

We Claim:

1. A chimeric endonuclease comprising a nuclease domain and a DNA-targeting domain, wherein the chimeric endonuclease is capable of cleaving double-stranded DNA as a monomer.

2. A chimeric endonuclease according to claim 1, wherein the nuclease domain comprises all or a portion of 1-TevI.

3. A chimeric endonuclease according to claim 1, further comprising a linking domain.

4. A chimeric endonuclease according to any one of claims 1 -3, wherein the DNA-targeting domain comprises all or a portion of a TAL domain.

5. A chimeric endonuclease comprising all or a portion of I-Tevl nuclease domain and all or a portion of a TAL DNA-targeting domain,

6. A chimeric endonuclease according to claim 5, further comprising a linker between the nuclease domain and the TAL domain.

7. A chimeric endonuclease according to claim 5 or 6, wherein the I-Tevl nuclease is N-terminal to the TAL domain,

8. A chimeric nuclease according to any one of claims 5 -7, wherein the I-Tevl nuclease domain begins at the N-terminal of 1-TevI and comprises S206, N201 , D184, N169, N140, D127, or S I 14.

9. A chimeric endonuclease according to any one of claims 5 -8, wherein the TAL domain begins at T221 , T120, VI 52, G 187, 1214, P218, Dl, E 18 I, V184, A191 , A195, T209, or Q21 1.

10. A nucleic acid molecule encoding a chimeric endonuclease according to any one of claims 1 -9.

1 1. A method of inactivating a gene, comprising:

introducing a nucleic acid molecule encoding a chimeric endonuclease according to any one of claims 1-9 into a cell comprising the gene under conditions causing the expression of the chimeric endonuclease, wherein the chimeric endonuclease binds DNA and cleaves the gene.

12. A method according to claim 1 1, wherein the expression of the chimeric endonuclease is transient.

13. A method according to claim 1 1, wherein the cell is a plant cell.

14. A method according to claim 1 1, wherein the nucleic acid molecule is an mRNA.

15. A method of altering a gene in a cell, comprising:

introducing a first nucleic acid molecule encoding a chimeric endonuclease according to any one of claims 1 -9 into a cell comprising the gene under conditions causing the expression of the chimeric endonuclease and cleavage of the gene;

introducing a second nucleic acid molecule into the cell wherein the second nucleic acid molecule comprises a region having a nucleotide sequence that has a high degree of sequence identity to all or a portion of the gene in the region of the cleavage site under conditions causing homologous recombination to occur between the second nucleic acid molecule and the gene.

16. A method according to claim 15, wherein the region comprises 500 basepairs that are homologous to the gene.

17. A method according to claim 16, wherein the region comprises an altered sequence when compared to the gene of interest.

18. A method according to claim 17, wherein the region comprises one or more mutations that will result in changes to one or more amino acids in a protein encoded by the gene.

19. A method according to claim 15, wherein the chimeric endonuclease is transiently expressed in the cell.

20. A method according to claim 19, wherein the first nucleic acid molecule is mRNA.

21. A method according to claim 15, wherein the second nucleic acid molecule is a linear DNA molecule.

22. A method according to claim 1 , wherein the cell is a plant cell.

23. A method for deleting all or a portion of a gene, comprising:

introducing a first nucleic acid molecule encoding a chimeric endonuclease according to any one of claims 1 -9 into a cell comprising the gene under conditions causing expression of the chimeric endonuclease and cleavage of the gene;

introducing into the cell a second nucleic acid molecule comprising a region having a nucleotide sequence that has a high degree of sequence identity to the gene in the region of the cleavage site under conditions causing homologous recombination to occur between the second nucleic acid molecule and the gene, wherein the nucleotide sequence lacks the sequence of the gene adjacent to the cleavage site.

24. A method according to claim 23, wherein the region comprises 500 basepairs that are homologous to the gene.

25. A method according to claim 24, wherein the region comprises an altered sequence when compared to the gene of interest.

26. A method according to claim 25, wherein the region comprises one or more mutations that will result in changes to one or more amino acids in a protein encoded by the gene.

27. A method according to claim 23, wherein the chimeric endonuclease is transiently expressed in the cell.

28. A method according to claim 23, wherein the first nucleic acid molecule is mRNA.

29. A method according to claim 23, wherein the second nucleic acid molecule is a linear DNA molecule.

30. A method according to claim 23, wherein the cell is a plant cell.

31 . A method for making a cell having an altered genome, comprising:

introducing into the cell a first nucleic acid molecule encoding a chimeric endonuclease according to any one of claims 1-9 under conditions causing expression of the chimeric endonuclease and cleavage of the gene.

32. A method according to claim 31, wherein the altered genome comprises an inactivated gene.

33. A method according to claim 31 , comprising: introducing into the cell a second nucleic acid molecule comprising a region having a nucleotide sequence that has a high degree of sequence identity to the gene in the region of the cleavage site under conditions causing homologous recombination between the gene and the second nucleic acid, wherein the homologous region comprises an altered sequence when compared to the gene.

34. A method according to claim 33, wherein the region comprises 500 basepairs that are homologous to the gene.

35. A method according to claim 34, wherein the region comprises one or more mutations that will result in changes to one or more amino acids in a protein encoded by the gene.

36. A method according to claim 33, wherein the nucleotide sequence of the region lacks the sequence of the gene adjacent to the cleavage site.

37. A method according to claim 33, wherein the chimeric endonuclease is transiently expressed in the cell.

38. A method according to claim 33, wherein the first nucleic acid molecule is mRNA.

39. A method according to claim 34, wherein the second nucleic acid molecule is a linear DNA molecule.

40. A method according to claim 33, wherein the cell is a plant cell.

41. A nucleic acid substrate for the endonuclease as defined in any one of claims 1 -9, said substrate comprising a cleavage motif of the nuclease domain, a spacer that correlates with the linking domain and a binding site for the DNA -targeting domain.

42. A cell incorporating the substrate as defined in claim 41.

43. A kit comprising the nucleic acid molecule of claim 10 and the substrate of claim

Description:
ENDONUCLEASE FOR GENOME EDITING

Statement of Government Rights

[0001] This invention was made with government support under 0820831 awarded by the National Science Foundation. The government has certain rights in the invention.

Field of the Invention

[0002] The present application relates generally to endonucleases useful for gene editing.

Background of the Invention

[0003] Precise genome editing is enhanced by the introduction of a double- strand break (DSB) at defined positions, and two distinct site-specific DNA endonuclease architectures have been developed towards this goal. One of these architectures relies on reprogramming the DNA-binding specificity of naturally occurring LAGLIDADG (SEQ ID NO: l) homing endonucleases (LHEs) to target desired sequences. The other architecture utilizes the reprogrammable DNA-binding specificity of zinc-finger proteins or the DNA-binding domains of transcription activator-like effectors (TAL-effectors) that are fused to the non-specific nuclease domain of the type IIS restriction enzyme Fokl to create chimeric zinc-finger nucleases (ZFNs) or TAL-effector nucleases (TALENs). Regardless of the architecture, the underlying biology of the component proteins imposes design challenges and the relative merits of the LHE and the ZFN/TALEN architectures are the subject of much debate in the literature. One notable constraint imposed by the Fokl nuclease domain is the requirement to function as a dimer to efficiently cleave DNA. For any given DNA target, this necessitates the design of two distinct ZFNs (or two TALENs), such that each zinc finger or TAL-effector domain is oriented to promote Fokl dimerization and DNA cleavage. Off-target DSBs have been observed with ZFNs, likely promoted by binding at degenerate sites and by DNA-bound ZFNs recruiting ZFNs in solution to promote DNA hydrolysis. Many engineering strategies have been employed with varying degrees of success to reduce off-target effects, including creating sets of complementary heterodimeric nuclease domains, addition of zinc-finger modules, optimization of the Fokl-zinc finger linker, and in vitro and in vivo selections to increase zinc-finger binding specificity. [0004] Expanding the repertoire of DNA nuclease domains with distinctive properties is necessary to facilitate the development of new genome editing reagents. Indeed, a number of recent studies have explored the potential of alternative dimeric sequence- specific nuclease domains for genome editing applications. These dimeric nuclease domains, however, still require the design of two nuclease fusions for precise targeting. The GIY-YIG nuclease domain is associated with a variety of proteins with diverse cellular functions. The small (-100 aa) globular GIY-YIG domain is characterized by a structurally conserved central three-stranded antiparallel β sheet, with catalytic residues positioned to utilize a single metal ion to promote DNA hydrolysis. Intriguingly, the GIY-YIG homing endonucleases, typified by the isoschizomers I-TevI (a double-strand DNA endonuclease encoded by the mobile td intron of phage T4), I-Bmol and I-Tulal bind DNA as monomers. It is unknown, however, if GIY-YIG homing endonucleases function as monomers in all steps of the reaction, as it is possible that dimerization between GIY-YIG nuclease domains is necessary for efficient DNA hydrolysis, as is the case with Fokl. Notably, GIY-YIG homing endonucleases require a specific DNA sequence to generate a DSB. For I- TevI, the bottom (†) and top (j) strand nicking sites lie within a 5'-CN†N|G-3 ' motif (referred to as CNNNG or CXXXG), with the critical G optimally positioned -28 bp from the where the H-T-H module of the I-TevI DNA-binding domain interacts with substrate.

[0005] It would be desirable to develop novel endonucleases for use in genome editing that overcome one or more disadvantages of existing endonucleases.

Summary of the Invention

[0006] The present invention provides chimeric endonucleases and methods of making and using such chimeric endonucleases. In one embodiment of the invention, the present invention provides a chimeric endonuclease comprising at least a nuclease domain and a DNA-targeting domain. Typically, the nuclease domain has the ability to cleave double-stranded DNA, typically at a specific DNA sequence. In some embodiments, the nuclease is capable of cleaving double-stranded DNA as a monomer. The nuclease domain may be derived from a homing endonuclease. Suitable examples of homing endonucleases include, but are not limited to, homing endonucleases of the LAGLIDADG, HNH, His-Cys box, and GIY-YIG families. In one embodiment of the invention, a chimeric endonuclease of the invention comprises a nuclease domain derived from a homing endonulcease of the GIY-YIG family. Suitable examples of homing endonucleases of the GIY-YIG family include, but are not limited to, I-TevI and I-Bmol. In some embodiments, a chimeric endonuclease of the invention comprises the nuclease domain of I-TevI. Chimeric endonucleases of the invention may be provided as part of a composition, for example, a pharmaceutical composition. The present invention also provides cells, cell lines and transgenic organisms (e.g., plants, fungi, animals) comprising one or more chimeric endonucleases of the invention. Suitable cells include, but are not limited to, mammalian cells (e.g., mouse cells, human cells, rat cells, etc.) which may be stem cells, avian cells, plant cells, bacterial cell, fungal cells (e.g., yeast cells), and any other type of cell known to those skilled in the art.

[0007] Any specific DNA-binding domain known to those skilled in the art may be used as a DNA-targeting domain in the practice of the present invention. Examples include, but are not limited to, the DNA-binding domains of TAL-effector proteins (which will be referred to herein as TAL domains), such as PthXo l and AvrBs3 (from Xanthamonas campestris); zinc finger domains, e.g. ryA zinc finger binding domain and ryB zinc finger binding domain, and other distinct DNA-binding domains, such as the binding domain in LAGLIDADG homing endonucleases, for example I-Onul. In some embodiments, the entire LAGLIDADG homing endonuclease, not just the binding domain, may be used as a DNA-targeting domain in the practice of the present invention. In some embodiments, the nuclease activity of the LAGLIDADG endonuclease may be disrupted, for example, with a point mutation, such that it acts as a DNA-binding platform only.

[0008] In some embodiments, a chimeric endonuclease of the invention may comprise one or more additional domains. Examples of additional domains include, but are not limited to, linking domains and functional domains. Typically, linking domains may be disposed between two functional domains, for example, between a nuclease domain and a DNA-targeting domain. Other functional domains include domains comprising nuclear localization signals, transcription activating domains, dimerization domains, and other functional domains known to those skilled in the art. [0009] The present invention also provides nucleic acid molecules encoding the chimeric endonucleases of the invention. Such molecules may be DNA or RNA. Typically, DNA molecules will comprise one or more promoter regions operably linked to a nucleic acid sequence encoding all or a portion of a chimeric endonuciease of the invention. Nucleic acid molecules of the invention may be provided as part of a larger nucleic acid molecule, for example, an expression vector. Suitable expression vectors include, but are not limited to, plasmid vectors, viral vectors, and retroviral vectors. Nucleic acid molecules of the invention may be provided as part of a composition, for example, a pharmaceutical composition. The present invention also provides cells, cell lines and transgenic organisms (e.g., plants, fungi, animals) comprising one or more nucleic acid molecules of the invention. Suitable cells include, but are not limited to, mammalian cells (e.g., mouse cells, human cells, rat cells, etc.) which may be stem cells, avian cells, plant cells, insect cells, bacterial cells, fungal cells (e.g., yeast cells), and any other type of cell known to those skilled in the art.

[0010] In a further embodiment of the invention, a method of cleaving a target nucleic acid is provided comprising the step of exposing target nucleic acid to a chimeric endonuciease as defined above, wherein the DNA targeting domain of the endonuciease binds to the target nucleic acid and the nuclease domain cleaves the target nucleic acid. In some embodiments, the target nucleic acid may be a gene of interest in a cell. Thus, methods of the invention may be used in genomic editing applications. Typically a method of this type will comprise introducing, into the cell, one or more one chimeric endonucleases of the invention that bind to a target nucleic acid sequence in the gene (or nucleic acid molecules encoding such chimeric endonucleases under conditions resulting in expression of the chimeric endonucleases), wherein the DNA-targeting domain of the endonuciease binds to the target nucleic acid sequence and the nuclease domain cleaves the target nucleic acid. In some embodiments, cleavage of the gene results in disrupting the function of the gene as repair of the double-stranded break introduced by the chimeric endonuciease of the invention may result in one or more insertions and or deletions of nucleotides at the site of the break. [001 1 ] In another embodiment, the present invention provides a method for introducing an exogenous nucleotide sequence into the genome of a cell. Such methods typically comprise, introducing, into the cell, one or more chimeric endonucleases of the invention (or nucleic acid molecules encoding such chimeric endonucleases under conditions resulting in expression of the chimeric endonucleases), wherein the DNA-targeting domain of the endonuclease binds to the target nucleic acid and the nuclease domain cleaves the target nucleic acid, and contacting the cell with an exogenous polynucleotide; under conditions such that the exogenous polynucleotide is integrated into the genome by homologous recombination. In some embodiments, the exogenous polynucleotide may comprise a nucleic acid sequence that is capable of interacting with a protein. Suitable examples of such sequences include, but are not limited to, recognition sites (e.g., endonuclease recognition sites, recombinase recognition sites), promoter sequences, and protein binding sites.

[0012] In some embodiments, the present invention provides a chimeric endonuclease. Such a chimeric endonuclease typically comprises a nuclease domain and a DNA-targeting domain. In some embodiments, the chimeric endonuclease is capable of cleaving double-stranded DNA as a monomer. In some embodiments, the nuclease domain is a site-specific nuclease domain, which may be from a homing endonuclease. A suitable example of a homing endonuclease is a GIY-YIG homing endonuclease, for example 1-TevI. A chimeric endonuclease of the invention may further comprise a linking domain. In some embodiments, the DNA-targeting domain is a TAL domain. In one embodiment, the chimeric endonuclease comprises a I-Tevl nuclease domain and a TAL DNA-targeting domain. In some embodiments, I-Tevl nuclease is N-terminal to the TAL domain. The present invention also provides nucleic acid molecules encoding chimeric endonucleases as described above.

[0013] In some embodiments, the present invention provides a method of inactivating a gene. Such methods typically comprise introducing into a cell comprising the gene a nucleic acid molecule encoding a chimeric endonuclease as described above under conditions causing the expression of the chimeric endonuclease. Typically the chimeric endonuclease comprises a DNA-targeting domain that binds the gene and cleaves it. In some embodiments, the expression of the chimeric endonuclease is transient, in some embodiments, the cell is a plant cell. In some embodiments, the nucleic acid molecule is an mRNA.

[0014] In some embodiments, the present invention provides a method of altering a gene in a cell. Such methods typically comprise introducing a first nucleic acid molecule encoding a chimeric endonuclease as described above into a cell comprising the gene under conditions causing the expression of the chimeric endonuclease and cleavage of the gene. Such methods may further comprise introducing a second nucleic acid molecule into the cell. Typically, the second nucleic acid molecule comprises a region having a nucleotide sequence that has a high degree of sequence identity to all or a portion of the gene in the region of the cleavage site. The second nucleic acid molecule is introduced under conditions causing homologous recombination to occur between the second nucleic acid molecule and the gene. In some embodiments, the region of high sequence identity comprises a sequence that is highly identical to all or a portion of the sequence of the gene. In some embodiments, the region of high sequence identity of the second nucleic acid molecule is not 100% identical to the corresponding region of the gene. Instead the region comprises an altered sequence when compared to the gene of interest. Typically, the region may comprise one or more mutations that will result in changes to one or more amino acids in a protein encoded by the gene. In some embodiments, the chimeric endonuclease is transiently expressed in the cell. In some embodiments, the first nucleic acid molecule is mRNA. In some embodiments, the second nucleic acid molecule is a linear DNA molecule. In some embodiments, the cell is a plant cell.

[0015] The present invention provides a method for deleting all or a portion of a gene in a cell. Such methods typically comprise introducing a first nucleic acid molecule encoding a chimeric endonuclease as described above into a cell comprising the gene under conditions causing expression of the chimeric endonuclease and cleavage of the gene. A second nucleic acid molecule comprising a region having a nucleotide sequence that has a high degree of sequence identity to the gene in the region of the cleavage site is introduced into the cell under conditions causing homologous recombination to occur between the second nucleic acid molecule and the gene. Typically, the region of high sequence identity lacks the sequence of the gene adjacent to the cleavage site. In some embodiments, the region of high sequence identity comprises a sequence that is highly identical to all or a portion of the sequence of the gene. In some embodiments, the region of high sequence identity of the second nucleic acid molecule is not 100% identical to the corresponding region of the gene. Instead the region comprises an altered sequence when compared to the gene of interest. In some embodiments, the region comprises one or more mutations that will result in changes to one or more amino acids in a protein encoded by the gene. In some embodiments, the chimeric endonuclease is transiently expressed in the cell. In some embodiments, the first nucleic acid molecule is mRNA. In some embodiments, the second nucleic acid molecule is a linear DNA molecule. In some embodiments, the cell is a plant cell.

[0016] The present invention provides a method for making a cell having an altered genome. Such methods typically comprise introducing into the cell a first nucleic acid molecule encoding a chimeric endonuclease as described above under conditions causing expression of the chimeric endonuclease and cleavage of the gene. In some embodiments, the altered genome comprises an inactivated gene. Methods of making a cell having an altered genome may also comprise introducing into the cell a second nucleic acid molecule comprising a region having a nucleotide sequence that has a high degree of sequence identity to the gene in the region of the cleavage site. The second nucleic acid molecule is introduced into the cell under conditions causing homologous recombination between the gene and the second nucleic acid, wherein the region of high sequence identity comprises an altered sequence when compared to the gene. In some embodiments, the region of high sequence identity comprises a sequence that is highly identical to all or a portion of the sequence of the gene. In some embodiments, the region comprises one or more mutations that will result in changes to one or more amino acids in a protein encoded by the gene. In some embodiments, the nucleotide sequence of the region lacks the sequence of the gene adjacent to the cleavage site. In some embodiments, the chimeric endonuclease is transiently expressed in the cell. In some embodiments, the first nucleic acid molecule is mRNA. In some embodiments, the second nucleic acid molecule is a linear DNA molecule. In some embodiments, the cell is a plant cell. [0017] The present invention provides a nucleic acid substrate for the chimeric endonuclease as described above. Such a substrate will typically comprise a cleavage motif of the nuclease domain, a spacer that correlates with the linking domain and a binding site for the DNA-targeting domain. The present invention also provides cells, for example plant cells, incorporating the substrate.

[0018] The present invention provides kits comprising nucleic acid molecules encoding the chimeric endonucleases described above and a substrate for the chimeric endonuclease. In another embodiment, the invention provides kits comprising the chimeric endonucleases of the invention. Kits of the invention can be used for genomic editing using the methods described above.

[0019] These and other aspects of the invention will become apparent from the detailed description by reference to the following figures.

Brief Description of the Figures

[0020] Figure 1 illustrates that I-Bmol functions as a monomer. Figure 1A provides graphs of progress curves of initial reaction velocity for eight I-Bmol concentrations with fixed amount (ΙΟηΜ) of pBmolHS target site plasmid (left) and plot of initial velocity versus I-Bmol protein concentration (right). Figure IB provides graphs showing results of time course assays showing cleavage of 1 - or 2- site target plasmids by I-Bmol;

[0021 ] Figure 2 schematically illustrates the design and functionality of chimeric GIY-YIG endonucleases of the invention. Figure 2a provides a schematic modeling of a Tev-zinc finger fusion with DNA substrate using structures of the I- TevI catalytic domain (PDB 1MK0), the I-Tevl DNA-binding domain co-crystal (PDB 113J), and the Zif268 co-crystal (PDB 1 AAY). Figure 2b (upper) provides a schematic of a chimeric I-Tevl endonuclease-ryA construct showing the fusion point as the last I-Tevl amino acid, with an optional 2xGlycine or 4xGlycine linker and 6xHis tag at the C-terminal end, and (lower) a Tev-ryA substrate including 33-nts of the top strand of the I-Tevl td homing site substrate (T21.33), fused to the 5' end of the ryA-binding site. The substrate is numbered from the first base of the td homing site sequence (note that is numbering scheme is reverse of that used for the native td homing site). The different substrates tested differ by one or two T residues inserted at the junction of the td/ryA sites. Figure 2c (upper) provides a schematic of a chimeric I-Bmol endonuclease-ryA construct showing the fusion point as the last I- Bmol amino acid, with an optional 2xGlycine or 4xGlycine linker and 6xHis tag at the C-terminal end, and (lower) a I-BmoI-ryA substrate including 33-nts of the top strand of the I-Bmol homing site substrate (BZ1.33), fused to the 5' end of the ryA- binding site. Figure 2d provides a schematic representation of the two plasmids used in the genetic selection system, where the fusion protein is expressed from pExp and the hybrid targets sites are cloned onto the pTox plasmid harboring the ccdB gyrase toxin;

[0022] Figure 3 shows chimeric GIY-YIG endonuclease target specificity.

Figure 3a is an SDS-PAGE that shows purification of TevN201-zinc finger endonuclease (ZFE). Figure 3b is an SDS-PAGE that shows purification of a BmoN221-ZFE. Lanes are marked as follows: M, marker with molecular weights in kDa indicated on the left; UN, uninduced culture; IND, induced culture; C, crude lysate; FT, flow-through from metal-affinity column; W, wash; E, elution. Figure 3c is a sequencing gel that shows mapping of TevN201-ZFE cleavage sites on the TZ1.33 substrate, with top and bottom cleavage sites indicated below on the Tev-ryA substrate by open and closed triangles, respectively. Figure 3d is a sequencing gel that shows mapping of BmoN221-ZFE cleavage sites on the BZ1.33 substrate, with top and bottom cleavage sites indicated below on the Bmo-ryA substrate. Figure 3e (left) shows the sequences of the wild-type TZ1.33, the TZ1.33 G5A, and TZ 1.33 C IA/G5A mutant substrates and (right) is a bar graph that shows the ECo.smax determinations for each substrate, with ECo.smax values in nM with standard deviations from three experimental trials;

[0023] Figure 4A provides the amino acid sequences of chimeric GIY-YIG I-

TevI endonucleases of the invention. Figure 4B provides the amino acid sequences of chimeric I-Bmol endonucleases of the invention

[0024] Figure 5 illustrates that TevN201-ZFE functions as a monomer. Figure

5a (left) is a graph of initial reaction progress for seven TevN201-ZFE concentrations expressed as percent linear product. Protein concentrations from highest to lowest are 47 nM, 32.5 nM, 23 nM, 1 InM, 6nM, 3 nM, and 0.7 nM. Figure 5a (right) is a graph of initial reaction velocity (nM s "1 ) versus TevN201 -ZFE concentration (nM). Figure 5b provides graphs of the results of cleavage assays with 90 nM TevN201 -ZFE and 10 nM one-site pTZ1.31 plasmid (left), or two-site pTZ1.31 plasmids with the same orientation of sites (center) and two-site pTZ1.31 plasmids with the opposite orientation of sites (right);

[0025] Figure 6 provides a schematic comparison of GIY-YIG ZFEs and

ZFNs. (upper) The GIY-YIG nuclease fusion is to the ryA zinc finger, and (lower) the two ZFNs are fusions of the Fokl nuclease domain to ryA and ryB zinc fingers. The central portion of the GIY- YIG ZFE substrate is shown as random sequence (N).

[0026] Figure 7 shows various GIY-YIG TAL domain chimeric endonuclease constructs of the invention. Figure 7A (upper) is a schematic of the chimeric endonuclease I-TevI PthXol fusion proteins including amino acid sequences of I- TevI/PthXol fusion proteins, (lower) shows the sequences of various hybrid I- TevI/PthXol substrates. Figure 7B provides the amino acid sequence of various I- TevI/PthXol chimeric endonucleases of the invention. Figure 7C provides the sequences of various I-TevI/PthXol hybrid target sites. Figure 7D shows the amino acid sequences of various I-BmoI/PthXo 1 chimeric endonucleases of the invention. Figure 7E shows the sequences of various I-BmoI PthXol target sites.

[0027] Figure 8 is photograph of an ethidium bromide gel showing the double-stranded cleavage of various sized substrates.;

[0028] Figure 9 (upper) is a schematic of the assay used to individually demonstrate cleavage of top and bottom strands (lower) is a gel showing the results of the assay with variously sized substrates ;

[0029] Figure 10A is a schematic of an in vitro endonuclease selection protocol. Figure 10B is a graph illustrating the frequency of each nucleotide at various positions in a substrate space as determined by the assay of Figure 10 A. A positive value means an increase in nucleotide frequency, while a negative value means a decrease in nucleotide frequency. Note that position 15 can be mutated without effect on activity. Figure IOC is a schematic showing a correlation of the sequence of the DNA spacer binding motif with the 1-Tevl binding domain. The figure shows a correlation between the preferred DNA bases in the DNA spacer region of the substrate with conserved DNA bases of the native I-Tevl target site in thymidylate synthase genes. Homing endonucleases, such as I-Tevl, target genes that encode for conserved proteins. Doing so maximizes their opportunity to spread between related genomes. Further, the homing endonucleases target DNA sequence that corresponds to conserved amino acids of the target gene - again, by using these DNA sequences as recognition determinants it maximizes potential to spread. This figure was using this correlation as a justification for why those positions in the DNA spacer are important;

[0030] Figure 1 1 graphically illustrates the frequency of the I-Tevl cleavage motif in human cDNAs;

[0031 ] Figure 12A provides the sequences of the target substrates isolated from a bacterial two plasmid genetic selection assay, and 12B is a bar graph showing percent survival based on substrate spacers as determined by the assay;

[0032] Figure 13 graphically illustrates the results of a yeast assay for a

TevN169 endonuclease using substrates shown in Fig. 12. Substrate TO20 has the following sequence 5'-

CAACGCTCAGTAGATGTTTTGGTCCACATATΊΓTAACCTTTTG-3 , (SEQ ID NO:2), Substrate Zif268 has the following sequence 5'-GCGTGGGCG-3' (SEQ ID NO:3);

[0033] Figure 14 graphically illustrates the results of a yeast assay for a

TulaK169 endonuclease using substrates shown in Fig. 12(A);

[0034] Figure 15A provides the amino acid sequence of endonuclease I-Bmol.

Figure I 5B provides the amino acid sequence of endonuclease I-Tevl. Figure 15C provides the amino acid sequence of endonuclease I-Tulal. Figure 15D provides an amino acid alignment of the linker regions of I-Tulal, I-Tevl, and I-Bmol;

[0035] Figure 16A provides the amino acid sequences of DNA binding proteins, PthXo l , AvrBs3, r A, ryB and I-Onul. Figure 16B provides the sequences of the binding sites of each; [0036] Figure 17A provides the amino acid sequences of various I-Tevl-zinc finger chimeric endonucleases. Figure 17B provides the amino acid sequences of various I-BmoI-zinc finger chimeric endonucleases;

[0037] Figure 18 provides the amino acid sequences of I-Tevl-I-Onul chimeric endonucleases;

[0038] Figure 19 provides the amino acid sequences of I-TevI-TAL chimeric endonucleases;

[0039] Figure 20 provides the amino acid sequence of an I-Tulal-ONU chimeric endonuclease;

[0040] Figure 21 provides a sequence alignment of two TAL-effector proteins

Avrb6 from Xanthomonas citri subsp, Malvacearum GenBank accession number AAB00675.1 and PthN from Xanthomonas campestris GenBank accession number AAB69865.1

[0041 ] Figure 22 A provides a general schematic of the preparation of I-TevI -

TAL fusions and Figure 22B provides the nucleic acid sequence of the DNA substrates that were used to test the activity of the fusions.

[0042] Figure 23 shows the results of activity assays of the TEV-TAL fusions against substrates with various length DNA spacer lengths.

[0043] Figure 24 shows the activity of TEV-TAL12 fusion on various DNA substrates derived from phage thymidylate synthase genes tested in a yeast-based assay system.

[0044] Figure 25 shows the activity of the Tev-TALl l and Tev-TAL12 fusions on different DNA substrates in HE 293 cells using a GFP assay. Upper panel shows bright field images (left side) and fluorescent images (right side) showing that each construct was active in HE 293 cells as judged by GFP + cells in fluorescent images. Bottom panel shows Western blot analyses of whole cell extracts for full-length GFP. [0045] Figure 26 shows the results of assays of optimizing mTALEN architecture in yeast. (A) Boxplots of β-galactosidase activity on substrates with different length DMA spacers normalized to a homodimeric ZFN control. Experiments were carried out using the constructs depicted in Figure 22. The fusion points of the I-Tevl S206 fragment to the PthXol N-terminal residue are indicated above each set of plots. The upper and lower limits of the boxes indicate the 25 th and 75 th percentile of the data, the solid bar indicates the median of the data, and the ends of the whiskers represent 1.5 times the interquartile range. Data points outside of the interquartile range (outliers) are shown as black points. (B) Boxplots showing activity of shorter I-Tevl fragments fused to the Tl 20 or VI 52 residues of PthXo l .

[0046] Figure 27 shows mapping of mTALEN cleavage sites. (A) Schematic of double-strand oligonucleotide substrate labeled on top- and bottom-strands. The top-strand nicking product is indicated by an open triangle, and the bottom-strand nicking product by a filled triangle. Representative denaturing polyacrylamide gel of cleavage reactions with the S206-T120 mTALEN. Top- and bottom-strand products are represented by open and filled triangles, respectively. (B) In vitro mapping of N169-T120 mTALEN cleavage sites on supercoiled plasmid substrates containing a 15-bp spacer. Run-off sequencing reactions (representative ABI traces shown) allow the determination of cleavage sites, where the complement of the sequence shown in the trace is read (taking into account that an extra "A" is added during the sequencing reactions). (C) In vitro c leavage mapping on « ?/-PthXol plasmid su bstrates that contain four CNNNG motifs. The open and filled red triangles indicate secondary cleavage sites inferred from run-off sequencing. The electropherograms shown are derived from the nptH substrate. (D) In vivo activity of mTALENs on npt and nptACS substrates in the yeast-based assay, where activity of the N 169-V 152 or D184-V152 mTALENs is normalized to a homodimeric ZFN control and shown relative to the wild typel-Tev target with 15-bp spacer.

[0047] Figure 28 shows the nucleotide preferences between the C and G bases at the CNNNG cleavage site. (A) Schematic of the substrate used, with the randomized positions indicated and wild-type sequence shown. (B) Effect of single, double, or triple substitutions in the NNN motif on cleavage efficiency relative to the wild-type AAC sequence. Boxplots are as in Figure 26, with outliers shown as dots. (C) Heatmap indicating 169-T120 mTALEN activity on individual NNN sequences, grouped according to the number of changes from the wild-type sequence. Axes are labeled by the first, second, and third nucleotide in the NNN sequence. The color of each motif represents the median value plotted on a log 10 scale for N169-T129 mTALEN activity on each sequence. (D) Boxplots showing the effect of mutations on cleavage activity in all different contexts relative to the wild type motif for each position of the NNN triplet.

[0048] Figure 29 shows mTALEN accommodation of nucleotide variation in the DNA spacer region. (A) Boxplot of activity for 45 single nucleotide substitutions in the TP 15 DNA spacer, normalized to mTALEN activity on the TP 15 wild-type substrate. Plotted are the mean activity values of three biological replicates, with each biological replicate averaged from three technical replicates. The wild-type nucleotide at each position in the spacer is indicated at the top of the plot. (B) On the left are substrates derived from phage-encoded td genes, highlighting differences in the DNA spacer and cleavage motif relative to the wild-type td sequence from phage T4 (lower case red letters). On the right are boxplote showing β-galactosidase activity in the yeast-based assay for the N169-T120 mTALEN against the different td substrates. Boxplots are labeled as in Figure 26.

[0049] Figure 30 shows the results of screening of a randomized DNA spacer library in yeast. (A) Schematic of the TP_1 N library as compared to the wild-type TP 15 sequence, with positions in the DNA spacer number from 1 to 15. Shown below is a representative example of 96-well microtitre plate assay, where the individual wells are colored according to β-galactosidase activity (in Miller units). The red rectangles at the top right indicate the positive and negative controls. Yellow rectangles indicate active clones whose activity were greater than or within 2 standard deviations of the wild-type control, averaged over three technical replicates. (B) Plot of nucleotide enrichment per position based on sequencing data for the active and inactive clones. The height of each letter (A, G, C, or T) indicates the enrichment value on a log 2 scale, calculated as the difference in nucleotide proportion per position between the active and inactive clones. The dashed lines indicate 2 standard deviations from the mean of the dataset. [0050] Figure 31 shows activity of mTALENs in HEK293T cells on episomal targets. (A) Schematic of the vectors used for co-transfection experiments. For the expression vector, the mTALEN gene is separated from the mCherry translation reporter by a T2A peptide. (B) Example of mTALEN expression vector transfection efficiency and expression in HEK293T cells, with bright field image on the left and the epifluorescent image (1 sec exposure) of the same field of view on the right. (C) Schematic of the TP15 target, with the Ddel restriction site indicated and sizes of Ddel digestion products indicated. (D) Agarose gel of a representative assay where the target region has been amplified by PCR from total DNA isolated 48 hrs post transfection. Products were digested with Ddel (+) or incubated in buffer without Ddel (-); fragments resistant to cleavage by Ddel due to mutagenic repair are indicated (CR, cleavage resistant). (E) Examples of mutations induced by N169-T120 or D184-V I 52 mTALENs on episomal target plasmids. The Ddel-resistant fragments from panel (C) were cloned and sequenced. Sequences of CR fragments are shown, with dashes indicating the length of deletions relative to the wild-type sequence.

[0051 ] Figure 32 provides the amino acid sequences of some mTALEN constructs of the present invention.

Detailed Description of the Invention

[0052] The present invention provides novel chimeric polypeptides. As used herein, a "chimeric" polypeptide typically comprises two or more regions of amino acid sequence (also referred to as domains) that were derived from different proteins. As used herein, a chimeric polypeptide may also be referred to as a "fusion," "fusion protein" or "fusion polypeptide." Typically, a chimeric polypeptide of the invention may comprise a first region derived from a first protein and a second region derived from a second protein. Typically the first and the second protein will be different protein molecules, however, the present invention encompasses situations where the first and second regions are portions of one larger protein. Regions of a chimeric polypeptide of the invention may be fused together. As used herein, regions are "fused" when the regions are part of one contiguous string of amino acids.

[0053] One example of a chimeric polypeptide of the invention is a chimeric polypeptide comprising a first functional activity region and a second functional activity region. As used herein, "functional activity" encompasses all types of activities known to those skilled in the art. Examples of functional activities include, but are not limited to, enzymatic activities (e.g., nuclease activity, methylase activity, protease activity, etc), transcriptional regulatory activities (e.g., activation or repression of transcription), cellular localization activities (e.g., nuclear localization signals, cellular compartment localization signals (e.g., chloroplasts)), and binding activities (e.g., DNA binding activities, protein binding activities, etc). In some embodiments, the DNA binding activity may be specific to all or a portion of a DNA target sequence and the DNA binding activity may be referred to as DNA-targeting activity.

[0054] In general, chimeric polypeptides of the invention comprise at least one functional activity region fused to a region of DNA-targeting activity. The regions may be oriented in any order, for example, a region of functional activity may be located N-terminal to the region of DNA-targeting activity or the region of functional activity may be located C-terminal to the region of DNA-targeting activity. Chimeric polypeptides of the invention may comprise a plurality of functional activity regions, at least one of which is a DNA-targeting region. In embodiments of this type, the functional activity regions may be arranged in any order with respect to each other and with respect to the DNA-targeting region. Thus, for a chimeric polypeptide of the invention having two functional activity regions and a DNA-targeting region, the functional activity regions may be located 1) both N-terminal to the DNA- targeting region, 2) both C-terminal to the DNA-targeting region, or 3) one N- terminal and one C-terminal to the DNA-targeting region.

Functional Activity Regions

[0055] In general, any functional activity region that can be fused to a DNA- targeting region and retain activity can be used in the practice of the present invention.

[0056] In some embodiments, a functional activity region may comprise a nuclease activity. Any nuclease activity known to those skilled in the art may be used in the practice of the present invention. Any protein having nuclease activity may be used as a source of the functional activity regions in the chimeric polypeptides of the invention. [0057] In general, any nuclease, or domain of a nuclease that has nuclease activity, that can be fused to a DNA-targeting region and retain the nuclease activity can be used in the practice of the present invention. In some embodiments, the nuclease may function as a monomer. Thus, any site specific nuclease that is functional as a monomer can be used as the source of the nuclease domain for use in the present invention. In one embodiment, the nuclease domain is derived from a homing endonuclease, for example, a homing endonuclease of the GIY-YIG family of homing endonucleases. Other examples of site specific nucleases that cleave double- stranded DNA as monomers include, but are not limited to, Mspl, HinPlI, Mval and Bcnl.

[0058] The present chimeric GIY-YIG endonuclease may comprise a GIY-

YIG nuclease domain from any GIY-YIG homing endonuclease. As used herein, the GIY-YIG nuclease domain is an α/β structure comprising at least about 90-100 amino acids, the amino acid sequence -GIY- spaced from the amino acid sequence -YIG- by 10-1 1 amino acids which forms part of a three- stranded antiparallel β-sheet. Residues that may be important for nuclease activity include a glycine residue within the GIY- YIG motif, an arginine residue about 8-10 residues downstream of the --GIY- sequence (e.g. arginine 27 of I-Tevl), a metal-binding glutamic acid residue such as the glutamic acid at position 75 of I-Tevl and a conserved asparagine about 14-16 residues upstream of the metal-binding glutamic acid residue (asparagine 90 of I- TevI) in the nuclease domain. Examples of suitable GIY-YIG nuclease domains include, but are not limited to, the nuclease portion of I-Bmol (for example, residues 1-92), the full-length amino acid sequence of which is illustrated in Fig. 15A, I-Tevl (for example, at least residues 1 -1 14), the full-length sequence of which is illustrated in Fig. 15B, and I-Tulal (for example, residues 1 -1 14), the full-length sequence of which is illustrated in Fig. 15C.

[0059] As one of skill in the art will appreciate, functionally equivalent variant

GIY-YIG nuclease domains may also be utilized within the present chimeric endonuclease. The term "functionally equivalent" refers to variant nuclease domains which vary from a wild-type or endogenous sequence but which retain nuclease function, even though it may be to a lesser degree. Accordingly, variant GIY-YIG nuclease domains may include one or more amino acid substitutions, deletions or insertions at positions which do not eliminate nuclease activity. Variant nuclease domains may comprise at least about 50% sequence similarity with a native nuclease sequence, at least about 60-70%», or at least about 80%-90% or greater sequence similarity with a native nuclease sequence, to retain sufficient nuclease activity. Examples of variant GIY-YIG nuclease domains include N- or C- terminal truncated GIY-YIG nuclease domains, for example, N-terminal truncations of up to about 20 amino acid residues and C-terminal truncations of up to about 15 amino acid residues, and one or more amino acid substitutions, insertions or deletions which do not adversely affect nuclease activity, for example within the N-terminus up to about the amino acid at position 20 or within the C-terminus from about the amino acid at position 75, and amino acid substitutions within the 10-1 1 amino acid spacer between -GIY- and -YIG-. In this regard, suitable amino acid substitutions include conservative amino acid substitutions, for example, substitution of an amino acid with a hydrophobic side chain with a like amino acid, e.g. alanine, valine, leucine, isoleucine, phenylalanine and tyrosine; substitution of an amino acid with an uncharged polar sidechain with a like amino acid, e.g. serine, threonine, asparagine and glutamine; substitution of an amino acid having a positively charged sidechain with a like amino acid, e.g. arginine, histidine and lysine; or substitution of an amino acid having a negatively charged sidechain with a like amino acid, e.g. aspartic and glutamic acid. Variant GIY-YIG nuclease domains may also include one or more modified amino acids, for example, amino acids including modified sidechain entities which do not adversely affect nuclease activity.

[0060] In some embodiments, nuclease domains derived from: other homing endonucleases, for example, the HNH family of homing endonucleases; restriction enzymes (including other Type IIS enzymes with properties distinct from Fokl); or DNA repair nucleases may be used in the practice of the present invention.

[0061 ] Nucleases or nuclease domains for use in the present invention typically function as a monomer. In some embodiments, the nuclease or nuclease domain will make a double-strand break in DNA. In other embodiments, a nuclease or nuclease domain may only cleave one strand, i.e., may nick one strand of DNA. Such nickases have been shown to induce recombination and gene knockouts in mammalian cells with reduced levels of toxicity relative to double-strand nucleases. In some embodiments, a nuclease or nuclease domain for use in the present invention will have at least a minimal amount of site-specificity. This will help reduce cleave in locations other than the target location. In other embodiments, a nuclease or nuclease domain will be entirely not site specific. This will allow the greatest flexibility in the application of the chimeric polypeptides of the invention.

[0062] In some embodiments, a nuclease or nuclease domain may be derived from an HNH family endonucleases. HNH endonucleases have a two-domain structure similar to GIY-YIG homing endonucleases. The catalytic nuclease domain is located in the N-terminal region of the polypeptide and comprises a catalytic domain defined by the amino acid motif HNH. The C-terminal portion of the polypeptide comprises the DNA-binding domain. HNH endonucleases usually function as nickases (ie. nick one strand of DNA). In one embodiment, the nuclease or nuclease domain for use in the present invention may be derived from the HNH enzyme I-Hmul (Accession:P34081. lGI:465641) the sequence of which is specifically incorporated herein by reference. I-Hmul looks structurally very similar to GIY-YIG enzymes. The HNH domain from I-Hmul can be fused to a DNA- targeting domain to create a targeted ickase enzyme. I-Hmul has sequence specificity at the nicking site. In some embodiments, the specificity of cleavage may be engineered out by altering one or more of the amino acids involved in DNA contact. Such amino acids are known based on a co-crystal of I-Hmul with DNA (Shen et al. JMB 342:43-56, 2004). Other HNH family enzymes may be used as a source of a nuclease domain in the practice of the present invention. Other examples of suitable sources of nuclease activity include colicins that degrade DNA.

[0063] In some embodiments, a nuclease region may be derived from other

Type IIS enzymes. On suitable example is Eco31I (Accession:AAM09638.2, GI:56788324), This is a type IIS enzyme, similar to Fokl. As all Type IIS restriction enzymes, Eco31 I binds a DNA site, but cleaves at a distance from the binding site. Eco311 has a similar domain structure to Fokl, but the C-terminal cleavage domain contains an HNH motif. Interestingly, the enzyme functions as a monomer and makes a double-strand break. (Jakubauskas et al. Biochemistry 47:8546, 2008). Other type IIS enzymes are known to those skilled in the art and may be used in the practice of the present invention. [0064] Non-specific nucleases may also be used in the practice of the invention. Examples include Staphylococcal nuclease. Zinc finger fusions with Staph nuclease have been prepared that that are active (Mineta et al. Biochemistry 47: 12257, 2008).

[0065] DNA-repair nucleases may be used in the practice of the invention.

One skilled in the art is aware of many examples of DNA repair enzymes that nick or make double-stranded breaks (DSBs) in DNA, usually at a site of DNA damage (e.g., an abasic site), or at a specific DNA structure (e.g., cruciform DNA). The nuclease domain from these types of nucleases may be used in the practice of the present invention. Suitable examples include, but are not limited to, UvrC (Accession:ZP_03002418.1, GI: 18849 148) which has a GIY-YIG domain that nicks one strand, MutL ( Accession :P23367.2G1: 127552) a bacterial enzyme and its human homolog PMS2 (Accession:NP_000526.1 , GI:4505913). There are many examples of DNA repair enzymes that recognize DNA mismatches, the so-called AP endonucleases. The endonuclease activity of such enzymes can be used as a source of nuclease activity for the practice of the present invention.

[0066] Minimal nuclease domains may be used as the nuclease activity in the practice of the present invention. Using crystal structures and domain structure studies as a guide, the boundaries of the region of the enzyme having nuclease activity can be identified. Once identified, the region can be cloned using standard techniques and can serve as a nuclease activity.

[0067] Nucleases for use in the present invention may be from any source, for example, archaebacteria, bacteria, viruses, eukaryotes, organelles (e.g., mitochondria and chloroplast) of eukaryotes, plants, algae, fungi, or protozoa. Any source may be used so long as the nuclease activity is functional in the desired target organism, for example, in a eukaryotic cell (e.g., a mammalian cell, a plant cell, etc.).

[0068] In some embodiments, a functional activity region may comprise a

DNA-modifying activity, for example, a DNA methylase activity or a cytosine deaminase activity.

[0069] One example of a functional activity region that may be used in the practice of the present invention is a functional activity region having DNA methylase activity. Any DNA methylase activity known to those skilled in the art may be used in the present inventions. Examples include, but are not limited to, methylase activity that generates N6-methyladenine (EC 2.1.1.72), methylase activity that generates N4- methylcytosine (EC 2.1.1.1 13), and methylase activity that generates C5- methylcytosine (EC 2.1.1.37). Any protein having DNA methylase activity may be used as a source of the functional activity regions in the chimeric polypeptides of the invention. Examples include DNA (cytosine-5) methyltransferase I (Accession NP 001 124295.1 , GI: 195927037); DNA (cytosine-5)-methyltransferase 3A isoform a (Accession:NP_072046.2, GI: 12751473), DNA (cytosine-5)-methyltransferase 3A isoform b (Accession:NP_715640.2, GI:77176455); and DNA (cytosine-5 -)- methyltransferase 3 beta (Accession:AAD53062.1 , GI:5823166), the sequences of which are specifically incorporated herein by reference.

[0070] In some embodiments, the DNA modifying activity that may be used in the practice of the invention is cytosine deaminase activity (E.C. 3.5.4.1). These could be useful for making C:G -> T:A transitions in DNA at specific sites. Any protein having cytosine deaminase activity may be used as a source of the functional activity regions in the chimeric polypeptides of the invention. Examples include E. coli cytosine deaminase (Accession:BAE761 19.1, GI:85674479) and yeast cytosine deaminase (Accession : A AB67713.1, GI:2343114), the sequences of which are specifically incorporated herein by reference.

[0071] In some embodiments, a functional activity region may comprise a transcription regulatory activity. In some embodiments, the activity may be a transcription activation activity. Any transcription activation activity known to those skilled in the art may be used in the practice of the present invention. Any protein having transcription activation activity may be used as a source of the functional activity regions in the chimeric polypeptides of the invention. Examples include HSV VP16 (Accession:AAA45864.1, GI:330320), the p65 subunit of human transcription factor NF-kB (Accession:Q04206.2, GI:62906901), Nicotiana tabacum ERF2 (Accession Q40479.1 , GI:57012759), Nicotiana tabacum ERF4 (Accession Q40477.1, GI:57012757), Arabidopsis thaliana ERF1 (Accession P93835.2 GI:47605622), Arabadopsis thaliana ERF2 (Accession O80338.1. GI:7531 108), and Arabadopsis thaiiana ERF5 (Accession:BAA97157.1 , GI:8809606), the sequences of which are specifically incorporated herein by reference.

[0072] In some embodiments, a functional activity region may comprise a transcription repression activity. Any transcription repression activity known to those skilled in the art may be used in the practice of the present invention. Any protein having transcription repression activity may be used as a source of the functional activity regions in the chimeric polypeptides of the invention. Examples include tobacco Nicotiana tabacum ERF3 (Accession Q9SXS8.1, GI:57012880), Arabadopsis thaiiana ERF3 (Accession 080339.1 GI:7531 109), and Arabadopsis thaiiana ERF4 (Accession Q9FJ93.1, GI:47605744), the sequences of which are specifically incorporated herein by reference.

Linking Domains

[0073] Functional activity regions may be linked by linking domains. In some embodiments, a nuclease domain may be linked to a DNA-targeting domain via a linking domain. Other functional domains, e.g., methylase domains, transcriptional regulatory domains etc, may be linked to a DNA-targeting domain via a linking domain. For example, the GIY-YIG nuclease domain may be linked to a DNA- targeting domain via a linking domain.

[0074] The linking domain will generally be a polypeptide of a length sufficient to permit the nuclease domain to retain nuclease function when linked to the DNA-targeting domain, and sufficient to permit the DNA-binding domain to bind the endonuclease to a target substrate. The linking domain may be from 1 amino acid residue to about 100 amino acid residues, from about 1 amino acid residue to about 90 amino acid residues, from about 1 amino acid residue to about 80 amino acid residues, from about 1 amino acid residue to about 70, from about 1 to about 60 amino acid residues, from about 1 to about 50 amino acid residues, from about 1 to about 40 amino acid residues, from about 1 to about 30 amino acid residues, or from about 1 amino acid residue to about 25 amino acid residues. The linking domain may be 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 amino acid residues in length. [0075] The length of the linker domain may be adjusted depending on the distance between the binding and cleavage sites on a target nucleic acid molecule. By including an appropriately sized linker, chimeric endonucleases of the invention can cleave nucleic acid molecules where the binding and cleavage sites are separated by varying numbers of basepairs.

[0076] The linking domain may be a random sequence, for example, may be one or more glycine residues. The linking domain may be a simple repeat of amino acids, for example, GS, which may be repeated multiple times. As used herein, such a repeat will be indicated by placing the amino acids in parenthesis and using a subscript to indicate the number of times repeated. Thus (GS) 4 indicates a linking domain of four repeats of the amino acids glycine and serine. Similarly, (G 4 S) 3 indicates three repeats of the sequence G-G-G-G-S. In some embodiments, the linker domain may comprise one or more glycine residues in addition to one or more amino acid residues. The linking domain may be from about 10% to about 100%, from about 20% to about 100%, from about 30% to about 100%, from about 40% to about 100%, from about 50% to about 100%, from about 60% to about 100%, from about 70% to about 100%, from about 80% to about 100%, from about 90% to about 100%, or may be 100% glycine. The linking domain may be flexible or may comprise one or more regions of secondary structure that impart rigidity, for example, alpha helix forming sequences. The linking domain may be the endogenous linker associated with the GIY-YIG nuclease, e.g. the linker region of I-Tevl including amino acid residues 93-169, the linker region of I-Bmo-I including amino acids 90-149, or the linker region of I-Tulal including amino acids 93-169. Al tentatively, the linking domain may be unrelated to the nuclease domain, i.e. the I-Tevl linker or portion thereof may be utilized with the I-Bmol or 1-TulaI nuclease regions, or the I-Bmol or I-Tulal linker or portion thereof may be used with the 1-TevI nuclease domain. Various lengths of the nuclease-linker portion of an endonuclease may be utilized, such as the I-Tevl nuclease domain and its linker region from about amino acid residue 1 to about amino acid residue 1 14, from about amino acid residue 1 to about amino acid residue 128, from about amino acid residue 1 to about amino acid residue 141 , from about amino acid residue 1 to about amino acid residue 169, from about amino acid residue 1 to about amino acid residue 170, from about amino acid residue 1 to about amino acid residue 201, from about amino acid residue 1 to about amino acid residue 203, from about amino acid residue 1 to about amino acid residue 206; the I-Bmol nuclease domain and linker from about amino acid residue 1 to about amino acid residue 96, from about amino acid residue 1 to about amino acid residue 1 5, from about amino acid residue 1 to about amino acid residue 125, from about amino acid residue 1 to about amino acid residue 139, from about amino acid residue 1 to about amino acid residue 159, from about amino acid residue 1 to about amino acid residue 221, from about amino acid residue 1 to about amino acid residue 223, from about amino acid residue 1 to about amino acid residue 226; and the I-Tulal nuclease domain and linker from about amino acid residue 1 to about amino acid residue 1 14, and from about amino acid residue 1 to about amino acid residue 169.

[0077] As one skilled in the art will appreciate, the linking domain may be modified from a wild-type or native linking domain sequence. Suitable modifications include one or more amino acid substitutions, deletions or insertions, that do not impact on the function of the endonuclease, i.e. do not eliminate binding of the DNA- targeting domain to its substrate, nor eliminate nuclease activity. The native I-Tevl linker has some DNA sequence preference. Accordingly, the present invention provides modified I-Tevl linkers wherein the sequence of the native protein linker has been modified to change its DNA binding specificity, without affecting nuclease activity, to broaden or reduce targeting potential based on a specific target DNA sequence. Variant linking domains may comprise linking domain sequence to function effectively as a linking domain. Examples of at least about 50% sequence similarity with a native linking domain sequence, at least about 60-70%, and at least about 80%-90% or greater sequence similarity with a native linking domain to function as an effective linking domain. Suitable modifications include truncation of a native linking domain as set out above, and conservative amino acid substitutions as set out with respect to the nuclease domain.

[0078] As one skilled in the art will also appreciate, various linkers can be designed as needed for particular circumstances in accordance with known techniques (see, for example, Fan Xue et al., "LINKER: a web server to generate peptide sequences with extended conformation", Nucl. Acids Res. 32 (supple 2) W562-W565; Ryoichi Arai et al., "Design of the linkers which effectively separate domains of a bifunctional fusion protein", Protein Eng., 14(8): 529-232; and Richard George et al., "An analysis of protein domain linkers: their classification and role in protein folding", Protein Eng., 15(1 1): 871 -879.) For example, suitable linkers can be designed based on specific target sequences, i.e., linkers are designed to have DNA binding activity with a particular sequence(s). Suitable non-specific linkers without DNA binding activity can also be used. Examples of such non-specific linkers include, but are not limited to, linkers that interact non-specifically with the minor groove of DNA. One example of a non-specific linker is a linker having the sequence TG SI RPRAIGGS KPRVAT. This linker interacts with the sugar and phosphate of DNA, i.e., interacts with DNA in a non-specific manner. A person of ordinary skill in the art will recognize that modifications of non-specific linker sequences can be required based on a particular context of use for the linker.

DNA-Targeting Domain

[0079] The DNA-targeting domain may be any suitable domain that binds

DNA in a site-specific manner. Examples of suitable DNA-targeting domains include, but are not limited to, the DNA binding domains of TAL-effector proteins, such as PthXol and AvrBs3 (from Xanthamonas campestris); zinc finger domains, e.g. ryA zinc finger binding domain and ryB zinc finger binding domain, and other distinct DNA-binding platforms, such as the binding domain in LAGLIDADG homing endonucleases, e.g. I-Onul, which have reprogrammable DNA-binding specificity similar to zinc fingers or TAL domains. A functionally equivalent variant binding domain based on a native binding domain, i.e. a binding domain which incorporates sequence modifications but which retains DNA binding activity, may also be utilized in the present chimeric endonuclease. Variant binding domains may comprise at least about 50% sequence similarity with a native binding domain sequence, at least about 60-70%, and at least about 80%-90% or greater sequence similarity with a native binding domain to retain sufficient binding activity. Such a variant binding domain may include one or more of: an N- or C-terminal truncation, one or more amino acid substitutions, deletions or insertions, or modification of an amino acid, for example, modification of an amino acid sidechain entity. The DNA- targeting domain is typically bound at its N-terminal end to the linking domain or to the nuclease domain. In some embodiments, the DNA-targeting domain may be bound at its C-terminal end to the linking domain or to the nuclease domain. One of ordinary skill in the art is capable of using standard techniques to fuse the DNA- targeting domain at either end to a linking domain and/or a nuclease or other functional domain.

[0080] The targeting specificity of the present chimeric G1Y-YIG endonuclease is a function of DNA-targeting domain and may be modified or enhanced by modifying the specificity of the DNA-targeting domain as set out above. Additionally, for example, the specificity of the 3-zinc fmger DNA-targeting domain of ryA or ryB may be enhanced by addition of zinc fingers to generate a 4-, 5-, or 6- zinc finger fusion protein.

[0081 ] In one embodiment, the DNA-targeting domain of a chimeric endonuclease is a TAL domain, or a modified TAL domain. Examples of suitable TAL domains are known in the art, for example US 201 1/0301073 discloses Novel DNA-Binding Proteins and Uses Thereof and is specifically incorporated herein for its teaching of the structure of the DNA binding domain of TAL-effectors (i.e., TAL domain). A TAL domain is generally comprised of a plurality of repeat units that are typically 33 to 35 amino acid residue long segments and the repeats are typically 90- 100% homologous to each other. Suitable repeats include, but are not limited to, those from Xanthomonas, for example,

LTPEQVVAIASNIGG QALETVQALLPVLCQAHG (SEQ ID NO:4), LTPDQ V VAI A SEGGG QALET VQRLLPV LC QAHG (SEQ ID NO:5), and LTPEQVVAIASNIGGKQALETVQRLLPVLCQAHG (SEQ ID NO:6), those from Ralstonia solanacearum, for example,

LTPQQWAIASNTGGKRALEAVCVQLPVLRAAPYR (SEQ ID NO:7), LSTEQWAIASN GG QALEAVKAHLLDLLGAPYV (SEQ ID NO:8) and LDTEQVVAIASHNGG QALEAV ADLLDLRGAPYA (SEQ ID NO:9).

[0082] One suitable repeat sequence is

L(T P)(P/Q)(E/A/D/V)QVVAIASHDGGKQAL(E/A)T(V/M)QRLLPVLCQ(A/D)HG (SEQ ID NO: 10). The amino acid residues at positions 12 and 13 are referred to as a Repeat Variable Diresidue (RVD, residues HD in the sequence above) and determine the nucleic acid residue to which the repeat unit will bind. Thus, by selecting the sequence of RVDs and sequentially connecting repeat units comprising the RVDs, a TAL domain can be constructed that will bind to any desired sequence in the target DNA substrate, e.g. the binding site of the DNA targeting domain. For example, amino acid residues NI correspond to adenine, amino acid residues HD correspond to cytosine, amino acid residues NG correspond to thymine, amino acid residues NN correspond to guanine (and to a lesser degree adenine), amino acid residues NS correspond to A, C, T or G, amino acid residues N* (where * indicates a no amino acid residue) correspond to C or T, and amino acid residues HG correspond to T. Other RVDs are disclosed in US 201 1/0301073 and are specifically incorporated herein by reference. Using the known DNA sequence of a gene, a chimeric endonuclease of the invention may be constructed specific to any gene locus. Examples of suitable gene loci include, but are not limited to, NTF3, VEGF, CCR5, IL2Ry, BAX, BA , FUT8, GR, DHFR, CXCR4, GS, Rosa26, AAVS 1 (PPP1 R1 2C), MHC genes, P1TX3, ben-1, Pou5 F 1, (OCT4), CI, RPD1 , and any other genes known to those skilled in the art.

[0083] A TAL domain may be constructed by fusing a plurality of repeat units. Any number of repeat units may fused to create a TAL domain, for example, from about 5 repeat units to about 30 repeat units, from about 5 repeat units to about 25 repeat units, from about 5 repeat units to about 20 repeat units, from about 5 repeat units to about 1 repeat units, or from about 5 repeat units to about 10 repeat units, from about 7.5 repeat units to about 30 repeat units, from about 7.5 repeat units to about 25 repeat units, from about 7.5 repeat units to about 20 repeat units, from about 7.5 repeat units to about 15 repeat units, or from about 7.5 repeat units to about 10 repeat units.

[0084] In some embodiments, a TAL domain of the invention may comprise

5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 repeat units. In a given TAL domain, the repeat units typically share a high degree of homology. Thus, any two repeat units in a given TAL domain may be from about 75% to about 100%, from about 80% to about 100%, from about 85% to about 100%, from about 90% to about 100%, from about 91% to about 100%, from about 92% to about 100%, from about 93% to about 100%, from about 94% to about 100%, from about 95% to about 100%, from about 96% to about 100%, from about 97% to about 100%, from about 98% to about 100%, or from about 99% to about 100% , from about 75% to about 95%, from about 80% to about 95%, from about 91 % to about 95%, from about 92% to about 95%, from about 93% to about 95%, from about 75% to about 90%, from about 80% to about 90%, from about 82% to about 90%, from about 84% to about 90%, from about 86% to about 90%, or from about 88% to about 90%, identical with each other.

[0085] TAL domains of the invention may also comprise one or more half repeats that are typically on either the N-terminal, the C-terminal, or on both the island C-terminals of the TAL domain. In other embodiments, at least one repeat unit is modified at some or all of the amino acids at positions 4, 11, 12, 13 or 32 within the repeat unit. In some embodiments, at least one repeat unit is modified at 1 or more of the amino acids at positions 2, 3, 4, 1 1 , 12, 13, 21, 23, 24, 25, 26, 27, 28, 30, 31,32, 33, 34, or 35 within one repeat unit.

[0086] TAL domains can be constructed to bind to a predetermined nucleotide sequence, for example via engineering (altering one or more amino acids) of the hypervariable diresidue region, for example positions 12 and/or 13 of a repeat unit within a TAL protein. The amino acids at positions 4, 1 1, and 32 can also be engineered. A typical RVDs can also be selected for use in an engineered TAL protein, enabling specification of a wider range of non-natural target sites. For example, a NK RVD can be selected for use in recognizing a G nucleotide in the target sequence. Amino acids in the repeat unit can be altered to change the characteristics (i.e., stability or secondary structure) of the repeat unit. Engineered TAL proteins can be proteins that are non-natural ly occurring. The genes encoding TAL repeat domains can be engineered at the DNA level such that the codons specifying the TAL repeat amino acids are altered, but the specified amino acids are not (e.g., via known techniques of codon optimization). Examples of engineered TAL proteins include, but are not limited to, those obtained by design and/or selection. A designed TAL protein can be a protein not occurring in nature whose design/composition results principally from rational criteria. Rational criteria for design can include application of substitution rules and computerized algorithms for processing information in a database storing information of existing TAL designs and binding data. A selected TAL domain can be a non-naturally occurring or atypical domain whose production results primarily from an empirical process such as phage display, interaction trap, or hybrid selection.

[0087] TAL domains can be derived from any suitable TAL protein.

Examples of TAL proteins include, but are not limited to, TAL proteins derived from Ralstonia spp. or Xanthamonas spp.. The DNA-targeting domain can comprise one or more one or more naturally occurring and/or engineered TAL domain units derived from the plant pathogen Xanthomonas. The DNA-targeting domain can comprise one or more naturally occurring and/or engineered TAL domain units derived from the plant pathogen Ralstonia solanacearum, or other TAL DNA binding domain from the TAL protein family. The TAL DNA binding domains as described herein (comprising at least one TAL repeat unit) can include (i) one or more TAL repeat units not found in nature; (ii) one or more naturally occurring TAL repeat units; (iii) one or more TAL repeat units with atypical RVDs; and combinations of (i), (ii) and/or (iii). A TAL DNA binding domain as described herein can consist of completely non- naturally occurring or atypical repeat units. Furthermore, in polypeptides as described herein comprising two or more TAL domain units, the TAL domain units (naturally occurring or engineered) may be derived from the same species or alternatively, may be derived from different species.

[0088] The target sites useful can be subject to evaluation by other criteria or can be used directly for design or selection (if needed) and production of a TAL- fusion protein specific for such a site. A further criterion for evaluating potential target sites can be their proximity to particular regions within a gene. Target sites can be selected that do not necessarily include or overlap segments of demonstrable biological significance with target genes, such as regulatory sequences. Additional criteria for further evaluating target segments can include prior availability of TAL- fusion proteins binding to such segments or related segments, and/or ease of designing new TAL- fusion proteins to bind a given target segment.

[0089] After a target segment has been selected, a TAL- fusion protein that binds to the segment can be provided by a variety of approaches. Once a TAL-fusion protein has been selected, designed, or otherwise provided to a given target segment, the TAL-fusion protein or the DNA encoding can be synthesized. The TAL-fusion protein or a polynucleotide encoding it can then be used for modulation of expression, or analysis of the target gene containing the target site to which the TAL-fusion protein binds.

[0090] Suitable modified TAL domains may include one or more amino acid deletions, insertions or substitutions which do not eliminate the DNA binding activity thereof, for example, modifications at one or more amino acid residues other than amino acid residues at position 12 and 13, such as those indicated with multiple amino acid residues in parenthesis in the above sequence. Other proteins having TAL domains can be used to identify suitable repeats that can be used to construct a DNA- targeting domaim. Examples include, but are not limited to, Avrb6 from Xanthomonas citri subsp. Malvacearum GenBank accession number AAB00675.1 , PthN from Xanthomonas campestris GenBank accession numberAAB69865.1 , PthA from Xanthomonas citri GenBank accession number AAC43587.1, avirulence protein from Xanthomonas oryzae pv. Oryzae GenBank accession number AAF98343.1 , AvrXa7 Xanthomonas oryzae pv. Oryzae GenBank accession number AAG02079.2, AvrXa3 from Xanthomonas oryzae pv. Oryzae GenBank accession number AAN01357.1 , AvrXa5 from Xanthomonas oryzae pv. Oryzae GenBank accession number AAQ79773.2, PthXo3 from Xanthomonas oryzae pv. Oryzae GenBank accession number AAS46027.1, and PthXo4 from Xanthomonas oryzae pv. Oryzae GenBank accession number AAS58127.2. Additional examples include, but are not limited to, PthB from Xanthomonas xonopodis pv. Manihotis GenBank accession number AADO 1494.1 , PthB from Xanthomonas cit ri GenBank accession number AAO72098, avirulence protein AvrXa7-#38 from Xanthomonas oryzae pv. Oryzae GenBank accession number AAS58128.2, avirulence protein from Xanthomonas oryzae pv. Oryzae GenBank accession number AAS58129.3, avirulence protein AvrXa7-sacB50 from Xanthomonas oryzae pv. Oryzae GenBank accession number AAS58130.3, avirulence protein AvrXa7-2M from Xanthomonas oryzae pv. Oryzae GenBank accession number AAT46123.1 , avirulence protein AvrXa7-3M from Xanthomonas oryzae pv. Oryzae GenBank accession number AAT46124.1, Avr/Pthl 3 from Xanthomonas oryzae pv. Oryzicola GenBank accession number AAW59491.1, Avr/Pth3 from Xanthomonas oryzae pv. Oryzicola GenBank accession number AAW59492.1 , Avr/Pthl4 from Xanthomonas oryzae pv. Oryzicola GenBank accession number AAW59493.1, avirulence protein from Xanthomonas oryzae pv. Oryzae KACC 10331 GenBank accession number AAW77510.1, Hax2 from Xanthomonas campestris pv. Armoraciae GenBank accession number AAY43358, Hax3 from Xanthomonas campestris pv. Armoraciae GenBank accession number AAY43359.1 , Hax4 from Xanthomonas campestris pv. Armoraciae GenBank accession number AAY43360.1, R19.5 from Xanthomonas oryzae pv. Oryzae GenBank accession number AAY54166.1, AvrXa27 from Xanthomonas oryzae pv. Oryzae GenBank accession number AAY54168.1, R13.5 from Xanthomonas oryzae pv. Oryzae GenBank accession number AAY54169.1, R23.5 from Xanthomonas oryzae pv. Oryzae GenBank accession number AAY54170.1 , PthXo7 from Xanthomonas oryzae pv. Oryzae GenBank accession number ABB70129.1 , PthXo6 from Xanthomonas oryzae pv. Oryzae GenBank accession number ABB70183.1, and PthAW from Xanthomonas citri pv. Citri GenBank accession number AB077779.1. The sequence of each of these proteins is specifically incorporated herein by reference.

[0091 ] Proteins from Ralstonia solanacearum having TAL domains can be used to identify suitable repeats that can be used to construct a DNA-targeting domain. Examples include, but are not limited to, AvrBs3-like effector from Ralstonia solanacearum GenBank accession number ABO27067.1 , AvrBs3-like effector from Ralstonia solanacearum GenBank accession number ABO27068.1, AvrBs3-like effector from Ralstonia solanacearum GenBank accession number ABO27069.1, AvrBs3-like effector from Ralstonia solanacearum GenBank accession number ABO27070.1 , AvrBs3-like effector from Ralstonia solanacearum GenBank accession number ABO27071.1 , and AvrBs3-like effector from Ralstonia solanacearum GenBank accession number ABO27072.1.

[0092] The DNA-targeting domain can be constructed to target non-nuclear

DNA. TAL-effectors can include a nuclear localization sequence (NLS) for localization to a eukaryotic nucleus. For example, when the DNA-targeting domain is a TAL domain, a TAL-effector protein can be constructed to include a localization sequence for targeting cellular components other than the nucleus. For example, a TAL-effector can be fused to a targeting sequence that targets mitochondria and/or chloroplasts. Use of this targeting domain can allow modification of a mitochondrial genome or a plastid genome.

TAL Flanking Sequences

[0093] In addition to the repeat units described above, a TAL domain of the invention may also comprise flanking sequences at the N- and/or C-terminal of the TAL domain. The flanking sequences may be of any length that does not interfere with the DNA-binding of the TAL domain. Flanking sequences may be from about 1 amino acid residue to about 300 amino acid residues, from about 1 amino acid residue to about 250 amino acid residues, from about 1 amino acid residue to about 200 amino acid residues, from about 1 amino acid residue to about 150 amino acid residues, from about 1 amino acid residue to about 125 amino acid residues, from about 1 amino acid residue to about 100 amino acid residues, from about 1 amino acid residue to about 75 amino acid residues, from about 1 amino acid residue to about 50 amino acid residues, from about 1 amino acid residue to about 40 amino acid residues, from about 1 amino acid residue to about 30 amino acid residues, from about 1 amino acid residue to about 20 amino acid residues, or from about 1 amino acid residue to about 10 amino acid residues. The flanking sequences may be of any amino acid sequence. In some embodiments, the flanking sequences may be derived from the naturally occurring sequence of a TAL-effector protein, which may be the same or different TAL-effector protein from which the repeat units are derived. Thus, the present invention encompasses TAL domains comprising repeat units having an amino acid sequence found in a first TAL-effector protein and one or more flanking sequences found in a second TAL-effector protein. One suitable source for flanking sequences is amino acid residues 130 to 416 of SEQ ID NO: 101 which is the N- terminal flanking region of PthXol (Figure 7A). In some embodiments, a flanking sequence may comprise all or a part of amino acid residues 130 to 416 of SEQ ID NO: 101. For example, a flanking sequence may comprise from about amino acid residue 150 to about amino acid residue 416, from about amino acid residue 175 to about amino acid residue 416, from about amino acid residue 200 to about amino acid residue 416, from about amino acid residue 225 to about amino acid residue 416, from about amino acid residue 250 to about amino acid residue 416, from about amino acid residue 275 to about amino acid residue 416, from about amino acid residue 300 to about amino acid residue 416, from about amino acid residue 325 to about amino acid residue 416, from about amino acid residue 350 to about amino acid residue 416, from about amino acid residue 375 to about amino acid residue 416, or from about amino acid residue 400 to about amino acid residue 416. For example, the fusion protein described herein can begin at the T120 N-terminal fusion point of PthXol and can end at residue PI 1 1 8 at the C-terminus. In some embodiments, a flanking sequence may have sequence identity with one or more of the flanking sequence above. For example, a flanking sequence may comprise a sequence that is from about 80% to about 100% identical to the sequence of from about amino acid residue 350 to about amino acid residue 416, from about 85% to about 100% identical, from about 90% to about 100% identical, from about 95% to about 100% identical, from about 80% to about 95% identical, from about 80% to about 90% identical, or from about 80% identical to about 85% identical. A flanking sequence may comprise a sequence that is from about 80% to about 100% identical to the sequence of from about amino acid residue 300 to about amino acid residue 416, from about 85% to about 100% identical, from about 90% to about 100% identical, from about 95% to about 100% identical, from about 80% to about 95% identical, from about 80% to about 90% identical, or from about 80% identical to about 85% identical. A flanking sequence may comprise a sequence that is from about 80% to about 100% identical to the sequence of from about amino acid residue 250 to about amino acid residue 416, from about 85% to about 100% identical, from about 90% to about 100% identical, from about 95% to about 100% identical, from about 80% to about 95% identical, from about 80% to about 90% identical, or from about 80% identical to about 85% identical.

Additional Functional Domains

[0094] Chimeric endonucleases of the invention may optionally comprise one or more functional domains. Suitable functional domains include, but are not limited to, transcription factor domains (activators, repressors, co-activators, co-repressors) , additional nuclease domains, silencer domains, oncogene domains (e.g., myc, jun, fos, myb, max, mad, rel, ets, bcl, myb, mos family members etc.); DNA repair enzymes and their associated factors and modifiers; DNA rearrangement enzymes and their associated factors and modifiers; chromatin associated proteins and their modifiers (e.g. kinases, acetylases and deacetylases); and DNA modifying enzymes (e.g., methyltransferases, topoisomerases, helicases, ligases, kinases, phosphatases, polymerases, endonucleases), DNA targeting enzymes such as transposons, integrases, recombinases and resolvases and their associated factors and modifiers, nuclear hormone receptors, and ligand binding domains.

[0095] Chimeric polypeptides of the invention can include further modifications to control activity in various cell types. For example, a chimeric polypeptide can be fused to a sequence that promotes degradation. Examples of such sequences include, but are not limited to, a DEATH domain sequence, a sequence known to be ubiquitinated to promote degradation, a protein kinase sequence that when phosphorylated promotes degradation. A further example is to construct a chimeric polypeptide having a self-splicing intein in the middle. An intein is a protein segment that can excise itself and rejoin the remaining protein portions with a peptide bond during protein splicing. Inteins can be engineered to be redox sensitive (as described in, for example, Callahan BP et al., "Structure of catalytically competent intein caught in a redox trap with functional and evolutionary implications", Nat. Struct. ol. Biol., 18(5) 630-3.) Thus, redox conditions in the cell can control splicing of the intein and ligation of the N- and C-terminal domains of the chimeric polypeptide. A similar example is to insert an intron into the TAL coding region, where the intron can be spliced under certain cellular conditions and/or in specific cell types.

[0096] A chimeric polypeptide of the invention may comprise one or more nuclear localization signals or other amino acid sequences that direct the distribution of the c himeric polypeptide in cell. Suitable nuclear localization sequences are known in the art. Examples include, but are not limited to, the nucleoplasmin NLS RX, 0 KK L (SEQ ID NO: 1 1 ) (Moore JD,J Cell Biol. 1999 Jan 25; 144,213-24), the SV40 LargeT antigen NLS P R V (SEQ ID NO: 12) ( alderon D.,Cell., 1984,39,499-509), the BRCA I NLS PKKNRLRRP (SEQ ID NO: 13) (Chen CF,J.Biol.Chem.1996,271 ,32863-32868) and the c-myb NLS PLL KI Q (SEQ ID NO: 14) (Dang and Lee,J Biol Chem, 1989,264, 18019).

[0097] A chimeric polypeptide of the invention may comprise one or more cellular localization domains. In one embodiment, a cellular localization domain may be a transit peptide, e.g., a chloroplast transit peptide. Suitable chloroplast transit peptides are known in the art. In one embodiment, the transit peptide may be an algal chloroplast transit peptide. Sutable examples include, but are not limited to, those from Chlamydomonas reinhardtii disclsoed in Franzen FEBS 260: 165- 168, 1990 the sequences of which are specifically incorporated herein by reference.

Chimeric Polypeptides of the invention

[0098] In one embodiment, the present invention provides novel chimeric endonucleases that can be engineered to cleave virtually any nucleic acid molecule at a desired site. This is accomplished by selecting the desired binding and cleaving domains and using recombinant DNA techniques to construct a fusion protein comprising the selected domains. Thus, chimeric endonucleases invention are capable of creating double-stranded breaks in DNA molecule, for example, in the genome of an organism. Double-stranded breaks thus created may be used, for example, to induce targeted mutagenesis, induce targeted deletions of cellular DNA sequences, and facilitate targeted recombination at a predetermined chromosomal locus. See, for example, United States Patent Publications 20030232410; 20050208489; 20050026157; 20050064474; 20060188987; 20060063231 ; 20070218528; 20070134796; 20080015164 and International Publication Nos. WO 07/014,275 and WO 2007/ 139982, the disclosures of which are specifically incorporated herein by reference in their entireties.

[0099] As an example, a novel chimeric endonuclease has now been developed comprising a GIY-YIG nuclease domain which is linked to a DNA- targeting domain by a linking domain. Unlike chimeric endonucleases of the prior art, for example, TALENs comprising the Fokl nuclease domain, chimeric endonucleases of the present invention are capable of cleaving DNA as monomers. This allows greater flexibility in construction and ease in use as compared to the chimeric endonucleases of the prior art. Chimeric endonucleases of the invention will be particularly useful for in vivo applications as they do not require dimerization in situ to be effective.

[0100] Examples of chimeric endonucleases include, but are not limited to,

Tevl nuclease linked to PthXol TAL DNA targeting domain, I-Tevl nuclease linked to ryA or ryB zinc finger DNA targeting domain, I-Tevl nuclease linked to Onul DNA targeting domain, I-Bmol nuclease linked to PthXol TAL DNA targeting domain, I-Bmol nuclease linked to ryA or ryB zinc finger DNA targeting domain, I- Tulal linked to ryA or ryB zinc finger DNA targeting domain, Tula linked to a PthXol TAL DNA-targeting domain, and Tula linked to the I-Onul targeting domain. Nucleases may be linked via a linking domain as described above, either the linking domain native to the nuclease or derived from the linking domain native to the nuclease, or a linking domain of a different nuclease or derived from a different nuclease, or a linking domain comprising a random sequence.

[0101] Artificial TAL-effector proteins and TAL-effector fusion proteins can be produced to bind to a target sequence using natural or engineered TAL repeat units (see Boch et al, and Morbitzer et al, (2010) Proc. Natl. Acad. Sci. USA 107(50) :2 617-21622). See also, e.g., WO 2010/079430. When this target sequence is inserted upstream of a reporter gene in plant cells, it is possible to demonstrate activation of the reporter gene. Artificial TAL-effector fusions comprising the Fokl cleavage domain can also cleave DNA in living cells (see Christin et al, , Li et al (2011a) and (2011b), Cernak et al (201 1 ) Nucl. Acid. Res. epub doi: 10.1093/nar/gcr218.

[0102] An engineered TAL-effector protein and TAL-effector fusion protein can have a novel binding specificity, compared to a naturally-occurring TAL-effector protein. Engineering methods include, but are not limited to, rational design and various types of selection. Rational design includes, for example, using databases comprising nucleotide sequences for modules for single or multiple TAL-effector repeats. Exemplary selection methods, including phage display and two-hybrid systems, are disclosed in US Patents 5,789,538; 5,925,523; 6,007,988; 6,013,453; 6,410,248; 6, 140,466; 6,200,759; and 6,242,568; as well as WO 98/37186; WO 98/53057; WO 00/27878; WO 01/88197 and GB 2,338,237. In naturally occurring TAL-effector proteins, only a limited repertoire of potential dipeptide motifs are typically employed. Thus, as described herein, TALE related domains containing all possible mono- and dipeptide sequences have been constructed and assembled into candidate TAL-effector proteins. Thus, in certain embodiments, one or more TAL- effector repeat units of the DNA-binding protein comprise atypical RVDs.

[0103] Additionally, in naturally occurring TAL-effector proteins of the same species, the repeat units often show little variability within the framework sequence (i.e. the residue(s) not involved in direct DNA contact (non-RVD residues). This lack of variability may be due to a number of factors including evolutionary relationships between individual TALE repeat units and protein folding requirements between adjacent repeats. Between differing phytopathogenicbacterial species however the framework sequences can vary. For example, the TAL-effector repeat sequences in the Xanthomonas campestris pv vesicatoria, the protein AvrBs3 has less than 40% homology with brgl 1 and hpxl 7 repeat units from Ralstonia solanacearum (see Heuer et al (2007) Appl Environ Micro 73 (13): 4379-4384). The TAL-effector repeat may be under stringent functional selection in each bacterium's natural environment, e.g., from the sequence of the genes in the host plant that the TAL- effector regulates. Variants in the TAL-effector framework (e.g., within the TAL- effector repeat unit or sequences outside the repeat units such as N-cap and C-cap sequences) can be introduced by targeted or random mutagenesis by various methods known in the art, and the resultant TAL-effector fusion proteins screened for optimal activity.

[0104] Selection of target sites and methods for design and construction of fusion proteins (and polynucleotides encoding same) are known to those of skill in the art and described in detail in U.S. Patent Application Publication Nos. 20050064474 and 20060188987, incorporated by reference in their entireties herein.

[0105] Artificial fusion proteins linking TAL-effector DNA binding domains to zinc finger DNA binding domains can also be produced. These fusions may also be further linked to a desired functional domain.

[0106] In addition, TAL-effector DNA binding domains and/or zinc finger domains may be linked together using any suitable linker sequences, including for example, linkers of 5 or more amino acids in length (e.g., TGEKP, TGGQRP, TGQKP, and/or TGSQKP), although it is likely that sequences that can function as flanking sequences N-terminal and C-terminal of the TAL domain) would be required at the interface between the TAL-effector repeat domain and the linker. Thus, when linkers are used, linkers of five or more amino acids can be used in conjunction with the flanking sequences to join the TAL-effector DNA binding domains to a desired fusion partner domain. See, also, U.S. Patent Nos. 6,479,626; 6,903,185; and 7, 153,949 for exemplary linker sequences 6 or more amino acids in length. In addition, linkers between the TAL-effector repeat domains and the fused functional protein domains can be constructed to be either flexible or positionally constrained to allow for the most efficient genomic modification. Linkers of varying lengths and compositions may be tested.

Preparation of the chimeric polypeptides of the invention

[0107] The present chimeric peptides may be made using well-established peptide synthetic techniques, for example, FMOC and t-BOC methodologies. In addition, polynucleotides disclosed herein, for example, DNA substrates and DNA encoding the present chimeric endonucleases may also be made based on the known sequence information using well-established techniques. Peptides and oligonucleotides are also commercially available.

[0108] Recombinant technology may also be used to prepare the chimeric endonuclease. In this regard, a DNA construct comprising DNA encoding the selected nuclease, linking domain (if present), DNA-targeting domain, and any functional domains if present may be inserted into a suitable expression vector which is subsequently introduced into an appropriate host cell (such as bacterial, yeast, algal, fungal, insect, plant and mammalian) for expression. Such transformed host cells are herein characterized as having the chimeric endonuclease DNA incorporated "expressibly" therein. Suitable expression vectors are those vectors which will drive expression of the inserted DNA in the selected host. Typically, expression vectors are prepared by site-directed insertion of a DNA construct therein. The DNA construct is prepared by replacing a coding region, or a portion thereof, within a gene native to the selected host, or in a gene originating from a virus infectious to the host, with the endonuclease construct. In this way, regions required to control expression of the endonuclease DNA, which are recognized by the host, including a promoter and a 3' region to terminate expression, are inherent in the DNA construct. To allow selection of host cells stably transformed with the expression vector, a selection marker is generally included in the vector which takes the form of a gene conferring some survival advantage on the transformants such as antibiotic resistance. Cells stably transformed with endonuclease DNA-containing vector are grown in culture media and under growth conditions that facilitate the growth of the particular host cell used. One of skill in the art would be familiar with the media and other growth conditions.

[0109] Chimeric endonucleases of the invention comprising a TAL domain may be constructed using techniques well known in the art. One suitable protocol is found in Sanjana Nature Protocols 7: 171-192 (2012) which is specifically incorporated herein by reference. To prepare a TAL domain, nucleic acid encoding each desired repeat unit may be amplified with ligation adapters that uniquely specify the position of the repeat unit in the TAL domain to create a library that can be reused. Appropriate amplification products may be ligated together into hexamers and then amplified by PCR. The hexamers may be assembled into a suitably prepared plasmid background, for example, using a Golden Gate digestion-ligation. The plasmid backbone may contain a negative selection gene, for example, ccdB, which selects against empty plasmid. The plasmid may be constructed to contain coding sequence for one or more flanking sequences such that insertion of the coding sequence for the TAL domain will be in frame with the flanking sequences resulting in TAL domain comprising flanking sequences. The TAL domain coding sequences, optionally with flanking sequences, can then be combined with the nuclease coding sequences and any other desired coding sequences, for example, nuclear localization sequences (NLS), using standard techniques.

Assays

[01 10] The utility of a chimeric endonuclease in accordance with the invention may be confirmed using a DNA substrate designed for the endonuclease. The DNA substrate will include suitable counterpart regions to the nuclease, linking and DNA-targeting domains of the endonuclease. Thus, the substrate will include a cleavage motif of the nuclease domain, a DNA spacer that correlates with the linking domain and a binding site for the DNA-targeting domain. For example, for a chimeric endonuclease including the I-Tevl nuclease domain, at least a portion of the I-Tevl linker as the linking domain and the DNA-targeting domain of a zinc finger (e.g. of ryA or ryB), a suitable substrate will include a cleavage motif of I-Tevl (5 '- CNNNG-3'), a binding site for the selected zinc finger and a DNA spacer that connects the two and which is compatible with the I-Tevl linker to permit interaction between the nuclease and the substrate. It will be appreciated that the substrate may incorporate a native cleavage motif or may incorporate a cleavage motif derived from the native cleavage motif, i.e. somewhat modified from the native cleavage motif while still recognized and cleaved by the nuclease. The binding site for the DNA- targeting domain may similarly be a native sequence, or may be modified without loss of function. Between the cleavage motif and the binding site for the DNA-targeting domain there may be a DNA spacer. The DNA spacer will be of a size that permits binding of the endonuclease DNA-targeting domain to the substrate binding site, and nuclease access to the cleavage motif. Generally the DNA spacer that links the cleavage motif to the binding site may comprise about 10 to about 30 base pairs, and typically comprises about 15-25 base pairs. The length of the DNA spacer may be adjusted depending on the length of the linker domain and any flanking sequences present in the chimeric endonuclease of the invention. For applications where a chimeric endonuclease of the invention is to target a DNA in a ceil, it is not possible to adjust the DNA spacer length. Instead, the length of the linker may be adjusted such that, upon binding of the DNA-targeting domain to the DNA, the nuclease domain is brought into proximity with the cleavage site.

[011 1] A given DNA substrate is useful in a method of determining the activity of its corresponding chimeric endonuclease. In this regard, the DNA substrate may be utilitized as pair of complementary olignucleotides annealed together, which may be detectably labeled, e.g. radtoactively labeled. To assay for the activity of a selected chimeric endonuclease, the endonuclease is incubated with its substrate under conditions suitable to permit binding of the endonuclease DNA targeting domain to the binding site on the substrate, and subsequent nuclease cleavage at the cleavage site. Cleavage of the substrate can then be determined using well-established techniques, for example, polyacrylamide gel electrophoresis.

[01 12] Alternatively, the DNA substrate may be incorporated within a vector for use in an assay to determine endonuclease activity. In one embodiment, a cell- based bacterial Escherichia coli two-plasmid genetic selection system may be utilized to determine whether or not the chimeric endonuclease can cleave the target cleavage site. The DNA encoding the chimeric endonuclease is incorporated and expressed from one plasm id of the system, and the target DNA substrate is incorporated and expressed from the second plasmid. The target substrate plasmid also encodes a toxin, such as a DNA gyrase toxin. If the expressed endonuclease cleaves the target site, the toxin will not be expressed and the cells, e.g. bacterial cells such as E. coli cells, will survive when plated on selective solid media plates. If the endonuclease cannot cleave the target site, the toxin will be expressed and the cells will not survive on selective media plates. The percentage survival for each combination of fusion and target site is simply the ratio of survival on selective to non-selective plates.

[01 13] In another embodiment, a yeast-based assay is provided which utilizes detectable enzyme activity, e.g. beta-galactosidase activity as a readout of endonuclease activity. The lacZ gene is disrupted and partially duplicated in a first plasmid. The DNA substrate is cloned in between the lacZ gene fragments. Cleavage of the substrate by the endonuclease (expressed from a second plasmid) initiates DNA repair and generation of a functional LacZ protein (and beta-gal ctosidase activity).

[01 14] In another embodiment, a mammalian cell-based assay is provided which utilizes detectable activity, e.g. the fluorescence of green fluorescent protein (GFP), as a readout of endonuclease activity. The GFP gene is disrupted and partially duplicated in a first plasmid. The DNA substrate is cloned in between the GFP gene fragments. Cleavage of the substrate by the endonuclease (expressed from a second plasmid) initiates DNA repair and generation of a functional GFP and fluorescence can be detected.

Methods

[01 15] The present invention also provides methods for detection of the presence or absence of single nucleotide polymorphisms in a target DNA. In some embodiments, chimeric endonucleases of the invention comprise a nuclease domain that recognizes a 5'CNNNG3' cleavage motif and do not cleave, or cleave at a much reduced level, DNA sequences in which this motif has been altered. See Figure 3e. As shown in Figure 1 1 , the motif is prevalent in human cDNA sequences. Where one allele of a SNP comprises a functional motif and other alleles have a non-functional motif, this difference in reactivity can be used to identify which allele is present in a given sample. This could be useful for high throughput SNP screening for specific disease causing alleles,

[01 16] Thus, in a further embodiment of the invention, a kit comprising a chimeric endonuclease and a DNA substrate therefor is provided. Alternatively, a kit including a chimeric endonuclease-encoding plasmid and a substrate-encoding plasmid that expresses a cleavage-dependent marker, or that results in cleavage- dependent cell survival. In some embodiments, kits of the invention may comprise a second plasmid with reporter gene and the DNA binding motif - optimized DNA spacer - and cleavage site. In combination with a chimeric endonuclease of the invention such a plasmid may be used to identify optimized endonuclease - linker - DNA binding domain constructs. In some embodiments, plasmids in kits of the invention may comprise one or more multicloning sites (MCS) that may be disposed in such a fashion as to permit rapid exchange of nuclease and/or DNA targeting domains. For example, a plasmid may contain MCS-universal linker-MCS, In some embodiments, kit of the invention may comprise a plasmid encoding an I-TevI-TAL domain chimeric endonuclease. A chimeric endonuclease thus encoded may comprise a linker domain disposed between the nuclease and DNA-targeting domain as well as one or more other functional domains, for example, nuclear localization signals, disposed at either the N-or C-terminal or both.

[01 7] The present chimeric GIY-YIG endonucleases are active in vivo and in vitro, function as monomers, and retain the cleavage specificity associated with the parental GIY-YIG nuclease domain. The G1Y-YTG nuclease domain is shown to be a viable alternative to the Fokl nuclease domain for genome editing applications.

[01 18] The present invention provides materials and methods for manipulating the genome of a target organism, for example, by disabling one or more genes and/or by changing the nucleic acid sequence of the gene. As used herein, a gene includes a DNA region encoding a gene product (which may be a protein or an RNA), as well as all DNA regions which regulate the production of the gene product which may include, but are not limited to, one or more of promoter sequences, terminators, translational regulatory sequences such as ribosome binding sites and internal ribosome entry sites, enhancers, silencers, insulators, boundary elements, replication origins, matrix attachment sites and locus control regions.

[0119] Methods of the invention typically include introducing one or more chimeric endonucleases and/or nucleic acid molecules encoding such chimeric endonucleases, into one or more cells, which may be isolated or may be part of an organism. Any method of introducing known to those skilled in the art may be used. Examples include direct injection of DNA and/or RNA encoding chimeric endonucleases of the invention, transfection, electroporation, transduction, bombardment, lipofection and the like. Suitable cells include, but are not limited to, eukaryotic and prokaryotic cells. Cells may be cultured cell lines or primary cells. Primary cells will typically be used when it is desired to modify the cell and reintroduce it into the organism from which it was derived. Cells may be from any type of organism, for example, may be mammalian cells, plant cells, algal cells, insect cells, or fungal cells. Suitable types of cell include, but are not limited to, stem cells (e.g., embryonic stem cells, induced pluripotent stem cells, hematopoietic stem cells, neuronal stem cells, mesenchymal stem cells, muscle stem cells and skin stem cells), In some embodiments, the cells used in the methods of the invention may be plant cells. In addition to the methods of introducing nucleic acids into cells described above, DNA constructs encoding chimeric endonucleases of the invention may be introduced into plant cells using Agrobacterium tumefaciens-mediated transformation. Suitable plant cells include, but are not limited to, cells of monocotyledonous (monocots) or dicotyledonous (dicots) plants, plant organs, plant tissues, and seeds. Examples of plant species of interest include, but are not limited to, corn or maize (Zea mays), Brassica sp. (e.g., B.napus, B. rapa, B. j ncea), particularly those Brassica species useful as sources of seed oil, alfalfa (Medicago sativa), rice (Oryza sativa), rye (Secale cereale), sorghum (Sorghum bicolor, Sorghum vulgare), millet (e.g., pearl millet (Pennisetum glaucum), proso millet (Panicum miliaceum), foxtail millet (Setaria italica), finger millet (Eleusine coracana)), sunflower (Helianthus annuus), safflower (Carthamus tinctorius), wheat (Triticum aestivum, T Turgidum ssp. durum), soybean (Glycine max), tobacco (Nicotiana tabacum), potato (Solanum tuberosum), peanuts (Arachis hypogaea), cotton (Gossypium barbadense, Gossypium hirsutum), sweet potato (Ipomoea batatus), cassava (Manihot esculenta), coffee (Coffea spp.), coconut (Cocos nucifera), pineapple (Ananas comosus), citrus trees (Citrus spp.), cocoa (Theobroma cacao), tea (Camellia sinensis), banana (Musa spp. ), avocado (Per sea americana), fig (Ficus casica), guava (Psidium guajava), mango (Mangifera indica), olive (Olea europaea), papaya (Carica papaya), cashew (Anacardium occidentale), macadamia (Macadamia integrifolia), almond (Prunus amygdalus), sugar beets (Beta vulgaris), sugarcane (Saccharum spp.), oats, barley, vegetables, ornamentals, and conifers. In some embodiments, plants for use in methods of the present invention are crop plants (for example, sunflower, Brassica sp., cotton, sugar beet, soybean, peanut, alfalfa, safflower, tobacco, corn, rice, wheat, rye, barley triticale, sorghum, millet, etc.). Plant cells may be from any part of the plant and/or from any stage of plant development. In some embodiments, suitable plant cells are those that may be regenerated into plants after being modified using the methods of the invention, for example, cells of a callus. Methods of the invention may also include introducing one or more chimeric endonucleases and/or nucleic acid molecules encoding such chimeric endonucleases, into one or more algal cells. Any species of algae may be used in the methods of the invention. Suitable examples include, but are not limited to, algae of the genus Skeletonema, Thalassiosira, Phaeodactylum, Chaetoceros, Cylindrotheca, Bellerochea, Actinocyclus, Nitzchia, Cyclotella, Isochrysis, Pseudoisochrysis, Dicrateria, Monochrysis, (Pavlova), Tetraselmis (Platymonas), Pyramimonas, Micromonas, Chroomonas, Cryptomonas, Rhodomonas, Chlamydomonas Chlorococcum, OUsthodiscus, Carteria, Dunaliella, or Spirulina. Other examples include Haematococcus pluvialis, Chlorella vulgaris, and the halophilic algae Dunaliella sp. Algal cells may be transformed using the techniques disclosed in United States patent no. 8, 1 19, 859 issued to Vick et al. which is specifically incorporated herein for its teaching of algal transformation.

[0120] The present invention provides methods of inactivating a gene. Such methods typically comprise introducing a nucleic acid molecule encoding a chimeric endonuclease of the invention into a cell under conditions causing the expression of the chimeric endonuclease. The chimeric endonuclease of the invention can comprise a DNA-targeting domain selected to bind to a gene of interest. The chimeric endonuclease of the invention can cleave the gene of interest leaving a double- stranded break. The normal repair functions in the cell will result in the production of some inserted or deleted bases, which may result in a frame shift thereby inactivating the gene. In some embodiments, the chimeric endonuclease may be transiently introduced into the cell. This may be accomplished by transfecting a plasmid with a promoter controlling the expression of the chimeric endonuclease that does not drive expression unless induced, for example, the Tet-On promoter. Alternatively, transient expression may be accomplished by introducing mRNA encoding the chimeric endonuclease of the invention into the cell. Normal housekeeping functions of the cell will degrade the mRNA over time thereby stopping expression of the chimeric endonuclease.

[0121 ] Methods of the invention also include methods of changing the nucleic acid sequence of a gene. Typically a nucleic acid molecule encoding a chimeric endonuclease of the invention is introduced into a target cell under conditions causing the expression of the chimeric endonuclease. The chimeric endonuclease of the invention is constructed so as to bind to and cleave a gene of interest. In addition, a second nucleic acid molecule comprising a region having a nucleotide sequence that has a high degree of sequence identity to the gene in the region of the cleavage site is introduced into the cell. The region of high sequence identity may have a length of from about 10 basepairs to about 1000 basepairs, from about 25 basepairs to about 1000 basepairs, from about 50 basepairs to about 1000 basepairs, from about 75 basepairs to about 1000 basepairs from about 100 basepairs to about 1000 basepairs, from about 200 basepairs to about 1000 basepairs, from about 300 basepairs to about 1000 basepairs, from about 400 basepairs to about 1000 basepairs, from about 500 basepairs to about 1000 basepairs, from about 750 basepairs to about 1000 basepairs, from about 10 basepairs to about 500 basepairs, from about 25 basepairs to about 500 basepairs, from about 50 basepairs to about 500 basepairs, from about 75 basepairs to about 500 basepairs from about 100 basepairs to about 500 basepairs, from about 200 basepairs to about 500 basepairs, from about 300 basepairs to about 500 basepairs, from about 400 basepairs to about 500 basepairs, from about 10 basepairs to about 250 basepairs, from about 25 basepairs to about 250 basepairs, from about 50 basepairs to about 250 basepairs, from about 75 basepairs to about 250 basepairs from about 100 basepairs to about 250 basepairs, from about 150 basepairs to about 250 basepairs, or from about 200 basepairs to about 250 basepairs, corresponding to regions in the gene located both 5' and 3' to the anticipated cleavage site. High sequence identity means the region and the corresponding region in the gene have a sequence identity of from about 80% to about 100%, from about 82% to about 100%, from about 86% to about 100%, from about 88% to about 100%, from about 90% to about 100%, from about 92% to about 100%, from about 94% to about 100%, from about 96% to about 100%, from about 98% to about 100%, or from about 80% to about 95%, from about 82% to about 95%, from about 86% to about 95%, from about 88% to about 95%, from about 90% to about 95%, from about 92% to about 95%, or from about 80% to about 90%, from about 82% to about 90%, from about 86% to about 90%, from about 88% to about 90%. The region may comprise an altered sequence when compared to the gene of interest, for example, may have one or more mutations that will result in changes to one or more amino acids in a protein encoded by the gene. The double-stranded break introduced by the chimeric endonuclease of the invention may be repaired by homologous recombination with the region of high sequence identity of the second nucleic acid, effectively substituting all or a portion of the sequence of the homologous region in the second nucleic acid molecule for the original sequence of the gene. This results in a gene with modified nucleic acid sequence. In some embodiments, the chimeric endonuclease of the invention is transiently expressed in the cell. This may be accomplish by transfecting a plasmid with a promoter controlling the expression of the chimeric endonuclease that does not drive expression unless induced, for example, the Tet-On promoter. Alternatively, transient expression may be accomplished by introducing mRNA encoding the chimeric endonuclease of the invention into the cell. Normal housekeeping functions of the cell will degrade the mRNA over time thereby stopping expression of the chimeric endonuclease. In some embodiments, the second nucleic acid molecule may be a linear DNA molecule.

[0122] Methods of the invention also include methods of deleting all or a portion of the nucleic acid sequence of a gene. Typically a nucleic acid molecule encoding a chimeric endonuclease of the invention is introduced into a target cell under conditions causing the expression of the chimeric endonuclease. The chimeric endonuclease of the invention is constructed so as to bind to and cleave a gene of interest. In addition, a second nucleic acid molecule comprising a region having a nucleotide sequence that has a high degree of sequence identity to the gene in the region of the cleavage site is introduced into the cell. The region of high sequence identity is as described above except that the region will lack sequence corresponding to the portions of the gene adjacent to the anticipated cleavage site. After homologous recombination beween the gene and the second nucleic acid molecule, the lacking sequence will appear as a deletion of the sequence of the gene. Any number of basepairs may be lacking, from 1 to the entire sequence of the gene. The double strand break introduced by the chimeric endonuclease of the invention may be repaired by homologous recombination with the region of high sequence identity of the second nucleic acid, effectively substituting all or a portion of the sequence of the region of high sequence identity for the original sequence of the gene. Since this region contains a deletion at the cleavage site of the chimeric endonuclease of the invention, this results in a gene with a deletion in its nucleic acid sequence. In some embodiments, the chimeric endonuclease of the invention is transiently expressed in the cell. This may be accomplished by transfecting a plasmid with a promoter controlling the expression of the chimeric endonuclease that does not drive expression unless induced, for example, the Tet-On promoter. Alternatively, transient expression may be accomplished by introducing mRNA encoding the chimeric endonuclease of the invention into the cell. Normal housekeeping functions of the cell will degrade the mRNA over time thereby stopping expression of the chimeric endonuclease. In some embodiments, the second nucleic acid molecule may be a linear DNA molecule. [0123] Methods of the invention also include methods of making a cell having an altered genome. In some embodiments, the altered genome may comprise an inactivated gene. In some embodiments, the altered genome may comprise a gene having one or more mutations. In some embodiments the altered genome may lack all or a portion of a gene. Typically a nucleic acid molecule encoding a chimeric endonuclease of the invention is introduced into a target cell under conditions causing the expression of the chimeric endonuclease. The chimeric endonuclease of the invention is constructed so as to bind to and cleave a gene of interest. Cleavage of the target and DNA repair will result in an inactivated gene. In embodiments where the altered genome comprises a mutated gene, a nucleic acid molecule encoding a chimeric endonuclease of the invention is introduced into a target cell under conditions causing the expression of the chimeric endonuclease. In addition, a second nucleic acid molecule comprising a region having a nucleotide sequence that has a high degree of sequence identity to the gene in the region of the cleavage site is introduced into the cell. The region is as described above. The region may comprise an altered sequence when compared to the gene of interest, for example, may have one or more mutations that will result in changes to one or more amino acids in a protein encoded by the gene. The double-stranded break introduced by the chimeric endonuclease of the invention may be repaired by homologous recombination with the region of high sequence identity of the second nucleic acid, effectively substituting all or a portion of the sequence of the region of high sequence homology in the second nucleic acid molecule for the original sequence of the gene. This results in a cell with an altered genome. In embodiments wherein the altered genome lacks all or a portion of a gene, a nucleic acid molecule encoding a chimeric endonuclease of the invention is introduced into a target cell under conditions causing the expression of the chimeric endonuclease. The chimeric endonuclease of the invention is constructed so as to bind to and cleave a gene of interest. In addition, a second nucleic acid molecule comprising a region having a nucleotide sequence that has a high degree of sequence identity to the gene in the region of the cleavage site is introduced into the cell. The region typically lacks the sequence of the gene adjacent to the cleavage site, i.e. has a deletion that encompasses the anticipated cleavage site. The double-stranded break introduced by the chimeric endonuclease of the invention may be repaired by homologous recombination with the region of high sequence identity of the second nucleic acid, effectively substituting all or a portion of the sequence of the region for the original sequence of the gene. Since this region contains a deletion at the cleavage site of the chimeric endonuclease of the invention, this results in a gene with a deletion in its nucleic acid sequence. In some embodiments, the chimeric endonuclease of the invention is transiently expressed in the cell. This may be accomplished by transfecting a plasmid with a promoter controlling the expression of the chimeric endonuclease that does not drive expression unless induced, for example, the Tet-On promoter. Alternatively, transient expression may be accomplished by introducing mRNA encoding the chimeric endonuclease of the invention into the cell. Normal housekeeping functions of the cell will degrade the mRNA over time thereby stopping expression of the chimeric endonuclease. In some embodiments, the second nucleic acid molecule may be a linear DNA molecule.

Cells

[0124] The present invention includes cells produced using the methods of the invention. Cells of the invention will typically comprise one or more alterations in their genome. As used herein, "genome" encompasses the genetic material present in cellular compartments, such as chloroplasts, mitochondria and the like, as well as genetic material in the nucleus of the cell.

[0125] The present invention encompasses cells of any type, for example, cultured cells, which may be from an established cell line or may be a primary culture, cells from single cell organisms and cells from multicellular organisms. Cells from multicellular organisms may be isolated as individual cells, present in an organ, which may be isolated, or present in the whole organism.

[0126] Virtually any type of cells may be used in the practice of the invention.

The cells may be prokaryotic or eukaryotic. Any type of mammalian cells, for example mice, rat, primate (especially human primate), chicken, porcine, bovine, equine cells, may be used. Either primary cultured cells or an established cell line can be employed. The primary cultured cells may originate from any tissue, e.g. cartilage, bone, skin, nerve, oral alimentary canal, liver, pancreas, kidney, gland, heart, muscle, tendon, fat, connective, reproductive organ tissue, ocular, blood vessel, bone marrow and blood. Exemplary cell types include osteoblasts, keratinocytes, melanocytes, hepatocytes, gliacytes, pancreatic beta cells, pancreatic exocrine cells, neural stem cells, neural precursor cells, spinal cord precursor cells, nerve cells, mammary gland cells, salivary gland cells, renal glomerular endothelial cells, tubular epithelial cells, adrenocortical and adrenomedullary cells, cardiomyocytes, chondrocytes, skeletal and smooth muscle cells, fat and fat precursor cells, corneal and crystalline lens cells, embryonic retina cells, vascular cells, endothelial cells, bone marrow stromal cells and lymphocytes. For example, the methods of the invention may be employed to create muscle cells (smooth, skeletal, cardiac), connective tissue cells (fibroblasts, monocytes, mast cells, granulocytes, plasma cells, osteoclasts, osteoblasts, osteocytes, chondrocytes), epithelial cells (from skin, gastrointestinal, urinary tract or reproductive tract, or organ epithelial cells from the liver, pancreas or spleen), or nervous system cells (glial, neuronal, astrocytes), wherein the cells have an altered genome relative to the cells that were used as starting material in the methods of the invention.

[0127] In some embodiments, mammalian stem cells (embryonic, non- embryonic and hematopoietic) may be used in the practice of the invention. Exemplary stem cells include, but are not limited to, ectodermal, mesodermal, endodermal, mesenchymal, hematopoietic, neural, hepatic, muscle, pancreatic, cutaneous, retinal and follicular stem cells.

[0128] In some embodiments, non-mammalian cells from any non- mammalian organism may also be used in the practice of the invention. Numerous plant cell lines, animal cell lines, insect cell lines, plant virus cell lines and cells lines of microorganisms (such as Archaea, bacteria, plasmids, algae, phages, yeasts and fungi exist) and are available from repositories known to those of skill in the art. (DSMZ, the German National Resource Centre for Biological Material is one; ATCC, the American Type Culture Collection is another.) Cells from any of the known repositories may be advantageously used in the practice of the invention.

[0129] In one embodiment, cells of the invention may be algal cells comprising one or more alterations in their genome relative to corresponding wildtype algal cell.

[0130] Chimeric endonucleases of the invention may be used for in biological research by providing a mechanism to manipulate the genome of a cell or organism. Such genome editing allows the elucidation of the role of individual genes and portions of genes by allowing the controlled introduction of changes into the genome. This will allow the production of customized cells that are suitable for use in screening. The present invention also permits gene therapy, for example, by correcting a genetic defect using the materials and methods described herein. The present methods are particularly well suited for ex vivo methods of gene therapy where cells are removed from a patient, manipulated to achieve a desired outcome, and reintroduced in the patient. Materials and methods of the invention will find use in agricultural for creation of plants having improved growth rate , tolerance to stresses such as drought and pests, and taste. Materials and methods of the invention will find application in molecular biology and diagnostics by allowing the direct manipulation of any desired target DNA.

[0131] Embodiments of the invention are described by reference to the following specific examples.

Example 1

MATERIALS AND METHODS

Bacterial strains and plasmid construction

[0132] Escherichia coli strains DH5a and ER2566 (New England Biolabs) were used for plasmid manipulations and protein expression, respectively. E.coli strain BW25141(ADE3) was used for genetic selection assays. A complete description of all plasmids used in this study are listed in Table 1, and oligonucleotides are listed in Table 2.

Table 1. Strains and plasmids used in this study.

Table 1. Strains and plasm ids used in this study.

Table 1. Strains and plasmids used in this study.

Table 1. Strains and plasmids used in this study.

[0133] 1. Chen, Z. and Zhao, H. (2005) A highly sensitive selection method for directed evolution of homing endonucleases. Nucleic Acids Res. 33: el 54-

[0134] 2. Kleinstiver, B.P., Fernandes, A.D, Gloor, G.B. and Edgell, D.R.

(2010) A unified genetic, computational and experimental framework identifies iunctionally relevant residues of the homing endonuclease I-Bmol. Nucleic Acids Res, 38, 241 1 -2427. Table 2: Oli onucleotides used in this study

[0135] The ryA zinc-finger gene was synthesized by Integrated DNA

Technologies with 5'-BamHI and 3'-XhoI sites and a C-terminal 6-histidine tag and cloned into pACYCDuet-1 to generate pACYCryAZf+H. A stop codon was introduced at the 3 ' end of the ryAZf gene using Quikchange (Stratagene) to generate pACYCryAZf. The I-Tevl and I-Bmol GIY-YIG domains were PCR amplified from bacteriophage T4 gDNA and pACYCIBmol, respectively, and cloned into pACYCryAZf+H and pACYCryAZf. The R27A mutants of Tev-ZFEs were generated using Quickchange mutagenesis (DE613/614). The sequences of all GIY-ZFEs constructed are listed in Fig. 4). The hybrid target sites (Fig. IB and 2Q were cloned into the toxic reporter plasmid p i 1 -lacY-wtxl to generate pToxTZ1.35 and pToxBZl ,35. Identical Tev-ryA and Bmo-ryA target sites were generated in pSP72 for in vitro cleavage assays. The Tev-ryA site hybrid homing site was also cloned into LITMUS28i using BamHI and Xhol to generate pTZHS1.35. The two-site Tev-ZF plasmids were created by sub-cloning the PvuII/Hpal fragment from pSP-TZHS1.35 into the Swal site of pTZHS 1.35 to generate pTZHS2.35 and pTZHS3.35 (with the second TZHS in either orientation). The G5A or C I A/G5A mutations were introduced into pToxTZ and pTZHS plasmids by Quickchange mutagenesis. All constructs were verified by sequencing.

Two-piasmid genetic selection

[0136] The two plasmid genetic selection was performed as described with toxic (reporter) plasmids containing hybrid Tev- or Bmo-ryA target sites, or mutant ryA target sites (with G5A or C1A/G5A substitutions), or plasmids lacking a target site (pi 1-lacY-wtxl). Survival percentage was calculated by dividing the number of colonies observed on selective by those observed on non-selective plates.

Protein purification

[0137] Cultures overexpressing either TevN201-ZFE or BmoN221 -ZFE were grown at 37°C to an OD 6 oo~0.5 and expression induced by 0.5 mM 1PTG (Bio Basic Inc.) overnight at 15°C. Cells were harvested by centriiugation at 8983 x g for 12 minutes, re-suspended in binding buffer (20 mM Tris-HCl pH 8.0, 500 mM NaCl, 10 mM imidazole, 5% glycerol, and 1 mM DDT), and lysed by French press. The cell lysate was clarified by centriiugation at 20400 x g, followed by sonication for 30 seconds, and centriiugation at 20400 x g for 15 minutes. The clarified lysate was loaded onto a 1 mL HisTrap-HP column (GE Healthcare), washed with 15 mL binding buffer and then 10 mL wash buffer (20 mM Tris-HCl (pH 8.0), 500 mM NaCl, 50 mM imidazole, 5% glycerol, and 1 mM DDT). Bound proteins were eluted in 1.5 mL fractions in four 5 mL step elutions with increasing concentrations of imidazole. Fractions containing GIY-ZFEs were dialyzed twice against 1L dialysis buffer (20 mM Tris-HCl (pH 8.0), 500 mM NaCl, 5% glycerol, and 1 mM DDT) prior to storage at -80°C. 1-BmoI was purified as previously described (Kleinstiver et al. (2010) Nucleic Acids Res 38:241 1-2427).

Cleavage assays

[0138] Single time-point cleavage assays to determine the ECo 5ma of N201

Tev-ZFE were performed in buffer containing 20 mM Tris-HCl pH 8.0, 100 mM NaCl, 10 mM MgCl 2 , 5% glycerol, I mM DTT and 10 nM pTZHS1.33. Reactions were incubated for 3 minutes at 37°C, stopped with 5 μΐ stop solution (100 mM EDTA, 40% glycerol, and bromophenol blue), and electrophoresed on a 1% agarose gel prior to staining with ethiduium bromide and analysis on an Alphalmager™3400 (Alpha Innotech). The EC 0 5ma x was determined by fitting the data to the equation

/ m- x* [endo]"

/ ([endo]) =

ECo 5ma +[endo]

where ^([endoj) is the fraction of substrate cleaved at concentration of TevN201 -ZFE [endo], /nax is the maximal fraction cleavage, with 1 being the highest value, and H is the Hill constant that was set to 1. The initial reaction velocity was determined using supercoiled plasmid substrate with varying concentrations of TevN201 -ZFE (0.7 nM to 47 nM) and buffer as above. Aliquots were removed at various times, stopped and analyzed as above. The data for product appearance was fitted to the equation

P = A(l -e ~k t ) + kJ

where P is product (in nM), A is the magnitude of the initial burst, k \ is the rate constant (s 1 ) of the initial burst phase and k is the steady state rate constant (s 1 ). The two-site plasmid cleavage assays were conducted as above, using 10 nM pTZHS2.33 or pTZHS3.33 as substrates, and -90 nM purified TevN201-ZFE. The rate constants were calculated from the decay of supercoiled substrate by fitting to the equation

[C] = [Co]expHt, where [C] is the concentration (nM) of supercoiled plasmid at time t, [C 0 ] is the initial concentration of supercoiled substrate (nM), and k \ is the first order rate constant (in

At least 3 independent trials were conducted for each data set.

Cleavage mapping

[0139] Mapping of cleavage sites was performed as described (Mueller et al.(1995) EMBO J 14(22):5724-5735). Briefly, primers were individually end-labeled with γ - 32 P ATP, and used in PCR reactions with pTox or pSP72 plasmids carrying Tev-ryA or Bmo-ryA target sites to generate strand-specific substrates. The substrates were incubated with purified protein as above, and electrophoresed in 8% denaturing gels alongside sequencing ladders generated by cycle sequencing with the same end- labeled primers (USB Biologicals).

RESULTS

GIY-YIG homing endonucleases function as monomers

[0140] To probe the oligomeric state of GIY-YIG homing endonucleases, it was determined if I-Bmol functions catalytically as a monomer by examining the relationship between protein concentration and initial reaction velocity. This relationship was determined by in vitro cleavage assays using a plasmid substrate with a single thyA target site. As shown in Fig. 1, plotting of the initial reaction velocity versus protein concentration revealed a linear relationship, suggesting that DNA hydrolysis is first order with respect to I-Bmol concentration. This observation was extended by performing cleavage assays with plasmids that contained either one or two copies of the I-Bmol thyA target site under conditions of protein excess (Fig. 1), and a Aobs(i-site) o 0.105 ± 0.01 s "1 and a £ 0 bs(2-site) o 0.096 ± 0.01 s " ' was calculated. The small differences in the Ar obs rate constants indicated that I-Bmol does not require two target sites to promote DNA cleavage. In contrast, similar assays with Fokl showed a significant rate enhancement for two-site plasmids relative to one-site plasmids, consistent with Fokl functioning as a dimer. Cross-linking and gel-filtration studies were also consistent with I-Bmol existing as monomer in solution or when bound to its cognate substrate (Fig. 1). The simplest interpretation of the above data is that the oligomeric status of I-Bmol is not influenced by protein concentration, that cleavage by I-Bmol is non-cooperative, and that I-Bmol functions as a monomer. Furthermore, when considered in the context of past studies showing that the closely related I-TevI binds DNA as a monomer, it is likely that both I-Bmol and 1-TevI function as monomers in all steps of DSB formation.

Construction and validation of GIY-zinc finger endonucleases

[0141 ] Existing crystal structures were used to model GlY-YIG-zinc finger endonucleases (GIY-ZFEs). For the I-TevI-zinc finger fusions (Tev-ZFE), the Zif268 zinc finger was modeled in place of the H-T-H motif at the C-terminal end of I-TevI (Fig. 1A and 2B). Actual GIY-ZFEs utilized the ryA zinc finger that targets a sequence in the Drosophila yellow gene. One notable feature of these constructs is the polarity, as the GIY-YIG nuclease domain is fused to the N-terminal end of the ryA protein to mimic the native orientation of the GIY-YIG domain, whereas Fokl fusions are to the C-terminal end of zinc-finger proteins. The DNA substrates consisted of 31 to 33 bps of the I-TevI td homing site that is contacted by the linker and nuclease domains, joined to the 9-bp ryA target site (Fig. IB). In the shortest substrates, the critical G of the 5'-CXXXG-3' cleavage motif is positioned 28-bp distant from the ryA binding site, in analogy with the native spacing of the I-TevI td homing site. An analogous set of I-BmoI-ryA fusions were constructed (Bmo-ZFEs, Fig. 2C ).

[0142] The activity of the GIY-ZFEs using a well-described two-plasmid bacterial selection system (Fig. 2D) was determined, where survival is dependent on endonuclease activity, as described in leinstiver et al. ((2010). Nucleic Acids Res 38:241 1 -2427). Eight Tev-ZFEs were tested against three substrates that differed in positioning of the preferred 5'-CXXXG-3 ' cleavage motif relative to the ryA binding site (Fig. 2B). All Tev-ZFEs exhibited significant survival, with the highest survival observed against plasmid substrates with the shortest distance between the cleavage motif and ryA-binding site (as shown in Table 3 below). In contrast, no survival was observed when the fusions were tested against the toxic plasmid without an appropriate target site (pi llacYwtxl ), demonstrating that survival is dependent on a specific ryA-binding site. The catalytic arginine 27 of the I-TevI nuclease domain was also mutated to alanine in all of the Tev-ZFEs, creating TevR27A-ZFEs. None of the TevR27A-ZFEs survived the assay, showing that survival is dependent on the GIY- YIG nuclease activity. Addition of a C-terminal 6x-His tag to any of the Tev-ZFEs had no effect on activity, as all constructs displayed survival rates very similar to the untagged constructs. The Bmo-ZFEs in the genetic selection were also tested. As

Toxic plasmid

pToxTZ1.33 pToxTZ1.34 pToxTZ1.35 i l lacywtx

GIY-ZFE WT G5A CIA/G5A WT G5A C 1A/G5A WT G5A C1 A/G5A

TevN201 86.8 0 0 59.9 0 0.2 ± 0.1 49.8 0 0 0

± ± (3) ±

5.9 9.5 9.8

(6) (6) (6)

TevN201G 2 72.7 0 0 56.9 0 0 38.6 0 0 0

± ± ±

10.7 11.2 10.3

(6) (6) (4)

TevN201G 4 83.7 0 0 42.8 0 0 36.3 0 0 0

± ± ±

15.2 12.6 7.1

(4) (6) (4)

TevN201R27A 0 0 0 0 0 0 0 0 0 0

Tev 203 86.8 0 0 50.7 0 0 51.0 0 0 0

± ± ±

7.1 9.5 6.6

(6) (6) (5)

Tev 203G 2 88 0 0 53.7 0 0 46.5 0 0 0

± ± ±

13.9 10.4 10.9

(6) (6) (5)

Tev 203G 4 80.7 0.2 0.4 ± 0.3 43.6 0 0 48.0 0 0 0

± ± (3) ± ±

7.9 0.2 13.0 6.1

(4) (3) (6) (4)

TevK203R27A 0 0 0 0 0 0 0 0 0 0

TevS206 86.6 0 0 47.1 0 0 62.3 0 0 0

± ± ±

6.9 8.6 12.4

(6) (6) (4)

TevS206G 2 70.7 0 0 27.8 0 0 44.2 0 0 0

± ± ±

8.7 7.4 16.4

(4) (6) (4)

TevS206R27A 0 0 0 0 0 0 0 0 0 0 described below, enzymatic activity was detected in vitro using purified Bmo-ZFEs. Collectively, these results show that two different GIY-YIG nuclease domains and linkers could be fused to the ryA zinc finger to create chimeric site-specific nucleases.

Table 3. Survival of GIY-ZFEs in the two-plasmid genetic selection. [0143] 1 Fusions are named according to the residue number of 1-Tevl fused to the N-terminal of the ryA zinc finger (ie. N201 refers to asparagine 201 of I-Tevl). G 2 and G 4 refer to a 2- and 4-residue spacer linker, respectively, between the I-Tevl and ryA domains. R27A refers to an arginine 27 to alanine mutation.

[0144] 2 Toxic substrate plasmids are designated as described in Materials and

Methods.

[0145] 3 Survival percentages are reported as the mean with standard deviation, with the number of replicates in brackets. Selections with zero survival were confirmed by three independent trials.

GIY-ZFEs require specific sequences for efficient cleavage

[0146] Both I-Tevl and I-Bmol are DNA endonucleases that cleave specific sequences at a defined distance from their primary binding sites. To determine if the chimeric GIY-ZFEs also cleaved substrate in a sequence-specific manner, the TevN201-ZFE and BmoN221 -ZFE fusions were purified for in vitro mapping studies (Fig. >A and 3B). Using strand-specific end-labeled substrates, the bottom- and top- strand nicking sites of TevN201-ZFE were mapped to lie within the 5'-CXXXG-3' motif, with† and J, representing the bottom- and top-strand nicking sites, respectively (Fig. 3Q. The bottom- and top-strand nicking sties of BmoN221-ZFE were mapped to a 5'-XX†XX|G-3 ' motif, mimicking the native I-Bmol sites. Fig. 3D Thus, both the I-Tevl and I-Bmol GIY-YIG nuclease domains cleave DNA specifically in the context of a zinc-finger fusion.

[0147] To further demonstrate TevN201-ZFE cleavage specificity, mutations were introduced in the 5'-CXXXG-3 ' motif that were previously shown to drastically reduce I-Tevl cleavage efficiency (Fig. 3£ . Significantly, no survival was observed in the two-plasmid selection assay with pTox plasmids carrying either the single G5A (5'-CXXXA-3') or double C 1A/G5A (5'-ΑΧΧΧΑ-3 ') substitutions (Table 1), equivalent to mutations at positions C-27 and G-23 of the I-Tevl td substrate. Cleavage assays were performed with wild type and mutant substrates and increasing concentrations of TevN201-ZFE to determine the amount of protein required for half- maximal cleavage (EC 0 .5max)- As shown in Figure 3e -60 fold and -4.7 fold more protein were required to achieve half-maximal cleavage of the double- and single- mutant substrates relative to the wild-type substrate. The greater substrate discrimination observed in the genetic assay likely reflects lower in vivo protein concentrations than those used for in vitro cleavage assays. These results clearly show that the TevN201-ZFE fusion retains the cleavage specificity of the parental I-Tevl enzyme and that double nucleotide substitutions can significantly reduce cleavage efficiency. Although the BmoN221-ZFE substrate specificity was not tested extensively, it was shown that the chimeric endonuclease cleaved the Bmo-ryA substrate plasmid, but not the target-less control plasmid.

GIY-ZF s function as monomers

[0148] To determine if the GIY-YIG domain retained the ability to function as a monomer in the context of a zinc-finger fusion, cleavage assays were performed to determine the relationship between TevN201-ZFE enzyme concentration and initial reaction velocity. The reaction progress curves indicated an initial burst of cleavage followed by a slower rate of product accumulation (Fig. 5A), consistent with product release being the rate-limiting step. The initial burst phase was used to estimate initial velocity, and plotting against protein concentration yielded a linear relationship (Fig. 5A), suggesting that DNA hydrolysis catalyzed by TevN201 -ZFE is first order with respect to protein concentration. Time-course cleavage assays under single-turnover conditions (-10-fold molar excess of protein to substrate) were also conducted with plasmids that contained one or two Tev-ryA target sites. Two-site plasmids that differed in whether the target sites were in the same or opposite orientations relative to each other were constructed. As shown in Figure 5B, cleavage of the one-site plasmid yielded ¾>bs(i -site> = 0.099 ± 0.001 s "1 , and cleavage of the two-site plasmids with target sites in the same or opposite orientations generated very similar rate constants, £ 0 bs(2- S ite) = 0 088 ± °-001 s "1 and 0.089 ± 0.001 s "1 , respectively, to the one- site plasmid. Thus, TevN201 -ZFE does not require two sites for efficient DNA hydrolysis, consistent with the enzyme functioning as a monomer. Example 2

[0149] The TevN201 (G4)-PthXo 1 TAL-effector fusion (Tev201 -TAL, Figure

7A) was purified from E. coli BL21 (DE3) cells overexpressing the fusion that was cloned into pACYC-Duet. The fusion protein was purified un-tagged by ion-exchange chromatography. A number of fusion products were constructed which varied in the size of the I-Tevl linking portion that was incorporated. As shown, regions including 201 , 203 and 206, with or without additional glycine residues, were made. The full amino acid sequences of fusion products constructed are shown in Figure 19. The final purification fractions were used for in vitro DNA cleavage assays using either PCR products or radioactively labeled duplex oligonucleotide substrates. As shown in Figure 8A, the substrates consisted of various lengths of the native I-Tevl target sequence derived from the phage T4 td gene that were fused to the 5' end of the PthXol TAL-effector binding site. The substrates are designated TP (for Tev- PthXol), and number according the length of the I-Tevl target site included (TP24 has 24 bp of the I-Tev] target site). The substrates were designed as complementary oligonucleotides that were subsequently annealed and cloned into pLitmus. Alternatively, the oligonucleotides were radiolabeled with 32 P, and then annealed. As shown in Figure 8, when incubated with Tev201-TAL, cleavage was observed on all the PCR products corresponding to the TP24-36 substrates, with varying degrees of efficiency. Divalent metal ion was omitted from one reaction, but cleavage was still observed. This result is consistent with previous data showing that the native I-Tevl protein retains activity in the absence of exogenously added divalent metai ion, likely because the nuclease domain has metal bound during purification.

[0150] The radioactively labeled DNA substrates were used to map the cleavage sites of the Tev-TAL fusions. The substrates were labeled on both strands, meaning that both the top and bottom strand cleavage products could be mapped. As shown in Figure 9, two prominent cleavage products were observed with the TP series of substrate when incubated with Tev201-TAL. Note that the size of the bottom strand product varies with the TP substrate tested. The size difference is due to the fact that the position of the bottom strand cleavage site is moved closer to the 3' end of the duplex DNA substrate (i.e. closer to the TAL binding site) because the shorter TP substrates include less of the native I-Tevl site. The top strand cleavage site does not change size, because its position relative to the 5' end of the duplex substrates does not change in any of the substrates. The sizes of both cleavage products are consistent with specific cleavage by the Tev201 -TAL fusion at the CNNNG cleavage motif.

[0151 ] Reference to the amino acid alignment of the linker regions of I-Tulal,

I-TevI, and 1-BmoI (see Figure 15D) indicates the regions of conservation and consensus. Indicated is the functionally critical region of the ITevI linker (Kowalski et al. 1999 NAR; Liu et al. 2008, JMB). To one knowledgable in the art, an optimized linker may be generated that includes deletion, replacement, and addition of amino acid sequences using conventional methods. This may include the replacement of the functionally non-critical regions in the linker with other desired sequences.

Example 3

[0152] The nucleotide requirements of the I-TevI linker (residues 97-169) for its corresponding region on a substrate was determined. A coupled in vitro/in vivo selection system was used (Edgell et al. Current Biology (2003) 13:973-978) that relies on cleavage of a randomized DNA spacer plasmid library by the Tev l 69-Onu fusion protein (see Fig. 18 for amino acid sequences of a family of Tev-Onu fusion products that vary in the size of the Tev portion). Cleaved substrates are isolated, and amplified in E. coli, followed by bar-coded PCR for deep-sequencing on an Ion Torrent sequencer.

[0153] The findings indicate that the I-TevI linker has a nucleotide preference at 3 positions within the DNA spacer, namely, positions 2, 8 and 15 (see Figure l Oa b). Thus, a consensus DNA sequence for the Tevl 69 constructs could be 5' CNNNGN(A/T)NNNNNG(A/T), where N is any nucleotide and the CNNNG is the required cleavage motif. This motif occurs in >93% of all non-redundant human cDNAs at least once (see Figure 1 1 ). Figure I 0c demonstrates the relationship between the nucleotide bias in the DNA spacer region (bottom), and its relationship to the evolutionary conserved amino acids of the I-Tevl native target gene thymidylate synthase in bacteriophage T4 (spp). Domain knowledge regarding the original sequence permits refinement of the spacer region identified in Figure 10b to identify potential artifacts linked to the original sequence bias to generate a viable consensus and indicates the importance of the core spacer sequence comprising CNNGN(AZT), and the scaled optional nature of an additional NNN G and the additional terminal (A T) nucleotide.

[0154] Cleavage efficiency on individual substrates that were selected at random from the DNA spacer library were also tested. This data is shown in Figures 12, 13, and 14. Figure 12 shows the sequences, and the activity of the Tevl69-Onu fusion on these sequences in the bacterial two-plasmid assay.

[0155] Also included in this analysis is the activity of the Tula-derived fusions

(TuIaK169, sequence as shown in Fig. 20). Figure 13 shows the activity of the Tev l69-Onu fusions on the substrates in a yeast-based assays, relative to a normalized Zif268 control. Figure 14 shows the activity of the Tulakl69 fusions on a subset of the sequences.

Example 4

[0156] Additional Tev-TAL fusions were constructed using standard molecular biology cloning techniques by fusing different lengths of the I-Tevl nuclease domain to different N-terminal residues of the TAL effector PthXol . The general schematic of the fusions is shown in Figure 22, and the constructs that have been rigorously tested are shown in Table 4. Note that the longest TAL domain used corresponds to essentially the full length TAL effector protein. TAL domains that are truncated at the N-terminal side are labeled by the amino acid residues used as the fusion point (ie. T120 is threonine 120). Model DNA substrates were also constructed to test activity of the Tev-TAL fusions. These substrates are shown in the 5'-3' direction in Figure 22. The substrates are tri-partite, and consist of the I-Tevl cleavage motif (CAACG), a variable length spacer, and a TAL effector-binding site.

TABLE 4: Tev-TAL constructs

Name I-Tevl linker PthXol PthXol fragment N-terminus C-terminal

[0158] The activity of the Tev-TAL fusions was tested using a yeast-based

DNA repair assay to test for activity. In this assay, the Tev-TAL target site is cloned on a plasmid between an interrupted and partially duplicated lacZ coding for a nonfunctional beta-galactosidase enzyme. A separate plasmid expresses the Tev-TAL fusion under the control of either a weak or strong constitutive promoter. Separate yeast strains harboring each plasmid are mated, and a functional lacZ gene (and beta- galactosidase activity) is only produced when the Tev-TAL fusion cleaves its target, promoting DNA repair of the lacZ gene to generate a functional copy.

[0159] Shown in Figure 23 are representative data for a set of the Tev-TAL fusions tested in the yeast-based assay against substrates differing in the spacer length. Our data indicate that the N-terminal fusion point on the TAL effector has a major influence on the activity of the Tev-TAL fusions. The data also show that the fusions consisting of the longer I-Tevl fragments (residues 1-201 or 1 -206) are most active on the DNA substrates with the longest spacer lengths, consistent with the I- TevI linker region acting as a ruler to position the nuclease domain at the CNNNG cleavage motif. The fusion with the broadest range of activity corresponded to I-Tevl residues 1 to 169 fused to T120 of the TAL effector (Tev-TAL12). This fusion showed maximal activity on the 15-bp spacer, and lower activity on a broader range of substrates. Finally, the data also show that C-terminal truncations of the TAL effector at residue PI 1 18 reduce activity of the fusions by about a factor of 2.

[0160] The experiments shown in Figure 23 addressed the length requirement of the DNA spacer portion of the I-Tevl linker. For the Tevl69-T120 construct (Tev- TAL ^), the optimal DNA spacer length is 15 bp. We next focused on determining the degree of tolerance to nucleotide substitution in the DNA spacer using the Tev- TAL^ construct. The DNA spacer is the region of substrate that is contacted by the I- TevI linker, which may require specific nucleotides for contact. The native I-Tevl target lies within the phage T4 thymidylate synthase gene, and we rationalized that testing activity on a series of related thymidylate synthase genes from other phage would inform us of nucleotide requirements in the DNA spacer. We thus constructed a series of substrates where the native I-Tevl DNA spacer sequence was replaced with homologous sequence from other phage thymidylate synthase genes (Figure 24), and tested the activity of the Tev-TAL 12 fusion on those substrates in the yeast-based assay.

[0161] As shown in Figure 24, the Tev-TAL12 fusion showed activity on par with or better than the native T4 sequence (TP1.15). In particular, the Tev-TAL12 fusion was ~1.5-fold more active on the substrate derived from Tula phage than on its native substrate. Other phage-derived thymidylate synthase substrates exhibited a range of activity. Importantly, all the substrates had multiple substitutions in the DNA spacer relative to the T4 sequence (indicated by lower case red text), indicating that the I-Tevl linker can tolerate multiple substitutions in the DNA spacer. The only exception was the substrate derived from the RB32 phage, notable because this substrate contains mutations that are also found in the other substrates tested. However, the RB32-derived substrate had a C to A mutation in the cleavage motif relative to the other substrates, which may explain the lower activity of the Tev- TAL12 fusion.

[0162] To demonstrate the relevance of the Tev-TAL fusions to mammalian systems, we tested activity of two fusions in HEK293 cell lines using a GFP reporter assay similar to the yeast lacZ repair assay. In the mammalian assay, the target site for the Tev-TAL nuclease is cloned between a partially duplicated but non-functional GFP gene. Cleavage by the Tev-TAL nuclease initiates a DNA repair and recombination event that generates a functional GFP gene. The Tev-TAL fusions were cloned into a mammalian expression vector and co-transfected into HEK293 cells with the GFP reporter plasmid. We tested the activity of the Tev-TALl l and Tev-TAL 12 fusions on different DNA substrates, and found that each construct was active in HE 293 cells as judged by GFP + cells in bright field images, and by Western blot analyses of whole cell extracts for full-length GFP (Figure 25).

[0163] Targeted manipulation of complex genomes is greatly enhanced by site-specific DNA endonucleases that can introduce a nick or double-strand break at specific locations within a genome. There are currently two widely used technologies: the first is DNA endonucleases derived from naturally occurring homing endonucleases. The second technology used the dimeric non-specific nuclease domain derived from the type IIS restriction enzyme Fokl that is fused to the C-terminaus of zinc-fingers or TAL effector domains, to create zinc-finger nucleases (ZFNs) or TAL effector nucleases (TALENs). Our Tev-TAL fusions represent a third technology for genome engineering applications including, but not limited to, targeted cleavage of a clinically relevant sequence in the human genome or genome of model organisms (mouse, rat, Drosophila, etc) for gene therapy purposes, or targeted cleavage of sequences in genomes to introduce mutagenic lesions with the goal of creating gene knockouts. Currently, there are a number of commercially availabe services for the design and testing of ZFNs and TALENs against desired targets, or that offer engineered ZFNs and TALENs designed against particular sequences. This currently existing architecture imposes design constraints on ZFNs. In contrast, the GIY-YIG nuclease domains used in the present invention do not impose the same design limitations. The GIY-YIG nuclease domains of the present invention will be useful for introducing targeted double-strand breaks at sequences defined by a distinct DNA- targeting domain. The applications of the GIY-YIG nuclease fusions is an improvement over ZFNs, TALENS, or engineered homing endonucleases, notably for targeted manipulation of complex genomes for gene replacement, or gene knockouts.

[0164] The Tev-TAL fusions of the present invention are different from existing nucleases that fuse the Fokl nuclease domain to the C-terminus of the TAL effector domain (the TALENs) in at least:

(1) the Tev-TAL nuclease needs a single DNA target site for cleavage where as the TALENs need two DNA target sites (the Tev-TALs function as monomers, whereas TALENs function as dimers). This reduces the engineering requirements by a factor of 2, and also reduces the complexity of finding two suitable target sites in close proximity (as needed with ZFNs and TALENs);

(2) the Tev-TAL nucleases have cleavage specificity whereas TALENs have no inherent DNA cleavage specificity;

(3) the Tev-TAL nucleases are smaller than the corresponding TALENs;

(4) the I-Tevl nuclease domain is fused to the N-terminus of the PthXol TAL effector; and

(5) the Tev-TAL fusions use different lengths of the I-Tevl nuclease domain and PthXol TAL effector.

[0165] Currently, the most widely used technology is the catalytic domain derived from the Fokl restriction endonuclease. There are published reports of different nuclease domains that have been fused to zinc-finger nucleases, specifically the non-specific nuclease domain of Staphylococcal nuclease that is sandwiched between two zinc-finger binding modules. In addition the Staphylococcal nuclease domain has been fused the phage lambda repressor protein to create an artificial nuclease. Recently, the nuclease domain from the restriction enzyme PvuII has been fused to zinc-fingers and to a catalytically inactive LAGLIDADG homing endonuclease to create novel site-specific nucleases. However, the Puvll nuclease functions as a dimer, like Fokl, and has a 6-bp cleavage site requirement (in addition to the targeting requirements of the zinc finger or LAGLIDADG homing endonuclease). There are also published reports of generating zinc-finger nucleases using inorganic metal-protein complexes.

Example 5 [0166] Bacteria and yeast strains. Escherichia coli strains DH5a and ER2566

(New England Bio labs) were used for plasm id manipulations and protein expression, respectively. E. coli strains were grown in Luria-Broth media supplemented with the appropriate antibiotics. Sachoramyces cerevisiae strains YPH500(a) and YPH499(a) were used for the single-strand annealing assay, and grown in appropriate media as described (CHRISTIAN et al. 2010 Genetics 186: 757-761).

[0167] Construction of mTALENs and substrate plasm ids. Substrates for mTALENs were constructed by first cloning oligonucleotides corresponding to the target site into the Bglll/Sphl sites of pTox. Each substrate, differing in the DNA spacer length, was PCR amplified with flanking primers and cloned into the Bglll/Sphl sites of the yeast vector pCP5.1 to create the TP series of plasmids (TPS- TPS^ for the yeast activity assay. Mammalian substrates were constructed in the same manner, and cloned into the Sacl/Xhol sites of pcDNA3(+). mTALENs were first constructed in pACYC by changing the Ncol site to Pcil, and by inserting a stop codon downstream of the Bglll site, and the full-length PthXol TAL effector was then cloned into the BamHI/Bglll sites. The I-TevI nuclease domain and various linker lengths were then cloned into the Pcil/BamHI sites. mTALENs that differed in the N-terminal fusion point were constructed by first removing the N-terminal BamHI/Sphl fragment from PthXol , leaving the RVD repeats intact. PCR products corresponding to the new N-terminal fusion point were then cloned into the BamHI/Sphl sites, and the I-TevI nuclease domain was cloned Pcil BamHI. For yeast assays, each mTALEN construct digested with Pcil Xhol and subcloned into the Ncol Sall sites of pGPD423 (ALBERTI et al. 2007 Yeast 24: 913-919.). For mammalian assays, the pACYC backbone was first modified by including an RsrII site upstream of the Pcil site, and mTALEN constructs cloned as above. mTALENs were subsequently cloned into the Pstl RsrII sites of pExodus. A list of mTALENs tested for activity is found in the following table.

[0168] Table 5. mTALEN constructs, named according to the length of the I-

TevI fragment and the N-terminal residue of the PthXol TALE domain. mTALEN I-Tevl linker PthXol PthXol active fragment M -terminus C-terminal

truncation

S206-T221 S206 GGGSGLQ T221 No Yes

S206-T221.1 S206 DPISRSQLQ T221 No Yes

S206-T120 S206 GGGSGLQ T120 No Yes

S206-V152 S206 GGGSG V152 No Yes

S206-G 187 S206 GGGSGLQ G187 No Yes

S206-G 187.1 S206 DPISRSQLQ G187 No Yes

S206-T221A S206 GGGSGLQ T221 P I 135 Yes

S206-T221.1A S206 DPISRSQLQ T221 P I 135 Yes

S206-T120A S206 GGGSGLQ T120 P I 135 Yes

S206-I21 S206 None 1214 No Weak

S206-P218 S206 None P218 No Weak

N201-D1 N201 GGGGGS D1 No Yes

D184-V152 D184 GGSGGS V152 No Yes

N1 9-T120 N169 GGSGGS T120 No Yes

N169-V152 N169 GGSGGS V152 No Yes

N 169-E181 N169 GGSGGS E181 No No

N 169-V184 N169 GGSGGS V184 No No

N169-G187 N169 GGSGGS G187 No No

N 169-A191 N169 GGSGGS A191 No No

N169-A195 N169 GGSGGS Al 95 No No

N169-T209 N169 GGSGGS T209 No No

N169-Q211 N! 69 GGSGGS Q21 1 No No

N169-T221 N 169 GGSGGS T221 No No

N140-D1 N140 G Dl No No

D127-D1 D127 G Dl No No

D127-T221 D127 GGGSGLQ T22I No No

D127-T120 D127 GGGSGLQ T120 No No

D127-P218 D127 None P218 No No

DJ27-I214 D127 None 1214 No No

D127-T221 D127 GGGSG T221 No No

D1 7-T221A D127 None T221 P I 135 No

D127-I214A DI27 None 1214 Ρ Π 35 No

D127-T221A D127 GGGSG T22 I P I 135 No

S114-D1 SI 14 G D l No No [0169] Cleavage mapping. Mapping of mTALEN sites used N169-T120 or

S206-D1 mTALENs that were purified untagged. Briefly, untagged mTALEN constructs in pACYC-Duet were overexpressed in E. coli ER2566, suspended in buffer A (20 mM Tris-HCl pH 7.5, 200 mM NaCl, 1 mM DTT, 5% glycerol and 0.1 mM EDTA), and lysed with a cell homogenizer (Avestin). Clarified extracts were applied to a Hi-Trap Heparin (GE Healthcare) equilibrated in the same buffer and eluted with a linear gradient of from 200 mM to 1M NaCl. Fractions containing 250- 325 mM NaCl were pooled, dialyzed, and applied to a SP-FF column (GE Healthcare) equilibrated in buffer A, and eluted in steps of 200 nM NaCl to a final concentration 1 M NaCl. The 400 mM elutions were pooled, and applied to a FF-Q column (GE Healthcare) equilibrated in buffer A, and eluted in steps of 200 mM NaCl. The 400 mM fractions were pooled, concentrated to 0.5 mis and loaded onto a 30-ml Superose 12 gel filtration column (GE Healthcare) equilibrated in buffer A, and 0.25 ml factions collected over 1 column volume. Endonuclease assays on substrates with different length spacers utilized oligonucleotides end-labeled at the 5 ' end with T4 polynucleotide kinase and 32 Ρ-γΑΤΡ prior to annealing. Cleavage reactions consisted of 20-μ1 reactions in 1 X NEBuffer 3 reaction buffer for 10 mins at 37°C, and were resolved on 10% denaturing urea-polyarcylamide gels. Mapping of cleavage sites utilized supercoiled pSP72-TP15 in 20-μΙ volumes of 1 X NEBuffer 3 and a 5-fold molar excess of protein to DNA. Linear cleavage products were gel isolated and set for sequencing at the London Regional Genomics Facility. Cleavage sites were determined from ABI traces, taking into account the additional A added by Taq polymerase during sequencing reactions.

[0170] Yeast β-galactosidase reporter assay. The yeast reporter assay was performed as described (CHRISTIAN et al. 2010). The protocol was adapted to microtitre plates, where three transformants of YPH499 harbouring the target plasmids (in pCP5.1 ) and YPH500 harbouring the mTALENs were grown in 96-well plates at 30°C overnight with shaking in synthetic complete medium lacking tryptophan and uracil (for the YPH499 target strain) or histidine (for the YPH500 mTALENs strains). The mTALEN and target strains were mated by combining 200 - 500 μΐ of overnight cultures and adding 1 ml of YPD media, and incubated for 4-6 hrs without shaking at 30°C. Cell density was measured 595 nM by plate reader. Cells were harvested by centrifugation, resuspended and lysed using YeastBuster Protein Extraction Reagent (Novagen) according to the manufacturer's protocol. A total of 60 μΐ of lysate was transferred to a 96-well plate and β-galactosidase activity measured and normalized as previously described (TOWNSEND et al. 2009 Nature 459: 442-445.). Miller units were normalized to a SurB dimeric Fokl-TALEN or Zif268 zinc finger nuclease control for assays profiling the optimal mTALEN DNA spacer length, or to the N169-T120 mTALEN on the TP15 substrate for assays profiling CNNNG cleavage site and DNA spacer requirements.

[0171] Episomal and chromosomal assays in HEK 293T cells. HEK 293T cells (obtained from ATCC) were cultured in high glucose Dulbecco's modified Eagle medium (DMEM) supplemented with 10% fetal bovine serum (FBS), at 37°C in 5% C0 2 . Approximately 2.5xl0 6 million cells were seeded 24 hrs prior to transfection in 6 cm plates. Cells were co-trans fected with 3 g of pExodus mTALEN and 3μg of pcDNA3(+) TP15 target DNA using calcium phosphate, and incubated at 37°C with 5% C0 2 for 16 hrs before replacing media. After 48 hrs, cells were harvested in phosphate buffered saline (PBS). Plasmid DNA was isolated using the BioBasic miniprep kit. Target sites were PCR amplified and gel purified. After gel purification, 250 ng of each PCR producted was incubated with 2U of Ddel (N.E.B.) in 1 X NEBuffer 2 for 1 hr at 37°C. Digests were electrophoresed on a 1 .5% agarose gel and stained with ethidium bromide before analysis on an Alphalmager™3400 (Alpha Innotech).

[0172] Optimization of mTALEN architecture. To determine whether the monomeric 1-TevI nuclease domain remains functional when fused to TALE domains (mTALEN) , we constructed 33 different mTALENs by fusing varying lengths of the I-Tevl nuclease domain and native protein linker to the N-terminus of the TAL effector PthXo l from the rice pathogen Xanthomonas oryzae pv. oryzae (Figure 22A and Table 5) (YANG et al, 2006 Proc Natl Acad Sci U S A 103 : 10503-10508). Tev- PthXol mTALEN constructs are named using the length of 1-Tevl fragment, beginning at residue 1, followed by the N-terminal residue in PthXo l . All Tev- PthXol mTALENs were tested against model DNA substrates derived from the phage T4 td gene fused to the perfect match PthXol binding site (Figure 22B). The TevPth (TP) substrates mimic the modularity and orientation of the mTALENs as they consist of, in the 5' to 3 ' direction, a CNNNG cleavage motif, a DNA spacer (normally contacted by the I-Tevl linker), and the PthXol binding site. To optimize fusion architecture, we tested the activity of mTALENs on the model substrates using a quantitative yeast-based assay where an mTALEN target site interrupts a partially duplicated lacZ gene (CHR^AN et al. 2010). Cleavage of the target site can restore the lacZ gene reading frame through single-strand annealing DNA repair, resulting in β-galactosidase activity that can be normalized to benchmarked ZFNs or TALENs.

[0173] Our initial mTALEN construct consisted of residues 1 -206 of 1-TevI

(S206) as this fragment showed high activity in the Tev-ZFE and Tev-LHE scaffolds {KLEINSTIVER et al. 2012 Proc Natl Acad Sci U S A 109: 8061 -8066). We found that selection of fusion point within the N-terminus of the TALE had the greatest effect on activity, as fusion to the Dl, T120, VI 52, G187 or T221 residues of PthXol generated nucleases with varying levels of activity (Figure 26A and Table 5). Activity was measured on substrates where the DNA spacer length between the CNNNG cleavage motif and TALE binding site ranged from 5-bp to 31 -bp. Interestingly, the two most active constructs, S206-TI20 and S206-V152 displayed a periodic cleavage profile, with maximal cleavage correlating with a 10-bp helical turn, as was observed for the Tev-LHE.

[0174] While these constructs showed robust activity on model substrates, the

S206 fragment of I-Tevl contains the entire region of the native I-Tevl linker, including all residues that are known to make base-specific contacts to substrate in the context of the native enzyme (VAN ROEY et al. 2001 EMBO J 20: 3631 -3637). Thus, to remove potential base-specific interactions that may limit targeting (as had been done with the Tev-LHE fusion), we determined if progressively shorter lengths of the I-Tevl linker could also function in the context of mTALENs. The I-Tevl fragments consisting of residues 1 -169 (N169) and 1-184 (D184) displayed high activity in the context of the T120 or V I 52 PthXol N-terminal fusion points, and both of these fusions also exhibited a 10-bp periodic activity on substrates with varying length spacers (Figure 26B).

[0175] Additionally, we found that deleting C-terminal residues past PI 1 18 of the TAL domain had little effect on activity compared to analogous constructs with an intact C-terminus, except for the S206-T120 construct where the C-terminal deletion reduced activity by ~2 fold. Cleavage by the mTALEN fusions was directed by the TALE domain and not the I-Tevl domain, as we saw no activity on a different TALEN substrate, SurB, or on a substrate that contained the I-Tevl cleavage site and DNA spacer but a LHE binding site (Tev-ONU). Collectively, the data show that fusion of different lengths of the I-Tevl nuclease domain and linker to the PthXol TALE can create highly active mTALENs.

[0176] Mapping of mTALEN cleavage sites. In general, we found that the mTALENs with the longest I-Tevl fragments were most active on the substrates with longer DNA spacers. However, mTALENs with shorter I-Tevl fragments, particularly the N 169 and D 184 constructs, displayed activity on different length DNA substrates that was correlated with a 10-bp helical DNA turn. This behavior mimics that observed for the native I-Tevl enzyme (BRYK et al. 1 95 J Mol Biol 247: 197-210; DEAN et al. 2002 Proc Nat! Acad Sci U S A 99: 8554-8561), and suggests that the nuclease domain and 5'-CN NG-3' motif must be coordinately positioned for efficient cleavage. To map the cleavage sites of mTALENs, we used two different approaches. First, we purified the S206-D1 mTALEN, and performed in vitro cleavage assays with oligonucleotide substrates radioactively labeled on each strand that varied in the length of the DNA spacer (from 21 -31 bps) (Figure 27A). When resolved on a denaturing poly aery lam ide gel, both the top- and bottom-strand products are visible. The top-strand product is a constant length as the 5' end is always the same distance from the 5'-CNNNG-3 ' cleavage motif regardless of the DNA spacer length. In contrast, the bottom-strand product's size varies proportionally with the distance of the 5'-CNNNG-3' cleavage motif to the TALE binding site. These results are consistent with the top-strand and bottom-strand nicks generated by mTALENs mapping to a single cleavage motif regardless of spacer length.

[0177] To extend these results, we purified the N 169-T120 mTALEN and performed in vitro cleavage assays with a supercoiled plasmid substrate containing a target site with a 15-bp spacer. The linear product was gel-isolated and the cleavage sites were mapped by run-off sequencing. As shown in Figure 27B, the bottom (†) and top Q) strand nicking sites mapped to the 5'-C†AAC|G-3' motif positioned 15- bp from the TALE binding site. To determine if a secondary 5'-CN NG-3' cleavage motif (5'-CTCAG-3') immediately adjacent to the preferred cleavage motif could enable substrate cleavage in a biological context, we mutated the preferred 5'- CAACG-3' sequence to 5'-AAACA-3' and tested for activity on this "cleavage-site minus" [CS(-)] substrate in the yeast-based assay. As shown in Figure 26B, levels of β-galactosidase activity observed with the mTALEN and the CS(-) mutant substrate was at background (no mTALEN) level, compared to high activity with the mTALEN and the preferred substrate. The CS(-) substrates place a CNNNG motif 10-bp from the TALE site, which our spacer length data show is non-permissive for cleavage, consistent with the I-Tevl linker functioning as a molecular rule to position the cleavage domain on substrate. Collectively, these data show that the I-Tevl catalytic domain has a preferred 5'-CNNNG-3' cleavage motif on native substrate, and that inappropriately spaced, secondary motifs do not support cleavage. These data are in agreement with previous mapping studies of the native I-Tevl endonuclease and engineered derivatives (BRY et al. 1995; DEAN et al. 2002; KLEINSTIVER et al. 2012).

[0178] We also constructed a substrate that consisted of the PthXol binding site and 28-bp of the nptll gene encoding neomycin phosphotransferase from the previous study using cTALENs (BEURDELEY et al 2013 Nat Commun 4: 1762). This region of the nptll substrate has four CNNNG motifs, with the G of each motif located at 7, 14, 19, and 24-bp from the TALE-binding site, respectively. We mapped the major cleavage sites using purified N169-T120 mTALEN to the motif 14-bp from the TAL binding site (motif3, 5'-CTGTG-3'). This spacing of the cleavage motif from TALE binding site agrees with our mapping data on model DNA substrates (Figure 27B). Run-off sequencing also showed evidence of weaker cleavage at the motif 7 bp from the TALE binding site. To assess the biological relevance of either motif, we used the yeast-based reporter assay to measure activity of two different mTALENs on the 28-bp nptll substrate, and a derivative of this substrate where the C and G of motif3 were mutated to A and T (nptACS), respectively (Figure 27C). Using the TevN169-V 152 and TevD184-V152 mTALENs that more accurately mimic the cTALEN architecture, we observed activity on the nptACS substrate indistinguishable from background, and furthermore, only low levels of activity on the wild-type nptll substrate compared to the native td 15-bp substrate (Figure 27D). These data show that the nptll substrate is cleaved poorly by mTALENs, that the cleavage site maps to a CNNNG motif positioned 14-bp from the TALE binding site, and that additional CNNNG motifs in the substrate cannot support robust cleavage, in contrast to the conclusions drawn by the authors of the cTALEN study (BEURDELEY et al. 2013).

[0179] Defining nucleotide preference in the CNNNG cleavage motif.

Accurate targeting of mTALENs requires an understanding of the tolerance of the I- TevI nuclease domain to nucleotide substitutions at the cleavage site motif. Previous data have shown that substitutions within the central 3 base pairs of the cleavage motif can influence the cleavage efficiency of wild-type I-Tevl (BRY et al. 1993 EMBO J 12: 2141 -2149; EDGELL et al. 2004 J Mol Biol 343: 1231 -1241.), but a systematic study at a defined, biologically relevant cleavage site in the context of the TALE scaffold has not yet been undertaken. Our mTALEN mapping data identified a single major CNNNG motif that supports cleavage (Figure 27), allowing us to more accurately define the nucleotide requirements within a relevant motif. To do so, we compared activity of the wild-type triplet (AAC) to all possible 63 variants in the yeast-based assay using the N 169-T120 mTALEN (Figure 28A). We found that substrates with single substitutions had near wild-type activity, and that some single or double substitutions showed activity equivalent to or greater than the wild-type sequence (Figure 28B and 28C). When nucleotide position within the NNN triplet was considered independently of other positions, triplets with C or G at position 1 and G at position 3 showed lower activity than triplets with other bases at those positions (Figure 28D). Substrates with the triplets ACG, CCT, and GGG were cleaved particularly poorly (Figure 28C). Collectively, these data reveal a substantial degree of tolerance to substitution at the biologically relevant CNNNG motif, consistent with previous reports of I-Tevl GIY-HE cleavage specificity, but that overall, A/T rich triplets are preferred. The data also highlight the difficult in defining a consensus cleavage site sequence, as proposed by others (BEURDELEY et al. 2013), as the information regarding cleavage efficiency of individual NNN triplets is diluted by the consensus.

[0180] Tolerance of mTALENs to substitutions in the DNA spacer. The I-

TevI component of mTALENs consists of both the I-Tevl nuclease domain (residues 1 -92) and varying lengths of the I-Tevl linker that presumably contact the DNA spacer region of substrate. A portion of the I-Tevl linker, from residues 148 to 206, has been co-crystallized with its native DNA substrate (VAN ROEY et al. 2001 EMBO J 20: 3631-3637.). The structure reveals a linker that wraps around the minor groove of substrate with a limited number of base-specific contacts. Through these contacts, the linker accurately positions the nuclease domain on the substrate to cleave at the 5'-CAACG-3' motif (DEAN et al. 2002 Proc Natl Acad Sci U S A 99: 8554-8561.). Previous in vitro cleavage assays on partially randomized substrates revealed that wild-type I-TevI can accommodate nucleotide substitutions in the DNA spacer (BRYK et al 1 93), yet it is unknown if the I-TevI linker can tolerate nucleotide substitutions in the context of engineered DNA-binding domains.

[0181] To address this question, we used the yeast-based repair assay to test the in vivo activity of the N 169-T120 mTALEN on a set of 45 substrates that encompassed all possible single nucleotide substitutions at each position in the 15-bp DNA spacer (Figure 29A). The "one-off cleavage profile revealed a substantial degree of tolerance of the mTALEN to substitution. Many positions within the DNA spacer could accept all three substitutions and retain activity equal to or greater than the wild-type substrate. In three positions (2, 6, and 8), the mean activity of each mutant substrate was below wild-type activity, yet many of these substitutions had activity above background and only one substitutions (C3T) completed abolished activity.

[0182] The tolerance to single nucleotide substitutions also suggested that the mTALEN could retain high activity on substrates with multiple substitutions in the DNA spacer. We tested the tolerance in a two distinct ways. First, we generated a set of hybrid substrates where the CNNNG cleavage motif and 15-bp DNA spacer were replaced by sequences derived from naturally occurring td genes that are 53%-87% identical to the phage T4 sequence targeted by I-TevI (Figure 29B). Three of the CNNNG motifs possess single nucleotide substitutions, yet are predicted to be cleavable (Figure 28C). In the yeast-based lacZ repair assay, the N169-T120 mTALEN showed similar or greater activity on most substrates relative to the cognate T4 substrate, with the exception of the RB32 td gene, which showed reduced activity.

[0183] Second, we generated a yeast plasmid library where all 15 nucleotides of the DNA spacer region were randomized (the TP 15N library, Figure 30A). Independent transformants were gridded into 96-well microtitre plates, along with a N169-T120/TP15 positive control and a negative control consisting of the Zif268 ZFN against the TP15 substrate. The 376 transformants were screened in triplicate, and considered active if the mean activity of each transformant was greater than or within two standard deviations of the N 169-T120 positive control. Inactive transformants were those with background activity. In all, the TP 15N region was sequenced from 49 active and 62 inactive clones, and the average identity for both sets of clones to the TP15 wild-type sequence was 27%. For the active and inactive clones, we calculated the proportion (or enrichment) of all four nucleotides at each position from the sequences, and plotted the difference in proportions between the active versus inactive clones (Figure 30B). A positive value for a particular nucleotide indicates enrichment in the active clones, while a negative value indicates selection against that nucleotide. Selection was most evident at three positions (1, 2, and 7), paralleling the activity of some single nucleotide substitutions at these positions (Figure 29A), while little preference was observed at the remaining positions. Strong preference for G at position 1 may reflect selection of an alternative G to position the nuclease domain to nick the top-strand. While the nucleotide preferences in Figure 3 OB are generated from relatively small number of sequences, they show that the I- TevI linker domain is extremely tolerant of multiple substitutions within the DNA spacer, as supported by cleavage of td sequences derived from a variety of phage (Figure 29B). Moreover, the data suggest that sequence-dependent effects play a significant role in modulating cleavage activity, largely mitigating the effect of any single nucleotide substitution.

[0184] Targeting of mTALENs in human cells. To test whether mTALENs function in human cells, we cloned various mTALENs into a mammalian expression vector where mTALEN translation can be assessed using a fused mCherry reading frame separated from the mTALEN coding sequence by a T2A peptide (Figure 31 A). Initial trials were conducted with mTALENs comprising an I-Tevl sequence that was not codon-optimized for human cells expression, and we observed very weak mCherry activity indicative of poor mTALEN expression. Subsequent experiments were performed with mTALEN constructs encoding human codon-optimized I-Tevl, and this architecture yielded robust mCherry expression (Figure 3 IB). [0185] mTALEN activity was determined by co-transfecting the N169-T120 and D184-V152 constructs with an episomal substrate plasmid containing the hybrid ld/Pt Xo\ target with a 15-bp DNA spacer separating the CAACG motif and TALE binding site (Figure 31C). The substrate contains a Ddel site immediately adjacent to the I-Tevl cleavage site, allowing us to estimate mTALEN cleavage efficiency as the proportion of subsequently PCR-amplified substrates that were rendered resistant to Ddel digestion as a result of mTALEN cleavage and non homologous end joining (NHEJ) mediated mutagenic repair (Figure 31C). We observed -5-10% cleavage- resistant fragments following transfection with either mTALEN, and cleavage was specific, as no cleavage resistant fragments were observed when the substrate plasmids lacked the cognate TALE binding site. We cloned and sequenced cleavage- resistant fragments, revealing a range of deletions near the CNNNG motif. Thus, the mTALENs function in HE 293T cells at a level sufficient to induce mutagenic DNA repair on an episomal substrate, and this level of activity is comparable to many reported Fokl-TALENs.

[0186] To determine if the DNA-binding specificity of the mTALEN architecture could be modified to target endogenous human loci, we fused the human- codon optimized version of the S206 I-Tevl fragment to two customized versions of the full-length dHax3 TAL effector, respectively substituted with repeat arrays assembled to target sequences 16-bp and 25-bp downstream of the same CNNNG (5 '- CTCAG-3 ') motif within the chromosomal AAVS 1 locus. The mTALEN constructs were named A 15 and A25 to reflect the 15-bp or 25-bp DNA spacers between the TALE binding site and CNNNG motif (Figure 3 I F). For comparison, we used a commercially available Fokl-TALEN pair (AAVS 1 -R-SS 1 and AAVS 1 -F-SSI) targeting a site in the AAVS 1 locus 235-bp from the A 15 mTALEN binding site. The constructs were transfected into HEK 293FT cells, and TALEN activity was monitored using a Surveyor assay. For both mTALENs, we observed -5% activity, as compared to -10% activity for the Fokl-TALEN pair. Interestingly, the predicted 5'- CTCAG-3' cleavage site motif, despite supporting robust activity in human cells, shows only -10% activity of the wild-type AAC triplet in the yeast-based assay, yet supports robust activity in human cells. Collectively, these data demonstrate robust, programmable activity of mTALENs can accept the I-Tevl nuclease domain to target both episomal and chromosomal targets in human cells. [0187] We have previously shown that the small sequence-tolerant and monomeric nuclease domain derived from the GIY-YIG homing endonuclease I-TevI could be fused to both ZFs and LHEs to create monomeric nucleases. Here, we extend the utility of the I-TevI domain by showing that it can be fused to the N-terminus of TALEs to create a programmable monomeric TALEN platform that is functional in human cells. Further, we show that the requirement of I-TevI for the motif CN NG at the cleavage site is retained in the mTALEN context, and we provide evidence for the first time that within the CNNNG motif, A/T-rich triplets are preferred. We have also shown that nucleotide preference within the CNNNG motif is not likely to be restrictive, as many NNN combinations are cleaved by the mTALEN as well as or better than the wild-type AAC sequence. Finally, we define DNA spacer length optima for mTALEN activity.

[0188] The present application demonstrates that mTALENs are a viable alternative genome-editing tool. One obvious advantage of the mTALEN platform is the monomeric nature, simplifying design requirements to target desired sequences. The moderate sequence requirements of the nuclease domain retained in mTALENs can be exploited to minimize off-targeting, or possibly to simplify constructs further by minimizing the number of TALE repeats incorporated.

[0189] While the foregoing invention has been described in some detail for purposes of clarity and understanding, it will be appreciated by one skilled in the art from a reading of this disclosure that various changes in form and detail can be made without departing from the true scope of the invention and appended claims. All patents and publications cited herein are entirely incorporated herein by reference.