Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS FOR INTRON-DRIVEN GENE EDITING AND CONTROL
Document Type and Number:
WIPO Patent Application WO/2024/107836
Kind Code:
A1
Abstract:
Provided herein are nucleic acid molecules that include (a) an effector RNA; (b) an escape module; and (c) a stabilization module, wherein the effector RNA includes a CRISPR guide RNA (gRNA), a small RNA, a complementary RNA, a nuclear RNA, an interfering RNA, or a homing CRISPR gRNA.

Inventors:
KALHOR REZA (US)
FORSMO JAMES EDWARD (US)
Application Number:
PCT/US2023/079813
Publication Date:
May 23, 2024
Filing Date:
November 15, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV JOHNS HOPKINS (US)
International Classes:
C12N15/113; A61K48/00; C12N15/68
Foreign References:
US20200268907A12020-08-27
US20190032054A12019-01-31
Other References:
NISSIM, L ET AL.: "Multiplexed and Programmable Regulation of Gene Networks with an Integrated RNA and CRISPR/Cas Toolkit in Human Cells", MOLECULAR CELL, vol. 54, no. 4, 22 May 2014 (2014-05-22) - 15 May 2014 (2014-05-15), pages 698 - 710, XP029028594, DOI: 10.1016/j.molcel. 2014.04.02 2
Attorney, Agent or Firm:
YOON, Sohee Kim et al. (US)
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1. A nucleic acid molecule comprising:

(a) an effector RNA;

(b) an escape module; and

(c) a stabilization module.

2. The nucleic acid molecule of claim 1, wherein the effector RNA comprises a CRISPR guide RNA (gRNA), a small RNA, a complementary RNA, a nuclear RNA, an interfering RNA, or a homing CRISPR gRNA.

3. The nucleic acid molecule of claim 1 or 2, wherein the effector RNA comprises two or more small RNA sequences.

4. The nucleic acid molecule of any one of claims 1-3, wherein the escape module comprises a cleavable RNA sequence.

5. The nucleic acid molecule of claim 4, wherein the cleavable RNA sequence is at a 5’ end of the nucleic acid molecule.

6. The nucleic acid molecule of claim 4, wherein the cleavable RNA sequence is at a 3’ end of the nucleic acid molecule.

7. The nucleic acid molecule of claim 4, wherein the escape module comprises a first cleavable RNA sequence at a 5’ end of the nucleic acid molecule and a second cleavable RNA sequence at a 3’ end of the nucleic acid molecule.

8. The nucleic acid molecule of any one of claims 4-7, wherein the cleavable RNA sequence comprises a self-cleaving RNA sequence.

9. The nucleic acid molecule of claim 8, wherein the self-cleaving RNA sequence comprises a hammer-head ribozyme sequence, a twister ribozyme sequence, a twister- sister ribozyme sequence, a hatchet ribozyme, a HDV-like ribozyme sequence, a pistol ribozyme, a hairpin ribozyme, or a tRNA sequence. The nucleic acid molecule of any one of claims 4-7, wherein the cleavable RNA sequence comprises a sequence recognized by a site-specific endoribonuclease, wherein the sitespecific endoribonuclease comprises Csy4, Argonaute-family proteins, Casl2, Casl3, Cse3, or Cas6. The nucleic acid molecule of any one of claims 1-10, wherein the stabilization module comprises an RNA stabilizing sequence comprising Epstein Barr Virus (EBV)-sisRNA-l, sno-lncRNA2, exoribonuclease-resistant triplex, or triple helix from MALAT1 IncRNA. The nucleic acid molecule of any one of claims 1-11, wherein the nucleic acid molecule is located within a gene. The nucleic acid molecule of claim 12, wherein the nucleic acid molecule is located within an intron of the gene. The nucleic acid molecule of claim 12, wherein the gene is an endogenous gene in a eukaryotic cell. The nucleic acid molecule of claim 12, wherein the gene is a transgene in a eukaryotic cell. A synthetic intron comprising from 5’ to 3’:

(a) a 5’ splice site;

(b) the nucleic acid molecule of any one of claims 1-15; and

(c) a 3’ splice site. The synthetic intron of claim 16, further comprising 3’ to the nucleic acid molecule a branch point and a polypyrimidine tract. The synthetic intron of claim 16 or 17, wherein the nucleic acid molecule comprises an escape module positioned 5’ to the effector RNA sequence. The synthetic intron of any one of claims 16-18, wherein the nucleic acid molecule comprises an escape module positioned 3’ to the effector RNA sequence. The synthetic intron of claim 18 or 19, wherein the escape module comprises a hammerhead ribozyme sequence, a twister ribozyme sequence, a twister-sister ribozyme sequence, a hatchet ribozyme, a HDV-like ribozyme sequence, a pistol ribozyme, or a hairpin ribozyme. The synthetic intron of claim 16 or 17, wherein the nucleic acid molecule comprises from 5’ to 3’:

(a) a first escape module;

(b) an effector RNA sequence; and

(c) a second escape module. The synthetic intron of claim 21, wherein the first escape module and the second escape module independently comprise a hammer-head ribozyme sequence, a twister ribozyme sequence, a twister- sister ribozyme sequence, a hatchet ribozyme, a HDV-like ribozyme sequence, a pistol ribozyme, a hairpin ribozyme, or a tRNA sequence. A nucleic acid comprising the synthetic intron of any one of claims 16-22. The nucleic acid of claim 23, wherein the nucleic acid comprises a promoter and two or more exonic sequences flanking the synthetic intron. The nucleic acid of claim 24, wherein the promoter is inducible. An engineered cell comprising the nucleic acid molecule of any one of claims 1-15 or the nucleic acid of any one of claims 23-25.

27. The engineered cell of claim 26, wherein the effector RNA sequence comprises a sgRNA and wherein the engineered cell further comprises a nucleic acid sequence that expresses a Cas polypeptide. 28. The engineered cell of claim 27, wherein the Cas polypeptide is Cas9.

Description:
METHODS FOR TNTRON-DRTVEN GENE EDTTTNG AND CONTROL

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Patent Application No. 63/425,576, filed on November 15, 2022, and U.S. Provisional Patent Application No. 63/540,242, filed on September 25, 2023, which are incorporated herein by reference in their entireties.

SEQUENCE LISTING

This application contains a Sequence Listing that has been submitted electronically as an XML file named 44807-0435WOl_ST26_SL. The XML file, created on November 14, 2023, is 16,238 bytes in size. The material in the XML file is hereby incorporated by reference in its entirety.

TECHNICAL FIELD

The present disclosure relates to the area of gene editing and gene circuit engineering technologies. In particular, it relates to methods of gene editing, wherein intronic sequences are used to produce small nuclear RNAs (e.g., sgRNAs).

BACKGROUND

Many gene-editing technologies and molecular circuits used in synthetic biology approaches depend on small nuclear ribonucleic acids (snRNAs) to drive the target process. For example, CRISPR/Cas depends on an snRNA called the single guide RNA (sgRNA) which confers specificity to the system by identifying the target DNA locus. Despite their broad application and importance, technologies to control the expression of snRNAs are in short supply. If snRNA expression could be coupled to gene circuits of interest, it would expand the application of geneediting and synthetic biology techniques. The inability to control snRNA expression or couple it to cellular processes of interest (e.g., DNA damage) is due to the unique features of promoters by which snRNAs are usually expressed. For example, sgRNAs are often expressed under universal and constitutive RNA Polymerase III promoters (e.g., U6). RNA Polymerase III (PolIII) promoters are constitutive and cannot be easily regulated.

Some strategies have been introduced to make the U6 promoter inducible with limited success. Other strategies aim to express sgRNAs using RNA Polymerase II (PolII) promoters which can be coupled to the promoters of genes of interest (i.e., driver genes). However, these strategies are limited in specificity and efficiency. First, PolII transcripts are always transferred to the cytoplasm and strategies to maintain parts of these transcripts in the nucleus as snRNAs have been inefficient or toxic to cells. Second, the promoter of the driver gene may not act as the native copy acts when used as a part of a transgenic construct. Therefore, there is a need to address these limitations and enable the control of expression of snRNAs in gene editing.

SUMMARY

Provided herein are nucleic acid molecules comprising: (a) an effector RNA; (b) an escape module; and (c) a stabilization module. In some embodiments, the effector RNA comprises a CRISPR guide RNA (gRNA), a small RNA, a complementary RNA, a nuclear RNA, an interfering RNA, or a homing CRISPR gRNA. In some embodiments, the effector RNA comprises two or more small RNA sequences.

In some embodiments, the escape module comprises a cleavable RNA sequence. In some embodiments, the cleavable RNA sequence is at a 5’ end of the nucleic acid molecule. In some embodiments, the cleavable RNA sequence is at a 3’ end of the nucleic acid molecule. In some embodiments, the escape module comprises a first cleavable RNA sequence at a 5’ end of the nucleic acid molecule and a second cleavable RNA sequence at a 3’ end of the nucleic acid molecule. In some embodiments, the cleavable RNA sequence comprises a self-cleaving RNA sequence. In some embodiments, the self-cleaving RNA sequence comprises a hammer-head ribozyme sequence, a twister ribozyme sequence, a twister- sister ribozyme sequence, a hatchet ribozyme, a HDV-like ribozyme sequence, a pistol ribozyme, a hairpin ribozyme, or a tRNA sequence. In some embodiments, the cleavable RNA sequence comprises a sequence recognized by a site-specific endoribonuclease, wherein the site-specific endoribonuclease comprises Csy4, Argonaute-family proteins, Casl2, Casl3, Cse3, or Cas6. In some embodiments, the stabilization module comprises an RNA stabilizing sequence comprising Epstein Barr Virus (EBV)-sisRNA-l, sno-lncRNA2, exoribonuclease-resistant triplex, or triple helix from MALAT1 IncRNA.

In some embodiments, the nucleic acid molecule is located within a gene. In some embodiments, the nucleic acid molecule is located within an intron of the gene. In some embodiments, the gene is an endogenous gene in a eukaryotic cell. In some embodiments, the gene is a transgene in a eukaryotic cell.

Also provided herein are synthetic introns comprising from 5’ to 3’ : (a) a 5’ splice site; (b) the nucleic acid molecule of any one of claims 1-15; and (c) a 3’ splice site. In some embodiments, the synthetic intron further comprises 3’ to the nucleic acid molecule a branch point and a polypyrimidine tract.

In some embodiments, the nucleic acid molecule comprises an escape module positioned 5’ to the effector RNA sequence. In some embodiments, the nucleic acid molecule comprises an escape module positioned 3’ to the effector RNA sequence. In some embodiments, the escape module comprises a hammer-head ribozyme sequence, a twister ribozyme sequence, a twistersister ribozyme sequence, a hatchet ribozyme, a HDV-like ribozyme sequence, a pistol ribozyme, or a hairpin ribozyme. In some embodiments, the nucleic acid molecule comprises from 5’ to 3’ : (a) a first escape module; (b) an effector RNA sequence; and (c) a second escape module. In some embodiments, the first escape module and the second escape module independently comprise a hammer-head ribozyme sequence, a twister ribozyme sequence, a twister- sister ribozyme sequence, a hatchet ribozyme, a HDV-like ribozyme sequence, a pistol ribozyme, a hairpin ribozyme, or a tRNA sequence.

Also provided herein are nucleic acids comprising any one of the synthetic introns described herein. In some embodiments, the nucleic acid comprises a promoter and two or more exonic sequences flanking the synthetic intron. In some embodiments, the promoter is inducible.

Also provided herein are engineered cells comprising any one of the nucleic acid molecules or any one of the nucleic acids described herein. In some embodiments, the effector RNA sequence comprises a sgRNA and wherein the engineered cell further comprises a nucleic acid sequence that expresses a Cas polypeptide. In some embodiments, the Cas polypeptide is Cas9. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. Although methods and materials similar or equivalent to those described herein can be used to practice the invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.

The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.

BRIEF DESCRIPTION OF DRAWINGS

FIGs. 1A-1B show exemplary schematics of synthetic introns to accomplish intron-driven gene editing. FIG. 1A shows an exemplary schematic of intron constructs used herein. All introns contain a conserved dinucleotide GT 5’ splice site, as well as a branch point sequence (BP), polypyrimidine tract (PPT), and conserved AG 3’ splice site. Stabilizing sequences from sno- lncRNA2 and EBV-sisRNA-1 were added as indicated by ‘intronic sequences’. FIG. IB shows an exemplary schematic of intron processing post-splicing from an unstable lariat to a mature, linear sgRNA.

FIG. 2 shows rate of observed indels as a percentage of total reads in introns containing consensus splicing sequences, an HSV branchpoint sequence, sno-lncRNA sequences, and EBV-sisRNA-1 sequences. Sno-lncRNA and EBV-sisRNA-1 introns bearing the cis-cleaving ribozyme motifs exhibit robust gene editing.

FIG. 3 shows predicted secondary structure of EBV-sisRNA-1. The AAVS1T1 spacer sequence and sgRNA scaffold were inserted immediately after the hairpin motif (downstream of the 26th base from the 5’ end).

FIG. 4A shows rate of observed indels in HEK293T cells for single 5’ hammerhead ribozyme (“con-HH”) or single 3’ hepatitis delta virus ribozyme (“con-HDV”) consensus introns containing an AAVS1T1 targeting sgRNA, compared to U6-promoter expressed sgRNA (“U6”). Error bars report the range of indel rates observed for each condition (n=2). All samples collected 72 hours post-transfection.

FIG. 4B shows a schematic of single-ribozyme intronic guides. All introns contain a highly conserved dinucleotide GT 5’ splice site, as well as a branch point sequence (BP), polypyrimidine tract (PPT), and highly conserved AG 3’ splice site.

FIG. 5A shows rate of observed indels in HEK293T cells for hybrid 5’ consensus splice donor and 3’ EBV-sisRNA-1 splice acceptor intron with flanking ribozymes (“con-EBV-ribo”), or hybrid 5’ EBV-sisRNA-1 hairpin and splice donor and 3’ consensus splice acceptor intron with flanking ribozymes (“EBV-con-ribo”), compared to U6-promoter expressed sgRNA (“U6”). Error bars report the range of indel rates observed for each condition (n=2).

FIG. 5B shows a schematic of hybrid consensus/EBV-sisRNA-1 intron constructs. All introns contain a highly conserved dinucleotide GT 5’ splice site. The 3’ consensus splice acceptor contains a branch point sequence (BP), polypyrimidine tract (PPT), and highly conserved AG 3’ splice site.

FIG. 6A shows comparison of dsRed fluorescence with a transgene containing an adenoviral intron, an sgRNA-containing intron with flanking ribozymes and sno-lncRNA-2 sequences, and an sgRNA with flanking ribozymes embedded in the 3’ UTR.

FIG. 6B shows a schematic of intron constructs. ‘CMV’ refers to the constitutive cytomegalovirus RNA Pol II promoter. ‘UTR’ refers to the untranslated region of the dsRed transgene.

FIG. 6C shows observed indel rates at AAVS1T1 locus across all three constructs tested, versus a U6-promoter expressed sgRNA.

FIG. 7 shows rate of observed indels as a percentage of total reads in the AAVS1T1 locus 72 hours post doxycycline treatment.

FIG. 8A shows a schematic of intron constructs used, wherein ‘TRE’ refers to the Tet-ON expression system promoter and ‘UTR’ refers to the untranslated region of the dsRed transgene.

FIG. 8B shows representative images of dsRed and BFP fluorescence collected 49 hours post- lipofection.

FIG. 9A shows a schematic of intron constructs used, wherein ‘TRE’ refers to the Tet-ON expression system promoter, ‘ITR’ refers to the inverted terminal repeats required for PiggyBac integration, and “pA” refers to the polyadenylation signal. FIG. 9B shows observed mutation rates at the intronic hgRNA locus after 71.5 hours postinduction.

FIG. 9C shows dsRed fluorescence observed 71.5 hours post-induction in cell lines transfected with either a Cas9-expressing plasmid or a pUC19 negative control.

FIG. 10A shows EBFP2N1 fluorescence versus dsRed fluorescence for singlet cells as a contour plot. Cells transfected with the scramble-guide control are shown in the blue contour plot, while cells with BFP -targeting crRNA spacers are shown in red. All flow plots are representative of two independent lipofections.

FIG. 10B shows a histogram of EBFP2N1 fluorescence intensity for singlet cells gated on high dsRed fluorescence.

FIG. IOC shows mean percentage of total events from histogram in FIG. 10B that fall in the “EBFP2N1 high” gate. Error bars report the standard deviation observed for each condition (n=2).

DETAILED DESCRIPTION

Many gene-editing technologies and molecular circuits used in synthetic biology approaches depend on small nuclear ribonucleic acids (snRNAs) to drive the target process. For example, CRISPR/Cas depends on an snRNA called the single guide RNA (sgRNA) which confers specificity to the system by identifying the target DNA locus. Described herein are methods and compositions directed to producing CRISPR guide RNAs through intron splicing that can be referred to as “intron-driven gene editing”. Because guide RNAs are processed from pre-mRNA, gene editing with intronic guide RNAs can be coupled to both inducible synthetic promoters (e.g., Tet-ON) as well as native cellular events (through cis-regulatory elements that bind native transcription factors). Additionally, because splicing ligates the flanking exons of mRNA after excision of the intron, these sequences can be incorporated into native genes with minimal impact on expression. Native genes bearing these sequences can be used to drive gene editing that is conditional on cell transcriptional state, allowing for molecular recording, cell type-specific gene editing, and cell state-specific gene editing. More broadly, and incorporating applications of snRNAs and CRISPR in synthetic biology (including activation or inhibition of gene expression), the methods and compositions described herein allow coupling of synthetic genetic circuits to the native transcription machinery of eukaryotic cells. Provided herein are nucleic acid molecules that include: (a) an effector RNA; (b) an escape module; and (c) a stabilization module. Also provided herein are synthetic introns that include from 5’ to 3’: (a) a 5’ splice site; (b) any one of the nucleic acid molecules described herein; and (c) a 3’ splice site.

Various non-limiting aspects of these methods are described herein, and can be used in any combination without limitation. Additional aspects of various components of methods for intron- driven gene editing are known in the art.

It must be noted that, as used in the specification and the appended claims, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise.

As used herein, a “cell” can refer to either a prokaryotic or eukaryotic cell, optionally obtained from a subject or a commercially available source.

As used herein, “delivering”, “gene delivery”, “gene transfer”, “transducing” can refer to the introduction of an exogenous polynucleotide into a host cell, irrespective of the method used for the introduction. Such methods include a variety of well-known techniques such as vector- mediated gene transfer (e.g., viral infection/transfection, or various other protein-based or lipid- based gene delivery complexes) as well as techniques facilitating the delivery of “naked” polynucleotides (e.g., electroporation, “gene gun” delivery and various other techniques used for the introduction of polynucleotides). The introduced polynucleotide may be stably or transiently maintained in the host cell. Stable maintenance typically requires that the introduced polynucleotide either contains an origin of replication compatible with the host cell or integrates into a replicon of the host cell such as an extrachromosomal replicon (e.g., a plasmid) or a nuclear or mitochondrial chromosome.

In some embodiments, a polynucleotide can be inserted into a host cell by a gene delivery molecule. Examples of gene delivery molecules can include, but are not limited to, liposomes, micelles biocompatible polymers, including natural polymers and synthetic polymers; lipoproteins; polypeptides; polysaccharides; lipopolysaccharides; artificial viral envelopes; metal particles; and bacteria, or viruses, such as baculovirus, adenovirus and retrovirus, bacteriophage, cosmid, plasmid, fungal vectors and other recombination vehicles typically used in the art which have been described for expression in a variety of eukaryotic and prokaryotic hosts, and may be used for gene therapy as well as for simple protein expression. As used herein, the term “endogenous” refers to any material growing or originating from within a cell, a tissue, or an organism. As used herein, the term “exogenous” refers to any material introduced from or originating from outside a cell, a tissue or an organism that is not produced by or does not originate from the same cell, tissue, or organism in which it is being introduced.

As used herein, the term “expression” refers to the process by which polynucleotides are transcribed into mRNA and/or the process by which the transcribed mRNA is subsequently translated into peptides, polypeptides, or proteins. In some embodiments, if the polynucleotide is derived from genomic DNA, expression may include splicing of the mRNA in a eukaryotic cell. The expression level of a gene may be determined by measuring the amount of mRNA or protein in a cell or tissue sample; further, the expression level of multiple genes can be determined to establish an expression profde for a particular sample.

As used herein, “modulating” can refer to modifying, regulating, or altering the endogenous gene expression in a cell. In some embodiments, modulating gene expression can include systematically influencing RNA stability and/or translation by activating or suppressing the gene expression. In some embodiments, modulation of gene expression can include stabilizing a target RNA. In some embodiments, stabilizing a target RNA can increase translation of the target RNA. In some embodiments, modulation of gene expression can include destabilizing a target RNA. In some embodiments, destabilizing a target RNA can suppress translation of the target RNA. In some embodiments, modulation of gene expression can include increasing translation of a target RNA. In some embodiments, modulation of gene expression can include suppressing translation of a target RNA. In some embodiments, the gene expression of the target RNA is upregulated. In some embodiments, the gene expression of the target RNA is downregulated.

As used herein, “nucleic acid” or “nucleic acid molecule” is used to include any compound and/or substance that comprise a polymer of nucleotides. In some embodiments, a polymer of nucleotides are referred to as polynucleotides. Exemplary nucleic acids or polynucleotides can include, but are not limited to, ribonucleic acids (RNAs), deoxyribonucleic acids (DNAs), threose nucleic acids (TNAs), glycol nucleic acids (GNAs), peptide nucleic acids (PNAs), locked nucleic acids (LNAs, including LNA having a P-D-ribo configuration, a-LNA having an a-L-ribo configuration (a diastereomer of LNA), 2’-amino-LNA having a 2’-amino functionalization, and 2’-amino-oc-LNA having a 2’ -amino functionalization) or hybrids thereof. Naturally-occurring nucleic acids generally have a deoxyribose sugar (e.g., found in deoxyribonucleic acid (DNA)) or a ribose sugar (e.g., found in ribonucleic acid (RNA)).

A nucleic acid can contain nucleotides having any of a variety of analogs of these sugar moieties that are known in the art. A deoxyribonucleic acid (DNA) can have one or more bases selected from the group consisting of adenine (A), thymine (T), cytosine (C), or guanine (G), and a ribonucleic acid (RNA) can have one or more bases selected from the group consisting of uracil (U), adenine (A), cytosine (C), or guanine (G).

In some embodiments, the term “nucleic acid” or “nucleic acid molecule” refers to a deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), or a combination thereof, in either a single- or double-stranded form. Unless specifically limited, the term encompasses nucleic acids containing known analogues of natural nucleotides that have similar binding properties as the reference nucleotides. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses complementary sequences as well as the sequence explicitly indicated. In some embodiments of any of the isolated nucleic acids described herein, the isolated nucleic acid is DNA. In some embodiments of any of the isolated nucleic acids described herein, the isolated nucleic acid is RNA.

As used herein, the term “nucleotides” and “nt” are used interchangeably herein to generally refer to biological molecules that comprise nucleic acids. Nucleotides can have moieties that contain the known purine and pyrimidine bases. Nucleotides may have other heterocyclic bases that have been modified. Such modifications include, e.g., methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses, or other heterocycles. The terms “polynucleotides,” “nucleic acid,” and “oligonucleotides” can be used interchangeably. They can refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any three-dimensional structure, and may perform any function, known or unknown. The following are non-limiting examples of polynucleotides: coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. A polynucleotide may comprise non-naturally occurring sequences. A polynucleotide may comprise modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component.

Intron-Driven Gene Editing

Provided herein are nucleic acid molecules comprising: (a) an effector RNA; (b) an escape module; and (c) a stabilization module.

Effector RNA

As used herein, an “effector RNA” refers to an RNA sequence that expresses a small nuclear ribonucleic acid (snRNA). In some embodiments, the effector RNA comprises a sequence encoding a small nuclear RNA (snRNA). As used herein, a “small nuclear RNA” or “short nuclear RNA” refers to a small RNA molecule that can be found in the cell nucleus in eukaryotic cells. The length of an snRNA can be about 150 nucleotides (e.g., about 10 nucleotides, about 20 nucleotides, about 30 nucleotides, about 40 nucleotides, about 50 nucleotides, about 60 nucleotides, about 70 nucleotides, about 80 nucleotides, about 90 nucleotides, about 100 nucleotides, about 110 nucleotides, about 120 nucleotides, about 130 nucleotides, about 140 nucleotides, about 160 nucleotides, about 180 nucleotides, or about 200 nucleotides). In some embodiments, the effector RNA comprises a CRISPR guide RNA (gRNA), a small RNA, a complementary RNA, a nuclear RNA, an interfering RNA, or a homing CRISPR gRNA. In some embodiments, the effector RNA comprises an ADAR (adenosine deaminases acting on RNA) guide RNA, an Argonaute small RNA guide, or a toe-hold RNA sensor. In some embodiments, the effector RNA comprises a Casl3 CRISPR-RNA (crRNA). In some embodiments, the effector RNA comprises a homing guide RNA (hgRNA). In some embodiments, the effector RNA comprises a small RNA sequence. In some embodiments, the effector RNA comprises one small RNA sequence. In some embodiments, the effector RNA comprises one or more (e.g., one, two, three, four, five, six, seven, eight, nine, or ten) small RNA sequences.

In some embodiments, the nucleic acid molecule can include a short nuclear RNA, wherein the short nuclear RNA comprises a single guide RNA (sgRNA). In some embodiments, the short nuclear RNA can include a guide RNA (gRNA). As used herein, the term “gRNA” or “guide RNA” refers to the guide RNA sequences used to target specific genes for correction employing the CRISPR technique. As used herein, the term “CRISPR” refers to a technique of sequence specific genetic manipulation relying on the clustered regularly interspaced short palindromic repeats pathway, which unlike RNA interference regulates gene expression at a transcriptional level. Techniques of designing gRNAs and donor therapeutic polynucleotides for target specificity are well known in the art. For example, Doench, J., et al. Nature biotechnology 2014; 32(12): 1262-7 and Graham, D., et al. Genome Biol. 2015; 16: 260. As used herein, the term “single guide RNA” or “sgRNA” is a specific type of gRNA that combines tracrRNA (transactivating RNA), which binds to Cas9 to activate the complex to create the necessary strand breaks, and crRNA (CRISPR RNA), comprising complimentary nucleotides to the tracrRNA, into a single RNA construct. Exemplary methods of employing the CRISPR technique are described in WO 2017/091630, which is incorporated by reference in its entirety. In some embodiments, an sgRNA comprises a homing guide RNA (hgRNA).

In some embodiments, the single guide RNA can recognize a target RNA, for example, by hybridizing to the target RNA. In some embodiments, the single guide RNA comprises a sequence that is complementary to the target RNA. In some embodiments, the sgRNA can include one or more modified nucleotides. In some embodiments, the sgRNA has a length that is about 10 nt (e.g., about 20 nt, about 30 nt, about 40 nt, about 50 nt, about 60 nt, about 70 nt, about 80 nt, about 90 nt, about 100 nt, about 120 nt, about 140 nt, about 160 nt, about 180 nt, about 200 nt, about 300 nt, about 400 nt, about 500 nt, about 600 nt, about 700 nt, about 800 nt, about 900 nt, about 1000 nt, or about 2000 nt).

In some embodiments, a single guide RNA can recognize a variety of RNA targets. For example, a target RNA can be messenger RNA (mRNA), ribosomal RNA (rRNA), signal recognition particle RNA (SRP RNA), transfer RNA (tRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), antisense RNA (aRNA), long noncoding RNA (IncRNA), microRNA (miRNA), piwi-interacting RNA (piRNA), small interfering RNA (siRNA), short hairpin RNA (shRNA), retrotransposon RNA, viral genome RNA, or viral noncoding RNA. In some embodiments, a target RNA can be an RNA involved in pathogenesis of conditions such as cancers, neurodegeneration, cutaneous conditions, endocrine conditions, intestinal diseases, infectious conditions, neurological conditions, liver diseases, heart disorders, congenital disease, genetic diseases, or autoimmune diseases. In some embodiments, a target RNA can be a therapeutic target for conditions such as cancers, neurodegeneration, cutaneous conditions, endocrine conditions, intestinal diseases, infectious conditions, neurological conditions, liver diseases, heart disorders, or autoimmune diseases.

Escape module

As used herein, an “escape module” refers to a nucleic acid sequence (e.g., RNA or DNA) with a catalytic activity, wherein the escape module can specifically cut an RNA fragment (e.g., RNA splicing). In some embodiments, the escape module comprises a cleavable RNA sequence. In some embodiments, the cleavable RNA sequence is at a 5’ end of the nucleic acid molecule. In some embodiments, the cleavable RNA sequence is at a 3’ end of the nucleic acid molecule. In some embodiments, the escape module comprises a first cleavable RNA sequence at a 5’ end of the nucleic acid molecule and a second cleavable RNA sequence at a 3’ end of the nucleic acid molecule.

In some embodiments, the cleavable RNA sequence comprises a self-cleaving RNA sequence. In some embodiments, the self-cleaving RNA sequence comprises a hammer-head ribozyme sequence, a twister ribozyme sequence, a twister- sister ribozyme sequence, a hatchet ribozyme, a HDV-like ribozyme sequence, a pistol ribozyme, a hairpin ribozyme, a tRNA sequence, a HH9 human intronic hammerhead ribozyme sequence, a HH10 human intronic hammerhead ribozyme sequence, a human CPEB3 HDV-like ribozyme sequence, or any other ciscleaving RNA sequence. Additional self-cleaving RNA sequences are well known in the art. In some embodiments, the cleavable RNA sequence comprises a sequence recognized by a sitespecific endoribonuclease, wherein the site-specific endoribonuclease comprises Csy4, Argonaute-family proteins, Casl3, Cse3, or Cas6. In some embodiments, the site-specific endoribonuclease comprises LwCasl3a. Additional site-specific endoribonucleases are well known in the art.

In some embodiments, the cleavable RNA sequence can include a ribozyme sequence. Ribozymes are RNA molecules that can break and form covalent bonds within a nucleic acid molecule, wherein these molecules can also bind specifically to and cleave an mRNA substrate. Ribozymes can also catalyze specific biochemical reaction, including RNA splicing in gene expression. In some embodiments, a ribozyme can include a GIRI branching ribozyme, hairpin ribozyme, Hammerhead ribozyme, HDV ribozyme, RNase P, Group I self-splicing intron, Group II self-splicing intron, twister ribozyme, VS ribozyme, Hatchet ribozyme, HH9 human intronic hammerhead ribozyme, HH10 human intronic hammerhead ribozyme, human CPEB3 HDV-like ribozyme, or rRNA. In some embodiments, the nucleic acid molecule can include a ribozyme sequence. In some embodiments, the nucleic acid molecule further comprises a hammerhead (HH) ribozyme and a hepatitis delta virus (HDV) ribozyme.

In some embodiments, the cleavable RNA sequence can include a tRNA sequence. Transfer RNA (tRNA) is a type of RNA molecule that is transcribed as a precursor molecule before becoming functional for protein synthesis. tRNA splicing occurs when introns are first excised by a tRNA-splicing endonuclease and exons are subsequently sealed by an RNA ligase. In some embodiments, the nucleic acid molecule can include a tRNA sequence. In some embodiments, the nucleic acid molecule can further include a tRNA sequence positioned at both 5’ and 3’ ends of the effector RNA sequence.

Stabilization module

As used herein, a “stabilization module” refers to a non-coding nucleic acid sequence that plays a role in stabilizing an intronic sequence. In some embodiments, the nucleic acid molecule can further include a stabilizing module, wherein the stabilizing module stabilizes the nucleic acid molecule in the nucleus of the cell. In some embodiments, a stabilizing module can play a role in retaining the structural integrity of an RNA molecule and resist degradation. In some embodiments, the stabilizing module can stabilize an intronic RNA in the nucleus of a cell. In some embodiments, the stabilizing module can stabilize an sgRNA in an intronic region of an endogenous gene of a cell. In some embodiments, the stabilizing module can enhance the activity of the intronic sgRNA. In some embodiments, the stabilizing module can comprise an intronic sequence from sno- IncRNAs or EBV-sisRNAs. In some embodiments, a stabilizing module can include an RNA secondary structure that conceals the end moieties targeted by an exoribonuclease.

Epstein-Barr virus stable intronic-sequence RNAs (EBV-sisRNAs) are a class of noncoding RNAs generated by repeat introns in the Epstein-Barr virus. In EBV, sisRNAs can be generated from a region known as the W repeats, wherein the short W repeat intron, rather than being spliced and rapidly degraded, persists after splicing of the introns. Nucleotides 4 to 26 of EBV-sisRNA-1 form a short hairpin loop that presents a Uridine-rich sequence motif (a possible platform for protein interactions) into the loop. The remainder of the sequence is unlikely to form stable RNA structure, such that this unstructured stretch of sequence may be exposed to allow for interactions with nucleic acids or other proteins. In some embodiments, the nucleic acid molecule can further include a hairpin motif of the Epstein Barr Virus (EBV) sisRNA-1, wherein the hairpin motif is positioned at a 5’ position relative to the sequence encoding the short nuclear RNA. In some embodiments, the stabilization module comprises an RNA stabilizing sequence comprising Epstein Barr Virus (EBV)-sisRNA-l, sno-lncRNA2, exoribonuclease-resistant triplex, triple helix from MALAT1 IncRNA, or similarly stabilizing RNA sequences. In some embodiments, an EBV- sisRNA may include a non-canonical branch point sequence. In some embodiments, an EBV- sisRNA comprises a polypyrimidine tract that includes CA dinucleotides.

In some embodiments, the nucleic acid molecule is located within a gene. In some embodiments, the nucleic acid molecule is located within an intron of the gene. In some embodiments, the gene is an endogenous gene. In some embodiments, the gene is a transgene. In some embodiments, the gene is an endogenous gene in a eukaryotic cell. In some embodiments, the gene is a transgene in a eukaryotic cell.

In some embodiments, the nucleic acid molecule can further include a dinucleotide GT 5’ splice site, a branch point sequence, a polypyrimidine tract (PPT), an AG 3’ splice site, a spacer sequence, or any combinations thereof. As used herein, a “splice site” refers to a conserved sequence that can be found at the 5’ and 3’ ends of introns. In some embodiments, a 5’ splice site is a dinucleotide GT 5’ splice site. In some embodiments, the 3’ splice site is an AG 3’ splice site. As used herein, a “branch point sequence” refers to a cis-acting intronic sequence that can be located upstream from the 3’ end of an intron. In some embodiments, the branch point sequence can be located from 18 to 40 nucleotides (e g., from 18 to 35 nucleotides, from 18 to 30 nucleotides, from 18 to 25 nucleotides, from 18 to 20 nucleotides, from 20 to 40 nucleotides, from 20 to 35 nucleotides, from 20 to 30 nucleotides, from 20 to 25 nucleotides, from 25 to 40 nucleotides, from 25 to 35 nucleotides, from 25 to 30 nucleotides, from 30 to 40 nucleotides, from 30 to 35 nucleotides, or from 35 to 40 nucleotides) upstream from the 3’ end of an intron. As used herein, a “polypyrimidine tract (PPT)” refers to a region of pre-mRNA that promotes the assembly of the spliceosome, a protein complex that carries out RNA splicing. The PPT sequence is rich with pyrimidine nucleotides, especially uracil, and is about 15 to 20 nucleotides (e.g., about 10 to 25, about 10 to 20, about 10 to 15, about 15 to 25, or about 20 to 25) long, located from about 5 to 40 nucleotides upstream (e.g., about 5 to 45 nucleotides from the 3’ end of the intron to be spliced.

In some embodiments, the exogenous nucleic acid molecule can comprise a sequence of SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, or any combinations thereof.

SEQ ID NO: 1 - Consensus Intron + Ribozymes

GTAAGTGGGGACCTGATGAGTCCGTGAGGACGAAACGAGTAAGCTCGTCGTCCCCT CCACCCCACAGTGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTT ATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTGGCCGGCATGGTCCCAGCCTC CTCGCTGGCGCCGGCTGGGCAACATGCTTCGGCATGGCGAATGGGACTACTAACTTC GAGTCTTCTTTTTTTTTTTCACAG

SEQ ID NO: 2 - Consensus Intron + tRNA

GTAAGTGGCTCGTTGGGAGGTCCCGGGTTGAAATCCCGGACGAGCCCGTCCCCTCCA CCCCACAGTGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATC AACTTGAAAAAGTGGCACCGAGTCGGTGCAACCAGTTTGTGTCGGCTCGTTGGGAG GTCCCGGGTTGAAATCCCGGACGAGCCCTACTAACTTCGAGTCTTCTTTTTTTTTTTC ACAG

SEQ ID NO: 3 - Sno-IncRNA2 Intron + Ribozymes

GTAAGTGTTCATTTCTCAAAAGACCCTAATGTTCTTCCTTTACAGGAATGAATACTG T GCATGGACCAATGATGACTTCCATACATGCATTCCTTGGAAAGCTGAACAAAATGA GTGGGAACTCTGTACTATCATCTTAGTTGAACTGAGGTCCGGATCCGGGGACCTGAT GAGTCCGTGAGGACGAAACGAGTAAGCTCGTCGTCCCCTCCACCCCACAGTGGTTTT AGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTG GCACCGAGTCGGTGCTTTTGGCCGGCATGGTCCCAGCCTCCTCGCTGGCGCCGGCTG GGCAACATGCTTCGGCATGGCGAATGGGACTCTAGATGGATCGATGATGACTTCCAT ATATACATTCCTTGGAAAGCTGAACAAAATGAGTGAAAACTCTATACCGTCATTCTC GTCGAACTGAGGTCCAACCGGTGCACATTACTCCAACAGGGGCTAGACAGAGAGGG

CCAACATTGATTCGTTGACATGGGTGGCTGCAGTACTAACTTCGAGTCTTCTTTTTT T TTTTCACAG

SEQ ID NO: 4 - Sno-lncRNA2 Intron + tRNA

GTAAGTGTTCATTTCTCAAAAGACCCTAATGTTCTTCCTTTACAGGAATGAATACTG T

GCATGGACCAATGATGACTTCCATACATGCATTCCTTGGAAAGCTGAACAAAATGA

GTGGGAACTCTGTACTATCATCTTAGTTGAACTGAGGTCCGGATCCGGCTCGTTGGG

AGGTCCCGGGTTGAAATCCCGGACGAGCCCGTCCCCTCCACCCCACAGTGGTTTTAG

AGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGC

ACCGAGTCGGTGCAACCAGTTTGTGTCGGCTCGTTGGGAGGTCCCGGGTTGAAATCC

CGGACGAGCCCTCTAGATGGATCGATGATGACTTCCATATATACATTCCTTGGAAAG

CTGAACAAAATGAGTGAAAACTCTATACCGTCATTCTCGTCGAACTGAGGTCCAACC

GGTGCACATTACTCCAACAGGGGCTAGACAGAGAGGGCCAACATTGATTCGTTGAC

ATGGGTGGCTGCAGTACTAACTTCGAGTCTTCTTTTTTTTTTTCACAG

SEQ ID NO: 5 - HSV Branchpoint Intron + Ribozymes

GTAAGTGGGGACCTGATGAGTCCGTGAGGACGAAACGAGTAAGCTCGTCGTCCCCT

CCACCCCACAGTGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTT

ATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTGGCCGGCATGGTCCCAGCC

TCCTCGCTGGCGCCGGCTGGGCAACATGCTTCGGCATGGCGAATGGGACGAGGGAG

TCGAGTCTTCTTTTTTTTTTTCACAG

SEQ ID NO: 6 - HSV Branchpoint Intron + tRNA

GTAAGTGGCTCGTTGGGAGGTCCCGGGTTGAAATCCCGGACGAGCCCGTCCCCTCCA

CCCCACAGTGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATC

AACTTGAAAAAGTGGCACCGAGTCGGTGCAACCAGTTTGTGTCGGCTCGTTGGGAG

GTCCCGGGTTGAAATCCCGGACGAGCCCGAGGGAGTCGAGTCTTCTTTTTTTTTTTC A CAG

SEQ ID NO: 7 - EBV-sisRNA-1 Intron + Ribozymes GTAAGTGGACTTTAATTTTTTCTGCTGGGGACCTGATGAGTCCGTGAGGACGAAACG

AGTAAGCTCGTCGTCCCCTCCACCCCACAGTGGTTTTAGAGCTAGAAATAGCAAGTT

AAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTGG

CCGGCATGGTCCCAGCCTCCTCGCTGGCGCCGGCTGGGCAACATGCTTCGGCATGGC

GAATGGGACAAGCCCAACACTCCACCACACCCAGGCACACACTACACACACCCACC CGTCTCAG

SEQ ID NO: 8 - EBV-sisRNA-1 Intron + tRNA

GTAAGTGGACTTTAATTTTTTCTGCTGGCTCGTTGGGAGGTCCCGGGTTGAAATCCC

GGACGAGCCCGTCCCCTCCACCCCACAGTGGTTTTAGAGCTAGAAATAGCAAGTTA

AAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCAACCAG

TTTGTGTCGGCTCGTTGGGAGGTCCCGGGTTGAAATCCCGGACGAGCCCAAGCCCAA

CACTCCACCACACCCAGGCACACACTACACACACCCACCCGTCTCAG

SEQ ID NO: 9 - EBV-sisRNA-1 Intron with sgRNA scaffold only

GTAAGTGGACTTTAATTTTTTCTGCTGTCCCCTCCACCCCACAGTGGTTTTAGAGCT A

GAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGA

GTCGGTGCAAGCCCAACACTCCACCACACCCAGGCACACACTACACACACCCACCC GTCTCAG

SEQ ID NO: 10 - Hammerhead Ribozyme

CTGATGAGTCCGTGAGGACGAAACGAGTAAGCTCGTC

SEQ ID NO: 11 - Hepatitis Delta Virus Ribozyme

GGCCGGCATGGTCCCAGCCTCCTCGCTGGCGCCGGCTGGGCAACATGCTTCGGCATG GCGAATGGGAC

SEQ ID NO: 12 - AC55G tRNA

GGCTCGTTGGGAGGTCCCGGGTTGAAATCCCGGACGAGCCC SEQ TD NO: 13 - EBV-sisRNA-1

GTAAGTGGACTTTAATTTTTTCTGCTAAGCCCAACACTCCACCACACCCAGGCACAC ACTACACACACCCACCCGTCTCAG

SEQ ID NO: 14 - AAVS1T1 sgRNA

GTCCCCTCCACCCCACAGTGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTA GTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC

Synthetic Intron

Provided herein are synthetic introns that include from 5’ to 3’ : (a) a 5’ splice site; (b) any one of the nucleic acid molecules described herein; and (c) a 3’ splice site. In some embodiments, the synthetic intron further comprises 3’ to the nucleic acid molecule a branch point and a polypyrimidine tract. In some embodiments, the nucleic acid molecule comprises an escape module positioned 5’ to the effector RNA sequence. In some embodiments, the nucleic acid molecule comprises an escape module positioned 3’ to the effector RNA sequence. In some embodiments, the nucleic acid molecule comprises a first escape module at a 5’ end of the nucleic acid molecule and a second escape module at a 3’ end of the nucleic acid molecule. In some embodiments, the escape module comprises a hammer-head ribozyme sequence, a twister ribozyme sequence, a twister-sister ribozyme sequence, a hatchet ribozyme, a HDV-like ribozyme sequence, a pistol ribozyme, or a hairpin ribozyme.

In some embodiments, the nucleic acid molecule comprises from 5’ to 3’ : (a) a first escape module; (b) an effector RNA sequence; and (c) a second escape module. In some embodiments, the first escape module and the second escape module independently comprise a hammer-head ribozyme sequence, a twister ribozyme sequence, a twister- sister ribozyme sequence, a hatchet ribozyme, a HDV-like ribozyme sequence, a pistol ribozyme, a hairpin ribozyme, or a tRNA sequence. In some embodiments, the first escape module comprises a hammer-head ribozyme sequence, a twister ribozyme sequence, a twister-sister ribozyme sequence, a hatchet ribozyme, a HDV-like ribozyme sequence, a pistol ribozyme, a hairpin ribozyme, or a tRNA sequence. In some embodiments, the second escape module comprises a hammer-head ribozyme sequence, a twister ribozyme sequence, a twister-sister ribozyme sequence, a hatchet ribozyme, a HDV-like ribozyme sequence, a pistol ribozyme, a hairpin ribozyme, or a tRNA sequence. In some embodiments, the first escape module, the effector RNA sequence, and the second escape module are positioned between the dinucleotide GT 5’ splice site and the branch point sequence. In some embodiments, the first escape module comprises a hammer head ribozyme. In some embodiments, the second escape module comprises a hepatitis delta virus ribozyme. In some embodiments, the hammer head ribozyme is at a 5’ position and the hepatitis delta virus ribozyme is at a 3’ position relative to the effector RNA sequence, wherein the hammer head ribozyme, the effector RNA sequence, and the hepatitis delta virus ribozyme are positioned between the dinucleotide GT 5’ splice site and the branch point sequence.

In some embodiments, the escape module is at a 5’ position relative to the effector RNA sequence. In some embodiments, the hammer head ribozyme is at a 5’ position relative to the effector RNA sequence, wherein the hammer head ribozyme and the effector RNA sequence are positioned between the dinucleotide GT 5’ splice site and the branch point sequence. In some embodiments, the escape module is at a 3’ position relative to the effector RNA sequence. In some embodiments, the hepatitis delta virus ribozyme is at a 3’ position relative to the effector RNA sequence, wherein the hepatitis delta virus ribozyme and the effector RNA sequence are positioned between the dinucleotide GT 5’ splice site and the branch point sequence.

In some embodiments, the first escape module comprises a tRNA sequence. In some embodiments, the second escape module comprises a tRNA sequence. In some embodiments, the tRNA sequences and the effector RNA sequence are positioned between the dinucleotide GT 5’ splice site and the branch point sequence.

Also provided herein are nucleic acids comprising any one of the synthetic introns described herein. In some embodiments, the nucleic acid comprises a promoter and two or more exonic sequences flanking the synthetic intron. In some embodiments, the promoter is inducible. Non-limiting exemplary promoters can include CMV, CBA, CAG, Cbh, EF-la, PGK, UBC, GUSB, UCOE, hAAT, TBG, Desmin, MCK, C5-12, NSE, Synapsin, PDGF, MecP2, CaMKII, mGluR2, NFL, NFH, n02, PPE, ENK, EAAT2, GFAP, MBP, Tet-ON, and U6 promoters.

Also provided herein are engineered cells comprising any one of the nucleic acid molecules described herein or any one of the nucleic acids described herein. In some embodiments, the effector RNA sequence comprises a sgRNA and wherein the engineered cell further comprises a nucleic acid sequence that expresses a Cas polypeptide. In some embodiments, the Cas polypeptide is Cas9. In some embodiments, a method of intron-driven gene editing can include delivering a nucleic acid molecule and a gene-editing agent into a cell. As used herein, the term “gene-editing agent” can refer to an agent that allows for changing the DNA or RNA (e.g., mRNA) in the genome. In some embodiments, gene-editing can include insertion, deletion, modification, or replacement of the DNA or RNA. In some embodiments, a gene-editing agent can include a nuclease-based gene editing platform. In some embodiments, a gene-editing agent can include zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), engineered meganucleases, or a clustered regularly interspaced short palindromic repeats (CRISPR) system. In some embodiments, a gene-editing agent can include RNA base editors (e.g., ADAR) or DNA base editors (e.g., target- AID base editor and dddA-derived cytosine base editor). In some embodiments, a gene-editing agent can include a CRISPR-associated transposase. In some embodiments, a gene-editing agent can include RNA interference (e.g., short hairpin RNA (shRNA), small interfering RNA (siRNA), antisense oligonucleotide (ASO), or microRNA mimics). In some embodiments, the gene-editing agent can include an RNA guide gene targeting system. In some embodiments, the gene-editing agent can include CRISPR components. For example, in some embodiments, CRISPR components can include, but are not limited to, a guide RNA and a CRISPR-associated endonuclease (Cas protein). As used herein, a “CRISPR- associated endonuclease” or “CRISPR-associated protein” can refer to an enzyme or protein that uses CRISPR sequences as a guide to recognize and cleave specific nucleic acid strands that are complementary to the CRISPR sequence. In some embodiments, a gene-editing agent can include a CRISPR-associated protein. In some embodiments, the gene-editing agent can be a Cas9 endonuclease that makes a double- stranded break in a target DNA sequence. In some embodiments, the gene-editing agent can include a Cas9 protein. In some embodiments, the gene-editing agent can be a Casl2 (e.g., Casl2a) nuclease that also makes a double-stranded break in a target DNA sequence. In some embodiments, a gene-editing agent can irreversibly knock out the target gene via double strand breaks. In some embodiments, a gene-editing agent that irreversibly knocks out the target gene can be a Cas9 nuclease which targets RNA. In some embodiments, a gene-editing agent can transiently reduce the expression of a target gene. In some embodiments, a gene-editing agent that transiently reduces the expression of a target gene can include a Cas protein that targets RNA, including but not limited to Cas9, Casl3, Cmr, and Csm systems. As used herein, “delivering” refers to the introduction of an exogenous polynucleotide into a cell. Such methods include a variety of well-known techniques such as vector-mediated gene transfer (e.g., viral infection/transfection, or various other protein-based or lipid-based gene delivery complexes). In some embodiments, the delivering comprises transfection, electroporation, or a virus-based delivery.

In some embodiments, nucleic acid molecules may be inserted into an expression vector or viral vector by methods known to the art, and the nucleic acid molecules may be operably linked to an expression control sequence. Non-limiting examples of expression vectors include plasmid vectors, transposon vectors, cosmid vectors, and viral vectors (e.g., any adenoviral vectors (AV), cytomegaloviral (CMV) vectors, simian viral (SV40) vectors, adeno-associated virus (AAV) vectors, lentiviral vectors, and retroviral vectors). Additional sequences can be added to such cloning and/or expression sequences to optimize their function in cloning and/or expression, to aid in isolation of the polynucleotide, or to improve the introduction of the polynucleotide into a cell. Use of cloning vectors, recombinant vectors, adapters, and linkers is well known in the art. In some embodiments, the expression vector is a viral vector. In some embodiments, the short nuclear RNA produced in the cell is expressed by using a mammalian expression vector. In some embodiments, the mammalian expression vector comprises a promoter.

In some embodiments, the cell can be from eukaryotic cells. As used herein, the term “eukaryotic cell” refers to a cell having a distinct, membrane-bound nucleus. Such cells may include, for example, mammalian (e.g., rodent, non-human primate, or human), insect, fungal, or plant cells. In some embodiments, the eukaryotic cell is a yeast cell, such as Saccharomyces cerevisiae. In some embodiments, the eukaryotic cell is a higher eukaryote, such as mammalian, avian, plant, or insect cells. Non-limiting examples of mammalian cells include Chinese hamster ovary cells and human embryonic kidney cells (e.g., HEK293 cells). In some embodiments, the cell is from a HEK293FT or a HCT116 cell line.

In some embodiments, a method of intron-driven gene editing can further include detecting gene-editing of a target RNA in a cell, wherein a short nuclear RNA that is targeted to the target RNA is produced in the cell. In some embodiments, a short nuclear RNA can be targeted to a target RNA of a cell. In some embodiments, the short nuclear RNA is targeted to the AAVS1T1 gene. In some embodiments, the detecting step can include amplifying and sequencing of the target RNA. In some embodiments, the detecting can further include analyzing a mutation rate of the target RNA.

EXAMPLES

The disclosure is further described in the following examples, which do not limit the scope of the disclosure described in the claims.

Example 1 - Ribozyme-mediated intron-driven gene editing

A synthetic intron to accomplish ribozyme-mediated intron-driven gene editing is described here (FIG. 1A). The intron comprises requisite sequences for the recruitment of cellular splicing machinery and after splicing, the produced lariat is further processed by cis-cleaving ribozymes (FIG. IB). Here, an sgRNA flanked by a hammerhead (HH) ribozyme on the 5’ side and a hepatitis delta virus (HDV) ribozyme on the 3’ side was inserted between the 5’ splice site sequence and the branch point of a gene’s intron (Table 1). The sgRNA used in this case targeted the AAVS1T1 locus of the human genome. HEK293T cells were transfected with a mammalian expression vector containing a CMV promoter-driven dsRed gene (cloned from Addgene #24218) interrupted by this intron and cultured for nine days post-transfection. Indel rate was measured by amplifying and sequencing the AAVS1T1 locus targeted by the intronic single guide RNA. The indel rate is also shown for a U6 promoter-expressed sgRNA as a positive control. Introducing this synthetic intron together with Cas9 protein into HEK293T cells resulted in editing of the AAVS1T1 locus (FIG. 2), thus demonstrating gene editing capability using the synthetic intron. These results suggest that this synthetic intron undergoes cis-cleavage after transcription and splicing, releasing a linear sgRNA.

Example 2 - tRNA-mediated intron-driven gene editing

Here, an sgRNA flanked by a tRNA on each side was inserted between the 5’ splice site sequence and the branch point of a gene’s intron (Table 1). The sgRNA used in this case targeted the AAVS1T1 locus of the human genome. HEK293T cells were transfected with a mammalian expression vector containing a CMV promoter-driven dsRed gene (cloned from Addgene #24218) interrupted by this intron and cultured for nine days post-transfection. Indel rate was measured by amplifying and sequencing the AAVS1T1 locus targeted by the intronic single guide RNA. The indel rate is also shown for a U6 promoter-expressed sgRNA as a positive control. Introducing this synthetic intron together with Cas9 protein into HEK293T cells resulted in editing of the AAVS1T1 locus (FIG. 2), thus demonstrating gene editing capability using the synthetic intron. These results suggest that flanking tRNA sequences are cleaved by the endogenous tRNA processing enzymes RNase P and RNase Z after transcription and splicing, releasing a linear functional sgRNA. Of note, the tRNA sequences used herein contained a single base mutation to disable the intrinsic RNA Pol III promoter activity of tRNA sequences. Therefore, any geneediting activity observed here can be attributed to the promoter of the driver gene.

Example 3 - Epstein Barr Virus (EBV) sisRNA-1 in intron-driven gene editing

An sgRNA was inserted between the 5’ splice site and the branch point of the EBV- sisRNA-1 to create a synthetic intron (Table 1). The sgRNA used here targeted the AAVS1T1 locus of the human genome. HEK293T cells were transfected with a mammalian expression vector containing a CMV promoter-driven dsRed gene (cloned from Addgene #24218) interrupted by the indicated intron and cultured for nine days post-transfection. Indel rate was measured by amplifying and sequencing the AAVS1T1 locus targeted by the intronic single guide RNA. The indel rate is also shown for a U6 promoter-expressed sgRNA as a positive control. Sno-lncRNA and EBV-sisRNA-1 introns bearing the cis-cleaving ribozyme motifs exhibit robust gene editing. Introducing this synthetic intron as part of a constitutively expressed fluorescent protein coding gene together with Cas9 into HEK293T cells resulted in editing of the AAVS1T1 locus (FIG. 2), thus demonstrating gene editing capability using the synthetic intron. These results suggest that this minimal synthetic intron produces a linear functional sgRNA. The 5’ hairpin motif of the EBV-sisRNA-1 (which may play a role in stabilizing the intronic RNA in the nucleus) was preserved and the sgRNA scaffold was inserted downstream of this hairpin (FIG. 3). Any geneediting activity observed here can be attributed to the promoter of the driver fluorescent gene. This method produces a single guide RNA from a Pol-II driven transcript that does not require any additional processing beyond splicing.

[Table 1]

Example 4 - Intron-expressed guide RNAs only require a single ribozyme for processing In addition, it was explored whether the intronic strategy described herein would permit using only one ribozyme to generate snRNAs. Two constructs were produced in which an AAV S 1 T 1 -targeting sgRNA was flanked either with an upstream hammerhead (HH) ribozyme or a downstream hepatitis delta virus (HDV) ribozyme (FIG. 4B). After sequencing the AAVS1T1 locus, it was discovered that flanking the sgRNA with only one of the two ribozymes separately generated indels, with higher efficiencies observed for the intron with a single downstream HDV ribozyme (FIG. 4A). These results indicate that embedding the guide RNA in an intron facilitates snRNA production using only one cis-cleaving ribozyme for processing.

Example 5 - EBV-sisRNA-l/consensus splice donors and acceptors are interchangeable

It was also investigated whether the canonical, consensus human splicing sequences were compatible with the splice donor and acceptor portions of EBV-sisRNA-1. Because EBV- sisRNA-1 does not contain any obvious branch point sequence and has no polypyrimidine tract, it was unknown whether canonical spliceosomal processing would allow these different splicing donor/acceptor sequences to be used interchangeably. In addition, it was determined whether the observed increase in mutation rate for sgRNAs expressed from an intron with EBV-sisRNA-1 sequences was attributable to the entire sequence or specifically one part. By constructing hybrid introns with both consensus and EBV-sisRNA-1 sequences (FIG. 5B), it was found that these introns splice successfully and generate indels at comparable rates (FIG. 5A).

Example 6 - Expressing sgRNAs through introns preserves native gene expression

To establish that the strategy of embedding guide RNAs in an intron preserves native gene expression, an experiment was conducted comparing three constructs bearing a dsRed fluorescent protein gene with either a constitutively spliced adenoviral intron, an intron containing ribozymes, an AAVS1T1 -targeting sgRNA, and sno-lncRNA-2 sequences, or a sgRNA flanked with ribozymes and embedded in the 3’ UTR (FIG. 6B). As expected, the construct with ribozymes embedded in the 3’ UTR showed dramatically diminished fluorescence owing to removal of the polyadenylation tract after cis-cleavage. However, the introns bearing ribozymes, stabilizing sno-lncRNA-2 sequences, and a sgRNA show comparable fluorescence to the adenoviral intron construct with no cis-cleaving ribozymes (FIG. 6A). Furthermore, the AAVS1T1 locus was sequenced after transfecting these constructs and a Cas9-expressing plasmid and it was found that the sgRNAs produced by the intronic construct showed comparable rates of editing to those produced in the 3’ UTR (FIG. 6C).

Example 7 - Application of technology to cell fate engineering and cell differentiation

Expressing sgRNAs through stabilized, ribozyme-processed introns enables the production of complex, conditional knock-out/knock-down circuits in cell lines. For instance, a line is engineered to express a sgRNA within the intron of a gene downstream of an inflammatory pathway (such as an interleukin or TNF-inducible gene). This sgRNA drives the knockout (through Cas9) or knockdown (through Cast 3 a) of a gene crucial for an undesired cellular differentiation event. Applying the technology in this manner produces engineered stem/progenitor cells that inducibly disable their ability to differentiate into unwanted cell types in the presence of inflammatory stimulus. Similar objectives may be accomplished by through CRISPRi, CRISPRa, or other technologies relying on an effector RNA. In these contexts, the gene within which the effector RNA is located acts as a sensor.

Example 8 - Indel rate by intron and doxycycline pulse time

Stable, transgenic HEK293T lines were produced using a PiggyBac vector that expresses an intron-interrupted dsRed gene under a doxycycline-inducible promoter. Cell lines were pretransfected with a plasmid expressing Cas9 (“+Cas9”), a plasmid expressing Cas9 and a second plasmid expressing a single guide RNA under a U6 promoter (“+Cas9/+U6 sgRNA”), or pUC19 as a negative control (“-Cas9”). After transfection, cells were treated with lug/mL doxycycline at different pulse times to induce dsRed at a range of expression levels. Intronic guide RNAs exhibit expression-dependent gene editing rates at a wide dynamic range (FIG. 7).

Example 9 - Casl3a crRNAs expressed through stabilized introns achieve knockdown without ribozyme processing

In cell fate engineering applications, it can be desirable to transiently reduce the expression of a target gene rather than driving irreversible knockout via double strand breaks. LwCasl3a is an RNA-guided RNA endonuclease in the Type VI CRISPR system which catalyzes the degradation of target mRNAs using a complementary crRNA for targeting. Cast 3a exhibits self-processing activity, where pre-crRNAs in CRISPR arrays are processed into single, mature crRNAs by Casl3a. To make use of this self-processing activity, expressing Cast 3a crRNAs from an intron stabilized with sequences from sno-lncRNA-2 was explored without the flanking cis-cleaving ribozymes used in prior Cas9 sgRNA designs. In this experiment, HEK293T cells were lipofected with a doxycycline-inducible dsRed construct with a snoRNA-stabilized intron containing one of two previously validated Cast 3a pre-crRNAs targeting blue fluorescent protein (BFP) mRNA, or an intron containing both pre-crRNAs in a tandem array (FIG. 8A). In addition, cells were transfected with a plasmid that expresses LwCasl3a (Addgene #91924) and a plasmid that expresses EBFP2 (Addgene #54595). The intron-bearing dsRed gene was maximally induced after transfection using lug/mL doxycycline. All transfected groups showed robust induction and splicing of the dsRed transgene as evidenced by high levels of fluorescence. Both intron-expressed crRNAs exhibited knockdown of BFP fluorescence, with the highest level observed in the cells expressing both in a tandem array (FIG.

8B)

Example 10 - Intron-expressed homing guide RNAs (hgRNAs) record the expression of their parent gene

Homing guide RNAs (hgRNAs) are modified single guide RNAs that target Cas9 to induce double strand breaks that target the guide RNA’s locus, resulting in the recursive editing that produces diverse and continuously evolving mutations at the target site. For molecular recording applications, embedding hgRNAs into the intron of a gene of interest will allow for the measurement of expression history by sequencing observed mutations in the hgRNA spacer sequence. Using the Super PiggyBac transposase system, stable HEK293T cell lines were established with integrated doxycycline-inducible dsRed transgenes interrupted with snoRNA- stabilized introns containing a ribozyme-processed homing guide RNA (FIG. 9A). These lines were then transfected with either a Cas9-expressing plasmid (“+Cas9”), a positive control sgRNA-expressing plasmid targeting the same spacer sequence as the hgRNA (“+sgRNA”), or a pUC19 negative control (“-Cas9”). Expression of the dsRed gene was induced by treating the cell lines with different pulse times of lug/mL doxycycline in culture media and the hgRNA locus was then sequenced. After 71.5 hours post-induction, mutation rates at the hgRNA locus that correlated with the level of induced gene expression was observed (FIG. 9B). Notably, only a modest decrease in dsRed fluorescence associated with transfection of the Cas9-expressing plasmid was observed, suggesting that double strand breaks at the hgRNA locus in the intron do not substantially decrease the gene’s expression (FIG. 9C).

Example 11 - Casl3a crRNAs expressed through stabilized introns achieve knockdown measurable by flow cytometry To quantifiably measure knockdown by intronic Casl3 guides, lipofection in HEK293T cells was repeated with a doxycycline-inducible dsRed construct interrupted by a snoRNA- stabilized intron containing one of two previously validated Cast 3a pre-crRNAs targeting blue fluorescent protein (BFP) mRNA, or an intron containing both validated pre-crRNAs in a tandem array. Additionally, cells were lipofected with a plasmid expressing LwCasl3a and a plasmid expressing BFP. 48 hours post-lipofection, cells were lifted and run on a BD FACSymphony A3 in order to measure dsRed and BFP fluorescence. A noticeable decrease was observed in BFP fluorescence relative to scramble control, primarily among cells with the highest dsRed fluorescence (FIG. 10A). Gating only on the population of cells with high dsRed fluorescence, a noticeable shift was observed in the distribution of BFP fluorescence when compared to scramble (FIG. 10B). Furthermore, after gating on events with high BFP fluorescence from 8b, there was a marked decrease in the average percentage of events falling within the high BFP gate compared to control (FIG. 10C).