Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
COMPOSITIONS AND METHODS FOR DETECTING BLADDER CANCER
Document Type and Number:
WIPO Patent Application WO/2012/116019
Kind Code:
A1
Abstract:
Epigenetic alterations in tissues targeted for cancer play a causal role in carcinogenesis. Changes in DNA methylation in non-target tissues, specifically peripheral blood, can also affect risk of malignant disease. This application discloses specific profiles of DNA methylation in peripheral blood that are associated with bladder cancer risk and therefore serve as an epigenetic marker of disease susceptibility.

Inventors:
MARSIT CARMEN (US)
KELSEY KARL (US)
Application Number:
PCT/US2012/026038
Publication Date:
August 30, 2012
Filing Date:
February 22, 2012
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV BROWN (US)
MARSIT CARMEN (US)
KELSEY KARL (US)
International Classes:
G01N33/574; G06F19/10
Domestic Patent References:
WO2007050777A22007-05-03
Foreign References:
US20050021240A12005-01-27
US20100317000A12010-12-16
CA2759312A12010-10-28
Attorney, Agent or Firm:
BEATTIE, Ingrid A. et al. (P.C.One Financial Cente, Boston MA, US)
Download PDF:
Claims:
Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

WHAT IS CLAIMED IS:

1. A method of diagnosing bladder cancer or a susceptibility of developing said cancer, comprising providing a DNA from a non-bladder tissue and detecting methylation of a CpG locus, wherein said locus is located withinl kilobase of a transcription-factor binding site related to immune modulation, and oncogenic transcription factor binding site, or a forkhead family member.

2. The method of claim 1, wherein said non-bladder tissue comprises peripheral blood.

3. The method of claim 2, wherein said locus is located within a gene selected from the group consisting of NALP4, BDKRBl, C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2.

4. A method for identification of differentially methylated genomic CpG dinucleotide sequences associated with bladder cancer in an individual, said method comprising:

(a) obtaining a biological sample comprising genomic DNA from said individual;

(b) measuring in said sample the level of one or more methylated genomic CpG dinucleotide sequences in one or more loci selected from the group consisting of NALP4, BDKRBl, C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2, and

(c) comparing the level of methylation at said one or more genomic CpG dinucleotide sequences in the sample to a reference level of methylated genomic CpG dinucleotide sequences,

wherein a difference in the level of methylation of said one or more genomic CpG dinucleotide sequences in the sample compared to the reference level identifies differentially methylated genomic CpG dinucleotide sequences associated with bladder cancer.

5. The method of claim 4, wherein the level of methylation of said differentially methylated genomic CpG dinucleotide sequences is used to diagnose bladder cancer in the individual. Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

6. The method of claim 4, wherein the level of methylation of said differentially methylated genomic CpG dinucleotide sequences is used to predict the course of the cancer in the individual.

7. The method of claim 4, wherein the level of methylation of said differentially methylated genomic CpG dinucleotide sequences is used to predict the susceptibility to cancer of the individual.

8. The method of claim 4, wherein the level of methylation of said differentially methylated genomic CpG dinucleotide sequences is used to stage the progression of the cancer in the individual.

9. The method of claim 4, wherein the level of methylation of said differentially methylated genomic CpG dinucleotide sequences is used to predict the likelihood of overall survival for said individual.

10. The method of claim 4, wherein the level of methylation of said differentially methylated genomic CpG dinucleotide sequences is used to predict the likelihood of recurrence of cancer for individual.

11. The method of claim 4, wherein the level of methylation of said differentially methylated genomic CpG dinucleotide sequences in said sample is used to determine the effectiveness of a treatment course undergone by the individual.

12. The method of claim 11, wherein said reference level corresponds to the level of

methylated genomic CpG dinucleotide sequences present in a corresponding sample obtained from said individual prior to treatment.

13. The method of claim 4, wherein said level of methylation in the biological sample is decreased in comparison to the reference level.

14. The method of claim 4, wherein said level of methylation in the biological sample is increased in comparison to the reference level.

15. The method of claim 4, wherein said biological sample is a biopsy sample.

16. The method of claim 4, wherein said biological sample is a blood sample.

17. A method, comprising: Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

a) providing a biological sample from a subject, said biological sample comprising genomic DNA;

b) detecting the presence or absence of DNA methylation in one or more gene loci to generate a methylation profile for said subject, wherein the one or more loci is selected from the group consisting of NALP4, BDKRBl, C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2; and

c) comparing said methylation profile to one or more standard methylation profiles, wherein said standard methylation profiles are selected from the group consisting of

methylation profiles of non cancerous samples and methylation profiles of cancerous samples.

18. The method of claim 17, wherein said biological sample is a biopsy sample.

19. The method of claim 17, wherein said biological sample is a blood sample.

20. The method of claim 17, wherein said DNA methylation comprises CpG methylation.

21. A method of characterizing bladder cancer, comprising:

a) providing a biological sample from a subject diagnosed with bladder cancer, said biological sample comprising genomic DNA; and

b) detecting increased DNA methylation in one or more CpG loci selected from the groups consisting of NALP4, BDKRBl, C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2, thereby characterizing cancer in said subject.

22. The method of claim 21, wherein said biological sample is a biopsy sample.

23. The method of claim 21, wherein said biological sample is a blood sample.

24. A non-transient computer readable storage medium, comprising executable instructions for detecting bladder cancer or a susceptibility of developing said cancer, the executable instructions configured to:

receive data characterizing the intensity of DNA methylation in one or more gene loci;

compare the intensity of DNA methylation to a model; and

calculate a likelihood of bladder cancer. Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

25. A system for predicting bladder cancer, comprising:

an input module for analyzing genomic DNA and detecting the intensity of DNA methylation in said genomic DNA;

wherein one or more gene loci of said genomic DNA are inspected;

a processor configured to determine a likelihood of bladder cancer using said intensity of DNA methylation and a model computed from one or more subjects known to have bladder cancer and one or more subjects known to not have bladder cancer.

Description:
Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

COMPOSITIONS AND METHODS FOR DETECTING BLADDER CANCER

RELATED APPLICATIONS

[0001] This application claims the benefit of priority under 35 U.S.C. § 119(e) to U.S.

Provisional Application No: 61/445,270, filed February 22, 2011, which is incorporated herein by reference in its entirety.

GOVERNMENT SUPPORT CLAUSE

[0002] This invention was made with government support under Grant Nos. NIH-NCI R01CA121147 and NIEHS P42ES007373. The government has certain rights to this invention.

FIELD OF THE DISCLOSURE

[0003] The present invention relates to compositions and methods related to the detection of bladder cancer.

BACKGROUND OF THE DISCLOSURE

[0004] The incidence of transitional cell carcinoma of the urinary bladder (henceforth bladder cancer) in the United States in 2009 was predicted to be almost 71,000 new cases (Siegel et al., CA Cancer J. Clin. 59:225-49, 2009). Worldwide, almost 360,000 cases of the disease were diagnosed in 2009 (Parkin, Scand. J. Urol. Nephrol. Suppl.: 12-20, 2008). In addition, 16% of individuals initially diagnosed with bladder cancer will, in their lifetime, be diagnosed with additional primary tumors (Hayat et al., Oncologist 12:20-37, 2007). The highly successful treatment of bladder cancer comes at great economic burden to the healthcare system, with lifetime monitoring and treatment making bladder cancer one of the most expensive of all cancers, with diagnosis to death per patient costs ranging from $96,000 to $187,000, accounting for almost 3.7 billion U.S. dollars (2001 dollars) in direct costs to the U.S. medical system each year (Botteman et al., Pharmacoeconomics 21:1315-1330, 2003).

[0005] Tobacco carcinogen exposure through active smoking is the main established risk factor for bladder cancer; but the attributable risk is far less than for lung cancer, and much of the etiology of bladder cancer remains unclear (Kogevinas et al., Textbook of Cancer Epidemiology, London, Oxford University Press, 2002, pgs. 446-66). Other major risk factors for bladder cancer include occupational exposures, particularly aromatic amine and Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

polycyclic aromatic hydrocarbon exposures, inorganic arsenic, use of certain hair dyes, exposure to chlorination by-products, individual fluid intake, and dietary factors (Colt et al., Cancer Causes Control 15:759-69, 2004; National Research Council, Washington D.C., National Academy Press, 2001; Chen et al., Am. J. Public Health 94:741-4, 2004; Smith et al., Bull. World Health Organ. 78:1093-103, 2000; Bates et al., Am. J. Epidemiol. 141:523- 30, 1995; Karagas et al., Cancer Causes Control 15:465-72, 2004; King et al., Cancer Causes Control 7:596-604, 1996; and Wilkens et al., Cancer Epidemiol. Biomakers Prev. 5:161-6, 1996). Of course, host susceptibility also plays an important role in bladder carcinogenesis and family history of bladder cancer confers an almost two-fold risk of disease. Polymorphisms in genes related to environmental toxicant metabolism such as NAT2 and GSTM1 have been clearly linked to bladder cancer risk (Mueller et al., Urol. Oncol. 26:451- 64, 2008; Engel et al., Am. J. Epidemiol. 156:95-109, 2002). Genome-wide association studies of bladder cancer identified SNPs on chromosome 8q24, upstream of the MYC oncogene, on chromosome 3q28 near the TP63 tumor suppressor gene, and in the PSCA gene to be associated with bladder cancer risk (Kiemeney et al., Curr. Opin. Urol. 19:540-6, 2009; Wu et al., Nat. Genet. 41:991-5, 2009). Although these SNP studies may point to novel mechanisms of susceptibility to the disease, it is becoming increasingly clear that their contribution to the attributable risk for the disease is minor. In fact there is great controversy over whether common genetic variants will play a major role in defining disease susceptibility (Schork et al., Curr. Opin. Genet. Dev. 19:212-9, 2009.

[0006] It is now widely accepted that epigenetic alterations in target tissues are causal to the development of malignancy (Jones et al., Nat. Genet. 21:163-7, 1999; Gaudet et al., Science 300:489-92, 2003. The extent of variability of the cellular epigenome, and specifically DNA methylation at gene promoter regions, remains a critical question; the amount of variation in genomic methylation across the population is not currently known. Further, individual variation in the epigenome is likely to have multiple characteristics, with a component that is tissue specific and a component that is common to all tissues. We know that some of this variability, particularly in blood, is associated with aging, and exposures encountered throughout life, and the data now suggest that it is, in fact, associated with risk of breast, ovarian, and small cell lung cancer (Christensen et al., PLoS Genet. 5:el000602, 2009; Teschendorff et al., Genome Res. 20:440-6, 2010; Widschwendter et al., PLoS One Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

3:e2656, 2008; Teschendorff et al., PLoS One 4:e8274, 2009; and Wang et al., J. Thorac. Oncol., 2010). The profiles of epigenetic change that are found to be associated with disease may reflect genetic or environmental factors (or their interaction) that establish these gene regulatory marks in a fashion that results in disease susceptibility.

SUMMARY OF THE DISCLOSURE

[0007] The methods of the invention enable detection of differentially methylated

genomic CpG dinucleotide sequences associated with bladder cancer. As disclosed herein, the methods of the invention have numerous diagnostic and prognostic applications.

[0008] According to some embodiments, methods are provided for diagnosing bladder cancer or a susceptibility of developing the cancer, comprising providing a DNA from a non- bladder tissue and detecting methylation of a CpG locus, wherein the locus is located withinl kilobase of a transcription-factor binding site related to immune modulation, and oncogenic transcription factor binding site, or a forkhead family member. In some embodiments, the non-bladder tissue comprises peripheral blood. In some embodiments, the locus is located within a gene selected from the group consisting of NALP4 (e.g., GenBank Accession number AF442488 (GI: 17064171), incorporated herein by reference), BDKRB1 (e.g.,

GenBank Accession Number AY275464 (GI: 18105039), incorporated herein by reference), C14ORF103 (see, e.g., GenBank Accession Number CM000265 (GL74273668),

incorporated herein by reference), COX7C (e.g., GenBank Accession Number NM_001867 (GL18105039), incorporated herein by reference), ZNF322B (e.g., GenBank Accession

Number XM_003403446 (GL341915480), incorporated herein by reference), HIGD2A (e.g., GenBank Accession Number NM_138820 (GL52851396), incorporated herein by reference), TBCA (e.g., GenBank Accession Number NM_004607 (GL94421476), incorporated herein by reference), BRD7 (e.g., GenBank Accession Number AF152604 (GL8452873),

incorporated herein by reference), and PSME2 (e.g., GenBank Accession Number

NM_002818 (GL30410791), incorporated herein by reference). In addition to peripheral blood, other bodily fluids such as urine, saliva, and sputum can be tested. The diagnostic methods are non-invasive, a significant advantage over other methods that require biopsied tissues.

[0009] According to some embodiments, methods are provided for identification of differentially methylated genomic CpG dinucleotide sequences associated with bladder Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

cancer in an individual, the method comprising: (a) obtaining a biological sample comprising genomic DNA from the individual; (b) measuring in the sample the level of one or more methylated genomic CpG dinucleotide sequences in one or more loci selected from the group consisting of NALP4, BDKRBl, C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2, and (c) comparing the level of methylation at the one or more genomic CpG dinucleotide sequences in the sample to a reference level of methylated genomic CpG dinucleotide sequences, wherein a difference in the level of methylation of the one or more genomic CpG dinucleotide sequences in the sample compared to the reference level identifies differentially methylated genomic CpG dinucleotide sequences associated with bladder cancer.

[0010] According to some embodiments, methods are provided for diagnosing bladder cancer in an individual, the method comprising: (a) obtaining a biological sample comprising genomic DNA from the individual; (b) measuring in the sample the level of one or more methylated genomic CpG dinucleotide sequences in one or more loci selected from the group consisting of NALP4, BDKRBl, C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2, and (c) comparing the level of methylation at the one or more genomic CpG dinucleotide sequences in the sample to a reference level of methylated genomic CpG

dinucleotide sequences, wherein a difference in the level of methylation of the one or more genomic CpG dinucleotide sequences in the sample compared to the reference level identifies differentially methylated genomic CpG dinucleotide sequences associated with bladder cancer, and wherein the level of methylation of the differentially methylated genomic CpG dinucleotide sequences is used to diagnose bladder cancer in the individual.

[0011] According to some embodiments, methods are provided for predicting the course of bladder cancer in an individual, the method comprising: (a) obtaining a biological sample comprising genomic DNA from the individual; (b) measuring in the sample the level of one or more methylated genomic CpG dinucleotide sequences in one or more loci selected from the group consisting of NALP4, BDKRBl, C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2, and (c) comparing the level of methylation at the one or more genomic CpG dinucleotide sequences in the sample to a reference level of methylated genomic CpG dinucleotide sequences, wherein a difference in the level of methylation of the one or more genomic CpG dinucleotide sequences in the sample compared to the reference Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

level identifies differentially methylated genomic CpG dinucleotide sequences associated with bladder cancer, and wherein the level of methylation of the differentially methylated genomic CpG dinucleotide sequences is used to predict the course of the cancer in the individual.

[0012] According to some embodiments, methods are provided for predict susceptibility to bladder cancer in an individual, the method comprising: (a) obtaining a biological sample comprising genomic DNA from the individual; (b) measuring in the sample the level of one or more methylated genomic CpG dinucleotide sequences in one or more loci selected from the group consisting of NALP4, BDKRB1, C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2, and (c) comparing the level of methylation at the one or more genomic CpG dinucleotide sequences in the sample to a reference level of methylated genomic CpG dinucleotide sequences, wherein a difference in the level of methylation of the one or more genomic CpG dinucleotide sequences in the sample compared to the reference level identifies differentially methylated genomic CpG dinucleotide sequences associated with bladder cancer, and wherein the level of methylation of the differentially methylated genomic CpG dinucleotide sequences is used to predict the susceptibility to bladder cancer of the individual.

[0013] According to some embodiments, methods are provided for staging the progression of bladder cancer in an individual, the method comprising: (a) obtaining a biological sample comprising genomic DNA from the individual; (b) measuring in the sample the level of one or more methylated genomic CpG dinucleotide sequences in one or more loci selected from the group consisting of NALP4, BDKRB1, C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2, and (c) comparing the level of methylation at the one or more genomic CpG dinucleotide sequences in the sample to a reference level of methylated genomic CpG dinucleotide sequences, wherein a difference in the level of methylation of the one or more genomic CpG dinucleotide sequences in the sample compared to the reference level identifies differentially methylated genomic CpG dinucleotide sequences associated with bladder cancer, and wherein the level of methylation of the differentially methylated genomic CpG dinucleotide sequences is used to stage the progression of the cancer in the individual. Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

[0014] According to some embodiments, methods are provided for predicting the

likelihood of overall survival of an individual with bladder cancer, the method comprising: (a) obtaining a biological sample comprising genomic DNA from the individual; (b)

measuring in the sample the level of one or more methylated genomic CpG dinucleotide sequences in one or more loci selected from the group consisting of NALP4, BDKRB1, C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2, and (c) comparing the level of methylation at the one or more genomic CpG dinucleotide sequences in the sample to a reference level of methylated genomic CpG dinucleotide sequences, wherein a difference in the level of methylation of the one or more genomic CpG dinucleotide

sequences in the sample compared to the reference level identifies differentially methylated genomic CpG dinucleotide sequences associated with bladder cancer, and wherein the level of methylation of the differentially methylated genomic CpG dinucleotide sequences is used to predict the likelihood of overall survival for the individual.

[0015] According to some embodiments, methods are provided for predicting the

likelihood of recurrence of bladder cancer in an individual, the method comprising: (a) obtaining a biological sample comprising genomic DNA from the individual; (b) measuring in the sample the level of one or more methylated genomic CpG dinucleotide sequences in one or more loci selected from the group consisting of NALP4, BDKRB1, C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2, and (c) comparing the level of methylation at the one or more genomic CpG dinucleotide sequences in the sample to a reference level of methylated genomic CpG dinucleotide sequences, wherein a difference in the level of methylation of the one or more genomic CpG dinucleotide sequences in the sample compared to the reference level identifies differentially methylated genomic CpG dinucleotide sequences associated with bladder cancer, and wherein the level of methylation of the differentially methylated genomic CpG dinucleotide sequences is used to predict the likelihood of recurrence of cancer for individual.

[0016] According to some embodiments, methods are provided for determining the effectiveness of a treatment course undergone by an individual with bladder cancer, the method comprising: (a) obtaining a biological sample comprising genomic DNA from the individual; (b) measuring in the sample the level of one or more methylated genomic CpG dinucleotide sequences in one or more loci selected from the group consisting of NALP4, Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

BDKRB1, C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2, and (c) comparing the level of methylation at the one or more genomic CpG dinucleotide sequences in the sample to a reference level of methylated genomic CpG dinucleotide sequences, wherein a difference in the level of methylation of the one or more genomic CpG

dinucleotide sequences in the sample compared to the reference level identifies differentially methylated genomic CpG dinucleotide sequences associated with bladder cancer, and wherein the level of methylation of the differentially methylated genomic CpG dinucleotide sequences in the sample is used to determine the effectiveness of a treatment course

undergone by the individual. In some embodiments, the reference level corresponds to the level of methylated genomic CpG dinucleotide sequences present in a corresponding sample obtained from the individual prior to treatment.

[0017] In some embodiments, the level of methylation in the biological sample is decreased in comparison to the reference level. In some embodiments, the level of methylation in the biological sample is increased in comparison to the reference level. For bladder cancer, the level of methylation is elevated compared to normal tissue samples for one of more (e.g. 1 or more, 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more) loci selected from the groups consisting of NALP4, BDKRB1, C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2.

[0018] According to some embodiments, methods are provided comprising: a) providing a biological sample from a subject, the biological sample comprising genomic DNA; b) detecting the presence or absence of DNA methylation in one or more gene loci to generate a methylation profile for the subject, wherein the one or more loci is selected from the group consisting of NALP4, BDKRB1, C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2; and c) comparing the methylation profile to one or more standard methylation profiles, wherein the standard methylation profiles are selected from the group consisting of methylation profiles of non cancerous samples and methylation profiles of cancerous samples. In some embodiments, the DNA methylation comprises CpG methylation.

[0019] According to some embodiments, methods are provided for characterizing bladder cancer, comprising: a) providing a biological sample from a subject diagnosed with bladder cancer, the biological sample comprising genomic DNA; and b) detecting increased DNA methylation in one or more loci selected from the groups consisting of NALP4, BDKRB1, Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2, thereby characterizing cancer in the subject. In some embodiments, the DNA methylation comprises CpG methylation.

[0020] In some embodiments, the biological sample is a biopsy sample. In some embodiments, the biological sample is a blood sample. In some embodiments, the biological sample is a peripheral blood sample.

[0021] Also within the invention is non-transient computer readable storage medium, comprising executable instructions for detecting bladder cancer or a susceptibility of developing said cancer, the executable instructions configured to: receive data characterizing the intensity of DNA methylation in one or more gene loci; compare the intensity of DNA methylation to a model; and calculate a likelihood of bladder cancer. A system for predicting bladder cancer comprises the following elements: an input module for analyzing genomic DNA and detecting the intensity of DNA methylation in the genomic DNA, wherein one or more gene loci of the genomic DNA are inspected; and a processor configured to determine a likelihood of bladder cancer using said intensity of DNA methylation and a model computed from one or more subjects known to have bladder cancer and one or more subjects known to not have bladder cancer.

[0022] The foregoing description of related art is not intended in any way as an admission that any of the documents described therein, including pending United States patent

applications, are prior art to embodiments of the present disclosure. Moreover, the description herein of any disadvantages associated with the described products, methods, and/or

apparatus, is not intended to limit the disclosed embodiments. Indeed, embodiments of the present disclosure may include certain features of the described products, methods, and/or apparatus without suffering from their described disadvantages.

[0023] Each of the publications cited herein, e.g., journal articles and GENBANK

Accession numbers, is incorporated by reference herein.

BRIEF DESCRIPTION OF THE DRAWINGS

[0024] Figure 1. Diagram of the analysis strategy employed in defining the methylation profiles in the training set and applying those profiles for classification in the testing set. Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

[0025] Figures 2A and 2B. DNA Methylation Profiles defined by a panel of 9 loci are significantly associated with bladder cancer. (A) The RPMM based classification of methylation of 9 loci (columns) in the peripheral blood-derived DNA of the 230 subjects (rows) in the testing dataset is depicted in the heatmap, with the 7 classes separated by red lines. The overall mean methylation and confidence intervals (error bars) by class are depicted in the bar graph on the right. (B) The prevalence of cases and controls (y-axis) in each of the predicted classes (x-axis). A permutation based Chi-square test suggests that case control prevalence is significantly different by methylation class (P<0.0001). Top: Cases; Bottom: Control.

[0026] Figures 3 A and 3B. Receiver operator curve (ROC) analysis of methylation profiles. (A) ROC curve based on methylation class only results in a significant AUC of 0.70 (95% CI 0.63, 0.77). (B) ROC curve including methylation classes, patient gender, age, smoking status (never, former, current), and family history of bladder cancer results in a significant AUC of 0.76 (95% CI 0.70, 0.82).

[0027] Figure 4. Diagram of the Gene-Set Enrichment Analysis on DNA Methylation Data. The upper panel depicts the transcription factor binding sites (TFBS) within lkB of differentially methylated loci associated with aging, bladder cancer, and their overlap grouped by functional role or family. The lower panel depicts the KEGG pathways that are over-represented amongst the loci with differential methylation associated with aging, bladder cancer, and their overlap, grouped by higher level pathways.

[0028] Figure 5 is a flow chart showing a method for detecting cancer or a risk of developing cancer.

DETAILED DESCRIPTION OF THE INVENTION

[0029] Epigenetics relates to gene expression or cellular phenotype caused by mechanisms other than changes in the underlying DNA sequence. It refers to functionally relevant modifications to the genome that do not involve a change in the actual nucleotide sequence. Examples of such changes are DNA methylation and histone deacetylation. Such changes serve to regulate gene expression without altering the nucleotide sequence of the regulated gene. Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

[0030] The genome-wide DNA methylation profiles of peripheral blood from a population-based case control study of bladder cancer was used to identify profiles of DNA methylation in this accessible (but not diseased) tissue, that are associated with bladder cancer. By examining the gene pathways involved as well as the genomic context of the loci with bladder cancer associated methylation, we provide insight into the functional consequences of these profiles and their genesis, respectively.

[0031] According to some embodiments, methods are provided for identification of differentially methylated genomic CpG dinucleotide sequences associated with bladder cancer in an individual by obtaining a biological sample comprising genomic DNA from the individual measuring the level or pattern of one or more methylated genomic CpG dinucleotide sequences in two or more of the genomic targets in the sample, and comparing the level of the one or more methylated genomic CpG dinucleotide sequences in the sample to a reference level of methylated genomic CpG dinucleotide sequences, wherein a difference in the level or pattern of methylation of the genomic CpG dinucleotide sequences in the sample compared to the reference level identifies differentially methylated genomic CpG dinucleotide sequences associated with bladder cancer.

[0032] The methods of the invention are directed to methods for diagnosing an individual with a condition that is characterized by a level and/or pattern of methylated genomic CpG dinucleotide sequences distinct from the level and/or pattern of methylated genomic CpG dinucleotide sequences exhibited in the absence of the particular condition. This invention also is directed to methods for predicting the susceptibility of an individual to a condition that is characterized by a level and/or pattern of methylated genomic CpG dinucleotide sequences that is distinct from the level and/or pattern of methylated genomic CpG dinucleotide sequences exhibited in the absence of the condition.

[0033] In one embodiment, this invention provides diagnostic markers for cancer. The markers of the invention are genomic sequences having methylation states that are diagnostic or prognostic of the presence or severity of cancer. A list of exemplary genes for which methylation state can be used to determine the presence or severity of bladder cancer include CDPCRl/3 (see, e.g., GenBank Accession Number NG_029476 (GL340745288), incorporated herein by reference), CDPCR3HD (see, e.g., GenBank Accession Number NG_029476 (GL340745288), incorporated herein by reference), CHX10 (e.g., GenBank Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

Accession Number AY336059 (GL33285957), incorporated herein by reference), GFI1 (e.g., GenBank Accession Number NG_007874 (GL188595651), incorporated herein by reference), HFH1 (e.g., GenBank Accession Number AF225950 (GI: 12655883), incorporated herein by reference), LM02 (e.g., GenBank Accession Number NM_001142316 (GL214832218), incorporated herein by reference), MEF2 (e.g., GenBank Accession Number L16794 (GL401768), incorporated herein by reference), MEIS (see, e.g., GenBank Accession Number NG_029108 (GL335892881), incorporated herein by reference; GenBank Accession Number NG_011467 (GL224809261), incorporated herein by reference), HOXA9 (e.g., GenBank Accession Number NG_029923 (GL345525397), incorporated herein by reference), MRF2 (see, e.g., GenBank Accession Number AL671972.8 (GL21911536), incorporated herein by reference), NKX25 (e.g., GenBank Accession Number NM_004387 (GL260898746), incorporated herein by reference), PAX2/6 (e.g., GenBank Accession Number NM_003987 (GI: 152963640), incorporated herein by reference; GenBank Accession Number NM_000280 (GI: 189083678), incorporated herein by reference), POU3F2 (e.g., GenBank Accession Number NM_005604 (GI:51702520), incorporated herein by reference), POU6F1 (e.g., GenBank Accession Number NM_002702 (GL223890224), incorporated herein by reference), PPARG (e.g., GenBank Accession Number NM_138712 (GI: 116284369), incorporated herein by reference), SOX5 (e.g., GenBank Accession Number NM_006940 (GL30061559), incorporated herein by reference), TALl-b (see, e.g., GenBank Accession Number NM_003189 (GI: 197927279), incorporated herein by reference), AREB6 (e.g., GenBank Accession Number D15050 (GL457560), incorporated herein by reference), CEBP (e.g., GenBank Accession Number NM_004364 (GL225735576), incorporated herein by reference; GenBank Accession Number NM_005194 (GL356640243), incorporated herein by reference), GR (see, e.g., GenBank Accession Number NM_001204263

(GL324021680), incorporated herein by reference), STAT5B (e.g., GenBank Accession Number NM_012448 (GL42519913), incorporated herein by reference), CREBP1 (e.g., GenBank Accession Number U16028 (GI: 1039380), incorporated herein by reference),

FREAC3/4 (see, e.g., GenBank Accession Number BD206053 (GL33015823), incorporated herein by reference),

TATA, PBX1 (e.g., GenBank Accession Number NM_002585 (GL326320046), incorporated herein by reference), FOXOl (e.g., GenBank Accession Number NM_002015 Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

(GI: 133930787), incorporated herein by reference), FOX02 , FOX03 (e.g., GenBank

Accession Number NM_001455 (GI: 146260266), incorporated herein by reference), FOX04 (e.g., GenBank Accession Number NM_005938 (GL283436081), incorporated herein by reference), CETSP54,

C-REL (e.g., GenBank Accession Number L41414 (GL8248631), incorporated herein by reference), E4BP4 (e.g., GenBank Accession Number X64318 (GL30955), incorporated herein by reference), SREBP1 (e.g., GenBank Accession Number U00968 (U00968.1), incorporated herein by reference), ARNT (e.g., GenBank Accession Number NM_001178 (NM_001178.4), incorporated herein by reference), CDC5 (e.g., GenBank Accession

Number NM_001253 GL356640202), incorporated herein by reference), SRF (e.g., GenBank Accession Number NM_003131 (NM_003131.2), incorporated herein by reference), GATAl (e.g.,GenBank Accession Number NM_002049.3 (GI: 183227689), incorporated herein by reference),

IRF1 (e.g., GenBank Accession Number NM_002198 (GI: 196049386), incorporated herein by reference), IRF7 (e.g., GenBank Accession Number NM_001572 (GL98985820), incorporated herein by reference), NFAT (see, e.g., GenBank Accession Number

NM_138714 (GI: 342672020), incorporated herein by reference), STAT1 (see, e.g., GenBank Accession Number NM_007315 GL189458859), incorporated herein by reference), STAT3 (e.g., GenBank Accession Number NM_213662 (GL47458819), incorporated herein by reference), HNF1 (e.g., GenBank Accession Number NM_000545 (GL256542296),

incorporated herein by reference),

LHX3 (e.g., GenBank Accession Number NM_178138 (GL315013528), incorporated herein by reference), MIF1 (e.g., GenBank Accession Number BD248115 (GL33057885),

incorporated herein by reference), NFY (e.g., GenBank Accession Number NM_001142588 (GL217272830), incorporated herein by reference), NALP4 (E.G., genbank accession number AF442488 (GI: 17064171), incorporated herein by reference), BDKRB1 (e.g.,

GenBank Accession Number AY275464 (GI: 18105039), incorporated herein by reference), C14ORF103 (see, e.g., GenBank Accession Number CM000265 (GL74273668),

incorporated herein by reference), COX7C (e.g., GenBank Accession Number NM_001867 (GL18105039), incorporated herein by reference), ZNF322B (e.g., GenBank Accession

Number XM_003403446 (GL341915480), incorporated herein by reference), HIGD2A (e.g., GenBank Accession Number NM_138820 (GL52851396), incorporated herein by reference), Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

TBCA (e.g., GenBank Accession Number NM_004607 (GL94421476), incorporated herein by reference), BRD7 (e.g., GenBank Accession Number AF152604 (GL8452873),

incorporated herein by reference), and PSME2 (e.g., GenBank Accession Number

NM_002818 (GL30410791), incorporated herein by reference).

[0034] The prognostic methods of the invention are useful for determining if a patient is at risk for recurrence. Cancer recurrence is a concern relating to a variety of types of cancer. The prognostic methods of the invention can be used to identify surgically treated patients likely to experience cancer recurrence so that they can be offered additional therapeutic options, including preoperative or postoperative adjuncts such as chemotherapy, radiation, biological modifiers and other suitable therapies. The methods are especially effective for determining the risk of metastasis in patients who demonstrate no measurable metastasis at the time of examination or surgery.

[0035] The prognostic methods of the invention also are useful for determining a proper course of treatment for a patient having cancer. A course of treatment refers to the therapeutic measures taken for a patient after diagnosis or after treatment for cancer. For example, a determination of the likelihood for cancer recurrence, spread, or patient survival, can assist in determining whether a more conservative or more radical approach to therapy should be taken, or whether treatment modalities should be combined. For example, when cancer recurrence is likely, it can be advantageous to precede or follow surgical treatment with chemotherapy, radiation, immunotherapy, biological modifier therapy, gene therapy, vaccines, and the like, or adjust the span of time during which the patient is treated. As described herein, the diagnosis or prognosis of cancer state is typically correlated with the degree to which one or more of the genes selected from NALP4, BDKRB1, C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2 is methylated. Thus, the invention can include a determination made based on the methylation state for the entire set of genes or a subset of the genes.

[0036] This invention provides methods for determining a prognosis for survival for a cancer patient. One method involves (a) measuring a level of methylation for one or more of the genes selected from the groups consisting of NALP4, BDKRB1, C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2 in a biological sample from an individual, and (b) comparing the level of methylation in the sample to a reference level of methylation Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

for the gene, wherein a low level of methylation for the gene in the sample correlates with increased survival of the patient.

[0037] The invention also provides a method for monitoring the effectiveness of a course of treatment for a patient with cancer. The method involves (a) determining a level of one or more of the genes listed in selected from the groups consisting of NALP4, BDKRB1, C14ORF103, COX7C, ZNF322B, HIGD2A, TBCA, BRD7, and PSME2 in a biological sample from an individual prior to treatment, and (b) determining the level of methylation for the gene in a biological sample from the patient after treatment, whereby comparison of the level of methylation for the gene prior to treatment with the level of methylation for the gene after treatment indicates the effectiveness of the treatment.

[0038] Methods of measuring DNA methylation

[0039] The level of methylation of the differentially methylated genomic CpG dinucleotide sequences can provide a variety of information about the cancer and can be used, for example, to diagnose bladder cancer in the individual; to predict the course of the cancer in the individual; to predict the susceptibility to cancer in the individual, to stage the progression of the cancer in the individual; to predict the likelihood of overall survival for the individual; to predict the likelihood of recurrence of cancer for the individual; to determine the effectiveness of a treatment course undergone by the individual.

[0040] As described herein, the level of methylation that is detected in a biological sample can be decreased or increased in comparison to the reference level and alterations that increase or decrease methylation can be detected and provide useful prognostic or diagnostic information. In addition to detecting levels of methylation, the present invention also allows for the detection of patterns of methylation. Analysis of methylation patterns across these chromosome in biological samples from afflicted individuals can reveal epigenetic changes in the form of altered levels of methylation of subsets of genomic CpG dinucleotide sequences that make up a pattern of affected genomic targets that can be correlated with a condition.

[0041] Methylation of CpG dinucleotide sequences can be measured using any method known in the art. Methylation of CpG dinucleotide sequences can be measured using any of a variety of techniques used in the art for the analysis of specific CpG dinucleotide methylation status. For example, methylation can be measured by employing a restriction Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

enzyme based technology, which utilizes methylation sensitive restriction endonucleases for the differentiation between methylated and unmethylated cytosines. Restriction enzyme based technologies include, for example, restriction digest with methylation- sensitive restriction enzymes followed by Southern blot analysis, use of methylation- specific enzymes and PCR, restriction landmark genomic scanning (RLGS) and differential methylation hybridization (DMH).

[0042] Methods for determining and analyzing methylation profiles are disclosed in U.S. Publication No. 2005/0227230, U.S. Publication No. 2009/0264306, U.S. Publication No. 2005/0164246, the disclosures of which are incorporated herein by reference in their entireties.

Definitions

[0043] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In the case of conflict, the present specification, including definitions, will control. In addition, the

materials, methods, and examples are illustrative only not intended to be limiting. Other features and advantages of the invention will be apparent from the following detailed

description and claims.

[0044] For the purposes of promoting an understanding of the embodiments described herein, reference will be made to preferred embodiments and specific language will be used to describe the same. The terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention. As used throughout this disclosure, the singular forms "a," "an," and "the" include plural reference unless the context clearly dictates otherwise. Thus, for example, a reference to "a

composition" includes a plurality of such compositions, as well as a single composition, and a reference to "a therapeutic agent" is a reference to one or more therapeutic and/or

pharmaceutical agents and equivalents thereof known to those skilled in the art, and so forth. Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

[0045] Throughout this application, the term "about" is used to indicate that a value includes the standard deviation of error for the device or method being employed to

determine the value.

[0046] Reference to numeric ranges throughout this specification encompasses all numbers falling within the disclosed ranges. Thus, for example, the recitation of the range of about 1% to about 5% includes 1%, 2%, 3%, 4%, and 5%, as well as, for example, 2.3%, 3.9%, 4.5%, etc.

[0047] The use of the term "or" in the claims is used to mean "and/or" unless explicitly indicated to refer to alternatives only or the alternatives are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and "and/or."

[0048] As used in this specification and claim(s), the words "comprising" (and any form of comprising, such as "comprise" and "comprises"), "having" (and any form of having, such as "have" and "has"), "including" (and any form of including, such as "includes" and

"include") or "containing" (and any form of containing, such as "contains" and "contain") are inclusive or open-ended and do not exclude additional, unrecited elements or method steps.

[0049] As used herein, the term "methylation profile" refers to a presentation of

methylation status of one or more cancer marker genes in a subject's genomic DNA. In some embodiments, the methylation profile is compared to a standard methylation profile

comprising a methylation profile from a known type of sample (e.g., cancerous or noncancerous samples or samples from different stages of cancer). In some embodiments, methylation profiles are generated using the methods of the present invention. The profile may be presented as a graphical representation (e.g., on paper or on a computer screen), a physical representation (e.g., a gel or array) or a digital representation stored in computer memory.

[0050] As used herein, the term "one or more" includes 1 or more, 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, or 9 or more.

[0051] DNA methylation: Methylation of bases contained in the DNA double helix, resulting in a loss of gene function. Generally occurring on cytosine residues in the DNA, methylation is important in regulating cell growth and differentiation and has resulted in the testing of DNA methyltransferase inhibitors as anti-cancer agents and differentiation agents. Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

[0052] Epigenetic: The transfer of information from one cell to its descendants without the information's being encoded in the nucleotide sequence of the DNA. The methylation of the promoter to inactivate a gene is an example of an epigenetic change. Epigenetic inheritance is typically transmitted in dividing cells. Al-though rare, it is occasionally seen in traits being transmitted from one generation to another. Epigenetic variants can arise spontaneously and just as spontaneously revert.

[0053] Epigenome: The overall epigenetic state of a cell.

[0054] Gene Set Enrichment Analysis (GSEA.): GSEA is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biologic states.

[0055] K EGG: Kyoto Encyclopedia of Genes and Genomes is an integrate bioinformatic database resource that is used as a reference knowledge base for biologic interpretation of large-scale data sets generated by sequencing and other high-throughput experimental technologies (online at The KEGG website at www.kegg.jp has become the primary site of the KEGG database developed by Kanehisa Laboratories. The GenomeNet website at www.genome.jp operated by Kyoto University Bioinformatics Center will continue to mirror the KEGG database and provide additional KEGG-based analysis services.

[0056] Population-based: Study in which the subjects are drawn from a defined population in a manner that is representative of the source population studied. Such a design can avoid bias arising from the selective factors that guide affected individuals to a particular medical facility, allowing for greater generalizability of the findings.

[0057] Recursively partitioned mixture model (RPMM): A likelihood-based hierarchical clustering procedure that produces classification solutions similar to those of conventional mixture models in a computationally efficient manner and allows for precise inference regarding potential covariates.

[0058] Regulatory T cells (known as suppressor T cells): are a specialized subpopulation of T cells that act to suppress activation of the immune system and thereby maintain immune system homeostasis and tolerance to self-antigens. This is an important "self-check" built into the immune system so that responses do not go haywire. Regulatory T cells come in many forms, including those that express the CD8 trans-membrane glycoprotein (CD8 T cells), Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

those that express CD4, CD25 and Foxp3 (CD4CD25 regulatory T cells or "Tregs") and other T cell types that have suppressive function. These cells are involved in closing down immune responses after they have successfully tackled invading organisms and also in keeping in check immune responses that may potentially attack one's own tissues (autoimmunity).

Example 1: DNA Methylation Array Analysis Identifies Profiles of Blood-derived DN A Methylation Associated with Bladder Cancer

[0059] Epigenetic alterations in tissues targeted for cancer play a causal role in

carcinogenesis. Changes in DNA methylation in non-target tissues, specifically peripheral blood, can also affect risk of malignant disease. We sought to identify specific profiles of DNA methylation in peripheral blood that are associated with bladder cancer risk and therefore serve as an epigenetic marker of disease susceptibility.

[0060] We performed genome- wide DNA methylation profiling on subjects involved in a population-based incident case-control study of bladder cancer. In a Training set of 112 cases and 118 controls, we identified a panel of 9 CpG loci whose profile of DNA

methylation was significantly associated with bladder cancer in a masked, independent testing series of 111 cases and 119 controls (P<0.0001). Membership in 3 of the most methylated classes was associated with a 5.2 fold increased risk of bladder cancer (95% CI 2.8, 9.7), and a model including the methylation classification, subject age, gender, smoking status, and family history of bladder cancer was a significant predictor of bladder cancer (AUC 0.76, 95% CI 0.70, 0.82). CpG loci associated with bladder cancer and aging had neighboring sequences enriched for transcription-factor binding sites related to immune modulation and forkhead family members.

[0061] These results indicate that profiles of epigenetic states in blood are associated with risk of bladder cancer, and signal the utility of epigenetic profiles in peripheral blood as diagnostic and/or prognosticl markers of susceptibility to this type of cancer and other malignancies.

METHODS

[0062] Study Population: The study population has been previously described (Karagas et al., Environ. Health Perspect. 106 Suppl. 4:1047-50, 1998; and Wallace et al., Cancer Prev. Res. 2:70-3, 2009). Briefly, cases of incident bladder cancer were identified from the New Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

Hampshire state cancer registry from July 1, 1994 until June 30, 1998, and a standardized histopathologic review was conducted by a single study pathologist to verify the diagnosis and histopathology of the cases. The case group in this study is limited to Caucasian cases, due to limited number of non-Caucasian subjects in the study population. For cases, blood samples were collected on average within 1 year following diagnosis. All controls less than 65 years of age were selected from records obtained from the New Hampshire Department of Transportation and controls older than 65 years of age were chosen from records obtained from the Health Care Financing Administration's Medicare Program. Informed consent was obtained from each participant, and all procedures and study materials were approved by the Committee for the Protection of Human Subjects at Dartmouth College and Brown

University. Consenting participants underwent a detailed in-person interview covering sociodemographic information and lifestyle factors such as the use of tobacco.

[0063] DNA Methylation and Statistical Analysis: DNA was extracted from peripheral blood buffy coats using the QIAmp DNA mini kit according to the manufacturer's protocol (Qiagen, Valencia, CA), and was subjected to sodium bisulfite modification using the EZ DNA Methylation Kit (Zymo Research, Orange, CA) following the manufacturer's protocol. Methylation profiling was performed using the Illumina Infinium Methylation27 Bead Array at the UCSF Institute for Human Genetics Genomic Core Facility.

[0064] The scheme of our analysis strategy aimed at identifying and validating novel epigenetic biomarkers of bladder cancer in peripheral blood is depicted in Figure 1. We used the methods of Houseman et al (BMC Bioinformatics 9:365, 2008), the recursively

partitioned mixture model (RPMM), as this model-based clustering strategy has been

demonstrated to perform effectively and efficiently for methylation data derived from the Illumina array technologies, and it allows for inference in addressing the associations between the methylation-based clusters and covariates. Training and testing sets were obtained by randomly sampling within bladder cancer case control status. We used a procedure called Semi-Supervised Recursively Partitioned Mixture Models (SS-RPMM) (Koestler et al., Bioinformatics 26:2578-85, 2010), similar in spirit to the semi- supervised methodologies proposed by Bair and Tibshirani (PLoS Biol. 2:E108, 2004) for identifying methylation profiles that are associated with case control status. To examine the robustness of the association identified in the SS-RPMM we also utilized a LASSO approach for Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

modeling the association between methylation profile and bladder cancer status, utilizing the same training and testing datasets. Details of these methods and results are included below.

[0065] Methylation Data QA/QC: The methylation status of a specific CpG site was calculated from the intensity of the methylated (M) and unmethylated (U) alleles, as the ratio of fluorescent signals β = Max(M,0)/[Max(M,0) + Max(U,0) + 100]. On this scale, 0 < β < 1, with β -values close to 1 (0) indicating methylation (no methylation). Quality assurance was assessed by detection P-values, and no loci had a sizable fraction (>25%) of P values above a predetermined threshold (10-5). In addition, the multivariate characteristics (e.g. Cholesky residuals (Subramanian et al., Proc. Natl. Acad. Sci USA 102:15545-50, 2005) or

Mahalanobis distance based on fitted mean vector and variance-covariance matrix) of array control probes supplied by Illumina were used to diagnose problems such as poor bisulfite conversion or color- specific problems though none were noted. Finally, only autosomal loci were considered, thus our analysis was performed on 26,486 loci.

[0066] Statistical Analysis of Methylation Data: Stratifying on case status, age, sex, and smoking status, we partitioned our data into training and test data sets (n=230 for each).

Within the training data, we selected the top M loci most associated with case control status by fitting a series of linear mixed effects models, one for each of the 26,486 loci in our data set, where the choice of M is described below. For the linear effects model, we let y ijk represent the arcsin square root transformed methylation among subject i = 1,2, · · · , N , loci j = 1,2, · · · , , and plate/beadChip k = 1,2, · · · , K . We fit linear mixed effects models of the form:

y ijk = βο ] + βι ] ί + α + ε ί

where J3 0j is the intercept, a ik is a normally distributed random intercept assumed to have the same value for all arrays on the same plate/beadChip, £ ijk is a normally distributed error term (now assumed to represent technical variation), and β 1] is a fixed effect term for case control status, where

Case i = 1 if subject i is a case and Case i = 0 if subject i is a control. Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

[0067] To determine the M number of loci with the largest absolute t-statistic to be used in fitting RPMM, we randomly split the Training set D 0 into two sets: D 0a and D 0b . A RPMM was fit to Do a using the M pre-selected genes with largest absolute t-statistics. Using the results from the RPMM fit on Do a , we predicted class membership C i for the observations in

Do b using Empirical Bayes and assign the observations in Do b to the class which has the largest posterior probability. We then computed and recorded the P- value from a

permutation-based Chi-square test, testing the hypothesis that the predicted classes in Do b are not associated with case control status. The fitting of the RPMM was repeated varying M from M m i n =5 to M max =100 . This cross-validation was then repeated by making different splits of Do into Do a and Do b , and the median p-value for each specification of M across the different splits of Do was computed. We selected M to be the value that minimized median P- value. Using the M loci with the largest absolute t-statistics which were derived using only the training data, we then fit a RPMM assuming beta-distributed responses, to the training data. The resulting model provides a latent class structure on the pre-selected loci, which was then used in conjunction with Empirical Bayes to predict class for the observations in the testing data. We then assessed the association between predicted methylation class and case- control status in the test data using a permutation based Chi-Square test, and assessed clinical relevance by computing ROC curves and their associated AUC. The ROC curves and corresponding AUC were generated (1) based only on the predicted methylation classes in the testing data and (2) based on the predicted methylation classes in the testing data

controlled for subject gender, age, smoking status never, former, or current), and family history of bladder cancer.

[0068] Gene Set Enrichment Analysis: For each defined gene set (described below), the GSEA enrichment score statistic was applied to correlation statistics obtained from regression coefficients. To quantify differential methylation between cases and controls, the following ordinary least squares regression model was fit to data for each autosomal CpG site:

y ijk = o j + i j C se i + k x k + e i 7i = 0,

[2] similar to equation [1] defined above except for the fixed-effects parameterization of bead chip effects, and the use of the full training and test data set consisting of 223 cases and 237 controls. The correlation values r caie) — / SD(y ) , where p , was the

Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012 proportion of non-missing case values for CpG j and ■ was the corresponding proportion of non-missing control values, were used as inputs to the GSEA enrichment score function to obtain a gene set statistic G. A sample from the permutation distribution of G was obtained by fitting [2] with case status shuffled once for all CpG sites, and 1000 such samples were used to obtain the nominal significance of G, computed by calculating the proportion of permutation samples G* for which IG* l>IGI. Note that in this analysis, a fixed effects model was used instead of the random effects model used for equation [1], in order to obtain the permutation distribution in a computationally efficient manner. To quantify age-related changes, a similar procedure was used, except that [2] was replaced with

y ljk = & j + A j Age, + , (Sex, = Female) + y k x k + ε ι , γ 1 = 0, [3] and fit to the data set consisting only of 237 controls, the correlation statistic was computed as r ( ge) = A j SD j (age) /SD(y j ) , where the standard deviation SD j (age) was

recomputed for each CpG site j using only the controls for which no methylation data were missing, and control age was shuffled in [3] to obtain the permutation distribution. Note that the age-methylation models were adjusted for sex, since the matched-design induced a correlation between sex and age (due to the differential pattern of incidence of bladder cancer between males and females).

[0069] Kegg protein interaction pathway data sets were constructed by parsing Kegg XML files for homo sapiens and matching gene nodes to CpG sites by Entrez ID. Transcription factor binding site data sets were obtained by querying the Genomes Browser table

tfbsConsSites and excluding sites with Z-score less than or equal to 2. A separate gene set for each of 252 types of transcription factor binding sites was obtained by determining which CpG sites were within lkbp of the midpoint of the nearest instance of the transcription factor binding sites.

[0070] Gene Set Enrichment Analysis (GSEA; Subramanian et al., Proc Natl Acad Sci U S A 102: 15545-50, 2005; hereby incorporated by reference) was used to explore the biological relevance of blood-based alterations in DNA methylation for distinguishing bladder cancer cases from controls and also as a result of aging. Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

RESULTS

[0071] The profile of DNA methylation was obtained for 460 peripheral blood samples using the Human Methylation Beadarray (Houseman et al., BMC Bioinformatics 9:365, 2008; hereby incorporated by reference). We utilized a semi-supervised strategy to identify profiles of DNA methylation associated with bladder cancer and to examine if the identified profiles can predict case status in a series of blinded test samples (Figure 1). Following quality assurance procedures, the dataset was split into training and testing series.

Characteristics of the cases and controls are shown in Table 1, and do not differ significantly between training and testing sets.

Table 1. Characteristics of the subjects used in the analysis

Characteristic Controls Cases

Total n 237 223

Subject Age (yrs), median (range) 65 (28-74) 66 (25-74)

Gender, n(%)

Male 158 (48%) 171 (52%)

Female 79 (60%) 52 (40%)

Family History of Bladder Cancer *

No 224 (53%) 199 (47%)

Yes 7 (44%) 9 (56%)

Smoking History

Never 72 (64%) 40 (36%)

Former 126 (53%) 111 (47%)

Current 39 (35%) 72 (65%)

Tumor Stage- Grade Designation

Carcinoma-in- situ — 6 (3 %)

Non-invasive Low Grade (Gl-2) — 140 (63%)

Non-invasive High Grade (G3) — 17 (7%)

Invasive - 60 (27%)

Data on family history not available on 13 subjects

[0072] The first step of our semi-supervised strategy was to identify those CpG loci whose methylation state was most significantly associated with being bladder cancer case than control. To do this we fit a series of linear mixed effects model using the training data only for each of the 26,486 CpGs in the data set. This allowed us to model each methylation value as the dependent variable, with a random effect for plate (to allow for inter-plate

normalization) based on single normalization sample run on all plates, and a fixed effect for Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

case-control status. CpG loci were ranked based on the absolute value of the t-statistic derived from the model, and the top 9 loci were chosen based on a nested cross-validation procedure (Materials and Methods, Figure 1) for inclusion in the RPMM, which clustered the samples based on the methylation profile of these 9 loci in the training data. To predict class membership in the testing data using only the methylation status of these 9 loci, the latent class structure from the RPMM solution fit to the training data was used in conjunction with an empirical Bayes procedure. The methylation profile of these 9 loci in the testing data is depicted in Figure 2a, which also shows the mean methylation across loci within a given class and the relationships among the classes through the dendrogram. The right branch classes (those beginning with the letter R) had overall mean methylation that was significantly greater than the left branch classes (P<0.0001).

[0073] In the test set, we observed that class membership was significantly associated with case-control status (P<0.0001, permutation-based Chi-square test, Figure 2b), with the right branch classes containing a higher proportion of bladder cases than controls compared to the left branch classes. Each of the 9 CpG loci used in the classifier had greater methylation among cases than controls. We assessed performance of the classifier by using ROC curves and calculating the AUC. Using methylation class alone, the AUC was 0.70, with 95% bootstrap confidence interval (0.63, 0.77). After adjustment for subject age, gender, smoking status (never, former, current), and family history of bladder cancer the AUC increased to 0.76, with 95% bootstrap confidence interval (0.70, 0.82)(Figure 3a,b). To identify if the association between methylation profiles and bladder cancer is sensitive to the statistical methodology employed in the examination, we also performed our analysis utilizing a

LASSO approach, utilizing the same training and testing datasets. The methods and results of these analyses suggest our identification of bladder cancer associated methylation classes is robust to the statistical method employed.

[0074] Unconditional logistic regression was used to calculate the magnitude of the association between methylation class and bladder cancer, controlling for potential

confounders. There was a trend of increasing risk of disease moving from the left to right branch of the classification, with the highest risk for members of class RR compared to LLL (OR= 8.7, 95% CI 1.5, 55.2). Comparing the right branch classes to the left, the OR for bladder cancer was 5.2 (95% CI 2.8, 9.7), controlled for subject age, gender, smoking status, Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

and family history of bladder cancer. There was no difference in the prevalence of invasive disease across the predicted classes.

[0075] As previous work has suggested that aging is associated with epigenetic states in peripheral blood and can be related to the alterations associated with cancer, we sought to examine if there was any overlap in the biological pathways impacted by differential DNA methylation associated with age or case status. We performed a gene set enrichment analysis (GSEA) based on Kegg-defined pathways using the combined training and testing data, and compared pathways over-represented among loci associated with subject age (in controls) to those associated with disease. Pathways with a nominal P<0.05 based on the GSEA

enrichment statistic are provided in Figure 4, grouped by function. No overlapping pathways based on age and disease associated loci were identified. However, similar functional groupings of pathways were identified in both age-associated and bladder cancer-associated loci and are detailed in Figure 4. Genetic information processing pathways were identified exclusively amongst loci associated with bladder cancer.

[0076] In addition to examining the functional consequences of differential methylation in peripheral blood between cases and controls, we hypothesized that differential methylation profiles may represent a response of the hematopoietic system to a developing tumor, i.e. the methylation profiles capture the downstream effects of this response, which may be through differential binding of transcription factors near sites of altered methylation. The top half of Figure 4 depicts the results of this GSEA-based analysis, depicting binding sites of

transcription factors over-represented within lkB of loci whose DNA methylation related to age, bladder cancer status, or both, grouped by similar structure or functional response.

Binding sites for a forkhead containing transcription factor and a transcription factor involved in immune modulation (GATA1), overlapped between loci associated with age and disease status. Loci with differential methylation strongly associated with age were nearby binding sites of a large number of transcription factors related to developmental processes including homeobox containing transcription factors, as well as factors involved in immune modulation and stress response. Oncogenic transcription factor binding sites as well as immune

modulation and development related transcription factor binding sites were exclusively over- represented near loci whose methylation was associated with bladder cancer. Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

[0077] Results of LASSO Analysis: To identify if the association between methylation profiles and bladder cancer is sensitive to the statistical methodology employed in the examination, we also performed our analysis utilizing a LASSO approach, utilizing the same training and testing datasets. In the same training series used for the SS-RPMM, we fit two models, first investigating the association between the top 1000 most variable loci on the array and bladder cancer risk, and the second with the 1000 loci most associated with bladder cancer risk. The first mode resulted in 48 loci being identified with non-zero estimates of the coefficient representing correlation with bladder cancer status, and the second with 66 loci. Among the 9 CpG loci that were used in the SS-RPMM analysis, none overlapped with the 48 loci from the model containing the most variable loci and 3 overlapped with the 66 CpG loci identified in the model containing the most associated loci. In fact, none of the 9 CpG loci used for the SS-RPMM analysis were among the top 1000 most variable CpG loci, which illustrates a potential bias in this strategy, towards CpG having greater biochemical range in the assay related to probe binding effects. There was, though, a high degree of correlation between the 9, 48, and 66 loci used in these 3 analytical strategies, suggesting, in fact, that each, through potentially different loci, is identifying the same inherent epigenetically altered process related to disease.

[0078] To investigate the predictive performance of these models, ROC curves were constructed and the AUC calculated. The AUCs and corresponding 95% bootstrap CIs using only the predicted probabilities of case/control status in the testing data were 0.76 (0.69, 0.82) and 0.78 (0.72, 0.84) for the LASSO models fit to the top 1000 most variable CpG loci and to the top 1000 CpG loci most associated with bladder case/control status, respectively.

Furthermore, the AUCs and corresponding 95% bootstrap CIs using the predicted

probabilities of case/control status, controlled for gender, age, smoking status, and family history of bladder cancer were 0.78 (0.72, 0.84) and 0.79 (0.73, 0.85) for the LASSO models fit to the top 1000 most variable CpG loci and the top 1000 CpG loci most associated with bladder case/control status, respectively. The previous analysis using SS-RPMM resulted in an AUC of 0.70, with 95% bootstrap confidence interval (0.63, 0.77) using only the predicted methylation classes in the testing data. When methylation class was controlled for subject age, gender, smoking status, and family history of bladder cancer the AUC increased to 0.76, with 95% bootstrap confidence interval (0.70, 0.82). The LASSO models we considered Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

resulted in slightly better prediction performance compared to SS-RPMM, though required the use of 5-7 times as many loci. Nevertheless, both the SS-RPMM and LASSO methods lead to the same basic message; a disease signal is detectable in methylation measured on peripheral blood.

DNA Methylation as a Biomarker

[0079] This study represents a very novel, large scale examination of the utility of DNA methylation in peripheral blood as a biomarker of bladder cancer risk, and suggests that there are epigenetic alterations detectable in accessible, non-diseased tissue which reflects

susceptibility to bladder tumorigenesis. The profiles represent a directed alteration and by looking at the genomic context of those loci whose differential methylation was associated with aging or bladder cancer, the manner in which these pathologic processes are influencing methylation status is defined.

[0080] Specifically, we examined the representation of transcription factor binding sites within lkb of the loci demonstrating differential methylation with age and with case status. Loci associated with aging and with disease demonstrated over-representation of specific transcription factors involved in immune modulation and proliferation. Specific to those loci associated with bladder cancer, we observed an over-representation of transcription factor binding sites for transcription factors which have been functionally characterized as

oncogenes, and those involved in lipid/sterol homeostasis, while specific to loci associated with aging were a number of transcription factor binding sites critical in developmental processes. The key role of immune modulation in both aging and carcinogenesis, and particularly bladder carcinogenesis, highlights how the detected methylation alterations may represent specific changes to the immune system that enable tumorigenesis (Dietrich et al., Br. J. Cancer 101:1316-20, 2009). For instance, there is a growing literature on the role of T- regulatory (Treg) cells and their over-abundance in both the peripheral blood as well as in the target epithelial tissues of a developing tumor, and these methylation alterations may

represent changes to the representation of specific lymphocyte subsets in peripheral blood as either a mediator or consequence of bladder tumorigenesis (Loskog et al., J. Urol. 177:353-8, 2007; Wang et al., Curr. Opin. Immunol. 19:217-23, 2007; and Ruffell et al., Cytokine

Growth Factor Rev. 21:3-10, 2010). In fact recent work has demonstrated that Foxol and Foxo3 proteins are critical for the control of Treg cell differentiation and specifying Treg cell Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

lineage (Ouyang et al., Nat. Immunol. 11:618-27, 2010). It is equally important to consider transcription factor binding sites which were not over-represented in our analyses; these include those involved in , angiogenesis/VEGF signaling, and cellular interaction and communication. Taken together, these results are striking, as they suggest that methylation of DNA in the hematopoietic system associated with aging and bladder cancer may be

associated with the presence or absence of transcription factor binding that is both specific to the two processes as well as overlapping.

[0081] The nine genes identified as harboring signal CpGs that are most associated with bladder cancer represent a wide range of cellular processes. For example, BRD7 is an activator of the WNT signaling pathway, which plays a critical role in stem cell maintenance and a pathways whose alteration has been linked to bladder cancer (Marsit et al., Cancer Res. 65:7081-5, 2005). TBCA encodes a member of the multiprotein complex responsible for appropriate folding of the tubulin protein, and may be involved in responding to cellular stress events leading to an unfolded protein response (Tian et al., Cell 86:287-96, 1996).

COX7C is one member of the cytochrome c oxidase complex responsible for mitochondrial respiration and changes in its expression have been observed in skin squamous cell

carcinoma and in response to 5-fluorouracil treatment (Hofmann et al., Cytogenet. Cell

Genet. 83:226-7, 1998; Dang et al., Oncol. Rep 16:513-9, 2006; and De Angelis et al., Mol. Cancer 5:20, 2006). At a higher level, similar pathways were disrupted in both age- associated and cancer-associated methylation including those related to organismal systems, cellular processes, human disease and environmental information processing.

[0082] This work, in summary, indicates that there is untapped potential in the use of peripheral-blood based epigenetic profiling for bladder cancer risk prediction or early detection, as well as in understanding the complicated interplay of multiple systems in tumorigenesis. We have demonstrated with high accuracy, the ability to distinguish bladder cancers from controls using a model containing the DNA methylation profile of 9 loci, patient age, gender, smoking status, and family history (AUC=0.76). Profiles of DNA methylation reflecting increased methylation of these 9 loci were associated with an over 5- fold increased risk for bladder cancer compared to profiles with lesser extents of methylation.

[0083] The addition of GWAS-based SNPs to bladder cancer prediction models does not appear to significantly improve their performance, compared to models including risk factors Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

and demographics alone. Researchers (Wu et al., Cancer Metastasis Rev. 28:269-80, 2009). recently developed a risk modeling strategy for bladder cancer, including epidemiologic variables, as well as a phenotypic measure of mutagen sensitivity, and this model showed similar performance (AUC=0.8) to our model, and points to the need for including

phenotypically relevant data in risk prediction strategies (Wu et al., J. Clin. Oncol. 25:4974- 81, 2007). Our use of phenotypically relevant DNA methylation profiles, though, may be more appealing, as measurement of DNA methylation is more amenable to the clinical setting than are those of mutagen sensitivity assay, which time intensive and laborious, requiring lymphocyte culture and microscopic assessment following exposure to a test mutagen. The results described herein demonstrate the tremendous clinical potential of epigenetic profiling of peripheral blood DNA (Widschwendter et al., PLoS One 3:e2656, 2008; Teschendorff et al., PLoS One 4:e8274, 2009; and Wang et al., J. Thorac. Oncol., 2010).

[0084] Figure 5 illustrates the method for detecting bladder cancer or a susceptibility of developing bladder cancer. The intensity, level or pattern of DNA methylation is

characterized 502. The measurement/characterization of intensity, level, or pattern is compared to a model 504. The model may be predetermined from data derived from a set of subjects known to have bladder cancer and a set of subjects known to not have bladder cancer. A likelihood of cancer is computed 506.

[0085] Other embodiments are within the scope and spirit of the invention. For example, due to the nature of software, functions described above can be implemented using software, hardware, firmware, hardwiring, or combinations of any of these. Features implementing functions may also be physically located at various positions, including being distributed such that portions of functions are implemented at different physical locations. Operations such as generating, determining, identifying etc., may be include calculations performed by a machine configured with a processor and memory.

[0086] The subject matter described herein can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structural means disclosed in this specification and structural equivalents thereof, or in combinations of them. The subject matter described herein can be implemented as one or more computer program products, such as one or more computer programs tangibly embodied in an information carrier (e.g., in a machine-readable storage device), or embodied in a propagated signal, for Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

execution by, or to control the operation of, data processing apparatus (e.g., a programmable processor, a computer, or multiple computers). A computer program (also known as a program, software, software application, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file. A program can be stored in a portion of a file that holds other programs or data, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.

[0087] The processes and logic flows described in this specification, including the method steps of the subject matter described herein, can be performed by one or more programmable processors executing one or more computer programs to perform functions of the subject matter described herein by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus of the subject matter described herein can be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

[0088] To provide for interaction with a user, the subject matter described herein can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, (e.g., a mouse or a trackball), by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, (e.g., visual feedback, auditory feedback, or tactile feedback), and input from the user can be received in any form, including acoustic, speech, or tactile input.

[0089] The foregoing is illustrative of the present invention and is not to be construed as limiting thereof. Although a few exemplary embodiments of this invention have been described, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of this invention. Accordingly, all such modifications are intended to be included Via EFS Attorney Docket No.: 35947 -005001 WO

Date of Deposit: February 22, 2012

within the scope of this invention as defined in the claims. Therefore, it is to be understood that the foregoing is illustrative of the present invention and is not to be construed as limited to the specific embodiments disclosed, and that modifications to the disclosed embodiments, as well as other embodiments, are intended to be included within the scope of the appended claims. The invention is defined by the following claims, with equivalents of the claims to be included therein.