Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MGAT1-DEFICIENT CELLS FOR PRODUCTION OF VACCINES AND BIOPHARMACEUTICAL PRODUCTS
Document Type and Number:
WIPO Patent Application WO/2019/018310
Kind Code:
A1
Abstract:
Mannosyl (alpha-1,3)-glycoprotein beta-1,2-N-Acetylglucosaminyltransferase (Mgat1)-deficient cell lines and methods for use of same for producing human immunodeficiency virus (HIV) envelope glycoprotein polypeptides or fragment thereof with terminal mannose-5 glycans are provided.

Inventors:
BERMAN PHILLIP (US)
BYRNE GABRIEL (US)
Application Number:
PCT/US2018/042335
Publication Date:
January 24, 2019
Filing Date:
July 16, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV CALIFORNIA (US)
International Classes:
A61K39/21; C07K14/16; C12N9/10
Domestic Patent References:
WO2014043220A22014-03-20
WO2013106515A12013-07-18
Other References:
JAVIER F. MORALES ET AL: "HIV-1 Envelope Proteins and V1/V2 Domain Scaffolds with Mannose-5 to Improve the Magnitude and Quality of Protective Antibody Responses to HIV-1", JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 289, no. 30, 28 May 2014 (2014-05-28), US, pages 20526 - 20542, XP055361872, ISSN: 0021-9258, DOI: 10.1074/jbc.M114.554089
STEPANENKO A A ET AL: "HEK293 in cell biology and cancer research: phenotype, karyotype, tumorigenicity, and stress-induced genome-phenotype evolution", GENE, ELSEVIER, AMSTERDAM, NL, vol. 569, no. 2, 27 May 2015 (2015-05-27), pages 182 - 190, XP029247983, ISSN: 0378-1119, DOI: 10.1016/J.GENE.2015.05.065
YAO-CHENG LIN ET AL: "Genome dynamics of the human embryonic kidney 293 lineage in response to cell biology manipulations", NATURE COMMUNICATIONS, vol. 5, no. 1, 3 September 2014 (2014-09-03), XP055507043, DOI: 10.1038/ncomms5767
JOHN SY GOH ET AL: "Producing recombinant therapeutic glycoproteins with enhanced sialylation using CHO-gmt4 glycosylation mutant cells", BIOENGINEERED, vol. 5, no. 4, 9 June 2014 (2014-06-09), US, pages 269 - 273, XP055238680, ISSN: 2165-5979, DOI: 10.4161/bioe.29490
CHEN W ET AL: "Five Lecl CHO cell mutants have distinct Mgatl gene mutations that encode truncated N-acetylglucosaminyltransferase I", GLYCOBIO, OXFORD UNIVERSITY PRESS, US, vol. 13, no. 1, 1 January 2003 (2003-01-01), pages 43 - 50, XP008136866, ISSN: 0959-6658, DOI: 10.1093/GLYCOB/CWG003
NATALIE R. SEALOVER ET AL: "Engineering Chinese Hamster Ovary (CHO) cells for producing recombinant proteins with simple glycoforms by zinc-finger nuclease (ZFN)-mediated gene knockout of mannosyl (alpha-1,3-)-glycoprotein beta-1,2-N-acetylglucosaminyltransferase (Mgat1)", JOURNAL OF BIOTECHNOLOGY, vol. 167, no. 1, 1 August 2013 (2013-08-01), pages 24 - 32, XP055212692, ISSN: 0168-1656, DOI: 10.1016/j.jbiotec.2013.06.006
GRAV LISE MARIE ET AL: "Application of CRISPR/Cas9 Genome Editing to Improve Recombinant Protein Production in CHO Cells", 1 January 2017, IMMUNOGENETICS : METHODS AND APPLICATIONS IN CLINICAL PRACTICE; [METHODS IN MOLECULAR BIOLOGY ; 882], NEW YORK : SPRINGER, C2012, US, PAGE(S) 101 - 118, ISBN: 978-1-61779-841-2, XP009505060
JAE SEONG LEE ET AL: "CRISPR/Cas9-mediated genome engineering of CHO cell factories: Application and perspectives", BIOTECHNOLOGY JOURNAL, vol. 10, no. 7, 9 June 2015 (2015-06-09), DE, pages 979 - 994, XP055372758, ISSN: 1860-6768, DOI: 10.1002/biot.201500082
RACHEL DORAN ET AL: "Glycan modifications to the gp120 immunogens used in the RV144 vaccine trial improve binding to broadly neutralizing antibodies", PLOS ONE, 24 April 2018 (2018-04-24), pages 1 - 17, XP055507047, Retrieved from the Internet [retrieved on 20180913]
GABRIEL BYRNE ET AL: "CRISPR/Cas9 gene editing for the creation of an MGAT1-deficient CHO cell line to control HIV-1 vaccine glycosylation", PLOS BIOLOGY, vol. 16, no. 8, 29 August 2018 (2018-08-29), pages e2005817, XP055507048, DOI: 10.1371/journal.pbio.2005817
O'ROURKE ET AL: "Robotic selection for the rapid development of stable CHO cell lines for HIV vaccine production", PLOS ONE, 2 August 2018 (2018-08-02), pages 1 - 22, XP055507049, Retrieved from the Internet [retrieved on 20180913]
DATABASE Protein [O] "GenBank", Database accession no. AAB50262
DATABASE Protein [O] "GENBANK", Database accession no. AAB50262
SANDERS RW; MOORE JP, IMMUNOLOGICAL REVIEWS, vol. 275, no. 1, 1 January 2017 (2017-01-01), pages 161 - 182
SANDERS RW ET AL., PLOS PATHOGENS, vol. 9, no. 9, 19 September 2013 (2013-09-19), pages el003618
SHARMA SK ET AL., CELL REPORTS, vol. 11, no. 4, 28 April 2015 (2015-04-28), pages 539 - 550
KARLSSON HEDESTAM GB ET AL., IMMUNOLOGICAL REVIEWS, vol. 275, no. 1, 1 January 2017 (2017-01-01), pages 183 - 202
HSU, P.D. ET AL., CELL, vol. 157, no. 6, pages 1262 - 1278
SANDER, J.D.; J.K. JOUNG, NAT BIOTECH, vol. 32, no. 4, 2014, pages 347 - 355
BINLEY, J.M. ET AL., JOURNAL OF VIROLOGY, vol. 84, no. 11, 2010, pages 5637 - 5655
ZHU, X. ET AL., BIOCHEMISTRY, vol. 39, no. 37, 2000, pages 11194 - 11204
GO, E.P. ET AL., JOURNAL OF PROTEOME RESEARCH, vol. 12, no. 3, 2013, pages 1223 - 1234
RERKS-NGARM , S. ET AL., NEW ENGLAND JOURNAL OF MEDICINE, vol. 361, no. 23, 2009, pages 2209 - 2220
KARASAVVAS, N. ET AL., AIDS RES HUM RETROVIRUSES, vol. 28, no. 11, 2012, pages 1444 - 1457
KIM, J.H. ET AL., ANNU REV MED, vol. 66, 2015, pages 423 - 437
DOORES, K.J. ET AL., PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 107, no. 31, 2010, pages 13800 - 13805
BONOMELLI, C. ET AL., PLOS ONE, vol. 6, no. 8, 2011, pages e23521
GO, E.P. ET AL., J VIROL, vol. 85, no. 16, 2011, pages 8270 - 8284
PRITCHARD, L.K. ET AL., NAT COMMUN, vol. 6, 2015, pages 7479
MCLELLAN, J.S. ET AL., NATURE, vol. 480, no. 7377, 2011, pages 336 - 343
PEJCHAL, R. ET AL., SCIENCE, vol. 334, no. 6059, 2011, pages 1097 - 1103
LAVINE, C.L. ET AL., JOURNAL OF VIROLOGY, vol. 86, no. 4, 2012, pages 2153 - 2164
KONG, L. ET AL., NAT STRUCT MOL BIOL, vol. 20, no. 7, 2013, pages 796 - 803
WURM, F.M.; D. HACKER, NAT BIOTECH, vol. 29, no. 8, 2011, pages 718 - 720
XU, X. ET AL., NAT BIOTECH, vol. 29, no. 8, 2011, pages 735 - 741
BIEBERICH, E., ADVANCES IN NEUROBIOLOGY, vol. 9, 2014, pages 47 - 70
MOREMEN, K.W.; M. TIEMEYER; A.V. NAIRN, NAT REV MOL CELL BIOL, vol. 13, no. 7, 2012, pages 448 - 462
CHANG, V.T. ET AL., STRUCTURE, vol. 15, no. 3, 2007, pages 267 - 273
SEALOVER, N.R. ET AL., JOURNAL OF BIOTECHNOLOGY, vol. 167, no. 1, 2013, pages 24 - 32
PATNAIK, S.K.; P. STANLEY, METHODS IN ENZYMOLOGY, vol. 416, 2006, pages 159 - 182
LEE, J. ET AL., BIOCHEMISTRY, vol. 42, no. 42, 2003, pages 12349 - 12357
CHRISTIANSEN, M.N. ET AL., PROTEOMICS, vol. 14, no. 4-5, 2014, pages 525 - 546
HAMOUDA, H. ET AL., JOURNAL OF PROTEOME RESEARCH, vol. 13, no. 12, 2014, pages 6144 - 6151
BERMAN, P.W., AIDS RES HUM RETROVIRUSES, vol. 14, no. 3, 1998, pages 277 - 289
BERMAN, P.W. ET AL., VIROLOGY, vol. 265, no. 1, 1999, pages 1 - 9
BURTON, D.R.; L. HANGARTNER, ANNU REV IMMUNOL, vol. 34, 2016, pages 635 - 659
DAVENPORT, T.M ET AL., JOURNAL OF VIROLOGY, vol. 85, no. 14, 2011, pages 7095 - 7107
HENZLER, H.-J.; K. KAISER, NAT BIOTECH, vol. 16, no. 11, 1998, pages 1077 - 1079
MOODY, M. ET AL., PDA J PHARM SCI TECHNOL, vol. 65, no. 6, 2011, pages 580 - 588
MORITZ, B. ET AL., SCIENTIFIC REPORTS, vol. 5, 2015, pages 16952
BOTHWELL, A.L. ET AL., CELL, vol. 24, no. 3, 1981, pages 625 - 637
SENAPATHY, P. ET AL., METHODS ENZYMOL, vol. 183, 1990, pages 252 - 278
LASKY, L.A. ET AL., SCIENCE, vol. 233, no. 4760, 1986, pages 209 - 212
SOUTHERN, P.J.; P. BERG, J MOL APPL GENET, vol. 1, no. 4, 1982, pages 327 - 341
LU, S.: "HIV Env. Manufacturing Workshop", 11 June 2015, NIAID
NAKAMURA, G.R. ET AL., J VIROL, vol. 67, no. 10, 1993, pages 6179 - 6191
SMITH, D.H. ET AL., PLOS ONE, vol. 5, no. 8, 2010, pages e12076
RERKS-NGARM, S. ET AL., N ENGL J MED, vol. 361, no. 23, 2009, pages 2209 - 2220
BERMAN, P.W., AIDS RES HUM RETROVIRUSES, vol. 14, no. 3, 1998, pages S277 - S289
HAYNES, B.F. ET AL., N ENGL J MED, vol. 366, no. 14, 2012, pages 1275 - 1286
MONTEFIORI, D.C. ET AL., J INFECT DIS, vol. 206, no. 3, 2012, pages 431 - 441
O'CONNELL, R.J. ET AL., EXPERT REV VACCINES, vol. 13, no. 12, 2014, pages 1489 - 500
WALKER, L.M. ET AL., SCIENCE, vol. 326, no. 5950, 2009, pages 285 - 9
WALKER, L.M. ET AL., NATURE, vol. 477, no. 7365, 2011, pages 466 - 70
HAAS, J. ET AL., CURR BIOL, vol. 6, no. 3, 1996, pages 315 - 24
SINCLAIR, A.M.; S. ELLIOTT, J PHARM SCI, vol. 94, no. 8, 2005, pages 1626 - 35
WU, X. ET AL., SCIENCE, vol. 329, no. 5993, 2010, pages 856 - 861
SHINGAI, M. ET AL., NATURE, vol. 503, no. 7475, 2013, pages 277 - 280
ASHKENAZI, A. ET AL., PROC NATL ACAD SCI USA, vol. 88, no. 16, 1991, pages 7056 - 60
CAPON, D.J. ET AL., NATURE, vol. 337, no. 6207, 1989, pages 525 - 31
LEE, C. ET AL., BIOPROCESS INTERNATIONAL, vol. 4, no. 3, 2006, pages 32 - 35
SOLA, R.J.; K. GRIEBENOW, BIODRUGS, vol. 24, no. 1, 2010, pages 9 - 21
LEONARD, C.K. ET AL., J BIOL CHEM, vol. 265, no. 18, 1990, pages 10373 - 82
YU, B. ET AL., PLOS ONE, vol. 7, no. 8, 2012, pages e43903
SRIVASTAVA, I.K. ET AL., J VIROL, vol. 76, no. 6, 2002, pages 2835 - 47
SELLHORN, G. ET AL., JOURNAL OF VIROLOGY, vol. 86, no. 1, 2012, pages 128 - 142
ARTHOS, J. ET AL., NAT IMMUNOL, vol. 9, no. 3, 2008, pages 301 - 9
LEONARD, C.K. ET AL.: "Assignment of intrachain disulfide bonds and characterization of potential glycosylation sites of the type 1 recombinant human immunodeficiency virus envelope glycoprotein (gp120) expressed in Chinese hamster ovary cells", J BIOL CHEM, vol. 265, no. 18, 1990, pages 10373 - 82
YU, B. ET AL.: "Glycoform and Net Charge Heterogeneity in gp120 Immunogens Used in HIV Vaccine Trials", PLOS ONE, vol. 7, no. 8, 2012, pages e43903, XP055323710, DOI: doi:10.1371/journal.pone.0043903
WANG, Z. ET AL., VACCINES, vol. 4, no. 2, 2016, pages 17
Attorney, Agent or Firm:
CHANDRA, Shweta (US)
Download PDF:
Claims:
Claims

What is claimed is:

1. A genetically modified Chinese hamster ovary (CHO) cell line comprising: a heterologous nucleic acid comprising a nucleotide sequence encoding a human immunodeficiency virus (HIV) envelope glycoprotein polypeptide or fragment thereof comprising an N-linked glycosylation site; and

a mutation of an endogenous gene encoding mannosyl (alpha- l,3)-glycoprotein beta- 1,2-N-Acetylglucosaminy transferase (Mgatl),

wherein the mutation prevents Mgatl -mediated addition of a INT- ace tylglucos amine moiety to a terminal mannose residue present at the N-linked glycosylation site of the HIV envelope glycoprotein polypeptide such that at least 75% of the HIV envelope glycoprotein polypeptides produced by the genetically modified cell line comprise terminal mannose-5, mannose-8, or mannose-9 glycans at the N-linked glycosylation site.

2. The genetically modified cell line of claim 1, wherein the polypeptide is gpl20 or an N-linked glycosylation site containing fragment thereof.

3. The genetically modified cell line of claim 2, wherein the fragment comprises variable regions 1 and 2 (V1/V2) or V3 domain comprising N-linked glycosylation sites N301 and N332.

4. The genetically modified cell line of claim 3, wherein the fragment comprising variable regions 1 and 2 is a monomer.

5. The genetically modified cell line of claim 1, wherein the polypeptide or fragment thereof is gpl40.

6. The genetically modified cell line of claim 5, wherein the polypeptide or fragment thereof is expressed as a trimer.

7. The genetically modified cell line of any one of the preceding claims, wherein the polypeptide is fused to a heterologous signal sequence.

8. The genetically modified cell line of claim 7, wherein the heterologous signal sequence comprises the amino acid sequence set forth in one of SEQ ID NOs: 44-47.

9. The genetically modified cell line of any one of the preceding claims, wherein the polypeptide comprises a purification tag.

10. The genetically modified cell line of claim 9, wherein the purification tag comprises the amino acid sequence set forth in one of SEQ ID NOs: 48-56.

11. The genetically modified cell line of claim 1, wherein the polypeptide comprises the amino acid sequence set forth in SEQ ID NO: 1, 2, 3, 5, 7, 9, 10, 12, 13, 15, 17, 18, 20, 22, 23, 25, 26, 28, 30, 32, 34, 36, 38, 40, or 42.

12. The genetically modified cell line of any one of the preceding claims, wherein the cell line produces the polypeptide at a concentration of at least 50 mg/L after 5 days of culturing.

13. The genetically modified cell line of any one of the preceding claims, wherein the cell line is of CHO Kl lineage.

14. The genetically modified cell line of any one of the preceding claims, wherein the cell line is of CHO-S lineage.

15. The genetically modified cell line of any one of the preceding claims, wherein the cell line comprises an endogenous gene encoding glutamine synthetase (GS).

16. The genetically modified cell line of any one of the preceding claims, wherein the cell line comprises an endogenous gene encoding dihydrofolate reductase (DHFR).

17. A genetically modified Chinese hamster ovary (CHO) cell line comprising a mutation of gene encoding mannosyl (alpha- l,3)-glycoprotein beta-l,2-N- Acetylglucosaminy transferase (Mgatl), wherein the genetically modified cell line is deposited with American Type Culture Collection (ATCC) as: i) PTA- 124141; or ii) PTA- 124142.

18. A method of producing a human immunodeficiency virus (HIV) envelope glycoprotein polypeptide or fragment thereof, the fragment comprising an N- linked

glycosylation site, the polypeptide or fragment thereof comprising terminal mannose-5 glycans, the method comprising:

a) introducing a nucleic acid comprising a nucleotide sequence encoding the HIV envelope glycoprotein polypeptide into a genetically modified Chinese hamster ovary (CHO) cell line comprising a mutation of an endogenous gene encoding mannosyl (alpha- 1,3)- glycoprotein beta- 1,2-N-Acetylglucosaminyl transferase (Mgatl),

wherein the mutation prevents Mgatl mediated addition of a N-acetylglucosamine moiety to a terminal mannose residue such that at least 75% of the HIV envelope glycoprotein polypeptide produced by the genetically modified cell line comprises terminal mannose-5, mannose-8, or mannose-9 glycans; and

b) culturing the cell line in a liquid culture medium under conditions sufficient for production of the HIV envelope glycoprotein polypeptide comprising terminal mannose-5, mannose-8, or mannose-9 glycans.

19. The method of claim 18, wherein the envelope glycoprotein fragment comprises variable region 3 (V3) and optionally, C3 domain.

20. The method of claim 18, wherein the envelope glycoprotein is gpl20 or a fragment thereof.

21. The method of claim 18, wherein the fragment comprises variable regions 1 and 2 (V1/V2).

22. The method of claim 21, wherein the fragment comprising variable regions 1 and 2 is a monomer.

23. The method of claim 18, wherein the polypeptide is gpl40 or a fragment thereof.

24. The method of claim 23, wherein polypeptide is expressed as a trimer.

25. The method of any one claims 18-24, wherein the polypeptide is fused to a heterologous signal sequence.

26. The method of claim 25, wherein the heterologous signal sequence comprises the amino acid sequence set forth in one of SEQ ID NOs: 44-47.

27. The method of any one of claims 18-26, wherein the polypeptide comprises a purification tag.

28. The method of claim 27, wherein the purification tag comprises the amino acid sequence set forth in one of SEQ ID NOs: 48-56.

29. The method of claim 18, wherein the polypeptide comprises the amino acid sequence set forth in SEQ ID NO: 1, 2, 3, 5, 7, 9, 10, 12, 13, 15, 17, 18, 20, 22, 23, 25, 26, 28, 30, 32, 34, 36, 38, 40, or 42.

30. The method of claim 18, wherein the nucleic acid comprises a nucleotide sequence set forth in SEQ ID NO:4, 6, 8, 11, 14, 16, 19, 21, 24, 27, 29, 31, 33, 35, 37, 39, 41, or 43.

31. The method of any one of claims 18-30, comprising screening individual clones of the cell line to identify clones expressing the highest amounts of the polypeptide, the screening comprising plating the clones in a semisolid matrix and contacting the clones with a detectably labeled antibody that binds to the polypeptide.

32. The method of claim 31, wherein the antibodies are fluorescently labeled antibodies that bind to the polypeptide and form a precipitate around the clones, wherein the precipitate is visible under fluorescent light.

33. The method of claim 32, further comprising identifying clones surrounded by precipitate meeting a selection threshold and isolating the identified clones.

34. The method of any one of claims 31-33, wherein the antibodies are polyclonal antibodies.

35. The method of claim 34, wherein the polyclonal antibodies are affinity purified antibodies that bind to the polypeptide.

36. The method of any one of claims 32-35, wherein the fluorescent label is Alexa dye.

37. The method of any one of claims 18-36, further comprising recovering the HIV envelope glycoprotein polypeptide comprising terminal mannose-5, mannose-8, or mannose-9 glycans from the culture medium.

38. A recombinant HIV envelope glycoprotein polypeptide or a fragment thereof comprising at least one N-linked glycosylation site, wherein the polypeptide or the fragment comprises terminal mannose-5, mannose-8, or mannose-9 glycans at the N-linked glycosylation site.

39. The recombinant HIV envelope glycoprotein polypeptide or a fragment thereof of claim 38 comprising a plurality of N-linked glycosylation sites, wherein the polypeptide or the fragment comprises terminal mannose-5, mannose-8, or mannose-9 glycans at the plurality of N- linked glycosylation sites.

40. The recombinant HIV envelope glycoprotein polypeptide or a fragment thereof of claim 38, wherein at least 75% of the N-linked glycosylation sites of the polypeptide or the fragment comprise terminal mannose-5, mannose-8, or mannose-9 glycans.

41. The recombinant HIV envelope glycoprotein polypeptide or a fragment thereof of any one of claims 38-40, wherein the polypeptide is gpl20 or a fragment thereof.

42. The recombinant HIV envelope glycoprotein polypeptide or a fragment thereof of any one of claims 38-41, wherein the fragment comprises variable regions 1 and 2 (V1/V2) or V3 domain comprising N-linked glycosylation sites N301 and N332.

43. The recombinant HIV envelope glycoprotein polypeptide or a fragment thereof of claim 42, wherein the fragment comprising variable regions 1 and 2 is a monomer.

44. The recombinant HIV envelope glycoprotein polypeptide or a fragment thereof of any one of claims 38-41, wherein the polypeptide or fragment thereof is gpl40.

45. The recombinant HIV envelope glycoprotein polypeptide or a fragment thereof of claim 44, wherein the polypeptide or the fragment is expressed as a trimer.

46. The recombinant HIV envelope glycoprotein polypeptide or a fragment thereof of any one of claims 38-45, wherein the polypeptide or the fragment is fused to a heterologous signal sequence.

47. The recombinant HIV envelope glycoprotein polypeptide or a fragment thereof of claim 46, wherein the heterologous signal sequence comprises the amino acid sequence set forth in one of SEQ ID NOs: 44-47.

48. The recombinant HIV envelope glycoprotein polypeptide or a fragment thereof of any one of claims 38-47, wherein the polypeptide or the fragment comprises a purification tag.

49. The recombinant HIV envelope glycoprotein polypeptide or a fragment thereof of claim 48, wherein the purification tag comprises the amino acid sequence set forth in one of SEQ ID NOs: 48-56.

50. The recombinant HIV envelope glycoprotein polypeptide or a fragment thereof of any one of claims 38-40, wherein the polypeptide or the fragment comprises the amino acid sequence set forth in SEQ ID NO: 1, 2, 3, 5, 7, 9, 10, 12, 13, 15, 17, 18, 20, 22, 23, 25, 26, 28, 30, 32, 34, 36, 38, 40, or 42 or comprises an amino acid sequence at least 85% identical to the amino acid sequence set forth in SEQ ID NO: 1, 2, 3, 5, 7, 9, 10, 12, 13, 15, 17, 18, 20, 22, 23, 25, 26, 28, 30, 32, 34, 36, 38, 40, or 42.

51. A composition comprising the polypeptide or the fragment of any one of claims 38-50 and a pharmaceutically acceptable excipient.

52. A method for inducing an immune response to HIV in a mammal, the method comprising administering to the mammal the composition of claim 51.

Description:
MGAT1-DEFICIENT CELLS FOR PRODUCTION OF VACCINES AND

BIOPHARMACEUTICAL PRODUCTS

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of U.S. Provisional Patent Application No.

62/534,594 filed on July 19, 2017, which application is incorporated herein by reference in its entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

[0002] This invention was made with government support under grant no. R01

All 13893, awarded by the National Institutes of Health. The government has certain rights in the invention.

INTRODUCTION

[0003] Human immunodeficiency virus type 1 (HIV-1) entry into a host cell is dependent on envelope glycoprotein (Env), which consists of two noncovalently bound subunits, the external gpl20 and the transmembrane gp41. Env is present on virion surfaces as trimers of gpl20-gp41 complexes and is involved in the binding of the virus to the host receptor and co- receptor(s). Env is also the target for the binding of neutralizing antibodies.

[0004] The development of a vaccine able to provide protection from HIV-1 infection has long been a global public health priority. To achieve this goal, vaccine development efforts have focused on the discovery of immunogens able to elicit cellular immune responses (e.g., cytotoxic lymphocytes) or broadly neutralizing antibody (bNAb) responses. Cellular immune responses are detected soon after infection in most HIV-1 infected individuals, whereas bNAb responses are found in only 10-20% of infected individuals. Unfortunately, after more than 30 years of research, none of the candidate vaccines described to date have been effective in eliciting bNAbs.

[0005] The recent isolation and characterization of multiple human bNAbs from HIV-1 infected subjects has now identified the epitopes responsible for much of the neutralizing activity in sera from HIV-1- infected humans. Over the past several years, the structures of several bNAbs in complexes with gpl20 fragments have been elucidated. Several of these bNAbs, including PG9, PG16, CHOI, CH03, and PGT145 appear to target glycan-dependent epitopes (GDEs) in the V1/V2 domain of gpl20. PG9 and PG9-like antibodies are particularly interesting, since the epitope they recognize appears to overlap with an epitope associated with protection from HIV-1 infection in the RV144 HIV-1 vaccine trial. Structural studies showed that the binding of PG9 was highly dependent on mannose-5 glycans at positions 156 and 160, as well as basic amino acid side chains at positions 167-169 and 171 and that this region is required for the binding of multiple neutralizing and non-neutralizing antibodies to the V1/V2 domain.

[0006] Mannosyl (alpha- l,3)-glycoprotein beta- 1 ,2-N-Acetylglucosaminyl transferase

(Mgatl, also known as Gntl) adds N-Acetylglucosamine to the MansGlcNAc2 (Man5) N-glycan structure as part of complex N-glycan synthesis and expressed by eukaryotic cell lines such as CHO cell lines.

[0007] Thus, there remains a need for the development of cell lines that do not have

Mgatl activity and can express exogenous polypeptides stably and in sufficient quantities.

SUMMARY

[0008] The present disclosure provides mannosyl (alpha- l,3)-glycoprotein beta-l,2-N-

Acetylglucosaminy transferase (Mgatl) -deficient cell lines and methods for use of same for producing human immunodeficiency virus (HIV) envelope glycoprotein polypeptides or fragment thereof with terminal mannose-5 glycans (MansGlcNAc2).

[0009] In certain aspects, a genetically modified Chinese hamster ovary (CHO) cell line is provided. The cell line includes a heterologous nucleic acid comprising a nucleotide sequence encoding a human immunodeficiency virus (HIV) envelope glycoprotein polypeptide comprising an N-linked glycosylation site; and a mutation of an endogenous gene encoding mannosyl (alpha- l,3)-glycoprotein beta- 1 ,2-N- Acetylglucosaminy transferase (Mgatl), where the mutation prevents Mgatl -mediated addition of a N-acetylglucosamine moiety to a terminal mannose residue present at the N-linked glycosylation site of the HIV envelope glycoprotein polypeptide such that at least 75% of the HIV envelope glycoprotein polypeptides produced by the genetically modified cell line comprise terminal mannose-5, manse-8, or mannose-9 glycans at the N-linked glycosylation site. The mutation may be a targeted mutation.

[0010] In certain aspects, the polypeptide is gpl20 or an N-linked glycosylation site containing fragment thereof. The fragment may comprise variable regions 1 and 2 (V1/V2). The gpl20 or an N-linked glycosylation site containing fragment thereof or the V1/V2 fragment may be a monomer. In certain aspects, the fragment comprising variable regions 1 and 2 may be at least 50 amino acids long (e.g., 50-100 amino acids) and may include a contiguous sequence having at least 60% sequence identity (e.g. at least 65%, 70%, 75%, 80%, 85%, 90%, 95% identity, or 100 % identity) to the V1/V2 domain sequence set forth in SEQ ID NO: 70: [0011 ] CVTLHCTNANLTKANLTNVNNRTNVSNIIGNITDEVRNCSFNMTTELRDK KQKVHALFYKLDIVPIEDNNDSSEYRLINCNTSVIKQAC (SEQ ID NO:70).

[0012] In certain aspects, the fragment of gpl20 may comprise a 50-100 amino acids long sequence at least 60% identical (e.g. having at least 65%, 70%, 75%, 80%, 85%, 90%, 95% identity, or 100 % identity) to SEQ ID NO:70.

[0013] In other embodiments, the fragment may comprise variable region 3 (V3). In other embodiments, the fragment comprising V3 region or domain may be at least 35 amin acids in length (e.g. 35-50 amino acids) and may include a contiguous sequence having at least 60% sequence identity (e.g. at least 65%, 70%, 75%, 80%, 85%, 90%, 95% identity, or 100 % identity) to the V3 domain sequence set forth in SEQ ID NO: 71:

[0014] QINCTRPNNNTRKRIHIGPGRAFYTTKNIKGTIRQAHCNISRAKWN (SEQ

ID NO:71)

[0015] In certain aspects, the fragment of gpl20 may comprise a 35-50 amino acids long sequence at least 60% identical (e.g. having at least 65%, 70%, 75%, 80%, 85%, 90%, 95% identity, or 100 % identity) to SEQ ID NO:71.

[0016] In certain cases, the V3 region may comprise glycan residue N301 and N332. In certain cases, the V3 region may comprise glycan residue N301 and N332 and may extend from residue 291-342 or 296-337 of A244 gpl20. The gpl20 or an N-linked glycosylation site containing fragment thereof or the VI /V2 fragment may be a monomer. The numbering of the amino acid residues N301, N332, and N334 is with reference to the amino acid sequence of HIV-1 envelope polyprotein of HIV HXB having GenBank Accession No. AAB50262.

AAB50262 provides a 856 amino acids long HIV-1 Env protein sequence; amino acids 34-511 define gpl20 and amino acids 530 to 726 define gp41. Within gpl20, the following domains are present: VI (amino acid position 126-156); V2 (amino acid position 157-205); V3 (amino acid position 292-339); V4 (amino acid position 385-418) and V5 (amino acid position 461-471). Amino acid sequence of envelope polyprotein of HIV HXB having GenBank Accession No. AAB 50262 is as follows:

[0017] MRVKEKYQHLWRWGWRWGTMLLGMLMICSATEKLWVTVYYGVPVW

KEATTTLFCASDAKAYDTEVHNVWATHACVPTDPNPQEVVLVNVTENFNMWKNDMV

EQMHEDIISLWDQSLKPCVKLTPLCVSLKCTDLKNDTNTNSSSGRMIMEKGEIKNCS FNI

STSIRGKVQKEYAFFYKLDIIPIDNDTTSYKLTSCNTSVITQACPKVSFEPIPIHYC APAGF

AILKCNNKTFNGTGPCTNVSTVQCTHGIRPVVSTQLLLNGSLAEEEVVIRSVNFTDN AKT

IIVQLNTSVEINCTRPNNNTRKRIRIQRGPGRAFVTIGKIGNMRQAHCNISRAKWNN TLK

QIASKLREQFGNNKTIIFKQSSGGDPEIVTHSFNCGGEFFYCNSTQLFNSTWFNSTW STEG SNNTEGSDTITLPCRIKQIINMWQKVGKAMYAPPISGQIRCSSNITGLLLTRDGGNSNNE S

EIFRPGGGDMRDNWRSELYKYKVVKIEPLGVAPTKAKRRVVQREKRAVGIGALFLGF L

GAAGSTMGAASMTLTVQARQLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARI L

AVERYLKDQQLLGIWGCSGKLICTTAVPWNASWSNKSLEQIWNHTTWMEWDREINNY

TSLIHSLIEESQNQQEKNEQELLELDKWASLWNWFNITNWLWYIKLFIMIVGGLVGL RI

VFAVLSIVNRVRQGYSPLSFQTHLPTPRGPDRPEGIEEEGGERDRDRSIRLVNGSLA LIW

DDLRSLCLFSYHRLRDLLLIVTRIVELLGRRGWEALKYWWNLLQYWSQELKNSAVSL L

NATAIAVAEGTDRVIEVVQGACRAIRHIPRRIRQGLERILL (SEQ ID NO:72).

[0018] In certain aspects, the polypeptide is gpl40 or an N-linked glycosylation site containing fragment thereof. In certain aspects, the gpl40 polypeptide may be a trimer.

[0019] In certain aspects, the polypeptide may be fused to a signal sequence. The signal sequence may be a native signal sequence or a heterologous signal sequence. In certain aspects, the heterologous signal sequence may be cleaved off from the secreted polypeptide. In certain cases, the signal sequence may be linked to the polypeptide via a linker which may be a cleavable linker. In other embodiments, the signal sequence may not be cleaved off the secreted polypeptide.

[0020] In certain aspects, the heterologous signal sequence comprises the amino acid sequence set forth in one of SEQ ID NOs: 44-47.

[0021] In certain aspects, the polypeptide may be a fusion protein comprising a purification tag. The purification tag may be present at the N-terminus and/or the C-terminus of the polypeptide. In certain aspects, the purification tag may be present at the N-terminus, where the polypeptide comprises from the N-terminus to the C-terminus: native or heterologous signal sequence, purification tag, an optional linker sequence, and the envelope glycoprotein.

[0022] In certain aspects, the polypeptide comprises the amino acid sequence set forth in

SEQ ID NO: 1, 2, 3, 5, 7, 9, 10, 12, 13, 15, 17, 18, 20, 22, 23, 25, 26, 28, 30, 32, 34, 36, 38, 40, or 42.

[0023] In certain aspects, the cell line produces the polypeptide at a concentration of at least 50 mg/L after 5 days of culturing.

[0024] In certain aspects, the cell line is of CHO Kl lineage or of CHO-S lineage.

[0025] In certain aspects, the cell line comprises an endogenous gene encoding glutamine synthetase (GS). In certain aspects, the cell line comprises an endogenous gene encoding dihydrofolate reductase (DHFR).

[0026] In other aspects, the cell line does not express a GS and/or a DHFR. For example, the cell line may include an inactivation, e.g., deletion, of an endogenous gene encoding glutamine synthetase (GS) and/or an endogenous gene encoding dihydrofolate reductase (DHFR).

[0027] Also provided herein is a genetically modified Chinese hamster ovary (CHO) cell line comprising a mutation of gene encoding mannosyl (alpha- l,3)-glycoprotein beta-l,2-N- Acetylglucosaminyltransferase (Mgatl), and expressing gpl20 polypeptide, wherein the genetically modified cell line is deposited with American Type Culture Collection (ATCC) as PTA-124141; or PTA-124142. The mutation may be a targeted mutation.

[0028] In addition, a method of producing a human immunodeficiency virus (HIV) envelope glycoprotein polypeptide comprising terminal mannose-5 glycans is disclosed. The method may include: a) introducing a nucleic acid comprising a nucleotide sequence encoding the HIV envelope glycoprotein polypeptide into a genetically modified Chinese hamster ovary (CHO) cell line comprising a mutation of a gene encoding mannosyl (alpha- l,3)-glycoprotein beta- 1,2-N-Acetylglucosaminy transferase (Mgatl), wherein the mutation prevents Mgatl mediated addition of a N-acetylglucosamine moiety to a terminal mannose residue such that at least 75% of the HIV envelope glycoprotein polypeptide produced by the genetically modified cell line comprises terminal mannose-5 glycans; and b) culturing the cell line in a liquid culture medium under conditions sufficient for production of the HIV envelope glycoprotein polypeptide comprising terminal mannose-5, mannose-8, or mannose-9 glycans. The mutation may be a targeted mutation. In certain cases, introducing the nucleic acid into the cell line may include electroporation.

[0029] The method may include screening individual clones of the cell line to identify clones expressing high levels of the polypeptide. The polypeptide may be the envelope glycoprotein gpl20 or an N- linked glycosylation site containing fragment thereof such as a N- linked glycosylation site containing fragment comprising variable regions 1 and 2 (V1/V2). The gpl20 or an N-linked glycosylation site containing fragment thereof such as a N-linked glycosylation site containing fragment comprising V1/V2 may be a monomer. The polypeptide may be the envelope glycoprotein gpl40. In certain aspects, the cell line may produce the gpl40 polypeptide as a trimer.

[0030] The method may include screening by plating the clones in a semisolid matrix and contacting the clones with a detectably labeled antibody that binds to the polypeptide. In certain cases, the contacting comprises contacting the clones with a plurality of fluorescently labeled antibodies that bind to the polypeptide and form a precipitate around the clones, wherein the precipitate is visible under fluorescent light. In certain cases, the method further includes identifying clones surrounded by precipitate "halo" meeting a selection threshold and isolating the identified clones. The contacting may be carried out by including the detectably labeled antibody (e.g., affinity purified polyclonal antibodies) in the semisolid matrix on which the cells are plated. The polypeptide comprises the amino acid sequence set forth in SEQ ID NO: 1, 2, 3, 5, 7, 9, 10, 12, 13, 15, 17, 18, 20, 22, 23, 25, 26, 28, 30, 32, 34, 36, 38, 40, or 42.

[0031] The method may include recovering the HIV envelope glycoprotein polypeptide comprising terminal mannose-5 glycans from the culture medium.

[0032] As disclosed herein is the use of HIV envelope gp comprising terminal mannose-

5 glycans produced using the cell lines and methods disclosed herein for inducing an immune response to HIV. In certain cases, the method may include administering the purified HIV gp, produced using the cell lines and methods disclosed herein, in a method for treating or preventing HIV infection.

[0033] Also provided herein is a recombinant HIV envelope glycoprotein polypeptide or a fragment thereof comprising at least one N-linked glycosylation site, wherein the polypeptide or the fragment comprises terminal mannose-5, mannose-8, or mannose-9 glycans at the N- linked glycosylation site.

[0034] The recombinant HIV envelope glycoprotein polypeptide or a fragment thereof may comprise a plurality of N-linked glycosylation sites, wherein the polypeptide or the fragment comprises terminal mannose-5, mannose-8, or mannose-9 glycans at the plurality of N- linked glycosylation sites. For example, the polypeptide or the fragment may include 2-20, e.g., 2-15, 2-12, 2-10, 2-8, 2-6, or 2-4, N-linked glycosylation sites and at least 50%-75% of these N- linked glycosylation sites of the polypeptide or the fragment comprise terminal mannose-5, mannose-8, or mannose-9 glycans.

[0035] The recombinant HIV envelope glycoprotein polypeptide or a fragment thereof may be as provided herein. For example, the polypeptide is gpl20 or a fragment thereof, wherein the fragment comprises variable regions 1 and 2 (V1/V2) or V3 domain comprising N- linked glycosylation sites N301 and N332. For example, the fragment comprising variable regions 1 and 2 is a monomer. The polypeptide or fragment thereof may be gpl40. The gpl40 fragment may be a trimer. The polypeptide or the fragment may be fused to a heterologous signal sequence.The heterologous signal sequence comprises the amino acid sequence set forth in one of SEQ ID NOs: 44-47. The polypeptide or the fragment comprises a purification tag. The purification tag comprises the amino acid sequence set forth in one of SEQ ID NOs: 48-56.

[0036] In certain aspects, the polypeptide or the fragment comprises the amino acid sequence set forth in SEQ ID NO: 1, 2, 3, 5, 7, 9, 10, 12, 13, 15, 17, 18, 20, 22, 23, 25, 26, 28, 30, 32, 34, 36, 38, 40, or 42 or comprises an amino acid sequence at least 85% (e.g., 90%, 95%, 96%, 97%, 98%, or 99%) identical to the amino acid sequence set forth in SEQ ID NO: 1, 2, 3,

5, 7, 9, 10, 12, 13, 15, 17, 18, 20, 22, 23, 25, 26, 28, 30, 32, 34, 36, 38, 40, or 42.

[0037] Also provided herein is a composition comprising the polypeptide or the fragment of any one of claims 38-50 and a pharmaceutically acceptable excipient.

[0038] In addition, a method for inducing an immune response to HIV in a mammal by administering the polypeptide and compositions disclosed herein is provided.

BRIEF DESCRIPTION OF THE FIGURES

[0039] FIG. 1 depicts a simplified view of the N-linked glycosylation pathway.

[0040] FIG. 2 shows the GeneArt® CRISPR Nuclease vector.

[0041] FIG. 3A provides the sequence of the CHO Mgatl gene (SEQ ID NO:64). A target of a guideRNA (gRNA) is underlined with the requisite protospacer adjacent motif in bold. FIG. 3B depicts the GeneArt CRISPR nuclease vector used to edit the CHO Mgatl gene. GGGCATTCCAGCCCACAAAGGTTTT (SEQ ID NO: 65) and the complementary sequence (CTTTGTGGGCTGGAATGCCCCGGTG: SEQ ID NO: 66) for facilitating cloning into the vector are depicted.

[0042] FIG. 4 provides a flow chart of Mgatl gene editing and the cell line selection strategy.

[0043] FIG. 5 shows results from a GNA lectin binding assay used to find cells with high mannose surface glycoproteins following CRISPR/Cas9 targeted cleavage of Mgatl.

[0044] FIG. 6A illustrates the native sequence at the region of Mgatl gene targeted by gRNA. FIG. 6B-6D illustrate NHEJR induced changes to the Mgatl gene. Nucleotides different from the native sequence are underlined.

[0045] FIG. 7A shows the cell doubling time of Mgatl " CHO cell lines. FIG. 7B shows the transient expression of gpl20 in Mgatl " CHO cell lines (3.5D9, 3.5D8, 3.4F10, 3.5A2') and in CHO-S and Gntl- cell lines.

[0046] FIGS. 8A-8C illustrate the expression of gpl20 in a GB Mgatl " CHO cell line.

FIG. 8A shows purified A244 produced by WT CHO-S, GB Mgatl CHO, and 293 HEK Gntl " cells. FIG. 8B shows samples of the same proteins digested with Endo H. FIG. 8C shows samples of the same proteins digested with PNGase F.

[0047] FIGS. 9A and 9B illustrates isoelectric focusing of CHO-S and Mgatl gpl20.

FIG. 9A illustrates the isoelectric focusing of gpl20 expressed in CHO-S. FIG. 9B illustrates the isoelectric focusing of gpl20 in expressed in Mgatl " . [0048] FIGS. 10A andlOB show PG9 binding to monomeric gpl20 and V1V2 scaffold was improved by Mgatl knockout (Mgatl-) in CHO cells. FIG. 10A shows PG9 binding to monomeric gpl20. FIG. 10B shows PG9 binding to V1/V2 fragment protein.

[0049] FIG. 11 provides a diagram of the UCSC1331 plasmid used to express

A244_N332-rgpl20.

[0050] FIG. 12 provides a diagram of the chimeric gene used for the expression of

A244_N332-rgpl20.

[0051] FIG. 13 provides the Emboss Needle pairwise sequence alignment of the amino acid sequence of the A244_N332-rgpl20 transcription product with the A244-rgpl20 transcription product used to produce rgpl20 for the RV144 clinical trial. A is A244ucsc rgpl20 (SEQ ID NO:71) and B is A244GNE rgpl20 (SEQ ID NO:72).

[0052] FIG. 14 depicts the comparison of the wild-type A244-rgpl20 transcription product with the A244-N332-rgpl20 transcription product and the mature processed form of the 244_N332-rgpl20 protein.

[0053] FIGS. 15A andl5B provide the Emboss Needle pairwise sequence alignment of the nucleotide sequence of the codon optimized A244_N332-rgpl20 gene (SEQ ID NO:73) and the A244-rgpl20 gene (SEQ ID NO:74) used to produce A244-rgpl20 for the RV144 clinical trial.

[0054] FIG. 16 depicts an SDS-PAGE gel of gpl20 proteins used for goat

immunization.

[0055] FIGS. 17A-17D illustrate the measurement of antibodies to A244-, MN-, and

CN97001 gpl20s and to the HSV1 glycoprotein purification tag during the course of immunization of Goat 577.

[0056] FIGS. 18A-18C illustrate the comparison of ClonePix2 images obtained with protein G purified, Alexa 488 labeled goat IgG and with gpl20-affinity-purified, Alexa 488 labeled IgG. FIG. 18A shows images of cells after a 14 day incubation of Mgatl- cells expressing A244-N332-rgpl20 with polyclonal immuno-affinity purified Alexa 488 labeled goat IgG. FIG. 18B shows images of cells after a 14 day incubation of Mgatl- cells expressing A244_N332-rgpl20 with ΙΟμ&ηύ of Alexa 488 labeled, protein G purified, goat IgG. FIG. 18C shows images of cells from a control experiment where of Mgatl- cells expressing A244_N332- rgpl20 were incubated for 14 days without added antibody.

[0057] FIG. 19 provides a diagram of a method for rapid production of cell lines expressing recombinant gpl20. [0058] FIG. 20 shows GFP expression after MaxCyte STX electroporation of CHO-S cells.

[0059] FIG. 21 shows white and fluorescent images from a single well of

UCSC_CHO.A244N332 transfected cells on the ClonePix 2.

[0060] FIGS 22A-22E provide ClonePix 2 Clone images at Day 16. FIG. 22A

illustrates a single 35mm well of UCSC_CHO.A244N332 transfected colonies illuminated by white light alone. FIG. 22B shows the same well as in A but FITC imaged. FIG. 22C illustrates the superimposition of white and FITC images. FIG. 22D shows six colonies picked on Day 16, expanded, and visualized with white light and FITC. FIG. 22E shows Clone 5F recloned at 25 cells/ml and visualized with white light and FITC.

[0061] FIGS. 23A and23B illustrate the expression of proteins in 2ml wells. FIG. 23A provides a Western blot of tissue culture supernatant from 2 ml wells. FIG. 23B provides indirect ELISA quantification of rgpl20 A244N332.

[0062] FIGS. 24A and24B show batch fed culture expression of Clone 5F:

accumulation of rgpl20 during 600ml protein expression trial culture. FIG. 24A shows a SDS/PAGE gel with 10 μΐ DTT reduced tissue culture supernatant (days 0-5) loaded per lane. FIG. 24B shows a SDS/PAGE gel with 1 μΐ DTT reduced tissue culture supernatant (days 0-5) loaded per lane and western blotted with an antigen specific polyclonal rabbit serum.

[0063] FIGS. 25A-25F illustrate indirect ELISA results showing raw dilution data of tissue culture supernatant collected during a batch fed protein expression assay.

[0064] FIG. 26A depicts protein yield from 600ml batch fed cultures pre and post purification by immunoaffinity capture.

[0065] FIG. 26B shows a western blot of protein purified by affinity chromatography from 600ml batch fed cultures.

[0066] FIGS. 27A-27H illustrates direct binding of purified MGAT gp 120 HIV- 1 proteins to bNAbs.

[0067] FIGS. 28A-28J provide the comparison of bNAb binding to CHO A244GNE- rgpl20 produced in normal CHO cells and used in the RV144 trial, and improved A244-N332- rgpl20 produced in Mgatl " cells.

[0068] FIGS. 29A-29F show data from 2-dimensional isoelectric focusing gel analysis of MN-rgpl20 produced in CHO and 293 HEK cells.

[0069] FIG. 30 illustrates the steps for purification of A244_N332-rgpl20 by column chromatography. [0070] FIG. 31 shows the comparison of A244_N332-rgpl20 recovered by an immunoaffinity recovery process dependent of the 5B6 monoclonal antibody and column chromatography (Desalting-IEXHP-SEC) recovery process.

[0071] FIG. 32 shows the steps for purification of A244_N332-rgpl20 by

immunoaffinity chromatography and size exclusion chromatography.

[0072] FIG. 33 provides the comparison of the recovered yields of A244_N332-rgpl20 obtained from the recovery process containing an immunoaffinity step and the recovery process depending only on column chromatography.

DEFINITIONS

[0073] The practice of the present invention will employ, unless otherwise indicated, conventional methods of medicine, chemistry, biochemistry, immunology, cell biology, molecular biology and recombinant DNA techniques, within the skill of the art. Such techniques are explained fully in the literature. All publications, patents and patent applications cited herein are hereby incorporated by reference in their entireties.

[0074] In describing the present invention, the following terms will be employed, and are intended to be defined as indicated below.

[0075] It must be noted that, as used in this specification and the appended claims, the singular forms "a", "an" and "the" include plural referents unless the content clearly dictates otherwise. Thus, for example, reference to "a cell" includes a mixture of two or more such cells, and the like. It is further noted that the claims can be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as "solely," "only" and the like in connection with the recitation of claim elements, or use of a "negative" limitation.

[0076] The term "heterologous" refers to two biological components that are not found together in nature. The components may be host cells, genes, or regulatory regions, such as promoters. Although the heterologous components are not found together in nature, they can function together, as when a promoter heterologous to a gene is operably linked to the gene. "Heterologous" in the context of recombinant cells can refer to the presence of a nucleic acid (or gene product, such as a polypeptide) that is of a different genetic origin than the host cell in which it is present. For example, a recombinant cell expressing a heterologous polypeptide refers to a cell that is genetically modified to introduce a nucleic acid encoding the polypeptide which nucleic acid is not naturally present in the cell. [0077] "Endogenous" as used herein to describe a gene or a nucleic acid in a cell means that the gene or nucleic acid is native to the cell (e.g., a non-recombinant host cell) and is in its normal genomic and chromatin context, and which is not heterologous to the cell. Mgatl, glutamine synthetase, dihydrofolate reductase are examples of genes that are endogenous to mammalian cells, such as, CHO cells. When added to a cell, a recombinant nucleic acids would also be heterologous to the endogenous genes of the cell. Thus, in a chromosome, a heterologous nucleic acid would include an non-native (non-naturally occurring) nucleic acid that has integrated into the chromosome, or a non-native (non-naturally occurring) extrachromosomal nucleic acid. In contrast, a naturally translocated piece of chromosome would not be considered heterologous in the context of this patent application, as it comprises an endogenous nucleic acid sequence that is native to the mutated cell.

[0078] "Recombinant" as used herein to describe a nucleic acid molecule means a polynucleotide of genomic, cDNA, semisynthetic, or synthetic origin which, by virtue of its origin or manipulation is not associated with all or a portion of the polynucleotide with which it is associated in nature. The term "recombinant" as used with respect to a protein or polypeptide means a polypeptide produced by expression of a recombinant polynucleotide. In general, the gene of interest is cloned and then expressed in transformed organisms, as described further below. The host organism expresses the foreign gene to produce the protein under expression conditions. Thus, for example, recombinant cells, such as a recombinant host cell, express genes that are not found within the native (naturally occurring) form of the cell or express a second copy of a native gene that is otherwise normally or abnormally expressed, under expressed or not expressed at all.

[0079] The term "transformation" or "genetic modification" refers to a permanent or transient genetic change induced in a cell following introduction of an new nucleic acid. Thus, a "genetically modified host cell" is a host cell into which a new (e.g., exogenous; heterologous) nucleic acid has been introduced. Genetic change ("modification") can be accomplished either by incorporation of the new DNA into the genome of the host cell, or by transient or stable maintenance of the new DNA as an episomal element. In eukaryotic cells, a permanent genetic change is generally achieved by introduction of the DNA into the genome of the cell.

[0080] The terms "DNA regulatory sequences," "control elements," and "regulatory elements," used interchangeably herein, refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate expression of a coding sequence and/or production of an encoded polypeptide in a host cell. [0081] "Encode," as used in reference to a nucleotide sequence of nucleic acid encoding a gene product, e.g., a polypeptide, of interest, is meant to include instances in which a nucleic acid contains a nucleotide sequence that is the same as in a cell or genome that, when transcribed and/or translated into a polypeptide, produces the gene product. In some instances, a nucleotide sequence or nucleic acid encoding a gene product does not include intronic sequences.

[0082] "Substantially purified" generally refers to isolation of a substance (compound, polynucleotide, protein, or polypeptide) such that the substance comprises the majority percent of the sample in which it resides. Typically in a sample, a substantially purified component comprises 50%, 80%-85%, or 90-95% of the sample. Techniques for purifying polynucleotides, oliognucleo tides, and polypeptides of interest are well-known in the art and include, for example, ion-exchange chromatography, affinity chromatography and sedimentation according to density.

[0083] The term "operably linked" refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner. For instance, a promoter is operably linked to a nucleotide sequence if the promoter affects the transcription or expression of the nucleotide sequence.

[0084] A "host cell," as used herein, denotes an in vitro eukaryotic cell (e.g., a mammalian cell, such as, a CHO cell line), which eukaryotic cell can be, or has been, used as a recipient for a nucleic acid, and include the progeny of the original cell which has been genetically modified by the nucleic acid. It is understood that the progeny of a single cell may not necessarily be completely identical in morphology or in genomic or total DNA complement as the original parent, due to natural, accidental, or deliberate mutation. A "recombinant host cell" (also referred to as a "genetically modified host cell") is a host cell into which has been introduced a heterologous nucleic acid, e.g., an expression vector. For example, a subject eukaryotic host cell is a genetically modified eukaryotic host cell, by virtue of introduction into of a heterologous nucleic acid, e.g., an exogenous nucleic acid that is foreign to the eukaryotic host cell, or a recombinant nucleic acid that is not normally found in the eukaryotic host cell.

[0085] As used herein, the term '"cell line" refers to a population of cells produced from a single cell and therefore consisting of cells with a uniform genetic makeup.

[0086] By "isolated" is meant, when referring to a polypeptide, that the indicated molecule is separate and discrete from the whole organism with which the molecule is found in nature or is present in the substantial absence of other biological macro-molecules of the same type. The term "isolated" with respect to a polynucleotide refers to a nucleic acid molecule devoid, in whole or part, of sequences normally associated with it in nature; or a sequence, as it exists in nature, but having heterologous sequences in association therewith; or a molecule disassociated from the chromosome.

[0087] The terms "polynucleotide," "nucleic acid" and "nucleic acid molecule" are used herein to include a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleo tides. This term refers only to the primary structure of the molecule. Thus, the term also includes modifications, such as by methylation and/or by capping, and unmodified forms of the polynucleotide. More particularly, the terms "polynucleotide," "nucleic acid" and "nucleic acid molecule" include polydeoxyribonucleotides (containing 2-deoxy-D-ribose), polyribonucleotides (containing D-ribose), any other type of polynucleotide which is an N- or C-glycoside of a purine or pyrimidine base, and other polymers containing nonnucleotidic backbones, for example, polyamide (e.g., peptide nucleic acids (PNAs)) and polymorpholino (commercially available from the Anti-Virals, Inc., Corvallis, Oregon, as Neugene) polymers, and other synthetic sequence-specific nucleic acid polymers providing that the polymers contain nucleobases in a configuration which allows for base pairing and base stacking, such as is found in DNA and RNA. There is no intended distinction in length between the terms

"polynucleotide," "nucleic acid" and "nucleic acid molecule," and these terms will be used interchangeably.

[0088] The terms "label" and "detectable label" refer to a molecule capable of being detected, including, but not limited to, radioactive isotopes, fluorescers, chemiluminescers, enzymes, enzyme substrates, enzyme cof actors, enzyme inhibitors, chromophores, dyes, metal ions, metal sols, ligands (e.g., biotin or haptens) and the like. The term "fluorescer" refers to a substance or a portion thereof that is capable of exhibiting fluorescence in the detectable range. Particular examples of labels that may be used with the invention include, but are not limited to phycoerythrin, Alexa dyes, fluorescein, YPet, CyPet, Cascade blue, allophycocyanin, Cy3, Cy5, Cy7, rhodamine, dansyl, umbelliferone, Texas red, luminol, acradimum esters, biotin, green fluorescent protein (GFP), enhanced green fluorescent protein (EGFP), yellow fluorescent protein (YFP), enhanced yellow fluorescent protein (EYFP), blue fluorescent protein (BFP), red fluorescent protein (RFP), firefly lucif erase, Renilla lucif erase, NADPH, beta-galactosidase, horseradish peroxidase, glucose oxidase, alkaline phosphatase, chloramphenical acetyl transferase, and urease.

[0089] Before describing the present invention in detail, it is to be understood that this invention is not limited to particular formulations or process parameters as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments of the invention only, and is not intended to be limiting.

[0090] Although a number of methods and materials similar or equivalent to those described herein can be used in the practice of the present invention, the preferred materials and methods are described herein.

DETAILED DESCRIPTION

[0091] The present disclosure provides cell lines and methods for producing HIV envelope glycoprotein polypeptides that possess terminal mannose-5 glycans. The HIV envelope glycoproteins produced by the cell lines and methods provided herein are suitable for eliciting antibodies effective in prevention and/or treatment of HIV infection. In certain cases, the antibodies elicited by the HIV envelope glycoproteins produced by the cell lines disclosed herein are broadly neutralizing antibodies. Further details of the cell lines and methods are provided below.

CELL LINES

[0092] Provided herein are recombinant cell lines for producing biopharmaceuticals, such as, HIV envelope glycoprotein polypeptides comprising terminal mannose-5 glycans. In certain embodiments, the cell line is derived from a CHO cell line that lacks or has limited expression of or function of the endogenous gene encoding mannosyl (alpha- l,3)-glycoprotein beta- 1,2-N-Acetylglucosaminy transferase (Mgatl). Mgatl is also refered to as N-Glycosyl- Oligosaccharide-Glycoprotein N-Acetylglucosaminyltransferase I, Alpha- 1,3-Mannosyl- Glycoprotein 2-Beta-N-Acetylglucosaminyltransferas, GlcNAc-T I, GLYT1, GLCT1, GNT-1, GLCNAC-TI, and Gntl. Deletion of Mgatl prevents glycosylation from advancing beyond the Man 5 GlcNAc2 state in the modified cell lines disclosed herein.

[0093] In certain embodiments, the CHO cell line has been genetically modified to delete the endogenous mgatl gene. In such embodiments, the deletion of the endogenous mgatl gene may be carried out by using CRISPER/Cas9 mediated gene editing. In certain

embodiments, the CRISPER/Cas9 mediated deletion of mgatl gene prevents Mgatl -mediated addition of a N-acetylglucosamine moiety to a terminal mannose residue present at the N-linked glycosylation site of the HIV envelope glycoprotein polypeptide produced in the cell line, resulting in expression of the HIV envelope glycoprotein polypeptide with one or more terminal mannose, e.g, mannose-5, mannose-8, or mannose-9. [0094] In certain embodiments, the Mgatl deficient cell lines may include a Mgatl encoding gene sequence that has been completely or partially inactivated. In certain

embodiments, two copies of the mgatl gene has been inactivated. In some embodiments, three or more copies of of mgatl gene has been inactivated. Inactivation of mgatl gene may be due to deletion of a part or entire sequence of the of mgatl gene and/or due to insertion of at least one nucleotide. The inactivation may result in reduced expression or reduced activity of Mgatl. In some embodiments, the inactivation may result in lack of expression of Mgatl. In some examples, the inactivation of mgatl gene results in expression of a truncated or otherwise mutated Mgatl that lacks detectable activity.

[0095] In certain aspects, the Mgatl deficient cell lines may include an insertion in the mgatl gene resulting in a frame shift mutation and a premature stop codon. In certain aspects, the premature stop codon may result in production of a truncated Mgat polypeptide that has no detectable activity. In certain aspects, the truncated Mgat may be an N-terminal fragment of full length Mgatl and may be 10-50 amino acids or 20-50 amino acids, such as, 20, 30 or 40 amino acids long. In certain embodiments, the Mgatl deficient cell line may include mgat gene in which nucleotides have been deleted. The deletion may be in the sequence encoding the transmembrane region og Mgatl. The deletion may result in a Mgatl polypeptide having a deletion of 8-30 amino acids in the transmembrane region, such as, deletion of 6 to 10 amino acids, 25-35 amino acids, such as, 8 or 30 amino acids, resulting in a Mgatl polypeptide with reduced activity.

[0096] In certain cases, the mgatl gene targeted for inactivation may have the sequence set forth in SEQ ID NO:64. The Mgatl polypeptide may have the amino acid sequence set forth in SEQ ID NO: 75. In certain embodiments, the cell lines disclosed herein may comprise an inactivated mgatl gene having the sequence set forth in SEQ ID NO:76, where the inactivated mgatl gene encodes a truncated Mgatl polypeptide having the sequence set forth in SEQ ID NO:77.

[0097] In certain aspects, the glycosylation heterogeneity of the polypeptides produced by cell lines provided herein is markedly reduced such that a majority of the polypeptides have one or more terminal mannose, mannose-5, mannose-8, or mannose-9 glycans. In certain embodiments, the genetic modification to delete the endogenous mgatl gene results in at least 75% of the HIV envelope glycoprotein polypeptides produced by the genetically modified cell line having terminal mannose glycans at the N-linked glycosylation site. In certain cases, at least 75% or more, such as, 75%-95%,75%-96%, 75%-97%, 75%-98%, 80%-98%, 85%-99%, e.g., 80%, 85%, 90%, 95%, 98%, 99%, or more of the HIV envelope glycoprotein polypeptides produced by the genetically modified cell line have terminal mannose glycans at the N-linked glycosylation site. As used herein, the term "terminal mannose" or "terminal mannose glycans" refers to N-glycans having one or more mannose residues at the terminus of the N-glycan. This term encompasses, N-glycans having 5, 8, or 9 terminal mannose residues.

[0098] The CHO cell line from which the cell lines disclosed herein are derived may be a CHO cell line adapted for growth in suspension culture, adherent culture, or both. In certain aspects, the genetically modified CHO cell line may be derived from a parent CHO cell line, such as, CHO S, CHO Kl, CHO-DXB11 (also known as CHO-DUKX), CHO-PR03, CHO- PR05, or CHO-DG44 cell line, and the like.

[0099] In certain aspects, the genetically modified CHO cell line is not deficient in markers commonly used for selection of transfected CHO cells, such as, glutamine synthetase (GS), dihyropfolate reductase (DHFR), and the like. In certain aspects, the genetically modified CHO cell line is derived from a parental CHO cell line that includes a gene encoding GS, DHFR, or both. As such, in certain examples, the generation of the genetically modified CHO cell line does not require transfection of a nucleic acid encoding GS and/or DHFR. In certain aspects, the genetically modified CHO cell line is derived from a parental CHO S or CHO Kl cell line that includes a gene encoding GS, DHFR, or both. In certain aspects, the parental cell line is CHO S that expresses GS. In other embodiments, the parental cell line is CHO Kl that expresses GS. In certain embodiments, the genetically modified CHO cell line of the present disclosure is not derived from CHO Lecl cells. In certain embodiments, the genetically modified CHO cell line of the present disclosure does not produce Mgatl or fragments thereof. In certain embodiments, the Mgatl encoding gene has been deleted from the cell lines disclosed herein such that the cell line has no detectable Mgatl activity. In certain embodiments, the Mgatl encoding gene has been disprupted from the cell lines disclosed herein such that the cell line has no detectable Mgatl activity. In other aspects, the cell line may also be deficient in GS and/or DHFR.

[00100] In certain aspects, the cell lines provided herein produce the exogenous polypeptide at a concentration of at least 50 milligrams/Liter (mg/L), such as, at least 75 mg/L, 100 mg/L, 150 mg/L, 175 mg/L, 200 mg/L, 250 mg/L, 300 mg/L, e.g., 50-300 mg/L, 50-250 mg/L, or 50-200 mg/L. The cell line may express the exogenous polypeptide at a concentration of at least 50 mg/L after 1-30 days of culturing, e.g., 1 day, 2, days, 3 days, 5 days, 7 days, 10 days, 15 days, 20 days, or more.

[00101] A subject genetically modified host cell is generated using standard methods well known to those skilled in the art. In some cases, the nucleic acid encoding Mgatl is disrupted (e.g., deleted) using a CRISPR/Cas9 system comprising: i) an RNA-guided endonuclease; and ii) a guide RNA (e.g., a single molecule guide RNA; or a double-molecule guide RNA) that provides for deletion of endogenous Mgatl gene; and iii) a donor DNA template. Suitable RNA- guided endonucleases include an RNA-guided endonuclease comprising an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence of Streptococcus pyogenes Cas9 (GenBank Accession No.: AKP81606.1) or Staphylococcus aureus Cas9 (NCBI Reference Sequence: WP_001573634.1). The guide RNA comprises a targeting sequence. A suitable targeting sequence can be determined by those skilled in the art. The donor template comprises a nucleotide sequence complementary to Mgatl -encoding nucleotide sequence.

[00102] In certain aspects, a genetically modified Chinese hamster ovary (CHO) cell line comprising a targeted mutation of gene encoding mannosyl (alpha- l,3)-glycoprotein beta-l,2-N- Acetylglucosaminyltransferase (Mgatl) and expressing gpl20 glyoprotein, wherein the genetically modified cell line is deposited with American Type Culture Collection (ATCC) as PTA-124141; or PTA-124142 is also disclosed.

COMPOSITIONS AND METHODS FOR PRODUCING EXOGENOUS POLYPEPTIDE

[00103] The present disclosure provides a composition comprising: a) a genetically modified host cell line as described above or elsewhere herein; and b) a culture medium.

[00104] The present disclosure provides a method of producing a polypeptide of interest. The method may include culturing the composition for a time period and under conditions suitable for production of the exogenous polypeptide, where the composition comprises: a) a genetically modified host cell line of the present disclosure; and b) a culture medium; and separating the genetically modified host cell line from the culture medium, to generate a cell culture comprising secreted polypeptide of interest. Separating the genetically modified host cells from the culture medium can be accomplished by methods known in the art, such as centrifugation, filtration, and the like.

[00105] The exogenous polypeptide secreted into the culture medium may be purified using any standard process. For example, the exogenous polypeptide, such as, an envelope glycoprotein, e.g., gpl40 trimer, secreted into the culture medium may be purified using the process disclosed in Sanders RW, Moore JP. Immunological reviews. 2017 Jan 1 ;275(1): 161-82; Sanders RW, et al., PLoS pathogens. 2013 Sep 19;9(9):el003618; Sharma SK, et al., Cell reports. 2015 Apr 28;l l(4):539-50; or Karlsson Hedestam GB, et al., Immunological reviews. 2017 Jan 1 ;275(1): 183-202. [00106] In certain embodiments, production of exogenopus polypeptides using the cell lines provided herein does not require culturing in the presence of inhibitors that prevent glycosylation from proceeding beyond MansGlcNAc2 state. As such, the culture medium for culturing the cell lines for expressing an exogenous polypeptide does not include inhibitors such as kifunensine.

[00107] In certain embodiments, a method of producing a human immunodeficiency virus (HIV) envelope glycoprotein polypeptide comprising terminal mannose-5 glycans is disclosed. The method may include: a) introducing a nucleic acid comprising a nucleotide sequence encoding the HIV envelope glycoprotein polypeptide into a genetically modified Chinese hamster ovary (CHO) cell line comprising a mutation of the gene encoding mannosyl (alpha- l,3)-glycoprotein beta-l,2-N-Acetylglucosaminyltransferase (Mgatl), wherein the mutation prevents Mgatl mediated addition of a N-acetylglucosamine moiety to a terminal mannose residue such that at least 75% of the HIV envelope glycoprotein polypeptide produced by the genetically modified cell line comprises terminal mannose-5, mannose-8, or mannose-9 glycans; and b) culturing the cell line in a liquid culture medium under conditions sufficient for production of the HIV envelope glycoprotein polypeptide comprising terminal mannose-5, mannose-8, or mannose-9 glycans.

[00108] The method may include screening by plating the clones in a semisolid matrix and contacting the clones with a detectably labeled antibody that binds to the polypeptide. In certain cases, the contacting comprises contacting the clones with a plurality of fluorescently labeled antibodies that bind to the polypeptide and form a precipitate around the clones, wherein the precipitate is visible under fluorescent light. In certain cases, the method further includes identifying clones surrounded by precipitate meeting a selection threshold and isolating the identified clones. The polypeptide comprises the amino acid sequence set forth in SEQ ID NO: 1, 2, 3, 5, 7, 9, 10, 12, 13, 15, 17, 18, 20, 22, 23, 25, 26, 28, 30, 32, 34, 36, 38, 40, or 42.

[00109] In certain aspects, identification of a Mgatl deficient (Mgatl ) cell line may be carried out using a positive selection method. In certain embodiments, the method may include contacting cells suspected of being Mgatl deficient (Mgatl ) with a GNA lectin, where the GNA lectin is a mannose binding lectin with a preference for ocl,3 linked mannose residues. In certain aspects, the method for identifying Mgatl " cells does not involve using ricin lectins, such as Ricinus communis agglutinin-I and II.

[00110] In certain cases, Mgatl " cells expressing an exogenous polypeptide may be identified using polyclonal antibodies that have been purified based on their ability to bind to the exogenous polypeptide. For example, the exogenous polypeptide may be used to immunize an animal and elicit antibodies to the exogenous polypeptide. The antibodies may be affinity purified using a solid substrate (e.g., a bead, a column, etc.) to which the exogenous polypeptide is conjugated. The affinity purified antibodies may be conjugated to a detectable label and used for identifying cells expressing the exogenous polypeptide. In certain embodiments, the affinity purified polyclonal antibodies bind to exogenous polypeptide secreted by the cells expressing the polypeptide. In certain embodiments, the binding of the affinity purified antibodies to the exogenous polypeptide secreted by the cells expressing the polypeptide may be detected by visualizing the detectable label. In certain embodiments, the detectable label may be a fluorescent label, such as, alexa dye. In certain embodiments, the affinity purified polyclonal antibodies form a fluorescent halo around the cells expressing the polypeptide thereby facilitating rapid identification of cells expressing high levels of the polypeptide.

EXOGENOUS POLYPEPTIDE

[00111] Any exogenous polypeptide of interest can be produced using the cell lines described herein. In some embodiments, the exogenous polypeptide may be a polypeptide that can be used to elicit an immune response in a mammal. In certain embodiments, the immune response may result in prevention or treatment of HIV infection.

[00112] In certain embodiments, the exogenous polypeptide is a polypeptide that undergoes glycosylation when expressed in a eukaryotic host cell. In certain embodiments, the exogenous polypeptide includes a N-linked glycosylation site comprising the consensus sequence Asn-X-Ser/Thr, where X is any amino acid except proline (Pro). In certain embodiments, expressing the exogenous polypeptide in the cell lines provided herein prevents prevents glycosylation from advancing beyond the MansGlcNAc2 state.

[00113] In certain embodiments, the exogenous polypeptide is a HIV-1 envelope glycoprotein (gp) or a fragment thereof, provided that the fragment contains an N-linked glycosylation site containing fragment thereof. In certain cases, the envelope gp is gpl60, gpl20 (e.g., gpl20 monomer), gpl40 (e.g., gpl40 trimer) or an envelope gp fragment containing variable regions 1 and 2 (V1/V2).

[00114] In certain embodiments, the exogenous polypeptide is an envelope glycoprotein or a fragment thereof, provided that the fragment contains an N-linked glycosylation site containing fragment thereof and may comprise an amino acid sequence set forth below. Clade CRF01_AE: A244_ N332 c rgpl20 (SEQ ID NO: l)

[00115] VPVWKEADTTLFCASDAKAHETEVHNVWATHACVPTDPNPQEIDLENV TENFNMWKNNMVEOMOEDVISLWDOSLKPCVKLTPPCVTLHCTNANLTKANLTNVN NRTNVSNIIGNITDEVRNCSFNMTTELRDKKOKVHALFYKLDIVPIEDNNDSSEYRLINC NTSVIKOACPKISFDPIPIHYCTPAGYAILKCNDKNFNGTGPCKNVSSVOCTHGIKPVVS T QLLLNGSLAEEEIIIRSENLTNNAKTIIVHLNKSVVINCTRPSNNTRTSITIGPGQVFYR TG DIIGDIRKAYCNISGTEWNKALKQVTEKLKEHFNNKPIIFQPPSGGDLEITMHHFNCRGE F FYCNTTRLFNNTCIANGTIEGCNGNITLPCKIKQIINMWQGAGQAMYAPPISGTINCVSN I TGILLTRDGGATNNTNNETFRPGGGNIKDNWRNELYKYKVVQIEPLGVAPTRAKRRVV EREKR

[00116] The V1/V2 domain is double underlined and starts at amino acid position 83 and ends at position 171 and V3 domain is underlined and starts at amino acid position 259 and ends at amino acid position 304 in SEQ ID NO:l.

Clade CRF01_AE: A244_ N332 c rgpl20 (SEQ ID NO:2)

[00117] VPVWKEADTTLFC ASD AKAHETEVHNVWATHACVPTDPNPQEIDLENV

TENFNMWKNNMVEOMOEDVISLWDOSLKPCVKLTPPCVTLHCTNANLTKANLTNVN NRTNVSNIIGNITDEVRNCSFNMTTELRDKKOKVHALFYKLDIVPIEDNNDSSEYRLINC NTSVIKOACPKISFDPIPIHYCTPAGYAILKCNDKNFNGTGPCKNVSSVOCTHGIKPVVS T QLLLNGSLAEEEIIIRSENLTNNAKTIIVHLNKSVVINCTRPSNNTRTSITIGPGQVFYR TG DIIGDIRKAYCNISGTEWNKALKQVTEKLKEHFNNKPIIFQPPSGGDLEITMHHFNCRGE F FYCNTTRLFNNTCIANGTIEGCNGNITLPCKIKQIINMWQGAGQAMYAPPISGTINCVSN I TGILLTRDGGATNNTNNETFRPGGGNIKDNWRNELYKYKVVQIEPLGVAPTRA

[00118] V1/V2 domain is double underlined and V3 domain is underlined.

Clade CRF01_AE: gD_A244_ N332 c rgpl20 (UCSC1250) (SEQ ID NO:3)

[00119] MGGAAARLGAVILFVVIVGLHGVRG^FALADASL^MADPNR RG^DLPVL DgLLEVPVWKEADTTLFCASDAKAHETEVHNVWATHACVPTDPNPQEIDLENVTENF NMWKNNMVEOMOEDVISLWDOSLKPCVKLTPPCVTLHCTNANLTKANLTNVNNRTN VSNIIGNITDEVRNCSFNMTTELRDKKOKVHALFYKLDIVPIEDNNDSSEYRLINCNTSV I KOACPKISFDPIPIHYCTPAGYAILKCNDKNFNGTGPCKNVSSVOCTHGIKPVVSTOLLL NGSLAEEEIIIRSENLTNNAKTIIVHLNKSVVINCTRPSNNTRTSITIGPGQVFYRTGDI IGDI RKAYCNISGTEWNKALKQVTEKLKEHFNNKPIIFQPPSGGDLEITMHHFNCRGEFFYCN TTRLFNNTCIANGTIEGCNGNITLPCKIKQIINMWQGAGQAMYAPPISGTINCVSNITGI L LTRDGGATNNTNNETFRPGGGNIKDNWRNELYKYKVVQIEPLGVAPTRA

[00120] gD signal sequence is underlined; mature N-terminal gD purification tag is italicized; linker sequence is in bold. V1/V2 domain is double underlined and V3 domain is underlined.

[00121] The exogenous polypeptide comprising an amino acid sequence set forth in SEQ

ID NO:3 may be encoded by the nucleic acid sequence set forth in SEQ ID NO:4:

[00122] ATGGGGGGGGCTGCCGCCAGGTTGGGGGCCGTGATTTTGTTTGTCGTC

ATAGTGGGCCTCCATGGGGTCCGCGGCAAATATGCCTTGGCGGATGCCTCTCTCAAG AT

GGCCGA CCCCAA TCGA TTTCGCGGCAAAGA CCTTCCGGTCCTGGA CCA GCTGCTCGAG

GTACCAGTGTGGAAGGAAGCCGACACAACCCTCTTCTGCGCCAGCGATGCCAAGGC

CCACGAGACGGAGGTCCACAATGTGTGGGCCACCCATGCCTGTGTGCCCACGGACC

CCAACCCCCAGGAGATTGACCTGGAGAATGTCACGGAGAACTTCAACATGTGGAAG

AACAACATGGTGGAGCAGATGCAGGAGGACGTCATCTCCCTGTGGGACCAGAGCCT

GAAACCCTGCGTCAAACTGACACCCCCCTGTGTGACCCTGCACTGCACGAACGCCA

ACCTGACCAAGGCCAACCTCACCAACGTGAACAATCGGACCAACGTGTCCAACATC

ATCGGGAACATCACAGATGAGGTGAGGAACTGCAGCTTCAATATGACAACCGAGCT

CCGGGACAAAAAGCAGAAGGTGCACGCGTTGTTCTACAAACTGGATATCGTCCCCA

TCGAGGACAATAATGACAGcTCCGAGTATCGCCTGATCAACTGCAACACCAGcGTCA

TCAAACAGGCCTGCCCCAAAATTTCCTTCGACCCCATCCCCATCCACTACTGCACCC

CAGCTGGGTACGCCATCCTGAAGTGCAATGACAAGAACTTCAACGGCACAGGGCCC

TGCAAGAATGTGAGCTCCGTCCAGTGCACCCACGGCATCAAGCCAGTGGTCTCCAC

CCAGCTCCTCCTGAATGGGAGCCTGGCAGAGGAAGAGATCATCATCCGCTCCGAGA

ACCTGACCAACAATGCCAAGACCATCATCGTCCACCTGAATAAGTCCGTGGTCATC

AACTGCACCAGACCCAGCAACAACACGCGGACCAGCATCACCATCGGCCCAGGGC

AGGTCTTCTATAGGACGGGGGACATCATTGGGGACATCAGGAAGGCCTACTGCAAC

ATCAGTGGGACCGAGTGGAACAAAGCCCTGAAACAGGTGACCGAAAAACTCAAGG

AGCACTTCAACAACAAGCCAATCATCTTCCAGCCCCCCAGCGGGGGGGACCTGGAG

ATCACCATGCACCATTTCAACTGCCGGGGGGAATTCTTCTACTGCAACACCACCCGC

CTGTTCAACAACACCTGCATCGCCAACGGCACCATCGAGGGCTGCAATGGCAACAT

CACCCTCCCATGCAAAATCAAGCAGATCATCAACATGTGGCAGGGGGCAGGCCAGG

CCATGTACGCCCCCCCCATCTCCGGCACGATCAACTGCGTGTCCAACATCACGGGG

ATCCTGCTGACCCGGGATGGGGGGGCTACCAACAATACGAACAATGAGACCTTCAG

GCCAGGGGGGGGGAACATCAAAGACAACTGGCGCAATGAGCTCTACAAGTACAAA GTGGTGCAGATCGAGCCCCTGGGGGTGGCCCCCACCCGGGCCAAACGCAGGGTGGT GGAGCGGGAGAAGCGG (SEQ ID NO:4)

[00123] Nucleotides encoding the gD signal sequence are underlined; nucleotides encoding the mature N-terminal gD purification tag are italicized; nucleotides encoding linker sequence are in bold.

[00124] Clade B: gD-MN468-rgpl20; UCSC468 (SEQ ID NO:9)

[00125] VPVWKEATTTLFCASDAKAYDTEAHNVWATHACVPTDPNPQEVELVNV

TENFNMWKNNMVEQMHEDIISLWDQSLKPCVKLTPLCVTLNCTDLRNTTNTNNSTDN

NNSKSEGTIKGGEMKNCSFNITTSIGDKMQKEYALLYKLDIEPIDNDSTSYRLISCN TSVI

TOACPKISFEPIPIHYCAPAGFAIXKCNDKKFSGKGSCKNVSTVOCTHGIRPVVSTO LLLN

GSLAEEEVVIRSEDFTDNAKTIIVHLNESVQINCTRPNNNTRKRIHIGPGRAFYTTK NIKG

TIROAHCNISRAKWNDTLROIVSKLKEOFKNKTIVFNPSSGGDPEIVMHSFNCGGEF FYC

NTSPLFNSrWNGNNTWNNTTGSNNNITLQCKIKQIINMWQKVGKAMYAPPIEGQIRC SS

NITGLLLTRDGGEDTDTNDTEIFRPGGGDMRDNWRSELYKYKVVTIEPLGVAPTKA

[00126] V1/V2 domain is double underlined and V3 domain is underlined.

[00127] gD-MN468-rgpl20; UCSC468 (SEQ ID NO: 10)

[00128] MGGAAARLGAVILFVVIVGLHGVRG^FALADASL^MADPNR RG^DLPVL

DgLLEVPVWKEATTTLFCASDAKAYDTEAHNVWATHACVPTDPNPQEVELVNVTENF

NMWKNNMVEQMHEDIISLWDQSLKPCVKLTPLCVTLNCTDLRNTTNTNNSTDNNNSK

SEGTIKGGEMKNCSFNITTSIGDKMOKEYALLYKLDIEPIDNDSTSYRLISCNTSVI TOAC

PKISFEPIPIHYCAPAGFAIXKCNDKKFSGKGSCKNVSTVQCTHGIRPVVSTQLLLN GSLA

EEEVVIRSEDFTDNAKTIIVHLNESVQINCTRPNNNTRKRIHIGPGRAFYTTKNIKG TIRQ

AHCNISRAKWNDTLROIVSKLKEOFKNKTIVFNPSSGGDPEIVMHSFNCGGEFFYCN TSP

LFNSIWNGNNTWNNTTGSNNNITLQCKIKQIINMWQKVGKAMYAPPIEGQIRCSSNI TG

LLLTRDGGEDTDTNDTEIFRPGGGDMRDNWRSELYKYKVVTIEPLGVAPTKAKRRVV Q

RE

[00129] gD signal sequence is underlined; mature N-terminal gD purification tag is italicized; linker sequence is in bold. V1/V2 domain is double underlined and V3 domain is underlined. [00130] g D_MN468_rgpl20; UCSC468; (SEQ ID NO: 11)

[00131] ATGGGGGGGGCTGCCGCCAGGTTGGGGGCCGTGATTTTGTTTGTCGTC

ATAGTGGGCCTCCATGGGGTCCGCGGCAAATATGCCTTGGCGGATGCCTCTCTCAAG AT

GGCCGA CCCCAA TCGA TTTCGCGGCAAAGA CCTTCCGGTCCTGGA CCA GCTGCTCGAG

GTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGC

ATATGATACAGAGGCACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACC

CCAACCCACAAGAAGTAGAATTGGTAAATGTGACAGAAAATTTTAACATGTGGAAA

AATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCT

AAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTAAATTGCACTGATTTGAG

GAATACTACTAATACCAATAATAGTACTGATAATAACAATAGTAAAAGCGAGGGAA

CAATAAAGGGAGGAGAAATGAAAAACTGCTCTTTCAATATCACCACAAGCATAGGA

GATAAGATGCAGAAAGAATATGCACTTCTTTATAAACTTGATATAGAACCAATAGA

TAATGATAGTACCAGCTATAGGTTGATAAGTTGTAATACCTCAGTCATTACACAAGC

TTGTCCAAAGATATCCTTTGAGCCAATTCCCATACACTATTGTGCCCCGGCTGGTTT T

GCGATTNTAAAGTGTAACGATAAAAAGTTCAGTGGAAAAGGATCATGTAAAAATGT

CAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGT

TAAATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGAGGATTTCACTGAT

AATGCTAAAACCATCATAGTACATCTGAACGAATCTGTACAAATTAATTGTACAAG

ACCCAACAACAATACCAGAAAAAGGATACATATAGGACCAGGGAGAGCATTTTAT

ACAACAAAAAATATAAAAGGAACTATAAGACAAGCACATTGTAACATTAGTAGAG

CAAAATGGAATGACACTTTAAGACAGATAGTTAGCAAGTTAAAAGAACAATTTAAG

AATAAAACAATAGTCTTTAATCCATCCTCAGGAGGGGACCCAGAAATTGTAATGCA

CAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATACATCACCACTGTTTAATAG

TATTTGGAATGGTAATAATACTTGGAATAATACTACAGGGTCAAATAACAATATCA

CACTTCAATGCAAAATAAAACAAATTATAAACATGTGGCAGAAAGTAGGAAAAGC

AATGTATGCCCCTCCCATTGAAGGACAAATTAGATGTTCATCAAATATTACAGGGCT

ACTATTAACAAGAGATGGTGGTGAGGACACGGACACGAACGACACCGAGATCTTCA

GACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAA

AGTAGTAACAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTG

GTGCAGAGAGAA

[00132] gD signal sequence encoding sequence is underlined; mature N-terminal gD purification tag encoding sequence is italicized; linker sequence encoding sequence is in bold. [00133] gD_MN-rgpl20_N301_N332 (SEQ ID NO: 12)

[00134] VPVWKEATTTLFCASDAKAYDTEAHNVWATHACVPTDPNPQEVELVNV

TENFNMWKNNMVEOMHEDIISLWDOSLKPCVKLTPLCVTLNCTDLRNTTNTNNSTDN

NNSKSEGTIKGGEMKNCSFNITTSIGDKMOKEYALLYKLDIEPIDNDSTSYRLISCN TSVI

TOACPKISFEPIPIHYCAPAGFAILKCNDKKFSGKGSCKNVSTVOCTHGIRPVVSTO LLLN

GSLAEEEVVIRSEDFTDNAKTIIVHLKESVQINCTRPNNNTRKRIHIGPGRAFYTTK NIKG

TIRQAHCNISRAKWNDTLRQIVSKLKEQFKNKTIVFNPSSGGDPEIVMHSFNCGGEF FYC

NTSPLFNSIWNGNNTWNNTTGSNNNITLQCKIKQIINMWQKVGKAMYAPPIEGQIRC SS

NITGLLLTRDGGEDTDTNDTEIFRPGGGDMRDNWRSELYKYKVVTIEPLGVAPT

[00135] V1/V2 domain is double underlined and V3 domain is underlined.

[00136] gD_MN-rgpl20_N301_N332 ; UCSC 1320; (SEQ ID NO: 13)

[00137] MGGAAARLGAVILFVVIVGLHGVRG^FALADASL^MADPNR RG^DLPVL

DgLLEVPVWKEATTTLFCASDAKAYDTEAHNVWATHACVPTDPNPQEVELVNVTENF

NMWKNNMVEOMHEDIISLWDOSLKPCVKLTPLCVTLNCTDLRNTTNTNNSTDNNNSK

SEGTIKGGEMKNCSFNITTSIGDKMOKEYALLYKLDIEPIDNDSTSYRLISCNTSVI TOAC

PKISFEPIPIHYCAPAGFAILKCNDKKFSGKGSCKNVSTVQCTHGIRPVVSTQLLLN GSLA

EEEVVIRSEDFTDNAKTIIVHLKESVOINCTRPNNNTRKRIHIGPGRAFYTTKNIKG TIRO

AHCNISRAKWNDTLRQIVSKLKEQFKNKTIVFNPSSGGDPEIVMHSFNCGGEFFYCN TSP

LFNSIWNGNNTWNNTTGSNNNITLQCKIKQIINMWQKVGKAMYAPPIEGQIRCSSNI TG

LLLTRDGGEDTDTNDTEIFRPGGGDMRDNWRSELYKYKVVTIEPLGVAPTKAKRRVV Q

RE

[00138] gD signal sequence is underlined; mature N-terminal gD purification tag is italicized; linker sequence is in bold. V1/V2 domain is double underlined and V3 domain is underlined.

[00139] gD-MN-rgpl20_N301_N332; UCSC1320; (SEQ ID NO:14)

[00140] ATGGGGGGGGCTGCCGCCAGGTTGGGGGCCGTGATTTTGTTTGTCGTC

ATAGTGGGCCTCCATGGGGTCCGCGGCAAA7ArGCC7TGGCGGArGCCrcrcrCAAG Ar

GGCCGACCCCAATCGATTTCGCGGCAAAGACCTTCCGGTCCTGGACCAG TG TCG G

GTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGC

ATATGATACAGAGGCACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACC

CCAACCCACAAGAAGTAGAATTGGTAAATGTGACAGAAAATTTTAACATGTGGAAA

AATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCT AAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTAAATTGCACTGATTTGAG

GAATACTACTAATACCAATAATAGTACTGATAATAACAATAGTAAAAGCGAGGGAA

CAATAAAGGGAGGAGAAATGAAAAACTGCTCTTTCAATATCACCACAAGCATAGGA

GATAAGATGCAGAAAGAATATGCACTTCTTTATAAACTTGATATAGAACCAATAGA

TAATGATAGTACCAGCTATAGGTTGATAAGTTGTAATACCTCAGTCATTACACAAGC

TTGTCCAAAGATATCCTTTGAGCCAATTCCCATACACTATTGTGCCCCGGCTGGTTT T

GCGATTCTAAAGTGTAACGATAAAAAGTTCAGTGGAAAAGGATCATGTAAAAATGT

CAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGT

TAAATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGAGGATTTCACTGAT

AATGCTAAAACCATCATAGTACATCTGAAAGAATCTGTACAAATTAATTGTACAAG

ACCCAACAACAATACCAGAAAAAGGATACATATAGGACCAGGGAGAGCATTTTAT

ACAACAAAAAATATAAAAGGAACTATAAGACAAGCACATTGTAACATTAGTAGAG

CAAAATGGAATGACACTTTAAGACAGATAGTTAGCAAGTTAAAAGAACAATTTAAG

AATAAAACAATAGTCTTTAATCCATCCTCAGGAGGGGACCCAGAAATTGTAATGCA

CAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATACATCACCACTGTTTAATAG

TATTTGGAATGGTAATAATACTTGGAATAATACTACAGGGTCAAATAACAATATCA

CACTTCAATGCAAAATAAAACAAATTATAAACATGTGGCAGAAAGTAGGAAAAGC

AATGTATGCCCCTCCCATTGAAGGACAAATTAGATGTTCATCAAATATTACAGGGCT

ACTATTAACAAGAGATGGTGGTGAGGACACGGACACGAACGACACCGAGATCTTCA

GACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAA

AGTAGTAACAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTG

GTGCAGAGAGAA

[00141] Nucleotides encoding the gD signal sequence are underlined; nucleotides encoding the mature N-terminal gD purification tag are italicized; nucleotides encoding linker sequence are in bold.

[00142] gD_BAL-rg l20; codon optimized (SEQ ID NO: 17)

[00143] VPVWKEATTTLFCASDAKAYDTEVHNVWATHACVPTDPNPQEVALENV

TENFNMWKNNMVEOMHEDIISLWDOSLKPCVKLTPLCVTLNCTDLRNATSRNVTNTT S

SSRGMVGGGEMKNCSFNITTGIRGKVOKEYALFYELDIVPIDNKIDRYRLISCNTSV ITO

ACPKVSFEPIPIHYCAPAGFAILKCKDKKFNGKGPCSNVSTVOCTHGIRPVVSTOLL LNG

SLAEEEVVIRSENFTNNAKTIIVQLNESVEINCTRPNNNTRKSINIGPGRAFYTTGE IIGDIR

QAHCNLSRAKWNDTLNKIVIKLREQFGNKTIVFKHSSGGDPEIVTHSFNCGGEFFYC NST QLFNSTWNVTEESNNTVENNTITLPCRIKQIINMWQEVGRAMYAPPIRGQIRCSSNITGL LLTRDGGPEDNKTEVFRPGGGDMRDNWRSELYKYKVVKIEPLGVAPTKAKRRVVQRE

[00144] V1/V2 domain is double underlined and V3 domain is underlined.

[00145] gD_BAL-rg l20; UCSC 1375; codon optimized (SEQ ID NO: 18)

[00146] MGGAAARLGAVILFVVIVGLHGVRG^FALADASL^MADPNR RG^DLPVL

DgLLEVPVWKEATTTLFCASDAKAYDTEVHNVWATHACVPTDPNPQEVALENVTENF

NMWKNNMVEOMHEDIISLWDOSLKPCVKLTPLCVTLNCTDLRNATSRNVTNTTSSSR G

MVGGGEMKNCSFNITTGIRGKVQKEYALFYELDIVPIDNKIDRYRLISCNTSVITQA CPK

VSFEPIPIHYCAPAGFAILKCKDKKFNGKGPCSNVSTVQCTHGIRPVVSTQLLLNGS LAE

EEVVIRSENFTNNAKTIIVOLNESVEINCTRPNNNTRKSINIGPGRAFYTTGEIIGD IROAH

CNLSRAKWNDTLNKIVIKLREQFGNKTIVFKHSSGGDPEIVTHSFNCGGEFFYCNST QLF

NSTWNVTEESNNTVENNTITLPCRIKQIINMWQEVGRAMYAPPIRGQIRCSSNITGL LLT

RDGGPEDNKTEVFRPGGGDMRDNWRSELYKYKVVKIEPLGVAPTKAKRRVVQRE

[00147] gD signal sequence is underlined; mature N-terminal gD purification tag is italicized; linker sequence is in bold.

[00148] gD_BAL-rgpl20; UCSC 1375; codon optimized (SEQ ID NO: 19)

[00149] ATGGGGGGGGCTGCCGCCAGGTTGGGGGCCGTGATTTTGTTTGTCGTC

ATAGTGGGCCTCCATGGGGTCCGCGGCAAA7ArGCC7TGGCGGArGCCrcrcrCAAG Ar

GGCCGACCCCAATCGATTTCGCGGCAAAGACCTTCCGGTCCTGGACCAG TG TGG G

GTACCTGTGTGGAAAGAGGCCACCACCACACTGTTCTGTGCCTCCGATGCCAAGGC

CTACGATACCGAGGTGCACAACGTGTGGGCCACTCATGCCTGCGTGCCCACCGATC

CTAATCCTCAAGAAGTGGCCCTGGAAAACGTGACCGAGAACTTCAACATGTGGAAG

AACAACATGGTCGAGCAGATGCACGAGGACATCATCAGCCTGTGGGACCAGAGCCT

GAAGCCTTGCGTGAAGCTGACCCCTCTGTGCGTGACCCTGAACTGCACCGACCTGA

GAAACGCCACCAGCCGGAACGTGACCAATACCACCTCTAGCAGCAGAGGCATGGTT

GGAGGCGGCGAGATGAAGAACTGCAGCTTCAACATCACCACCGGCATCAGAGGCA

AGGTGCAGAAAGAGTACGCCCTGTTCTACGAGCTGGACATCGTGCCCATCGACAAC

AAGATCGACCGGTACAGACTGATCAGCTGCAACACCAGCGTGATCACCCAGGCCTG

TCCTAAGGTGTCCTTCGAGCCCATTCCTATCCACTACTGTGCCCCTGCCGGCTTCGC C

ATCCTGAAGTGCAAGGACAAGAAGTTCAACGGCAAGGGCCCCTGCAGCAACGTGTC

CACAGTGCAGTGTACACACGGCATCAGGCCCGTGGTGTCTACACAGCTGCTGCTGA

ATGGCAGCCTGGCCGAGGAAGAGGTGGTCATCAGAAGCGAGAATTTCACCAACAA CGCCAAGACCATCATCGTGCAGCTGAACGAGAGCGTGGAAATCAACTGCACCCGGC

CTAACAACAACACCCGGAAGTCCATCAACATCGGCCCTGGCAGAGCCTTCTACACA

ACCGGCGAGATCATCGGCGACATCAGACAGGCCCACTGCAACCTGTCTCGGGCCAA

GTGGAACGACACCCTGAACAAGATTGTGATCAAGCTGAGAGAGCAGTTCGGCAACA

AGACGATCGTGTTCAAGCACAGCTCTGGCGGCGACCCTGAGATCGTGACCCACAGC

TTTAATTGTGGCGGCGAGTTCTTCTACTGCAACAGCACCCAGCTGTTCAACTCCACC

TGGAATGTGACCGAGGAAAGCAACAATACCGTCGAGAACAACACCATCACACTGCC

CTGCCGGATCAAGCAGATCATCAATATGTGGCAAGAAGTCGGCAGGGCTATGTACG

CCCCTCCTATCAGAGGCCAGATCCGGTGCAGCAGCAATATCACAGGCCTGCTGCTC

ACCAGAGATGGCGGCCCTGAGGATAACAAGACCGAGGTGTTCAGACCCGGCGGAG

GCGACATGAGAGACAATTGGAGAAGCGAGCTGTACAAGTACAAGGTGGTCAAGAT

CGAGCCCCTGGGCGTCGCCCCTACAAAGGCTAAGAGAAGAGTGGTGCAGCGGGAA

[00150] TZ97008-rgpl20; UCSC 1374; codon optimized (SEQ ID NO:23)

[00151 ] MGGAAARLGAVILFVVIVGLHGVRG^FALADASL^MADPNR RG^DLPVL

DgLLEVPVWKEAKTTLFCASEAKGYEKEVHNVWATHACVPTDPSPHELVLENVTENF

NMWENDMVDOMHEDIISLWDOSLKPCVKLTPLCVTLNCTNVTGTNVTGNDMKGEMT

NCSFNATTEIKDRKKNVYALFYKLDVVOLEGNSSNSTYSTYRLINCNTSVITOACPK VSF

DPIPIHYCAPAGYAILKCNNKTFNGTGPCNNVSTVQCTHGIKPVVSTQLLLNGSLAE KEI

VIRS KNLTDNVKTIIVHLNESVEIT R^

E JKWNKTLQMVGEKLGKLFPNKTIKEPASGGDLEITTHSFNCRGEFFYCNTTKLFNSTY RPNANANSSSSNNTITLQCKIKQIINMWQEVGRAMYAPPIAGNITCTSNITGLLLVRDGG NNSTEEEIFRPGGGNMKDNWRSELYKYKVVEIKPLGVAPTGAKRRVyEREK VGroA VFLGFLGA

[00152] gD signal sequence is underlined; mature N-terminal gD purification tag is italicized; linker sequence is in bold. Dotted line ( ): Location of basic residues that are targets for furin and trypsin like enzymes. Translational stop codons for C-terminal purification tags can be incorporated at the beginning to this sequence. If a C-terminal purification tag may be included, then stop codon can be inserted at either the beginning or end of the sequence.

Broken line ( ): C-terminal or 3' sequences not required for expression. V1/V2 domain is double underlined and V3 domain is indicated with a wavy line.

[00153] TZ97008-rgpl20; UCSC1374; codon optimized (SEQ ID NO:24) [00154] ATGGGGGGGGCTGCCGCCAGGTTGGGGGCCGTGATTTTGTTTGTCGTC

ATAGTGGGCCTCCATGGGGTCCGCGGCAAATATGCCTTGGCGGATGCCTCTCTCAAG AT

GGCCGA CCCCAA TCGA TTTCGCGGCAAAGA CCTTCCGGTCCTGGA CCA GCTGCTGGAG

GTACCAGTGTGGAAAGAGGCCAAGACCACACTGTTCTGTGCCAGCGAGGCCAAGGG

CTACGAGAAAGAGGTGCACAACGTCTGGGCCACACACGCCTGTGTGCCTACCGATC

CTTCTCCTCACGAACTGGTGCTGGAAAACGTGACCGAGAACTTCAACATGTGGGAG

AACGACATGGTGGACCAGATGCACGAGGACATCATCAGCCTGTGGGACCAGAGCCT

GAAGCCTTGCGTGAAGCTGACCCCTCTGTGCGTGACCCTGAACTGCACCAATGTGA

CCGGCACCAACGTGACAGGGAACGATATGAAGGGCGAGATGACCAACTGCAGCTT

CAACGCCACCACCGAGATCAAGGACCGGAAGAAAAACGTGTACGCCCTGTTCTACA

AGCTGGACGTGGTGCAGCTGGAAGGCAACAGCAGCAACTCCACCTACAGCACCTAC

CGGCTGATCAACTGCAACACCAGCGTGATCACCCAGGCCTGTCCTAAGGTGTCCTTC

GATCCCATTCCTATCCACTACTGTGCCCCTGCCGGCTACGCCATCCTGAAGTGCAAC

AACAAGACCTTCAACGGCACAGGCCCCTGCAACAACGTGTCCACCGTGCAGTGTAC

CCACGGCATCAAGCCAGTGGTGTCCACACAGCTGCTGCTGAATGGAAGCCTGGCCG

AGAAAGAAATCGTGATCAGAAGCAAGAACCTGACCGACAACGTCAAGACCATCAT

CGTGCACCTGAACGAGAGCGTGGAAATCACCTGTATCAGACCCGGCAACAACACCA

GAAAGAGCATCAGAATCGGCCCAGGCCAGGCCTTTTATGCCACCGGCGATATCATC

GGCAACATCAGACAGGCCCACTGTAACATCAGCGAGGACAAGTGGAACAAGACCC

TGCAGATGGTCGGAGAGAAGCTGGGCAAGCTGTTCCCCAACAAGACAATCAAGTTC

GAGCCCGCCTCTGGCGGCGACCTGGAAATTACCACACACAGCTTCAATTGTCGGGG

CGAGTTCTTCTACTGCAATACCACCAAGCTGTTTAATAGCACCTACAGGCCCAACGC

CAATGCCAACAGCTCCAGCTCCAACAACACTATCACCCTGCAGTGCAAGATCAAGC

AGATCATCAATATGTGGCAAGAAGTCGGCAGGGCTATGTACGCCCCTCCTATCGCC

GGCAACATTACCTGCACCAGCAACATCACAGGCCTGCTGCTCGTTAGAGATGGCGG

CAACAATAGCACCGAGGAAGAGATCTTCAGACCTGGCGGCGGAAACATGAAGGAC

AACTGGCGGAGCGAGCTGTACAAGTACAAGGTGGTCGAGATTAAGCCCCTGGGCGT

TGCACCTACTGGCGCCAAGAGAAGAGTGGTGGAACGCGAGAAGAGAGCCGTTGGA

A GGG£GQQ LG T£G GG.GA J^Q GQGAJ JCJ.

[00155] gD signal sequence encoding sequence is underlined; mature N-terminal gD purification tag is italicized; linker sequence is in bold. Dotted line ( ): C-terminal or 3' sequences not required for expression. [00156] CN97001_ D179N-rgpl20 codon optimized (SEQ ID NO:25)

[00157] VPVWKEATTTLFCASDAKAYDTEVRNVWATHACVPADPNPQEMVLEN

VTENFNMWKNEMVNOMOEDVISLWDOSLKPCVKLTPLCVTLECRNVSSNSNGAHNET

YHESMKEMKNCSFNATTVVRDRKOTVYALFYRLNIVPLTKKNSSENSSEYYRLINCN TS

AITOACPKVTFDPIPIHYCTPAGYAILKCNDKIFNGTGPCHNVSTVOCTHGIKPVVS TOLL

LNGSLAEGEIIIRSENLTNNVKTIIVHLNQSVEIVCTRPGNNTRKSIRIGPGQTFYA TGDIIG

DIRQAHCNISEDKWNETLQRVSKKLAEHFQNKTIKFASSSGGDLEITTHSFNCRGEF FYC

NTSGLFNGTYTPNGTKSNSSSIITIPCRIKQIINMWQEVGRAMYAPPIEGNITCKSN ITGLL

LVRDGGTEPNDTETFRPGGGDMRNNWRSELYKYKVVEIKPLGVAPTTA

[00158] V1/V2 domain is double underlined and V3 domain is underlined.

[00159] CN97001_ D179N-rgpl20; UCSC199; codon optimized (SEQ ID NO:26)

[00160] MGGAAARLGAVILFVVIVGLHGVRG^FALADASL^MADPNR RG^DLPVL

DgLLEVPVWKEATTTLFCASDAKAYDTEVRNVWATHACVPADPNPQEMVLENVTEN

FNMWKNEMVNQMQEDVISLWDQSLKPCVKLTPLCVTLECRNVSSNSNGAHNETYHES

MKEMKNCSFNATTVVRDRKQTVYALFYRLNIVPLTKKNSSENSSEYYRLINCNTSAI TQ

ACPKVTFDPIPIHYCTPAGYAILKCNDKIFNGTGPCHNVSTVQCTHGIKPVVSTQLL LNG

SLAEGEIIIRSENLTNNVKTIIVHLNQSVEIVCTRPGNNTRKSIRIGPGQTFYATGD IIGDIR

QAHCNISEDKWNETLQRVSKKLAEHFQNKTIKFASSSGGDLEITTHSFNCRGEFFYC NTS

GLFNGTYTPNGTKSNSSSIITIPCRIKQIINMWQEVGRAMYAPPIEGNITCKSNITG LLLVR

DGGTEPNDTETFRPGGGDMRNNWRSELYKYKVVEIKPLGVAPTTA RMVERE .AV

GIGAVFLGFLGV*

[00161] gD signal sequence is underlined; mature N-terminal gD purification tag is italicized; linker sequence is in bold. Dotted line ( ): Location of basic residues that are targets for furin and trypsin like enzymes. Translational stop codons for C-terminal purification tags can be incorporated at the beginning to this sequence. If a C-terminal purification tag may be included, then stop codon can be inserted at either the beginning or end of the sequence.

Broken line ( ): C-terminal or 3' sequences not required for expression. * indicates location to insert translational stop codon or C-terminal purification Tag.

[00162] CN97001_D179N-rgpl20; UCSC 199; codon optimized (SEQ ID NO:27):

[00163] ATGGGGGGGGCTGCCGCCAGGTTGGGGGCCGTGATTTTGTTTGTCGTC ATAGTGGGCCTCCATGGGGTCCGCGGCAAA TATGCCTTGGCGGATGCCTCTCTCAAGAT GGCCGA CCCCAA TCGA TTTCGCGGCAAAGA CCTTCCGGTCCTGGA CCA GCTGCTGGAG GTACCAGTGTGGAAGGAAGCCACCACAACCCTCTTCTGCGCCAGCGATGCCAAGGC

CTACGACACGGAGGTCCGCAATGTGTGGGCCACCCATGCCTGTGTGCCCGCCGACC

CCAACCCCCAGGAGATGGTCCTGGAGAATGTCACGGAGAACTTCAACATGTGGAAG

AACGAGATGGTGAACCAGATGCAGGAGGACGTCATCTCCCTGTGGGACCAGAGCCT

GAAACCCTGCGTCAAACTGACACCCCTCTGTGTGACCCTGGAGTGCAGGAACGTGT

CCTCCAACAGCAACGGCGCCCACAACGAGACCTACCACGAAAGCATGAAAGAGAT

GAAGAACTGCAGCTTCAATGCCACAACCGTGGTGCGGGACCGGAAGCAGACGGTGT

ACGCGTTGTTCTACCGGCTGAATATCGTCCCCCTCACGAAGAAAAATTCCAGCGAG

AACTCCTCCGAGTATTATCGCCTGATCAACTGCAACACCAGCGCCATCACGCAGGC

CTGCCCCAAAGTGACCTTCGACCCCATCCCCATCCACTACTGCACCCCAGCTGGGTA

CGCCATCCTGAAGTGCAATGACAAAATCTTCAACGGCACAGGCCCCTGCCACAATG

TGAGCACCGTCCAGTGCACCCACGGCATCAAGCCAGTGGTCTCCACCCAGCTCCTCC

TGAATGGGAGCCTGGCAGAGGGCGAGATCATCATCCGCTCCGAGAACCTGACCAAC

AATGTCAAGACCATCATCGTCCACCTGAATCAGTCCGTGGAGATCGTCTGCACCAG

ACCCGGCAACAACACGCGGAAAAGCATCCGCATCGGCCCAGGGCAGACCTTCTATG

CCACGGGGGACATCATTGGGGACATCAGGCAGGCCCACTGCAACATCAGCGAAGA

CAAGTGGAACGAAACCCTGCAGCGGGTGTCCAAAAAACTCGCCGAGCACTTCCAGA

ACAAGACGATCAAGTTCGCATCCTCCAGCGGGGGGGACCTGGAGATCACCACGCAC

AGCTTCAACTGCCGGGGGGAATTTTTCTACTGCAACACCTCCGGGCTGTTCAACGGG

ACCTACACCCCCAACGGCACCAAGTCCAACTCCAGCAGCATCATCACCATCCCATG

CAGGATCAAGCAGATCATCAACATGTGGCAGGAGGTGGGCCGGGCCATGTACGCCC

CCCCCATCGAGGGCAATATCACCTGCAAGTCCAACATCACGGGGCTGCTGCTGGTG

CGGGATGGGGGGACCGAGCCCAACGACACCGAGACCTTCAGGCCAGGGGGGGGGG

ATATGCGGAACAACTGGCGCAGCGAGCTCTACAAGTACAAAGTGGTGGAGATCAA

ACCCCTGGGGGTGGCCCCCACCACAGCCAAACGCAGGATGGTGGAGCGGGAGAAG

CGGGCAGTGGGCATTGGGGCCGTGTTCTTGGGCTTCCTtGGCGtG

[00164] gD signal sequence encoding sequence is underlined; mature N-terminal gD purification tag encoding sequence is italicized; linker sequence encoding sequence is in bold.

[00165] A244_N334-rgpl40 ; codon optimized (SEQ ID NO:5)

[00166] MRVKETQMNWPNLWKWGTLILGLVIICSA5DNL 7YyyGVPVWKEADT TLFCASDAKAHETEVHNVWATHACVPTDPNPQEIDLENVTENFNMWKNNMVEQMQE DVISLWDQSLKPCVKLTPLCVTLHCTNANLTKANLTNVNNRTNVSNIIGNITDEVRNCS FNMTTELRDKKQKVHALFYKLDIVPIEDNNDSSEYRLINCNTSVIKQACPKISFDPIPIH Y CTPAGYAILKCNDKNFNGTGPCKNVSSVQCTHGIKPVVSTQLLLNGSLAEEEIIIRSDNL T

NNAKTIIVHLNKSVVINCTRPSNNTRTSITIGPGQVFYRTGDIIGDIRKAYCEINGT EWNK

ALKQVTEKLKEHFNNKPIIFQPPSGGDLEITMHHFNCRGEFFYCNTTRLFNNTCIAN GTIE

GCNGNITLPCKIKQIINMWQGAGQAMYAPPISGTINCVSNITGILLTRDGGATNNTN NET

FRPGGGNIKDNWRNELYKYKVVQIEPLGVAPTRAKRRVVEREKRAVGIGAMIFGFLG A

AGSTMGAASITLTVQARQLLSGIVQQQSNLLRAIEAQQHLLQLTVWGIKQLQARVLA V

ERYLKDQKFLGLWGCSGKIICTTAVPWNSTWSNKSLEEIWSNMTWIEWEREISNYTN QI

YEILTKSQDQQDRNEKDLLELDKWASLWTWFDITN WLWYIK

[00167] Wild type HIV signal sequence is underlined. Mature N-terminal HIV envelope sequences for gpl40 trimers is italicized.

[00168] A244_N334-rgpl40; codon optimized (SEQ ID NO:6)

[00169] ATGAGAGTGAAGGAGACACAGATGAATTGGCCAAACTTGTGGAAATG

GGGG ACTTTG ATCCTTGGGTTGGTG ATA ATTTGT AGTGCC TCAGA CAA CTTGTGGGTT

ACAG77TA7TArGGGGTACCAGTGTGGAAGGAAGCCGACACAACCCTCTTCTGCGCC

AGCGATGCCAAGGCCCACGAGACGGAGGTCCACAATGTGTGGGCCACCCATGCCTG

TGTGCCCACGGACCCCAACCCCCAGGAGATTGACCTGGAGAATGTCACGGAGAACT

TCAACATGTGGAAGAACAACATGGTGGAGCAGATGCAGGAGGACGTCATCTCCCTG

TGGGACCAGAGCCTGAAACCCTGCGTCAAACTGACACCCCCCTGTGTGACCCTGCA

CTGCACGAACGCCAACCTGACCAAGGCCAACCTCACCAACGTGAACAATCGGACCA

ACGTGTCCAACATCATCGGGAACATCACAGATGAGGTGAGGAACTGCAGCTTCAAT

ATGACAACCGAGCTCCGGGACAAAAAGCAGAAGGTGCACGCGTTGTTCTACAAACT

GGATATCGTCCCCATCGAGGACAATAATGACAGCTCCGAGTATCGCCTGATCAACT

GCAACACCAGCGTCATCAAACAGGCCTGCCCCAAAATTTCCTTCGACCCCATCCCCA

TCCACTACTGCACCCCAGCTGGGTACGCCATCCTGAAGTGCAATGACAAGAACTTC

AACGGCACAGGGCCCTGCAAGAATGTGAGCTCCGTCCAGTGCACCCACGGCATCAA

GCCAGTGGTCTCCACCCAGCTCCTCCTGAATGGGAGCCTGGCAGAGGAAGAGATCA

TCATCCGCTCCGAGAACCTGACCAACAATGCCAAGACCATCATCGTCCACCTGAAT

AAGTCCGTGGTCATCAACTGCACCAGACCCAGCAACAACACGCGGACCAGCATCAC

CATCGGCCCAGGGCAGGTCTTCTATAGGACGGGGGACATCATTGGGGACATCAGGA

AGGCCTACTGCGAAATCAATGGGACCGAGTGGAACAAAGCCCTGAAACAGGTGAC

CGAAAAACTCAAGGAGCACTTCAACAACAAGCCAATCATCTTCCAGCCCCCCAGCG

GGGGGGACCTGGAGATCACCATGCACCATTTCAACTGCCGGGGGGAATTCTTCTAC

TGCAACACCACCCGCCTGTTCAACAACACCTGCATCGCCAACGGCACCATCGAGGG CTGCAATGGCAACATCACCCTCCCATGCAAAATCAAGCAGATCATCAACATGTGGC

AGGGGGCAGGCCAGGCCATGTACGCCCCCCCCATCTCCGGCACGATCAACTGCGTG

TCCAACATCACGGGGATCCTGCTGACCCGGGATGGGGGGGCTACCAACAATACGAA

CAATGAGACCTTCAGGCCAGGGGGGGGGAACATCAAAGACAACTGGCGCAATGAG

CTCTACAAGTACAAAGTGGTGCAGATCGAGCCCCTGGGGGTGGCCCCCACCCGGGC

CAAACGCAGGGTGGTGGAGCGGGAGAAGCGGGCAGTGGGCATTGGGGCCATGATC

TTCGGCTTTCTGGGAGCCGCCGGATCTACAATGGGAGCTGCCAGCATCACCCTGACC

GTGCAGGCTAGACAACTGCTGTCTGGCATCGTGCAGCAGCAGAGCAATCTGCTGAG

AGCCATTGAGGCCCAGCAGCATCTGCTGCAGCTGACAGTGTGGGGCATCAAACAGC

TGCAGGCCAGAGTGCTGGCCGTGGAAAGATACCTGAAGGACCAGAAATTCCTCGGC

CTGTGGGGCTGCAGCGGCAAGATCATCTGTACAACAGCCGTGCCTTGGAACAGCAC

CTGGTCCAACAAGAGCCTGGAAGAGATCTGGTCCAATATGACCTGGATCGAGTGGG

AGAGAGAGATCAGCAACTACACCAACCAGATCTACGAGATCCTGACCAAGAGCCA

GGACCAGCAGGACCGGAACGAGAAGGATCTGCTGGAACTGGACAAGTGGGCCAGC

CTGTGGACTTGGTTTGACATCACCAACTGGCTGTGGTACATCAAG

[00170] Wild type HIV signal sequence encoding nucleic acid sequence is underlined.

Mature N-terminal HIV envelope sequences encoding nucleic acid sequence for gpl40 trimers is italicized.

[00171] A244_N332-rgpl40 (SEQ ID NO:7)

[00172] MRVKETQMNWPNLWKWGTLILGLVIICSA5DNL 7YyyGVPVWKEADT

TLFCASDAKAHETEVHNVWATHACVPTDPNPQEIDLENVTENFNMWKNNMVEQMQE

DVISLWDQSLKPCVKLTPLCVTLHCTNANLTKANLTNVNNRTNVSNIIGNITDEVRN CS

FNMTTELRDKKQKVHALFYKLDIVPIEDNNDSSEYRLINCNTSVIKQACPKISFDPI PIHY

CTPAGYAILKCNDKNFNGTGPCKNVSSVQCTHGIKPVVSTQLLLNGSLAEEEIIIRS DNLT

NNAKTIIVHLNKSVVINCTRPSNNTRTSITIGPGQVFYRTGDIIGDIRKAYCNISGT EWNK

ALKQVTEKLKEHFNNKPIIFQPPSGGDLEITMHHFNCRGEFFYCNTTRLFNNTCIAN GTIE

GCNGNITLPCKIKQIINMWQGAGQAMYAPPISGTINCVSNITGILLTRDGGATNNTN NET

FRPGGGNIKDNWRNELYKYKVVQIEPLGVAPTRAKRRVVEREKRAVGIGAMIFGFLG A

AGSTMGAASITLTVQARQLLSGIVQQQSNLLRAIEAQQHLLQLTVWGIKQLQARVLA V

ERYLKDQKFLGLWGCSGKIICTTAVPWNSTWSNKSLEEIWSNMTWIEWEREISNYTN QI

YEILTKSQDQQDRNEKDLLELDKWASLWTWFDITNWLWYIK

[00173] Wild type HIV signal sequence is underlined. Mature N-terminal HIV envelope sequences for gpl40 trimers is italicized. [00174] A244_N332-rgpl40; codon optimized (SEQ ID NO:8)

[00175] ATGAGAGTGAAGGAGACACAGATGAATTGGCCAAACTTGTGGAAATG

GGGG ACTTTG ATCCTTGGGTTGGTG ATA ATTTGT AGTGCC TCAGA CAA CTTGTGGGTT

ACAG77TA7TArGGGGTACCAGTGTGGAAGGAAGCCGACACAACCCTCTTCTGCGCC

AGCGATGCCAAGGCCCACGAGACGGAGGTCCACAATGTGTGGGCCACCCATGCCTG

TGTGCCCACGGACCCCAACCCCCAGGAGATTGACCTGGAGAATGTCACGGAGAACT

TCAACATGTGGAAGAACAACATGGTGGAGCAGATGCAGGAGGACGTCATCTCCCTG

TGGGACCAGAGCCTGAAACCCTGCGTCAAACTGACACCCCCCTGTGTGACCCTGCA

CTGCACGAACGCCAACCTGACCAAGGCCAACCTCACCAACGTGAACAATCGGACCA

ACGTGTCCAACATCATCGGGAACATCACAGATGAGGTGAGGAACTGCAGCTTCAAT

ATGACAACCGAGCTCCGGGACAAAAAGCAGAAGGTGCACGCGTTGTTCTACAAACT

GGATATCGTCCCCATCGAGGACAATAATGACAGcTCCGAGTATCGCCTGATCAACTG

CAACACCAGcGTCATCAAACAGGCCTGCCCCAAAATTTCCTTCGACCCCATCCCCAT

CCACTACTGCACCCCAGCTGGGTACGCCATCCTGAAGTGCAATGACAAGAACTTCA

ACGGCACAGGGCCCTGCAAGAATGTGAGCTCCGTCCAGTGCACCCACGGCATCAAG

CCAGTGGTCTCCACCCAGCTCCTCCTGAATGGGAGCCTGGCAGAGGAAGAGATCAT

CATCCGCTCCGAGAACCTGACCAACAATGCCAAGACCATCATCGTCCACCTGAATA

AGTCCGTGGTCATCAACTGCACCAGACCCAGCAACAACACGCGGACCAGCATCACC

ATCGGCCCAGGGCAGGTCTTCTATAGGACGGGGGACATCATTGGGGACATCAGGAA

GGCCTACTGCAACATCAGTGGGACCGAGTGGAACAAAGCCCTGAAACAGGTGACC

GAAAAACTCAAGGAGCACTTCAACAACAAGCCAATCATCTTCCAGCCCCCCAGCGG

GGGGGACCTGGAGATCACCATGCACCATTTCAACTGCCGGGGGGAATTCTTCTACT

GCAACACCACCCGCCTGTTCAACAACACCTGCATCGCCAACGGCACCATCGAGGGC

TGCAATGGCAACATCACCCTCCCATGCAAAATCAAGCAGATCATCAACATGTGGCA

GGGGGCAGGCCAGGCCATGTACGCCCCCCCCATCTCCGGCACGATCAACTGCGTGT

CCAACATCACGGGGATCCTGCTGACCCGGGATGGGGGGGCTACCAACAATACGAAC

AATGAGACCTTCAGGCCAGGGGGGGGGAACATCAAAGACAACTGGCGCAATGAGC

TCTACAAGTACAAAGTGGTGCAGATCGAGCCCCTGGGGGTGGCCCCCACCCGGGCC

AAACGCAGGGTGGTGGAGCGGGAGAAGCGGGCAGTGGGCATTGGGGCCATGATCT

TCGGCTTTCTGGGAGCCGCCGGATCTACAATGGGAGCTGCCAGCATCACCCTGACC

GTGCAGGCTAGACAACTGCTGTCTGGCATCGTGCAGCAGCAGAGCAATCTGCTGAG

AGCCATTGAGGCCCAGCAGCATCTGCTGCAGCTGACAGTGTGGGGCATCAAACAGC

TGCAGGCCAGAGTGCTGGCCGTGGAAAGATACCTGAAGGACCAGAAATTCCTCGGC CTGTGGGGCTGCAGCGGCAAGATCATCTGTACAACAGCCGTGCCTTGGAACAGCAC

CTGGTCCAACAAGAGCCTGGAAGAGATCTGGTCCAATATGACCTGGATCGAGTGGG

AGAGAGAGATCAGCAACTACACCAACCAGATCTACGAGATCCTGACCAAGAGCCA

GGACCAGCAGGACCGGAACGAGAAGGATCTGCTGGAACTGGACAAGTGGGCCAGC

CTGTGGACTTGGTTTGACATCACCAACTGGCTGTGGTACATCAAG

[00176] Wild type HIV signal sequence encoding nucleic acid sequence is underlined.

Mature N-terminal HIV envelope sequences encoding nucleic acid sequence for gpl40 trimers is italicized.

[00177] MN-rgpl40-N301_N332 ; (SEQ ID NO: 15)

[00178] MRVKGIRRNYQHWWGWGTMLLGLLMICSAr^L TYyyGVPVWKEA

TTTLFCASDAKAYDTEAHNVWATHACVPTDPNPQEVELVNVTENFNMWKNNMVEQM

HEDIISLWDQSLKPCVKLTPLCVTLNCTDLRNTTNTNNSTDNNNSKSEGTIKGGEMK NC

SFNITTSIGDKMQKEYALLYKLDIEPIDNDSTSYRLISCNTSVITQACPKISFEPIP IHYCAP

AGFAILKCNDKKFSGKGSCKNVSTVQCTHGIRPVVSTQLLLNGSLAEEEVVIRSEDF TDN

AKTIIVHLKESVQINCTRPNNNTRKRIHIGPGRAFYTTKNIKGTIRQAHCNISRAKW NDTL

RQIVSKLKEQFKNKTIVFNPSSGGDPEIVMHSFNCGGEFFYCNTSPLFNSIWNGNNT WN

NTTGSNNNITLQCKIKQIINMWQKVGKAMYAPPIEGQIRCSSNITGLLLTRDGGEDT DTN

DTEIFRPGGGDMRDNWRSELYKYKVVTIEPLGVAPTKAKRRVVQREKRAAIGALFLG F

LGAAGSTMGAASVTLTVQARLLLSGIVQQQNNLLRAIEAQQHMLQLTVWGIKQLQAR

VLAVERYLKDQQLLGFWGCSGKLICTTTVPWNASWSNKSLDDIWNNMTWMQWEREI

DNYTSLIYSLLEKSQTQQEKNEQELLELDKWASLWNWFDITNWLWYIK

[00179] Wild type HIV signal sequence is underlined. Mature N-terminal HIV envelope sequences for gpl40 trimers is italicized.

[00180] MN-rgpl40_N301_N332; (SEQ ID NO: 16)

[00181] ATGAGAGTGAAGGGGATCAGGAGGAATTATCAGCACTGGTGGGGATG

GGGCACGATGCTCCTTGGGTTATTAATGATCTGTAGTGCTACAGAAA4A7TGrGGGr C

ACAGrcrA7 ArGGGGT ACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCAT

CAGATGCTAAAGCATATGATACAGAGGCACATAATGTTTGGGCCACACATGCCTGT

GTACCCACAGACCCCAACCCACAAGAAGTAGAATTGGTAAATGTGACAGAAAATTT

TAACATGTGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTAT

GGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTAAATT

GCACTGATTTGAGGAATACTACTAATACCAATAATAGTACTGATAATAACAATAGT AAAAGCGAGGGAACAATAAAGGGAGGAGAAATGAAAAACTGCTCTTTCAATATCA

CCACAAGCATAGGAGATAAGATGCAGAAAGAATATGCACTTCTTTATAAACTTGAT

ATAGAACCAATAGATAATGATAGTACCAGCTATAGGTTGATAAGTTGTAATACCTC

AGTCATTACACAAGCTTGTCCAAAGATATCCTTTGAGCCAATTCCCATACACTATTG

TGCCCCGGCTGGTTTTGCGATTCTAAAGTGTAACGATAAAAAGTTCAGTGGAAAAG

GATCATGTAAAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTA

TCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATC

TGAGGATTTCACTGATAATGCTAAAACCATCATAGTACATCTGAAAGAATCTGTAC

AAATTAATTGTACAAGACCCAACAACAATACCAGAAAAAGGATACATATAGGACC

AGGGAGAGCATTTTATACAACAAAAAATATAAAAGGAACTATAAGACAAGCACAT

TGTAACATTAGTAGAGCAAAATGGAATGACACTTTAAGACAGATAGTTAGCAAGTT

AAAAGAACAATTTAAGAATAAAACAATAGTCTTTAATCCATCCTCAGGAGGGGACC

CAGAAATTGTAATGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATACAT

CACCACTGTTTAATAGTATTTGGAATGGTAATAATACTTGGAATAATACTACAGGGT

CAAATAACAATATCACACTTCAATGCAAAATAAAACAAATTATAAACATGTGGCAG

AAAGTAGGAAAAGCAATGTATGCCCCTCCCATTGAAGGACAAATTAGATGTTCATC

AAATATTACAGGGCTACTATTAACAAGAGATGGTGGTGAGGACACGGACACGAAC

GACACCGAGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTG

AATTATATAAATATAAAGTAGTAACAATTGAACCATTAGGAGTAGCACCCACCAAG

GCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGCGATAGGAGCTCTGTTCC

TTGGGTTCTTAGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAGTGACGCTGACG

GTACAGGCCAGACTATTATTGTCTGGTATAGTGCAACAGCAGAACAATTTGCTGAG

GGCCATTGAGGCGCAACAGCATATGTTGCAACTCACAGTCTGGGGCATCAAGCAGC

TCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTGGGG

TTTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTACTGTGCCTTGGAATGCTAGT

TGGAGTAATAAATCTCTGGATGATATTTGGAATAACATGACCTGGATGCAGTGGGA

AAGAGAAATTGACAATTACACAAGCTTAATATACTCATTACTAGAAAAATCGCAAA

CCCAACAAGAAAAGAATGAACAAGAATTATTGGAATTGGATAAATGGGCAAGTTTG

TGGAATTGGTTTGACATAACAAATTGGCTGTGGTATATAAAA

[00182] Wild type HIV signal sequence encoding nucleic acid sequence is underlined.

Mature N-terminal HIV envelope sequences encoding nucleic acid sequence for gpl40 trimers is italicized. [00183] BAL-rgpl40 (SEQ ID NO:20)

[00184] MRVTEIRKSYOHWWRWGIMLLGILMICNAEE^L TYyFGVPVWKEATT

TLFCASDAKAYDTEVHNVWATHACVPTDPNPQEVALENVTENFNMWKNNMVEQMH

EDIISLWDQSLKPCVKLTPLCVTLNCTDLRNATSRNVTNTTSSSRGMVGGGEMKNCS FN

ITTGIRGKVQKEYALFYELDIVPIDNKIDRYRLISCNTSVITQACPKVSFEPIPIHY CAPAGF

AILKCKDKKFNGKGPCSNVSTVQCTHGIRPVVSTQLLLNGSLAEEEVVIRSENFTNN AK

TIIVQLNESVEINCTRPNNNTRKSINIGPGRAFYTTGEIIGDIRQAHCNLSRAKWND TLNKI

VIKLREQFGNKTIVFKHSSGGDPEIVTHSFNCGGEFFYCNSTQLFNSTWNVTEESNN TVE

NNTITLPCRIKQIINMWQEVGRAMYAPPIRGQIRCSSNITGLLLTRDGGPEDNKTEV FRPG

GGDMRDNWRSELYKYKVVKIEPLGVAPTKAKRRVVQREKRAVGIGAVFLGFLGAAGS

TMGAASMTLTVQARLLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVER

YLRDQQLLGIWGCSGKLICTTAVPWNASWSNKSLNKIWDNMTWMEWDREINNYTSII Y

SLIEES QNQQEKNEQELLELD KW AS LWNWFDITKWLW YIK

[00185] Wild type HIV signal sequence is underlined. Mature N- terminal HIV envelope sequences for gpl40 trimers is italicized.

[00186] BAL-rgpl40 (SEQ ID NO:21)

[00187] ATGAGAGTGACGGAGATCAGGAAGAGTTATCAGCACTGGTGGAGATG

GGGCATCATGCTCCTTGGGATATTAATGATCTGTAATGCTGA4GAAA4A7TGrGGGr C

ACAGrcrAT ArGGGGT ACCTGTGTGGAAAGAGGCCACCACCACACTGTTCTGTGCCT

CCGATGCCAAGGCCTACGATACCGAGGTGCACAACGTGTGGGCCACTCATGCCTGC

GTGCCCACCGATCCTAATCCTCAAGAAGTGGCCCTGGAAAACGTGACCGAGAACTT

CAACATGTGGAAGAACAACATGGTCGAGCAGATGCACGAGGACATCATCAGCCTGT

GGGACCAGAGCCTGAAGCCTTGCGTGAAGCTGACCCCTCTGTGCGTGACCCTGAAC

TGCACCGACCTGAGAAACGCCACCAGCCGGAACGTGACCAATACCACCTCTAGCAG

CAGAGGCATGGTTGGAGGCGGCGAGATGAAGAACTGCAGCTTCAACATCACCACCG

GCATCAGAGGCAAGGTGCAGAAAGAGTACGCCCTGTTCTACGAGCTGGACATCGTG

CCCATCGACAACAAGATCGACCGGTACAGACTGATCAGCTGCAACACCAGCGTGAT

CACCCAGGCCTGTCCTAAGGTGTCCTTCGAGCCCATTCCTATCCACTACTGTGCCCC

TGCCGGCTTCGCCATCCTGAAGTGCAAGGACAAGAAGTTCAACGGCAAGGGCCCCT

GCAGCAACGTGTCCACAGTGCAGTGTACACACGGCATCAGGCCCGTGGTGTCTACA

CAGCTGCTGCTGAATGGCAGCCTGGCCGAGGAAGAGGTGGTCATCAGAAGCGAGA

ATTTCACCAACAACGCCAAGACCATCATCGTGCAGCTGAACGAGAGCGTGGAAATC

AACTGCACCCGGCCTAACAACAACACCCGGAAGTCCATCAACATCGGCCCTGGCAG AGCCTTCTACACAACCGGCGAGATCATCGGCGACATCAGACAGGCCCACTGCAACC

TGTCTCGGGCCAAGTGGAACGACACCCTGAACAAGATTGTGATCAAGCTGAGAGAG

CAGTTCGGCAACAAGACGATCGTGTTCAAGCACAGCTCTGGCGGCGACCCTGAGAT

CGTGACCCACAGCTTTAATTGTGGCGGCGAGTTCTTCTACTGCAACAGCACCCAGCT

GTTCAACTCCACCTGGAATGTGACCGAGGAAAGCAACAATACCGTCGAGAACAACA

CCATCACACTGCCCTGCCGGATCAAGCAGATCATCAATATGTGGCAAGAAGTCGGC

AGGGCTATGTACGCCCCTCCTATCAGAGGCCAGATCCGGTGCAGCAGCAATATCAC

AGGCCTGCTGCTCACCAGAGATGGCGGCCCTGAGGATAACAAGACCGAGGTGTTCA

GACCCGGCGGAGGCGACATGAGAGACAATTGGAGAAGCGAGCTGTACAAGTACAA

GGTGGTCAAGATCGAGCCCCTGGGCGTCGCCCCTACCAAGGCAAAGAGAAGAGTG

GTGCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTGTGTTCCTTGGGTTCTTGGG

AGCAGCAGGAAGCACTATGGGCGCAGCATCAATGACGCTGACGGTACAGGCCAGA

CTATTATTGTCTGGTATAGTGCAACAGCAGAACAATCTGCTGAGAGCTATTGAGGC

GCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATTAAGCAGCTCCAGGCAAGAG

TCCTGGCTGTGGAAAGATACCTAAGGGATCAACAGCTCCTGGGGATTTGGGGTTGC

TCTGGAAAACTCATCTGCACCACTGCCGTGCCTTGGAATGCTAGTTGGAGTAATAAA

TCTCTGAATAAGATTTGGGATAACATGACCTGGATGGAGTGGGACAGAGAAATTAA

CAATTACACAAGCATAATATACAGCTTAATTGAAGAATCGCAGAACCAACAAGAAA

AGAATGAACAAGAATTATTAGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTT

GACATAACAAAATGGCTGTGGTATATAAAA

[00188] Wild type HIV signal sequence encoding sequence is underlined. Mature N- terminal HIV envelope sequence encoding sequence for gpl40 trimers is italicized.

[00189] Clade C: TZ97008-rgpl20; UCSC 1374; codon optimized (SEQ ID NO:22)

[00190] VPVWKEAKTTLFCASEAKGYEKEVHNVWATHACVPTDPSPHELVLENV

TENFNMWENDMVDQMHEDIISLWDQSLKPCVKLTPLCVTLNCTNVTGTNVTGNDMK

GEMTNCSFNATTEIKDRKKNVYALFYKLDVVQLEGNSSNSTYSTYRLINCNTSVITQ AC

PKVSFDPIPIHYCAPAGYAILKCNNKTFNGTGPCNNVSTVQCTHGIKPVVSTQLLLN GSL

AEKEIVIRSKNLTDNVKTIIVHLNESVEITCIRPGNNTRKSIRIGPGQAFYATGDII GNIRQA

HCNISEDKWNKTLQMVGEKLGKLFPNKTIKEPASGGDLEITTHSFNCRGEFFYCNTT KL

FNSTYRPNANANSSSSNNTITLQCKIKQIINMWQEVGRAMYAPPIAGNITCTSNITG LLLV

RDGGNNSTEEEIFRPGGGNMKDNWRSELYKYKVVEIKPLGVAPTGAK [00191] BG505-rgpl20. Lll lA-rgpl20; codon optimized (SEQ ID NO:28)

[00192] MPMGSLOPLATLYLLGMLVASVLAAENL TyyyGVPVWKDAETTLFCA

SDAKAYETEKHNVWATHACVPTDPNPQEIHLENVTEEFNMWKNNMVEQMHTDIISAW

DQSLKPCVKLTPLCVTLQCTNVTNNITDDMRGELKNCSFNMTTELRDKKQKVYSLFY R

LDVVQINENQGNRSNNSNKEYRLINCNTSAITQACPKVSFEPIPIHYCAPAGFAILK CKD

KKFNGTGPCPSVSTVQCTHGIKPVVSTQLLLNGSLAEEEVMIRSENITNNAKNILVQ FNT

PVQINCTRPNNNTRKSIRIGPGQAFYATGDIIGDIRQAHCNVSKATWNETLGKVVKQ LR

KHFGNNTIIRFANSSGGDLEVTTHSFNCGGEFFYCNTSGLFNSTWISNTSVQGSNST GSN

DSITLPCRIKQIINMWQRIGQAMYAPPIQGVIRCVSNITGLILTRDGGSTNSTTETF RPGGG

DMRDNWRSELYKYKVVKIEPLGVAPTRAKSSVVGSEKSG

[00193] Wild type HIV signal sequence is underlined. Mature N-terminal HIV envelope sequences for gpl40 trimers is italicized.

[00194] BG505-rgpl20.Ll llA-rgpl20 (SEQ ID NO:29)

[00195] ATGCCTATGGGCAGCCTGCAGCCTCTGGCCACACTGTACCTGCTGGGC

ATGCTGGTGGCCTCTGTGCTGGCCGCCGAGAACCTGTGGGTGACAGTGTACTACGGC G

TGCCCGTGTGGAAGGACGCCGAGACAACCCTGTTCTGCGCCAGCGACGCCAAGGCC

TACGAGACAGAGAAGCACAACGTGTGGGCCACCCACGCCTGCGTGCCAACCGACCC

TAACCCCCAGGAAATCCACCTGGAAAACGTGACCGAAGAGTTCAACATGTGGAAGA

ACAACATGGTGGAACAGATGCACACCGACATCATCAGCGCCTGGGACCAGAGCCTG

AAGCCCTGCGTGAAGCTGACCCCCCTGTGCGTGACCCTGCAGTGCACCAACGTGAC

CAACAACATCACCGACGACATGCGGGGCGAGCTGAAGAACTGCAGCTTCAACATGA

CCACCGAGCTGCGGGACAAGAAACAGAAGGTGTACAGCCTGTTCTACCGGCTGGAC

GTGGTGCAGATCAACGAGAACCAGGGCAACAGAAGCAACAACAGCAACAAAGAGT

ACCGGCTGATCAACTGCAACACCAGCGCCATCACCCAGGCCTGCCCCAAGGTGTCC

TTCGAGCCCATCCCCATCCACTACTGCGCCCCTGCCGGCTTCGCCATCCTGAAGTGC

AAGGACAAGAAGTTCAACGGCACCGGCCCCTGCCCCAGCGTGTCCACAGTGCAGTG

TACCCACGGCATCAAGCCCGTGGTGTCCACCCAGCTGCTGCTGAACGGCAGCCTGG

CCGAAGAGGAAGTGATGATCAGAAGCGAGAACATCACCAACAACGCCAAGAACAT

CCTGGTGCAGTTCAACACCCCCGTGCAGATTAACTGCACCCGGCCCAACAACAACA

CCAGAAAGAGCATCCGGATCGGCCCAGGCCAGGCCTTCTACGCCACCGGCGACATC

ATCGGCGACATCCGGCAGGCCCACTGCAACGTGTCCAAGGCCACCTGGAACGAGAC

ACTGGGCAAGGTGGTGAAACAGCTGCGGAAGCACTTCGGGAACAACACCATCATCC

GCTTCGCCAACAGCTCTGGCGGCGACCTGGAAGTGACCACCCACAGCTTCAACTGT GGCGGCGAGTTCTTCTACTGCAATACCTCCGGCCTGTTCAACAGCACCTGGATCAGC

AATACCAGCGTGCAGGGCAGCAACAGCACCGGCAGCAACGACAGCATCACCCTGC

CCTGCCGGATCAAGCAGATCATCAATATGTGGCAGCGGATTGGCCAGGCTATGTAC

GCCCCACCCATCCAGGGCGTGATCAGATGCGTGTCCAATATCACCGGCCTGATCCTG

ACCCGGGACGGCGGCTCTACCAACAGCACCACCGAAACCTTCAGACCCGGCGGAGG

CGACATGAGAGACAACTGGCGGAGCGAGCTGTACAAGTACAAAGTGGTGAAAATC

GAGCCCCTGGGCGTGGCCCCCACCAGAGCCAAGAGCAGCGTGGTCGGAAGCGAGA

AGTCCGGC

[00196] Wild type HIV signal sequence encoding sequence is underlined. Mature N- terminal HIV envelope sequence encoding sequence for gpl40 trimers is italicized.

[00197] BG505-rgpl40; not codon optimized (SEQ ID NO:30)

[00198] MRVMGIORNCOHLFRWGTMILGMIIICSAAENL TYyyGVPVWKDAETT

LFCASDAKAYETEKHNVWATHACVPTDPNPQEIHLENVTEEFNMWKNNMVEQMHTDI

ISAWDQSLKPCVKLTPLCVTLQCTNVTNNITDDMRGELKNCSFNMTTELRDKKQKVY S

LFYRLDVVQINENQGNRSNNSNKEYRLINCNTSAITQACPKVSFEPIPIHYCAPAGF AILK

CKDKKFNGTGPCPSVSTVQCTHGIKPVVSTQLLLNGSLAEEEVMIRSENITNNAKNI LVQ

FNTPVQINCTRPNNNTRKSIRIGPGQAFYATGDIIGDIRQAHCNVSKATWNETLGKV VK

QLRKHFGNNTIIRFANSSGGDLEVTTHSFNCGGEFFYCNTSGLFNSTWISNTSVQGS NST

GSNDSITLPCRIKQIINMWQRIGQAMYAPPIQGVIRCVSNITGLILTRDGGSTNSTT ETFRP

GGGDMRDNWRSELYKYKVVKIEPLGVAPTRAKRRVVGREKRAVGIGAVFLGFLGAAG

STMGAASMTLTVQARNLLSGIVQQQSNLLRAIEAQQHLLKLTVWGIKQLQARVLAVE R

YLRDQQLLGIWGCSGKLICTTNVPWNSSWSNRNLSEIWDNMTWLQWDKEISNYTQII Y

GLLEES QNQQEKNEQDLL ALD KW AS LWNWFDISNWLW YIK

[00199] Wild type HIV signal sequence is underlined. Mature N- terminal HIV envelope sequences for gpl40 trimers is italicized.

[00200] BG505-rgpl40; not codon optimized (SEQ ID NO:31)

[00201] ATGAGAGTGATGGGGATACAGAGGAATTGTCAGCACTTATTCAGATG

GGGAACTATGATCTTGGGGATGATAATAATCTGTAGTGCAGCAGAAA4C7 GrGGGr

CACrGrcrACrArGGGGTGCCCGTGTGGAAGGACGCCGAGACAACCCTGTTCTGCGC

CAGCGACGCCAAGGCCTACGAGACAGAGAAGCACAACGTGTGGGCCACCCACGCC

TGCGTGCCAACCGACCCTAACCCCCAGGAAATCCACCTGGAAAACGTGACCGAAGA

GTTCAACATGTGGAAGAACAACATGGTGGAACAGATGCACACCGACATCATCAGCG CCTGGGACCAGAGCCTGAAGCCCTGCGTGAAGCTGACCCCCCTGTGCGTGACCCTG

CAGTGCACCAACGTGACCAACAACATCACCGACGACATGCGGGGCGAGCTGAAGA

ACTGCAGCTTCAACATGACCACCGAGCTGCGGGACAAGAAACAGAAGGTGTACAG

CCTGTTCTACCGGCTGGACGTGGTGCAGATCAACGAGAACCAGGGCAACAGAAGCA

ACAACAGCAACAAAGAGTACCGGCTGATCAACTGCAACACCAGCGCCATCACCCAG

GCCTGCCCCAAGGTGTCCTTCGAGCCCATCCCCATCCACTACTGCGCCCCTGCCGGC

TTCGCCATCCTGAAGTGCAAGGACAAGAAGTTCAACGGCACCGGCCCCTGCCCCAG

CGTGTCCACAGTGCAGTGTACCCACGGCATCAAGCCCGTGGTGTCCACCCAGCTGCT

GCTGAACGGCAGCCTGGCCGAAGAGGAAGTGATGATCAGAAGCGAGAACATCACC

AACAACGCCAAGAACATCCTGGTGCAGTTCAACACCCCCGTGCAGATTAACTGCAC

CCGGCCCAACAACAACACCAGAAAGAGCATCCGGATCGGCCCAGGCCAGGCCTTCT

ACGCCACCGGCGACATCATCGGCGACATCCGGCAGGCCCACTGCAACGTGTCCAAG

GCCACCTGGAACGAGACACTGGGCAAGGTGGTGAAACAGCTGCGGAAGCACTTCG

GGAACAACACCATCATCCGCTTCGCCAACAGCTCTGGCGGCGACCTGGAAGTGACC

ACCCACAGCTTCAACTGTGGCGGCGAGTTCTTCTACTGCAATACCTCCGGCCTGTTC

AACAGCACCTGGATCAGCAATACCAGCGTGCAGGGCAGCAACAGCACCGGCAGCA

ACGACAGCATCACCCTGCCCTGCCGGATCAAGCAGATCATCAATATGTGGCAGCGG

ATTGGCCAGGCTATGTACGCCCCACCCATCCAGGGCGTGATCAGATGCGTGTCCAA

TATCACCGGCCTGATCCTGACCCGGGACGGCGGCTCTACCAACAGCACCACCGAAA

CCTTCAGACCCGGCGGAGGCGACATGAGAGACAACTGGCGGAGCGAGCTGTACAA

GTACAAAGTGGTGAAAATCGAGCCCCTGGGCGTGGCCCCCACCAGAGCCAAGAGA

AGAGTGGTGGGGAGAGAAAAAAGAGCAGTTGGAATAGGAGCTGTCTTCCTTGGGTT

CTTAGGAGCAGCAGGAAGCACTATGGGCGCGGCGTCAATGACGCTGACGGTACAG

GCCAGAAATTTATTATCTGGCATAGTGCAACAGCAAAGCAATTTGCTGAGGGCTAT

AGAGGCTCAACAACATCTGTTGAAACTCACGGTCTGGGGCATTAAACAGCTCCAGG

CAAGGGTCCTGGCTGTGGAAAGATACCTAAGGGATCAACAGCTTCTAGGAATTTGG

GGCTGCTCTGGAAAACTCATCTGCACCACTAATGTGCCCTGGAACTCTAGTTGGAGT

AATAGAAACCTGAGTGAGATATGGGACAACATGACCTGGCTGCAATGGGATAAAG

AAATTAGCAATTACACACAGATAATATATGGGCTACTTGAAGAATCGCAGAACCAG

CAGGAAAAGAATGAACAAGACTTATTGGCATTGGATAAGTGGGCAAGTCTGTGGAA

TTGGTTTGACATATCAAACTGGCTGTGGTATATAAAA

[00202] Wild type HIV signal sequence encoding sequence is underlined. Mature N- terminal HIV envelope sequence encoding sequence for gpl40 trimers is italicized. [00203] TV1.21-rgpl20 (SEQ ID NO:32)

[00204] MRVMGTOKNCOOWWIWGILGFWMLMICN7¾:DL 7YyyGVPVWREAK

TTLFCASDAKAYETEVHNVWATHACVPTDPNPQEIVLGNVTENFNMWKNDMADQMH

EDIISLWDQSLKPCVKLTPLCVTLNCTETNVTGNRTVIGNTNDTNIANATYKYEEMK NC

SFNVTTELRNKKHKEYALFYRLDIVPLNENGDNSKYRLINCNTSAITQACPKVSFDP IPIH

YCAPAGYAILKCNNKTFNGTGPCYNVSTVQCTHGIKPVVSTQLLLNGSLAEEGMIIR SE

NLTENTKTIIVHLNESVEINCTRPNNNTRKSVRIGPGQAFYATNDVIGDIRQAHCNI STDR

WNKTLQQVMKKLGEHFPNKTIQFKPHAGGDIEITMHSFNCRGEFFYCNTSNLFNSTY HS

NNGTYKYNGNSSSPITLQCKIKQIVRMWQGVGQAMYAPPIAGNITCRSNITGILLTR DG

GFNTTNNTETFRPGGGDMRDNWRSELYKYKVVEIKPLGIAPTj^ RVVQREKR

[00205] Wild type HIV signal sequence is underlined. Mature N-terminal gD purification

Tag is italicized. Dotted line ( ) indicates location of basic residues that are targets for furin and trypsin like enzymes. Translational stop codons for C-terminal purification tags can be incorporated at the beginning to this sequence. If a C-terminal purification tag is not included, then stop codon can be inserted at either the beginning or end of this sequence.

[00206] TV1.21-rgpl20; not codon optimized (SEQ ID NO:33)

[00207] ATGAGAGTGATGGGGACACAGAAGAATTGTCAACAATGGTGGATATG

GGGCATCTTAGGCTTCTGGATGCTAATGATTTGTAATACAA4GGAC7TGrGGGrCAC A

GTCTA TTA TGGGGT ACCTGTGTGG AG AG A AGC A A AA ACTACCCTATTCTGTGC ATC A

GATGCTAAAGCATATGAGACAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGT

GCCCACAGACCCCAACCCACAAGAAATAGTTTTGGGAAATGTAACAGAAAATTTTA

ATATGTGGAAAAATGACATGGCAGATCAGATGCATGAGGATATAATCAGTTTATGG

GATCAAAGCCTAAAGCCATGTGTAAAGTTGACCCCACTCTGTGTCACTTTAAACTGT

ACAGAGACAAATGTTACAGGTAATAGAACTGTTATAGGTAATACAAATGATACCAA

TATTGCAAATGCTACATATAAGTATGAAGAAATGAAAAATTGCTCTTTCAATGTAAC

CACAGAACTAAGAAATAAGAAACATAAGGAGTATGCACTCTTTTATAGACTTGACA

TAGTACCACTTAATGAGAATGGTGACAACTCTAAATATAGATTGATAAATTGCAAT

ACCTCAGCCATAACACAAGCCTGTCCAAAGGTCTCTTTTGACCCGATTCCTATACAT

TACTGTGCTCCAGCTGGTTATGCGATTCTAAAGTGTAATAATAAGACATTCAATGGG

ACAGGACCATGTTATAATGTCAGCACAGTACAATGTACACATGGAATTAAGCCAGT

GGTATCAACTCAACTACTGTTAAATGGTAGCCTAGCAGAAGAAGGGATGATAATTA

GATCTGAAAATTTGACAGAAAATACCAAAACAATAATAGTACATCTTAATGAATCT

GTAGAGATTAATTGTACAAGACCCAACAATAATACAAGAAAAAGTGTAAGGATAG GACCAGGACAAGCCTTCTATGCAACAAATGATGTAATAGGAGACATAAGACAAGC

ACATTGTAACATTAGTACAGATAGATGGAACAAAACTCTACAACAGGTAATGAAAA

AACTAGGAGAGCATTTCCCTAATAAAACAATACAATTTAAACCACATGCAGGAGGG

GATATAGAAATTACAATGCATAGCTTTAATTGTAGAGGAGAATTTTTCTATTGCAAT

ACATCAAACCTGTTTAATAGTACATACCACTCTAATAATGGTACATACAAATATAAT

GGTAATTCAAGCTCACCCATCACACTCCAATGCAAAATAAAACAAATTGTACGCAT

GTGGCAAGGGGTAGGACAAGCAATGTATGCCCCTCCCATTGCAGGAAACATAACAT

GTAGATCAAACATCACAGGAATACTATTGACACGCGATGGAGGATTTAACACCACA

AACAACACAGAGACATTCAGACCTGGAGGAGGAGATATGAGGGATAACTGGAGAA

GTGAACTATATAAATATAAAGTAGTAGAAATTAAGCCATTGGGAATAGCACCCACT

AAGGCAAAAAGAAGAGTGGTGCAGAGAGAAAAAAGA

[00208] Wild type HIV signal sequence encoding sequence is underlined. Mature N- terminal gD purification Tag encoding sequence is italicized.

[00209] TV1.21-rgpl40; not codon optimized (SEQ ID NO: 34)

[00210] MRVMGTOKNCOOWWIWGILGFWMLMICN7¾:DL 7YyyGVPVWREAK

TTLFCASDAKAYETEVHNVWATHACVPTDPNPQEIVLGNVTENFNMWKNDMADQMH

EDIISLWDQSLKPCVKLTPLCVTLNCTETNVTGNRTVIGNTNDTNIANATYKYEEMK NC

SFNVTTELRNKKHKEYALFYRLDIVPLNENGDNSKYRLINCNTSAITQACPKVSFDP IPIH

YCAPAGYAILKCNNKTFNGTGPCYNVSTVQCTHGIKPVVSTQLLLNGSLAEEGMIIR SE

NLTENTKTIIVHLNESVEINCTRPNNNTRKSVRIGPGQAFYATNDVIGDIRQAHCNI STDR

WNKTLQQVMKKLGEHFPNKTIQFKPHAGGDIEITMHSFNCRGEFFYCNTSNLFNSTY HS

NNGTYKYNGNSSSPITLQCKIKQIVRMWQGVGQAMYAPPIAGNITCRSNITGILLTR DG

GFNTTNNTETFRPGGGDMRDNWRSELYKYKVVEIKPLGIAPTKAKRRVVQREKRAVG I

GAVFLGFLGAAGSTMGAASITLTVQARQLLSGIVQQQSNLLKAIEAQQHMLQLTVWG I

KQLQARVLAIERYLKDQQLLGIWGCSGRLICTTAVPWNSSWSNKSEADIWDNMTWMQ

WDREINNYTEAIFRLLEDSQNQQEKNEKDLLELDKWNSLWNWFNISNWLWYIK

[00211] Wild type HIV signal sequence is underlined. Mature N-terminal gD purification

Tag is italicized.

[00212] TV1.21-rgpl40; not codon optimized (SEQ ID NO:35)

[00213] ATGAGAGTGATGGGGACACAGAAGAATTGTCAACAATGGTGGATATG GGGCATCTTAGGCTTCTGGATGCTAATGATTTGTAATACAA4GGAC7TGrGGGrCACA GTCTA TTA TGGGGT ACCTGTGTGG AG AG A AGC A A AA ACTACCCTATTCTGTGC ATC A GATGCTAAAGCATATGAGACAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGT

GCCCACAGACCCCAACCCACAAGAAATAGTTTTGGGAAATGTAACAGAAAATTTTA

ATATGTGGAAAAATGACATGGCAGATCAGATGCATGAGGATATAATCAGTTTATGG

GATCAAAGCCTAAAGCCATGTGTAAAGTTGACCCCACTCTGTGTCACTTTAAACTGT

ACAGAGACAAATGTTACAGGTAATAGAACTGTTATAGGTAATACAAATGATACCAA

TATTGCAAATGCTACATATAAGTATGAAGAAATGAAAAATTGCTCTTTCAATGTAAC

CACAGAACTAAGAAATAAGAAACATAAGGAGTATGCACTCTTTTATAGACTTGACA

TAGTACCACTTAATGAGAATGGTGACAACTCTAAATATAGATTGATAAATTGCAAT

ACCTCAGCCATAACACAAGCCTGTCCAAAGGTCTCTTTTGACCCGATTCCTATACAT

TACTGTGCTCCAGCTGGTTATGCGATTCTAAAGTGTAATAATAAGACATTCAATGGG

ACAGGACCATGTTATAATGTCAGCACAGTACAATGTACACATGGAATTAAGCCAGT

GGTATCAACTCAACTACTGTTAAATGGTAGCCTAGCAGAAGAAGGGATGATAATTA

GATCTGAAAATTTGACAGAAAATACCAAAACAATAATAGTACATCTTAATGAATCT

GTAGAGATTAATTGTACAAGACCCAACAATAATACAAGAAAAAGTGTAAGGATAG

GACCAGGACAAGCCTTCTATGCAACAAATGATGTAATAGGAGACATAAGACAAGC

ACATTGTAACATTAGTACAGATAGATGGAACAAAACTCTACAACAGGTAATGAAAA

AACTAGGAGAGCATTTCCCTAATAAAACAATACAATTTAAACCACATGCAGGAGGG

GATATAGAAATTACAATGCATAGCTTTAATTGTAGAGGAGAATTTTTCTATTGCAAT

ACATCAAACCTGTTTAATAGTACATACCACTCTAATAATGGTACATACAAATATAAT

GGTAATTCAAGCTCACCCATCACACTCCAATGCAAAATAAAACAAATTGTACGCAT

GTGGCAAGGGGTAGGACAAGCAATGTATGCCCCTCCCATTGCAGGAAACATAACAT

GTAGATCAAACATCACAGGAATACTATTGACACGCGATGGAGGATTTAACACCACA

AACAACACAGAGACATTCAGACCTGGAGGAGGAGATATGAGGGATAACTGGAGAA

GTGAACTATATAAATATAAAGTAGTAGAAATTAAGCCATTGGGAATAGCACCCACT

AAGGCAAAAAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCT

GTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAATAAC

GCTGACGGTACAGGCCAGACAACTGTTGTCTGGTATAGTGCAACAGCAAAGCAATT

TGCTGAAGGCTATAGAGGCGCAACAGCATATGTTGCAACTCACAGTCTGGGGCATT

AAGCAGCTCCAGGCGAGAGTCCTGGCTATAGAAAGATACCTAAAGGATCAACAGCT

CCTAGGGATTTGGGGCTGCTCTGGAAGACTCATCTGCACCACTGCTGTGCCTTGGAA

CTCCAGTTGGAGTAATAAATCTGAAGCAGATATTTGGGATAACATGACTTGGATGC

AGTGGGATAGAGAAATTAATAATTACACAGAAGCAATATTCAGGTTGCTTGAAGAC

TCGCAAAACCAGCAGGAAAAGAATGAAAAAGATTTATTAGAATTGGACAAGTGGA

ACAGTCTGTGGAATTGGTTTAACATATCAAACTGGCTGTGGTATATAAAA [00214] Wild type HIV signal sequence encoding sequence is underlined. Mature N- terminal gD purification Tag encoding sequence is italicized.

[00215] 1086C-rgpl20; not codon optimized (SEQ ID NO:36)

[00216] MRVRGIWKNWPQWLrWSILGFWIGNMEGS TVyyGVPVWKEAKTTLFC

ASDAKAYEKEVHNVWATHACVPTDPNPQEMVLANVTENFNMWKNDMVEQMHEDIIS

LWDESLKPCVKLTPLCVTLNCTNVKGNESDTSEVMKNCSFKATTELKDKKHKVHALF

YKLDVVPLNGNSSSSGEYRLINCNTSAITQACPKVSFDPIPLHYCAPAGFAILKCNN KTF

NGTGPCRNVSTVQCTHGIKPVVSTQLLLNGSLAEEEIIIRSENLTNNAKTIIVHLNE SVNIV

CTRPNNNTRKSIRIGPGQTFYATGDIIGNIRQAHCNINESKWNNTLQKVGEELAKHF PSK

TIKFEPSSGGDLEITTHSFNCRGEFFYCNTSDLFNGTYRNGTYNHTGRSSNGTITLQ CKIK

QIINMWQEVGRAIYAPPIEGEITCNSNITGLLLLRDGGQSNETNDTETFRPGGGDMR DN

WRSELYKYKVVEIKPLGVAPTEAK

[00217] Wild type HIV signal sequence is underlined. Mature N-terminal gD purification Tag is italicized.

[00218] 1086C-rgpl20; not codon optimized (SEQ ID NO:37)

[00219] ATGAGAGTGAGGGGGATATGGAAGAATTGGCCACAATGGTTGATATG

GAGCATCTTAGGCTTTTGGATAGGTAArArGGA GGGCrCGTGGGrCACAG77TACrArG

GAGTGCCTGTGTGGAAAGAAGCAAAAACTACTCTATTCTGTGCATCAGATGCTAAA

GCATATGAGAAAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTGCCCACAGA

TCCCAACCCACAAGAAATGGTTTTGGCAAATGTAACAGAAAATTTTAACATGTGGA

AAAATGATATGGTAGAGCAGATGCATGAGGATATAATTAGTTTGTGGGATGAAAGC

CTGAAGCCATGTGTGAAGTTGACCCCACTCTGTGTCACTTTAAATTGTACAAATGTT

AAAGGGAATGAGAGTGACACCAGTGAAGTAATGAAAAATTGCTCTTTCAAGGCAAC

CACGGAACTAAAGGATAAAAAACATAAGGTGCATGCGCTTTTTTATAAACTTGATG

TAGTACCACTTAATGGAAACAGCAGCAGCTCTGGAGAGTATAGATTAATAAATTGC

AATACCTCAGCCATAACACAAGCCTGTCCAAAGGTCTCTTTTGACCCAATTCCTTTA

CATTACTGTGCACCAGCTGGTTTTGCGATTCTAAAGTGTAATAATAAGACATTCAAT

GGGACAGGACCATGTCGTAATGTCAGCACAGTACAATGTACACATGGAATTAAGCC

AGTGGTATCAACTCAACTACTGTTAAATGGTAGCCTAGCAGAAGAAGAGATAATAA

TTAGATCTGAAAATCTGACAAACAATGCCAAAACAATAATAGTACACCTCAATGAA

TCTGTAAACATTGTGTGTACAAGACCCAATAATAATACAAGAAAAAGTATAAGGAT

AGGACCAGGACAAACATTCTATGCAACAGGTGACATAATAGGAAACATAAGACAG GCACATTGTAACATTAATGAAAGTAAATGGAACAACACTTTACAAAAGGTAGGAGA

AGAATTAGCAAAACACTTCCCTAGTAAAACAATAAAGTTTGAACCATCCTCAGGAG

GGGATCTAGAAATTACAACACATAGCTTTAATTGTAGAGGAGAGTTTTTCTATTGCA

ATACATCAGACCTGTTTAATGGTACATACAGAAATGGTACATACAATCATACAGGA

AGAAGTTCAAATGGAACCATCACCCTCCAATGCAAAATAAAACAAATTATAAACAT

GTGGCAGGAGGTAGGAAGAGCAATATATGCCCCTCCCATTGAAGGAGAAATAACAT

GTAACTCAAATATCACAGGACTACTATTGCTACGTGATGGAGGTCAATCAAATGAA

ACAAATGACACAGAGACATTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGA

GAAGTGAATTATATAAATATAAAGTAGTAGAAATTAAACCATTGGGAGTAGCACCC

ACTGAGGCAAAA

[00220] 1086C-rgpl40 (SEQ ID NO:38)

[00221] MRVRGIWKNWPOWLIWSILGFWIGNMEGS TVyyGVPVWKEAKTTLFC

ASDAKAYEKEVHNVWATHACVPTDPNPQEMVLANVTENFNMWKNDMVEQMHEDIIS

LWDESLKPCVKLTPLCVTLNCTNVKGNESDTSEVMKNCSFKATTELKDKKHKVHALF

YKLDVVPLNGNSSSSGEYRLINCNTSAITQACPKVSFDPIPLHYCAPAGFAILKCNN KTF

NGTGPCRNVSTVQCTHGIKPVVSTQLLLNGSLAEEEIIIRSENLTNNAKTIIVHLNE SVNIV

CTRPNNNTRKSIRIGPGQTFYATGDIIGNIRQAHCNINESKWNNTLQKVGEELAKHF PSK

TIKFEPSSGGDLEITTHSFNCRGEFFYCNTSDLFNGTYRNGTYNHTGRSSNGTITLQ CKIK

QIINMWQEVGRAIYAPPIEGEITCNSNITGLLLLRDGGQSNETNDTETFRPGGGDMR DN

WRSELYKYKVVEIKPLGVAPTEAKRRVVEREKRAVGIGAVFLGFLGAAGSTMGAASM

TLTVQARQLLSGIVQQQSNLLRAIEAQQHMLQLTVWGIKQLQARVLAIERYLKDQQL L

GMWGCSGKLICTTAVPWNSSWSNKSQNEIWGNMTWMQWDREINNYTNTIYRLLEDS

QNQQEKNEKDLLALDSWKNLWNWFDISKWLWYIK

[00222] Wild type HIV signal sequence is underlined. Mature N-terminal gD purification Tag is italicized.

[00223] 1086C-rgpl40 (SEQ ID NO:39)

[00224] ATGAGAGTGAGGGGGATATGGAAGAATTGGCCACAATGGTTGATATG

GAGCATCTT AGGCTTTTGGATAGGTAATATGGAGGGCTCGTGGGTCACAGTTTACTATG

GAGTGCCTGTGTGGAAAGAAGCAAAAACTACTCTATTCTGTGCATCAGATGCTAAA

GCATATGAGAAAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTGCCCACAGA

TCCCAACCCACAAGAAATGGTTTTGGCAAATGTAACAGAAAATTTTAACATGTGGA

AAAATGATATGGTAGAGCAGATGCATGAGGATATAATTAGTTTGTGGGATGAAAGC CTGAAGCCATGTGTGAAGTTGACCCCACTCTGTGTCACTTTAAATTGTACAAATGTT

AAAGGGAATGAGAGTGACACCAGTGAAGTAATGAAAAATTGCTCTTTCAAGGCAAC

CACGGAACTAAAGGATAAAAAACATAAGGTGCATGCGCTTTTTTATAAACTTGATG

TAGTACCACTTAATGGAAACAGCAGCAGCTCTGGAGAGTATAGATTAATAAATTGC

AATACCTCAGCCATAACACAAGCCTGTCCAAAGGTCTCTTTTGACCCAATTCCTTTA

CATTACTGTGCACCAGCTGGTTTTGCGATTCTAAAGTGTAATAATAAGACATTCAAT

GGGACAGGACCATGTCGTAATGTCAGCACAGTACAATGTACACATGGAATTAAGCC

AGTGGTATCAACTCAACTACTGTTAAATGGTAGCCTAGCAGAAGAAGAGATAATAA

TTAGATCTGAAAATCTGACAAACAATGCCAAAACAATAATAGTACACCTCAATGAA

TCTGTAAACATTGTGTGTACAAGACCCAATAATAATACAAGAAAAAGTATAAGGAT

AGGACCAGGACAAACATTCTATGCAACAGGTGACATAATAGGAAACATAAGACAG

GCACATTGTAACATTAATGAAAGTAAATGGAACAACACTTTACAAAAGGTAGGAGA

AGAATTAGCAAAACACTTCCCTAGTAAAACAATAAAGTTTGAACCATCCTCAGGAG

GGGATCTAGAAATTACAACACATAGCTTTAATTGTAGAGGAGAGTTTTTCTATTGCA

ATACATCAGACCTGTTTAATGGTACATACAGAAATGGTACATACAATCATACAGGA

AGAAGTTCAAATGGAACCATCACCCTCCAATGCAAAATAAAACAAATTATAAACAT

GTGGCAGGAGGTAGGAAGAGCAATATATGCCCCTCCCATTGAAGGAGAAATAACAT

GTAACTCAAATATCACAGGACTACTATTGCTACGTGATGGAGGTCAATCAAATGAA

ACAAATGACACAGAGACATTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGA

GAAGTGAATTATATAAATATAAAGTAGTAGAAATTAAACCATTGGGAGTAGCACCC

ACTGAGGCAAAAAGGAGAGTGGTGGAGAGAGAAAAAAGAGCAGTGGGAATAGGA

GCTGTGTTCCTTGGGTTCTTGGGAGCAGCCGGAAGCACTATGGGCGCAGCATCAAT

GACGCTGACGGTACAGGCCAGGCAATTATTGTCTGGTATAGTGCAACAGCAAAGCA

ATTTGCTGAGGGCTATAGAGGCGCAACAGCATATGTTGCAACTCACGGTCTGGGGC

ATTAAACAGCTCCAGGCAAGAGTCCTGGCTATAGAAAGATACCTAAAGGATCAACA

GCTCCTAGGGATGTGGGGCTGCTCTGGAAAACTCATCTGCACCACTGCTGTGCCTTG

GAACTCCAGTTGGAGTAACAAATCTCAAAATGAAATTTGGGGGAACATGACCTGGA

TGCAGTGGGACAGAGAAATTAATAATTACACAAACACAATATATAGGTTACTTGAA

GACTCACAAAACCAGCAGGAAAAAAATGAGAAAGATTTGTTAGCATTGGACAGTTG

GAAAAATCTGTGGAATTGGTTTGACATATCAAAGTGGCTGTGGTATATAAAA

[00225] Wild type HIV signal sequence encoding sequence is underlined. Mature N- terminal gD purification Tag encoding sequence is italicized. [00226] CAP45.2.00.G3-rgpl20; not codon optimized (SEQ ID NO:40)

[00227] MRVRGILRNWPQWWIWSILGFWMLIICRVMGNL 7YyyGVPVWKEAK

ATLFCASDARAYEKEVHNVWATHACVPTDPNPQEIYLGNVTENFNMWKNDMVDQMH

EDIISLWDQSLKPCVKLTPLCVTLRCTNATINGSLTEEVKNCSFNITTELRDKKQKA YAL

FYRPDVVPLNKNSPSGNSSEYILINCNTSTITQACPKVSFDPIPIHYCAPAGYAILK CNNKT

FNGTGPCNNVSTVQCTHGIKPVVSTQLLLNGSLAEEDIIIKSENLTNNIKTIIVHLN KSVEI

VCRRPNNNTRKSIRIGPGQAFYATNDIIGDIRQAHCNINNSTWNRTLEQIKKKLREH FLN

RTIEFEPPSGGDLEVTTHSFNCGGEFFYCNTTRLFKWSSNVTNDTITIPCRIKQFIN MWQG

AGRAMYAPPIEGNITCNSSITGLLLTRDGGKTDRNDTEIFRPGGGNMKDNWRNELYK Y

KVVEIKPLGVAPTEARRRVVEREKR

[00228] Wild type HIV signal sequence is underlined. Mature N-terminal gD purification Tag is italicized.

[00229] CAP45.2.00.G3-rgpl20 (SEQ ID NO:41)

[00230] ATGAGAGTGAGGGGGATACTGAGGAATTGGCCACAATGGTGGATATG

GAGCATCTTAGGCTTTTGGATGCTAATAATTTGTAGGGTGATGGGGAA CTTGTGGGT

CA CAGTCTA TTA rGGGGTACCTGTGTGGA A AG A AGC A AA AGCT ACTCTATTCTGTGC A

TCAGATGCTAGAGCATATGAGAAAGAAGTGCATAATGTCTGGGCTACACATGCCTG

TGTACCCACAGACCCCAACCCACAAGAAATATACTTGGGAAATGTAACAGAAAATT

TTAACATGTGGAAAAATGACATGGTGGATCAGATGCATGAGGATATAATCAGTTTA

TGGGATCAAAGTCTAAAGCCATGTGTAAAGTTGACCCCACTCTGTGTCACTTTAAGG

TGTACAAATGCTACTATTAATGGTAGCCTGACGGAAGAAGTAAAAAATTGCTCTTTC

AATATAACCACAGAGCTAAGAGATAAGAAACAGAAAGCGTATGCACTTTTTTATAG

ACCTGATGTAGTACCACTTAATAAGAATAGCCCTAGTGGGAATTCTAGTGAGTATAT

ATTAATAAATTGCAATACCTCAACCATAACACAAGCCTGTCCAAAGGTCTCTTTTGA

CCCAATTCCTATACATTATTGTGCTCCAGCTGGTTATGCGATTCTAAAGTGTAATAA

TAAGACATTTAATGGGACAGGACCATGCAATAATGTCAGCACAGTACAATGTACAC

ATGGAATTAAACCAGTGGTATCAACTCAACTACTGTTAAATGGTAGCTTAGCAGAA

GAAGATATCATAATTAAATCTGAAAATCTGACAAACAATATCAAAACAATAATAGT

ACACCTTAATAAATCTGTAGAAATTGTGTGTAGAAGACCCAACAATAATACAAGGA

AAAGTATAAGGATAGGACCAGGACAGGCTTTCTATGCAACAAATGACATAATAGGA

GACATAAGACAAGCACATTGTAATATTAATAATTCTACATGGAACAGAACTTTAGA

ACAGATAAAGAAAAAATTAAGAGAACACTTCCTTAATAGAACAATAGAATTTGAAC CACCCTCAGGGGGGGATCTAGAAGTTACAACACATAGCTTTAATTGTGGAGGAGAA

TTTTTCTATTGCAATACAACACGACTGTTTAAGTGGTCTAGTAATGTCACAAACGAC

ACAATCACAATCCCATGCAGAATAAAACAATTTATAAACATGTGGCAAGGGGCAGG

ACGAGCAATGTATGCCCCTCCCATTGAAGGAAACATAACATGTAACTCAAGTATCA

CAGGACTCCTATTGACACGTGATGGAGGGAAAACAGACAGGAATGACACAGAGAT

ATTCAGACCTGGAGGAGGAAATATGAAGGACAATTGGAGAAATGAATTATATAAAT

ATAAAGTGGTAGAAATTAAGCCATTGGGAGTAGCACCCACTGAGGCAAGAAGGAG

AGTGGTGGAGAGAGAAAAAAGA

[00231] Wild type HIV signal sequence is underlined. Mature N-terminal gD purification Tag is italicized.

[00232] CAP45.2.00.G3- _rgpl40 (SEQ ID NO:42)

[00233] MRVRGILRNWPOWWrWSILGFWMLIICRVMGNL 7YyyGVPVWKEAK

ATLFCASDARAYEKEVHNVWATHACVPTDPNPQEIYLGNVTENFNMWKNDMVDQMH

EDIISLWDQSLKPCVKLTPLCVTLRCTNATINGSLTEEVKNCSFNITTELRDKKQKA YAL

FYRPDVVPLNKNSPSGNSSEYILINCNTSTITQACPKVSFDPIPIHYCAPAGYAILK CNNKT

FNGTGPCNNVSTVQCTHGIKPVVSTQLLLNGSLAEEDIIIKSENLTNNIKTIIVHLN KSVEI

VCRRPNNNTRKSIRIGPGQAFYATNDIIGDIRQAHCNINNSTWNRTLEQIKKKLREH FLN

RTIEFEPPSGGDLEVTTHSFNCGGEFFYCNTTRLFKWSSNVTNDTITIPCRIKQFIN MWQG

AGRAMYAPPIEGNITCNSSITGLLLTRDGGKTDRNDTEIFRPGGGNMKDNWRNELYK Y

KVVEIKPLGVAPTEARRRVVEREKRAVGIGAVLLGFLGAAGSTMGAASITLTVQARQ LL

SGIVQQQSNLLRAIEAQQHMLQLTVWGIKQLQTRVLAIERYLKDQQLLGLWGCSGKL I

CTTNVPWNSSWSNKSQTDIWDNMTWIQWDREISNYSNTIYKLLEGSQNQQEQNEKDL L

ALDSWNNLWNWFNITNWLWYIK

[00234] Wild type HIV signal sequence is underlined. Mature N-terminal gD purification Tag is italicized.

[00235] CAP45.2.00.G3-rgpl40 (SEQ ID NO:43)

[00236] ATGAGAGTGAGGGGGATACTGAGGAATTGGCCACAATGGTGGATATG GAGCATCTTAGGCTTTTGGATGCTAATAATTTGTAGGGTGATGGGGAA CTTGTGGGT CA CAGTCTA TTA rGGGGTACCTGTGTGGA A AG A AGC A AA AGCT ACTCTATTCTGTGC A TCAGATGCTAGAGCATATGAGAAAGAAGTGCATAATGTCTGGGCTACACATGCCTG TGTACCCACAGACCCCAACCCACAAGAAATATACTTGGGAAATGTAACAGAAAATT TTAACATGTGGAAAAATGACATGGTGGATCAGATGCATGAGGATATAATCAGTTTA TGGGATCAAAGTCTAAAGCCATGTGTAAAGTTGACCCCACTCTGTGTCACTTTAAGG

TGTACAAATGCTACTATTAATGGTAGCCTGACGGAAGAAGTAAAAAATTGCTCTTTC

AATATAACCACAGAGCTAAGAGATAAGAAACAGAAAGCGTATGCACTTTTTTATAG

ACCTGATGTAGTACCACTTAATAAGAATAGCCCTAGTGGGAATTCTAGTGAGTATAT

ATTAATAAATTGCAATACCTCAACCATAACACAAGCCTGTCCAAAGGTCTCTTTTGA

CCCAATTCCTATACATTATTGTGCTCCAGCTGGTTATGCGATTCTAAAGTGTAATAA

TAAGACATTTAATGGGACAGGACCATGCAATAATGTCAGCACAGTACAATGTACAC

ATGGAATTAAACCAGTGGTATCAACTCAACTACTGTTAAATGGTAGCTTAGCAGAA

GAAGATATCATAATTAAATCTGAAAATCTGACAAACAATATCAAAACAATAATAGT

ACACCTTAATAAATCTGTAGAAATTGTGTGTAGAAGACCCAACAATAATACAAGGA

AAAGTATAAGGATAGGACCAGGACAGGCTTTCTATGCAACAAATGACATAATAGGA

GACATAAGACAAGCACATTGTAATATTAATAATTCTACATGGAACAGAACTTTAGA

ACAGATAAAGAAAAAATTAAGAGAACACTTCCTTAATAGAACAATAGAATTTGAAC

CACCCTCAGGGGGGGATCTAGAAGTTACAACACATAGCTTTAATTGTGGAGGAGAA

TTTTTCTATTGCAATACAACACGACTGTTTAAGTGGTCTAGTAATGTCACAAACGAC

ACAATCACAATCCCATGCAGAATAAAACAATTTATAAACATGTGGCAAGGGGCAGG

ACGAGCAATGTATGCCCCTCCCATTGAAGGAAACATAACATGTAACTCAAGTATCA

CAGGACTCCTATTGACACGTGATGGAGGGAAAACAGACAGGAATGACACAGAGAT

ATTCAGACCTGGAGGAGGAAATATGAAGGACAATTGGAGAAATGAATTATATAAAT

ATAAAGTGGTAGAAATTAAGCCATTGGGAGTAGCACCCACTGAGGCAAGAAGGAG

AGTGGTGGAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTGTACTCCTTGGGTTCT

TGGGAGCAGCAGGAAGCACTATGGGCGCGGCGTCAATAACGCTGACGGTACAGGC

CAGGCAACTGTTGTCTGGTATAGTGCAACAGCAAAGCAATTTGCTGAGAGCTATAG

AGGCGCAACAGCACATGTTGCAACTCACGGTCTGGGGCATTAAGCAGCTCCAGACA

AGAGTCCTGGCTATAGAAAGGTACCTAAAGGATCAACAGCTCCTAGGGCTTTGGGG

CTGCTCTGGAAAACTCATCTGCACCACTAATGTGCCTTGGAACTCCAGTTGGAGTAA

TAAATCTCAAACAGATATTTGGGATAACATGACCTGGATACAGTGGGATAGAGAAA

TTAGTAATTACTCAAACACAATATACAAGTTGCTTGAAGGCTCGCAAAATCAGCAG

GAGCAAAATGAAAAAGACTTATTAGCATTGGACAGTTGGAATAATCTGTGGAATTG

GTTCAACATAACAAATTGGCTGTGGTATATAAAA

[00237] Wild type HIV signal sequence encoding sequence is underlined. Mature N- terminal gD purification Tag encoding sequence is italicized.

[00238] As noted herein, the HIV envelope gp may be expressed with a tag at the N- terminus and/or the C-terminus. Sequences of exemplary tags are provided: [00239] Herpes simplex virus I glycoprotein D ss (gD-1 ss) (SEQ ID NO:44)

MGGAAARLGAVILFVVIVGLHGVRG.

[00240] Fruit bat herpes simplex virus glycoprotein D ss (FBgD-1 ss) (SEQ ID NO:45) MAYPAVIVLVCGLFWVPATQG.

[00241] Intracellular adhesion molecule ss (ICAM-1 ss) (SEQ ID NO:46)

MAPS S PRPALPALLVLLG ALFPGPGNA.

[00242] Tissue plasminogen activator ss (TP A ss) (SEQ ID NO:47)

MDAMKRGLCCVLLLCGAVFVSPSQEIHARFRRGARW.

[00243] gD-1 tag (SEQ ID NO:48) KYALADASLKMADPNRFRGKDLPVLDQ

[00244] FBgD- 1 tag (SEQ ID NO:49) YVRADPSLSMVNPNRFRGGHLPPLVQQ

[00245] HIV gpl20 tag (SEQ ID NO:50) TDNLWVTVYYG

[00246] 6X His tag (SEQ ID NO:51) HHHHHH

[00247] Avi tag (SEQ ID NO: 52) GLNDIFEAQKIEWHE

[00248] Strep-Tactin (Strep) tag (SEQ ID NO:53) WSHPQFEK

[00249] His-Strep tag (SEQ ID NO: 54) HHHHHHSSWSHPQFEK

[00250] His-Strep-6X His tag (C-terminus) (SEQ ID NO:55)

HHHHHHSSWSHPQFEKSSHHHHHH

[00251] His-Strep-His (HSH) tag (N-terminus) (SEQ ID NO: 56)

HHHHHHWS HPQFEKHHHHHHQS G

[00252] As noted herein, HIV env gp can be expressed with or without the following sequence at the C-terminus. (SEQ ID NO:57). This sequence includes location of basic residues that are targets for furin and trypsin like enzymes. Translational stop codons for C-terminal purification tags can be incorporated at the beginning to this sequence. If a C- terminal purification tag is included, then stop codon can be inserted at either the beginning or end of the sequence.

[00253] As noted herein, HIV env gp can be expressed with or without the following sequence at the C-terminus: YY^EK YG GAVFLGFLGA (SEQ ID NO: 58). Dotted line ( ): This sequence includes location of basic residues that are targets for furin and trypsin like enzymes. Translational stop codons for C-terminal purification tags can be incorporated at the beginning to this sequence. If a C-terminal purification tag is included, then stop codon can be inserted at either the beginning or end of the sequence. Broken line ( ):

C-terminal or 3' sequences not required for expression. EXAMPLES

[00254] Below are examples of specific embodiments for carrying out the present invention. The examples are offered for illustrative purposes only, and are not intended to limit the scope of the present invention in any way.

[00255] Efforts have been made to ensure accuracy with respect to numbers used (e.g., amounts, temperatures, etc.), but some experimental error and deviation should, of course, be allowed for.

Example 1

Generation of Mgatl " CHO-S Cell Line

[00256] This report describes the use of the CRISPR/Cas9 gene editing system to inactivate the Mannosyl (Alpha- l,3-)-Glycoprotein Beta-1,2-N Acetylglucosaminyltransferase (Mgatl) gene in CHO cells for the purpose of creating a stable cell line, with growth properties suitable for biopharmaceutical production, for the purpose of producing HIV envelope proteins for use as vaccine immunogens.

[00257] It is widely believed that for an HIV vaccine to be successful, it needs stimulate the formation of broadly neutralizing antibodies (bNAbs). After more than 30 years of research none to the candidate vaccines developed to date are able to elicit these types of antibodies. For many years the specificity of bNAbs was unknown. Over the last few years advancements in B- cell cloning technology have allowed the isolation of broadly neutralizing monoclonal antibodies (bN-mAbs) from HIV infected humans. Surprisingly, many of these were found to recognize glycan dependent epitopes that required specific types of N-linked glycosylation for binding. The N-linked glycans required are high-mannose forms, primarily mannose-5 and mannose-9 that normally are early intermediates in the N-linked glycosylation pathway. These glycans differ from the normal complex, sialic acid containing carbohydrates found in mature membrane bound and secreted proteins. The fact that virtually all previous HIV vaccines possessed the normal type of complex glycosylation may explain their inability to elicit glycan dependent bNAbs. Genetic techniques were used to create cell lines that incorporate these early intermediate glycoforms (mannnose-5, mannose-8, and mannose-9) at N-linked glycosylation sites in all cellular proteins as well as heterologous proteins such as HIV envelope proteins. Disclosed herein is the use of the CRISPR/Cas9 gene editing system to knockout the Mannosyl (Alpha-l,3-)-Glycoprotein Beta- 1,2-N- Acetylglucosaminyltransferase (Mgatl) gene of the Chinese hamster Ovary (CHO) cells to produce CHO cells suitable for biopharmaceutical production. This mutation prevents the processing of N-linked glycans beyond the mannnose-5 (Mans) form, enabling the production of envelope proteins with high level of glycosylation with mannose N-glycans. Monomeric gpl20 produced by transient transfection of this cell line binds the prototypic glycan dependent bNAb PG9. Taking advantage of the robust productivity of CHO cells, this line has been established for the development of HIV-1 vaccine antigens as well as other vaccines, diagnostic, and therapeutic products requiring the incorporation of mostly mannose glycans.

Materials and Methods

[00258] Cells and antibodies. Suspension adapted CHO-S and 293 HEK Freestyle cells were obtained from Thermo Fisher (Thermo Fisher, Life technologies, Carlsbad, CA). HEK GNT1 " cells were obtained from ATCC (ATCC, Manassas, VA). Broadly neutralizing monoclonal antibody PG9 was produced from synthetic genes created on the basis of published sequence data (available from the NIH AIDS Reagent Program, Germantown, MD). The antibody genes were expressed in 293 HEK cells using standard techniques. Polyclonal rabbit sera was from rabbits immunized using a Complete Freund's Adjuvant/Incomplete Freund's Adjuvant (CFA/IFA) protocol (Pocono Rabbit Farms, AAALAC #926, Canadensis, PA) with A244rgpl20 produced in GNT1- HEK293 cells. Fluorescently conjugated anti-Human, anti- Rabbit, and anti-Murine antibodies were obtained from Invitrogen (Invitrogen, Thermo Fisher, Carlsbad, CA)

[00259] Cell culture conditions. Stocks of suspension adapted CHO-S, 293 HEK, and

GNT1 " cells were maintained in shake flasks (Corning, Corning NY) using a Kuhner ISF1-X shaker incubator (Kuhner, Birsfelden, Switzerland). For normal cell propagation shake flasks cultures were maintained at 37°C, 8% CO2, and 125 rpm. Static cultures were maintained in 96 or 24 well cell culture dishes and grown in a Sanyo incubator (Sanyo, Moriguchi, Osaka, Japan) at 37 °C and 8% C0 2 .

[00260] Cell culture media. For normal CHO cell growth, cells were maintained in CD-

CHO medium supplemented with 0.1% pluronic acid, 8mM GlutaMax and IX

Hypoxanthine/Thymidine (Thermo Fisher, Life Technologies, Carlsbad, CA). 293 HEK (Freestyle) and GNT1 " 293 HEK cells were maintained in Freestyle 293 cell culture media (Life Technologies, Carlsbad, CA). For CHO cell protein production the cells were maintained in OptiCHO medium supplemented with 0.1% pluronic acid, 8mM GlutaMax and IX H/T (Thermo Fisher, Life Technologies, Carlsbad, CA). For protein production experiments the growth medium was supplemented with MaxCyte CHO A Feed (0.5% Yeastolate, BD, Franklin Lakes, NJ; 2.5% CHO-CD Efficient Feed A; and 0.25mM GlutaMAX, 2 g/L Glucose (Sigma- Aldrich St. Louis, MO). [00261] Cell counts and growth calculation. All cell counts were performed using a

TC20™ automated cell counter (BioRad, Hercules, CA ) with viability determined by trypan blue (Thermo Fisher, Life Technologies, Carlsbad, CA) exclusion. Cell-doubling time in hours was calculated using the formula: (((time2-timel) x 24) x log(2)) / (log(density2)-log(densityl)).

[00262] Gene sequencing. The CHO Mgatl gene sequence was confirmed using predicted mRNA transcript XM_007644560.1 to design primers. Genomic DNA was extracted using Qiagen AllPrep kit (Qiagen, Germantown, MD). The Mgatl gene was PCR amplified using the primers F CAGGCAAGCCAAAGGCAGCCTTG (SEQ ID NO: 59) and

R_CTCAGGGACTGCAGGCCTGTCTC (SEQ ID NO: 60) (Eurofins Genomics, Louisville, KY) with Taq and dNTPs supplied by New England BioLabs (Ipswich, MA). The PCR product was gel purified using a Zymoclean kit (Zymo Research, Irvine, CA), then sequenced by Sanger method at the UC Berkeley Sequencing Center (UC Berkeley, Berkeley, CA). Mgatl knockouts were sequenced in the same manner.

[00263] CRISPR/Cas9 target design and plasmid preparation. Target sequences to knock out CHO Mgatl were designed using an online CRISPR RNA Configurator tool (GE

Dharmacon, Lafayette, CO). Target 1 : CCCTGGAACTTGCGGTGGTC (SEQ ID NO: 61), target 2: GGGCATTCCAGCCCACAAAG (SEQ ID NO: 62), and target 3:

GGCGGAACACCTCACGGGTG (SEQ ID NO: 63). Each sequence was run in NCBI's BLAST tool for homologies with off-target sites in the CHO genome. Single stranded DNA oligonucleotides and their complement strands were synthesized (Eurofins Genomics,

Louisville, KY) with extra bases on the 3 ' ends for ligation into Gene Art CRISPR nuclease vector (Thermo Fisher, GeneArt). The strands were ligated and annealed into GeneArt CRISPR vector using the protocol and reagents supplied with the kit. One Shot® TOP10 Chemically Competent E. coli were transformed and plated following the Invitrogen protocol (Thermo Fisher, Invitrogen, Carlsbad, CA) Five colonies from each target plate were picked the following day. These were incubated in 5mL LB broth at 37 °C and 225rpm overnight. Minipreps were performed using according to manufactures instructions (Qiagen, Germantown, MD) and sent to UC Berkeley DNA Sequencing Facility (Berkeley, CA) with U6 primers included in GeneArt® CRISPR kit to confirm successful integration of guide sequences via Sanger sequencing. A single 500mL MaxiPrep was performed for each of the three target sequences using PureLink tm MaxiPrep kit (Thermo Fisher, Invitrogen, Carlsbad, CA).

[00264] Electroporation. Electroporation was performed using a MaxCyte STX scalable transfection system (MaxCyte Inc., Gaithersburg, MD) according to the manufacturer's instructions. Briefly, CHO-S cells were maintained at >95% viability prior to transfection. All steps were performed using aseptic technique. Cells were pelleted at 250g for 10 minutes, and then re-suspended in MaxCyte EP buffer (MaxCyte Inc., Gaithersburg, MD) at a density of 2xl0 8 cells/mL. Transfections were carried out in the OC-400 processing assembly (MaxCyte Inc., Gaithersburg, MD) with a total volume of 400μί and 8xl0 7 total cells. Crispr/Cas9 exonuclease with guide sequence plasmid DNA, in endotoxin-free water was added to the cells in EP buffer for a final concentration of 300μg of DNA/mL. The processing assemblies were then transferred to the MaxCyte STX electroporation device and appropriate conditions (CHO protocol) were selected using the MaxCyte STX software. Following completion of

electroporation, the cells in Electroporation buffer were removed from the processing assembly and placed in 125mL Erlenmeyer cell culture shake flasks (Corning, Corning NY). The flasks were placed into 37°C incubators with no agitation for 40 minutes. Following the rest period pre- warmed OPTI-CHO media was added to the flasks for a final cell density of 4xl0 6 cells/mL. Flasks were then moved the Kuhner shaker and agitated at 125rpm.

[00265] Plating, expansion, and culture ofCRISPR transfected CHO-S cells. 14 hours post transfection a ΙΟΟμί aliquot was taken from each of the transfected pools for cell viability counts and to check for orange fluorescent protein expression using a light microscope (Zeiss Axioskop 2, Zeiss, Jena, Germany). 96 well flat bottom cell culture plates (Corning, Corning, NY) were filled with 50μΙ. of conditioned CD-CHO media. Each of the three transfected pools were serially diluted with warmed media to 10 cells/mL and added to five plates per pool in 50μΙ. volumes. Final calculated cell density was 0.5 cells/well in ΙΟΟμί of media. Once any single-colony well reached «20% confluency, the contents were moved to a 24 well cell culture plate (Corning, Corning, NY) in 500uL of media. When confluency reached 50%, a 200μί aliquot was removed for testing via a GNA lectin-binding assay. Following successful lectin binding, cells were moved to a 6 well cell culture plate (Corning, Corning, NY) with 2mL of media per well. After 5 days of growth in 6 well plates, GNA assay was repeated. Those colonies that still showed uniform lectin binding to all cells were moved to 125mL shake flasks with an initial 6mL of media. Daily counts were taken and cell culture expanded to maintain 0.3xl0 6 to l.OxlO 6 cells/mL density.

[00266] Lectin binding assay. Fluorescein labeled Galanthus nivalis lectin, GNA (Vector Laboratories, Burlingame, CA), was used to probe for the expression of Mans glycoforms on the cell surfaces. 200μί samples from 24 well plate wells were spun down at 3000rpm for 3 minutes. The supernatant was discarded and the cell pellet washed three successive times with 500μί of ice-cold 10μΜ EDTA (Boston BioProducts, Ashland, MA) PBS (Thermo Fisher, Gibco, Carlsbad, CA). Following the final wash, the cell pellet was re-suspended in 200μί ice cold ΙΟμΜ EDTA PBS with 5μg/mL of GNA-fluorescein. Samples were incubated with GNA in dark, on ice, for 30 minutes. Following incubation, samples were washed three times and re- suspended to a volume of 50μ1 in ΙΟμΜ EDTA PBS. Samples were then examined under light microscope (Zeiss Axioskop 2, Zeiss, Jena, Germany) with 495nm excitation. Wild type CHO-S cells were used as a negative control and HEK Gntl were used as a positive. Representative images were taken on a Leica DM5500 B Widefield Microscope (Leica Microsystems, Buffalo Grove, IL) at the UC Santa Cruz microscopy center.

[00267] Small scale gpl20 test trans fection. 4xl0 5 cells of each candidate line were placed in 450μ1 of media in a 24 well cell culture plate. In 1.7μ1 of Fugene (Promega, Madison, WI) was pre-incubated at room temperature for 30 minutes with 550ng of DNA in a total volume of 50μΙ, of media. Following an incubation period, 50μΙ, of the Fugene/DNA mixture was added to each well, for a final transfected volume of 500μί. Aliquots of supernatant were removed for testing 72 hours post transfection.

[00268] Experimental protein production. Cells were electroporated following the above method. 24 hours post electroporation, the culture was supplemented a single time with lmM sodium butyrate (Thermo Fisher, Life Technologies, Carlsbad, CA) and the temperature lowered to 34°C. Production culture was fed daily equivalent to 3.5% of the original volume with MaxCyte CHO A Feed. Cultures were run until viability dropped below 50%. Supernatant was harvested by pelleting the cells at 250 g for 30 minutes followed by pre-filtration through Nalgene™ Glass Pre-filters (Thermo Scientific, Waltham, MA) and 0.45 micron SFCA filtration Nalgene (Thermo Scientific, Waltham, MA), then stored frozen at -20°C before purification.

[00269] Protein purification. Proteins were purified using an N-terminal affinity tag as previously described (Yu, B. et al., 2012).

[00270] Glycosidase Digestion and SDS-PAGE. Endo H and PNGase F (New England

BioLabs, Ipswich, MA) digests were performed per the manufacturer's protocol on 5 μg of purified protein using on unit of glycosidase. Digested samples were run on NuPAGE (Thermo Fisher, Invitrogen, Carlsbad CA) 4-12% BisTris precast gels in MES running buffer then stained with SimplyBlue stain (Thermo Fisher, Invitrogen, Carlsbad, CA). Western blot analysis primary antibody was in-house 34.1 anti-gD flag mAb and secondary was HRP conjugated goat anti-mouse IgG (American Qualex, San Clemente, CA). Substrate was WesternBright ECL (Advansta, Menlo Park, CA).

[00271] Isoelectric focusing. Isoelectric focusing was performed using Ready Prep tm 2-D kit (Bio-Rd Laboratories, Hercules, CA). 50 μg of proteins were mixed with 150uL IEF sample buffer. 4 μΐ of two internal weight standards were added: carbonic anhydrase isozyme (pi =5.9, 29kDA) and Amyloglucosidase (pi =3.6, 97 kDa) (Sigma-Aldrich, St. Louis, MO). The protein mixture was loaded onto a ReadyStrip™ IPG strip (pH 3-10, 11 cm) and separated by a preset protocol on a Protean® IEF Cell. Following first dimensional separation, the strips were loaded, along with a molecular weight marker (Novex® Sharp prestained standard, Invitrogen) onto a 4- 20% polyacrylamide TRIS HCL gel (BIO-RAD, Hercules, CA) and run for 1 hour at 225 V. The gels were then stained with SimplyBlue™ SafeStain (Invitrogen).

[00272] Fluorescence intensity assays (FIA). A semi-automated fluorescence

immunoassay (FIA) was used to measure the binding of polyclonal or monoclonal antibodies to recombinant envelope proteins. For antibody binding to purified proteins, Greiner Fluortrac 600 microtiter plates (Greiner Bio-one, Germany) were coated with 2μg/mL of peptide overnight in PBS with shaking. Plates were blocked in PBS + 2.5% BSA (blocking buffer for 90 min, then washed 4 times with PBS containing 0.05% Tween-20 (Sigma). Serial dilutions of PG9 were added in a range from lOug/mL to 0.000 lug/mL, then incubated at 25°C for 90min with shaking. After incubation and washing, fluorescently conjugated anti-Hu or anti-Mu (Invitrogen, CA) was added at a 1:3000 dilution. Plates were incubated for 90 minutes with shaking then washed three times with 0.05% tween PBS using an automated plate washer. Plates were then imaged in a plate spectrophotometer (Envision System, Perkin Elmer) at excitation (ex395nm) and emission (em490nm).

[00273] For antibody binding to unpurified culture supernatant, Greiner Fluortrac 600 microtiter plates (Greiner Bio-one, Germany) were coated with 2μg/mL of purified monoclonal antibody (Berman lab, anti gD tag 34.1 or anti V2 peptide 10C10) overnight in PBS with shaking. Plates were blocked in PBS + 2.5% BSA (blocking buffer for 90 min, then washed 4 times with PBS containing 0.05% Tween-20 (Sigma). 150μ1 of 40x diluted supernatant were then added to each well or 10μg/mL of purified protein in control lanes, then incubated at 25°C for 90min with shaking. After incubation and washing, PG9 was added in a range from 10 μg/mL to 0.000 \\xglmL, then incubated at 25 ° C for 90min with shaking. After incubation and washing, fluorescently conjugated anti-Hu or anti-Mu (Invitrogen, CA) was added at a 1 :3000 dilution. Plates were incubated for 90 minutes with shaking then washed three times with 0.05% TWEEN® PBS using an automated plate washer. Plates were then imaged in a plate

spectrophotometer (Envision System, Perkin Elmer, Waltham, MA) at excitation (ex395nm) and emission (em490nm).

[00274] All steps except coating were carried out at room temperature on a shaking platform; incubation steps were 90 min on a shaking platform. All dilutions were done in blocking buffer (1% BSA in PBS with 0.05% Normal Goat Serum). Polyclonal rabbit sera was from rabbits immunized using a CFA/IF protocol (Pocono Rabbit Farms) with A244rgpl20 produced in GNT1-/- HEK293 cells.

[00275] Glycan composition analysis by MALDI-TOF-MS. Glycan analysis by mass spectrometry was performed by the Complex Carbohydrate Research Center at the University of Georgia (Athens, GA). Glycans were released from HIV-1 envelope proteins with PNGase F, permethylated, than analyzed by MALDI-TOF-MS.

[00276] MVM infectivity assay. MVM infectivity assay was performed by IDEXX BioResearch (Columbia, MO). Cells were cultured at 4xl0 5 cells/mL, in lOOmL total volume under conditions described above in a spinner flask for five days. Wild type CHO-S and MGAT1 cells were infected with 1 MOI of MVMp or MVMi and evaluated in triplicate. 5mL aliquots were removed on days 1, 3, and 5, and cells were pelleted by centrifugation and stored at -20°C. Day 5 samples were evaluated by PCR for MVM and 18S using proprietary primers. qPCR crossing point (CP) values were reported and copies based upon standard curves.

Results

[00277] Tarset desisn and cleavage of CHO-S MGAT1. CRISPR/Cas9 allows for specific targeting of genes for knockout or modification by introducing double stranded breaks (DSB) followed by non-homologous end joining (NHEJ) or homology directed repair (HDR). The details of CRISPR/Cas9, NHEJ, and HDR have been covered in a number of review articles (Hsu, P.D. et al., Cell. 157(6): p. 1262-1278; Sander, J.D. and J.K. Joung, Nat Biotech, 2014. 32(4): p. 347-355). GeneArt ® CRISPR Nuclease Vector with OFP Reporter allows contains all the elements needed for gene knockout given a well-designed target sequence. A target specific double stranded guide sequence is ligated into the vector between a U6 promoter and a tracrRNA sequence. The same plasmid encodes the Cas9 endonuclease and an orange fluorescent protein reporter separated by a self-cleaving 2A peptide linker (Figure 2). Following ligation of these guide sequences into the vector they were transfected into CHO-S cells using the MaxCyte electroporation system. This electroporation allows near 100% transfection, even with large plasmids, increasing the odds of finding successful knockouts in a given population. Targets 1 and 2 were introduced individually, and target 3 plasmid was mixed and added together in equal ratio with target 2, creating separate pools of transfected cells. Twenty-four hours post transfection samples from each of the three conditions were serially diluted and spread across five 96-well flat-bottoms plates at a calculated density of 0.5 cells per well. The plates were examined daily, any well with more than a single colony was discarded. Across the fifteen total plates, between fifteen and thirty wells per plate contained single viable colonies. Upon reaching approximately 20% confluency, those were expanded to 24-well plate wells in 500μί of media, taking between twelve and fifteen days to pass. Those that did not have at least several dozen cells by day fifteen were discarded. A total of 166 colonies were expanded to 24 well plates: 55 from target 1 pool, 67 from target 2 pool, and 44 from combined target 2/3 pool.

[00278] Lectin binding assay. If Mgatl was successfully knocked out then any N-linked glycoprotein expressed by the cell should have exclusively high mannose glycans with a preponderance of Mans isoforms. To determine successful knockout of Mgatl at a phenotypic level, a fluorescein-conjugated Galanthus nivalis lectin (GNA - also known as GNL, Vector Laboratories, Burlingame, CA) was used. GNA is an unusual lectin in that it does not require a Ca 2+ or Mg 2+ cofactor to bind, allowing the use of 10 μΜ EDTA to ameliorate cell clumping during repeated centrifugation and wash steps.

[00279] A total of 20 candidate lines from the original 166 showed uniformly high GNA binding and were chosen for expansion and further analysis. This represents a potentially successful knockout rate of 12%, though many colonies were rejected early due to slow growth in the 96 well plates, so the overall rate may have been higher. Three days following initial GNA selection, the cell line candidates were re-examined and six were rejected for lack of uniform lectin binding across the sample population, leaving 14 candidates.

[00280] Cell growth and expression of full length gpl20 and VI V2 fragments. The fourteen candidate cell lines were grown in 125mL shaker flasks for two weeks with cell counts taken daily. At the end of this period the four lines with the shortest average population doubling time were transiently transfected with a full-length gpl20 gene (A244) (SEQ ID NO:4) and a V1/V2 Env fragment also from A244 protein via electroporation. Five days post transfection, the proteins were purified by affinity chromatography and were tested via FIA for their ability to bind the PG9 bNAb that requires mannoses for binding (Figures 10A and 10B). This assay identified the highest protein producer of the four lines, and confirmed the cell lines could produce envelope proteins with the correct glycans required to bind PG9. Material produced using wild type CHO-S and HEK Gntl was used as a comparator for both quantification and a PG9 high mannose binding baseline. From this analysis, a single Mgatl- CHO cell line, designated 3.4F10, was selected for further characterization and analysis.

[00281] Identification of CRISPR/Cas9 induced genetic alteration. Up until this point all the analysis on the putative Mgatl " cell lines had been phenotypic. To confirm that Mgatl had been altered to the point of non-functionality on a genetic level, the Mgatl gene from the 3.4F10 line as well as the Mgat 1 gene from the next three best candidates were sequenced. In 3.4F10, an extra thymidine had been inserted at the cleavage site, introducing a frame shift mutation, leading to 23 altered codons and a premature stop (Figure 6B). 3.5D8 has the same mutation, while 3.5D9 and 3.5A2 both had in frame deletions of 24 and 30 nucleotides respectively. The deleted codons of 3.5D9 and 3.5A2 corresponded to the transmembrane domain of the Gntl protein, leaving the active domain intact. This may explain why the envelope protein produced in these lines did not bind PG9 while 3.4F10 produced envelope did.

[00282] Characterization of CHO-S Mgatl ' gpl20 glycosylation. To fully characterize the lead CHO-S Mgatl " cell line (hereafter, simply referred to as Mgatl) glycosylation as high mannose the following assays were performed: Glycosidase digestion, 2-D isoelectric focusing, and mass spec analysis. Affinity purified, monomeric A244 gpl20 produced in CHO-S, GnTl-, 293 HEK, and Mgatl " cells. These were digested overnight by PnGase F and Endo H that removes only high mannose glycans. The digest products were then separated on an SDS-PAGE gel and stained with Coomassie blue (Figures 8A-8C). As expected, the proteins expressed in normal CHO and 293 cells were only partially sensitive to Endo H, whereas the proteins produced in the GNT1- 293 and Mgatl- CHO cells were about 20kD smaller than the CHO-S material, due to the lower mass of Mans glycan structures. Endo H cleaves N-linked high- mannose glycan structures, while complex glycans are insensitive to it. Following Endo H digestion the CHO-S material is largely unaltered, but both the Mgatl and Gntl products are reduced to «60kd in size. This is consistent with the observation that approximately half the mass of a given gpl20 molecule is from glycosylation (Binley, J.M., et al., Journal of Virology, 2010. 84(11): p. 5637-5655; Zhu, X., et al., Biochemistry, 2000. 39(37): p. 11194-11204; Go, E.P., et al., Journal of proteome research, 2013. 12(3): p. 1223-1234). The complete sensitivity to Endo H, consistent with that of Gntl, indicates that the glycosylation of the Mgatl line is exclusively high mannose. When digested with PNGase F, all samples dropped to the same size, confirming undigested gpl20 size variances were due to glycosylation size differences and not an under laying amino acid diversity.

[00283] The CHO-S and Mgatl material were resolved on 11 cm IPG strips followed by fractionation in the second dimension (Figures 9A and 9B). The CHO-S material had broad pi spread and was heterogeneous of both charge and mass, due to the varying levels and type of glycosylation. As expected, the charge of the Mgatl material was highly homogenous and collapsed to a single spot.

[00284] Beyond the strong indicators above that the selected Mgatl line was producing glycoproteins with purely high-mannose residues, the precise glycan composition of the A244 rgpl20 envelope proteins was then determined. MALDI-TOFF-MS was used on CHO-S and Mgatl " produced material, confirming that the Mgatl line produced only high-mannose material with that least 70% of that being the Mans isoform . Thus at least 70% of the glycosylation could be attributed to mannose 5 glycans and as much as 30% could be attributed to earlier glycan precursors such as mannose 8 and mannose 9.

[00285] Binding to PG9. To confirm whether Mgatl " cell line could produce monomeric, full-length rgpl20 capable of binding PG9, an FIA with both A244 rgpl20 and A244 V1/V2 fragment proteins was performed (Figures 10A and 10B). Envelope proteins produced by HEK 293, HEK Gntl, CHO-S, and Mgatl " cells were all compared. Both the 293 and CHO-S material bound poorly, while the Gntl and Mgatl material showed significant improvement over their glycan wild type counterparts, containing the necessary Mans epitope component.

Discussion

[00286] The overwhelming majority of HIV-1 vaccine research over the better part of three decades has focused on designing an antigen capable of eliciting a safe and effective protective immune response. While this goal has not yet been realized, there is hope. The RV144 trial demonstrated for the first time that some level of protection could be achieved through the use of a subunit vaccine (Rerks-Ngarm , S., et al., New England Journal of

Medicine, 2009. 361(23): p. 2209-2220; Karasavvas, N., et al., AIDS Res Hum Retroviruses, 2012. 28(11): p. 1444-57; Kim, J.H. et al., Annu Rev Med, 2015. 66: p. 423-37). Since that time much has been learned about both the envelope protein itself and the panoply of new bNAbs that bind to it. Two general concepts have clarified the requirements for an envelope protein based manufacturing scheme. First, the glycan topography became better understood, as well as the critical role of high-mannose glycans for the binding of bNAbs; something generally avoided in bio-therapeutic production (Doores, K.J., et al., Proceedings of the National Academy of Sciences, 2010. 107(31): p. 13800-13805; Bonomelli, C, et al., PLOS ONE, 2011. 6(8): p. e23521; Go, E.P., et al., J Virol, 2011. 85(16): p. 8270-84; Pritchard, L.K., et al., Nat Commun, 2015. 6: p. 7479; Cao, L., et al., 2017. 8: p. 14954). Second, a new class of potently neutralizing bNAbs were discovered that specifically required interaction with these high-mannose structures (McLellan, J.S., et al., Nature, 2011. 480(7377): p. 336-43; Pejchal, R., et al., Science, 2011. 334(6059): p. 1097-103; Lavine, C.L., et al., Journal of Virology, 2012. 86(4): p. 2153-2164; Kong, L., et al., Nat Struct Mol Biol, 2013. 20(7): p. 796-803). The gpl20 used in the RV144 trial used cell lines and methods in keeping with the best understanding of both HIV- 1 and biopharmaceutical production of the time. This meant CHO production of recombinant gpl20 with as much sialic acid as possible to increase stability and improve

pharmacokinetic/pharmacodynamic properties. As the understanding of HIV-1 and its interaction with the immune system has matured, it became clear that high sialic acid content and complex glycosylation was likely a hindrance to the development of neutralizing antibodies. These new understandings are guiding the current development of what a HIV- 1 vaccine may look like.

[00287] This creates the need for a cell platform capable of producing large amounts of recombinant high-mannose proteins. Disclosed herein is a cell line specifically for the scalable production high-mannose HIV-1 vaccine antigen. A CHO-S Mgatl knock out line limited to Man5-9 N-linked glycoforms was established using the CRISPR/Cas9 gene editing system.

[00288] With the recent sequencing of the CHO genome (Wurm, F.M. and D. Hacker, Nat Biotech, 2011. 29(8): p. 718-720; Xu, X., et al., Nat Biotech, 2011. 29(8): p. 735-741) and the advent of CRISPR gene technology, these were used as tools to efficiently knock out Mgatl. This particular glycosyltransferase is something of a standout in the N-linked glycosylation pathway in that its action is one of the few bottlenecks (Figure 1). While enzymes before and after this point in processing have their preferred substrates, there is some minor overlapping and branch points (Bieberich, E., Advances in Neurobiology, 2014. 9: p. 47-70; Moremen, K.W., M. Tiemeyer, and A.V. Nairn, Nat Rev Mol Cell Biol, 2012. 13(7): p. 448-62). This means that there are multiple potential paths to arrive at the same glycoform or diverge to create different structures. If the expression of Mgatl is silenced, then N-glycan processing essentially stops at Mans (though ocl,6 fucosylation of the primary GlcNAc by Fut8 may still occur independent of Mgatl (Chang, V.T., et al., Structure, 2007. 15(3): p. 267-73), preventing the formation of hybrid or complex type glycans. Though the maturation process cannot proceed beyond Man5, upstream high-mannose glycoforms such as Mans and Man9, required for 2G12 binding, are not precluded and may still be present on completed proteins.

[00289] When creating this cell line an initial screening was performed by a positive selection test using GNA lectin, a mannose binding lectin with a preference for ocl,3 linked mannose residues. This is in contrast to previously isolated Mgatl/Gntl lines, generated by mutagenesis and zinc-finger nucleases, which have relied upon negative selection through ricin lectins, such as Ricinus communis agglutinin-I and II (RCA-I, RCA-II) (Sealover, N.R., et al., Journal of Biotechnology, 2013. 167(1): p. 24-32; Patnaik, S.K. and P. Stanley, Methods in Enzymology, 2006. 416: p. 159-182; Lee, J., et al., Biochemistry, 2003. 42(42): p. 12349- 12357). Unlike complex and hybrid glycans, high-mannose glycans are rare in high

concentrations on healthy cell surface glycoproteins (Christiansen, M.N., et al., Proteomics, 2014. 14(4-5): p. 525-46; Hamouda, H., et al., Journal of Proteome Research, 2014. 13(12): p. 6144-6151). Positive binding of GNA to surface high-mannose glycans would be strongly indicative of successful knockout of Mgatl. Initial tests comparing the GNA-fluorescein surface staining of HEK Gntl " and CHO-S cells confirmed this with a clear difference in staining intensity (Figure 5).

[00290] In order to be useful for viable for large-scale production, the cells have to have a reasonable growth rate. One of the features that have made CHO the dominant substrate for bio- manufacturing production is their robust growth; CHO-S cells have an average doubling time of 24.3 hours when split daily to 0.35e 6 cells/mL. When seeded at the same densities, the four best candidate lines doubled between 24.0 and 38.3 hours. While rapid growth is one goal, the overall protein production level and quality is paramount. The candidate lines still had to demonstrate they could produce sufficient gpl20 with the correct glycosylation to bind glycan dependent bNAbs. To show this, a small-scale transient transfection was performed using an A244 gpl20 then performed a FIA with the purified material. This told us whether the candidate lines could produce monomeric gpl20 with the correct glycosylation to bind PG9. Affinity purified HEK Gntl produceed A244 gpl20 was used as a positive comparator. While all the candidate lines material bound PG9, the cell line candidate that grew the most slowly, 3.4F10 (38.3hr doubling time), had the highest level of PG9 binding, equal to that of the HEK Gntl material. As expected, the WT CHO-S material, with complex and hybrid glycosylation, bound poorly. When the Mgatl- gene was sequenced in the knockouts, the two lines with the lowest relative amount of PG9 binding showed only a partial knockout of the Mgatl gene. They each had multiple-codon in-frame deletions, corresponding to the transmembrane domain of the Mgatl protein (Figure 6A-6D). With the catalytic domain intact, it appears that Mgatl mannosidase functionality in these two lines was curtailed, but not eliminated.

[00291] A single cell line 3.4F10 was selected from the initial growth characteristics and PG9 binding FIA data to advance as a high-mannose HIV-1 antigen production line. A 1.3L transient transfection of A244 gpl20 was performed and affinity purified the material for further glycan analysis and bNAb affinity binding. Digestion with Endo H confirmed the uniformly high-mannose glycosylation of the gpl20 produced (Figure 8 WT CHO-S and Mgatl- produced A244 gpl20 were then compared through 2D isoelectric focusing (Figure 9). The CHO-S material, similar to what was used in the RV144 trial (Rerks-Ngarm , S., et al., New England Journal of Medicine, 2009. 361(23): p. 2209-2220; Berman, P.W., AIDS Res Hum Retroviruses, 1998. 14 Suppl 3: p. S277-89; Berman, P.W., et al., Virology, 1999. 265(1): p. 1-9), showed broad heterogeneity of charge caused by varying levels of sialylation. The Mgatl material, devoid of sialic acid and complex glycosylation, collapsed to a single discrete point. All the tests performed up to this point (lectin biding, size shifts, glycosidase digests, 2D electrophoresis) had been secondary indicators that the Mgatl line was producing solely high-mannose material. As a final confirmation the Mgatl- A244 g l20 material was analyzed via MALDI-TOF mass spectrometry. This definitively showed the Mgatl- line is limited to high-mannose glycoprotein production, with the preponderance of species being Mans .

[00292] It was then determined that the Mgatl line was an improved substrate for the production of HIV-1 vaccines. The PG9 epitope is frequently described as quaternary, requiring a gpl20 native-like trimer for binding (Burton, D.R. and L. Hangartner, Annu Rev Immunol, 2016. 34: p. 635-59; Davenport, T.M., et al., Journal of Virology, 2011. 85(14): p. 7095-7107). The requisite high-mannose glycans are thought to result from the high degree of glycosylation, large size, and complex nature of the trimeric gpl20 molecule preventing glycosidases and glycotransferases from effectively maturing the initial high mannose structures (Doores, K.J., et al., Proceedings of the National Academy of Sciences, 2010. 107(31): p. 13800-13805;

Bonomelli, C, et al., PLOS ONE, 2011. 6(8): p. e23521 ; Go, E.P., et al., J Virol, 2011. 85(16): p. 8270-84). When these pathways are controlled, the same high-mannose structures can be generated on monomeric gpl20, enabling PG9 binding. When comparing A244 gpl20 produced by WT CHO-S cells, the Mgatl- material demonstrated a high level of binding (Figure 10A).

[00293] At large-scale manufacturing facilities a viral contamination can be devastating, effectively shutting down production and only cleared with great effort and expense (Henzler, H.-J. and K. Kaiser, Nat Biotech, 1998. 16(11): p. 1077-1079; Moody, M., et al., PDA J Pharm Sci Technol, 2011. 65(6): p. 580-8). One of the principle causes for failed fermentation of CHO cells is infection by Minute Virus of Mice (MVM), a tiny (20nM) non-enveloped single stranded DNA parvovirus (Moody, M., et al., PDA J Pharm Sci Technol, 2011. 65(6): p. 580-8). Because the receptor for MVM is thought to be sialic acid MVM virus infectivity assays were carried out. These studies showed that Mgatl " CHO cells were resistant to infection by the strain MVMc, but sensitive to two other strains. While a full resistance to all MVM strains would be preferable, this removes on source of potential manufacturing contamination and factory shutdown.

[00294] Figure 1. Simplified view of N-linked glycosylation pathway. N-linked glycosylation begins in the endoplasmic reticulum with the en-block transfer of a highly conserved Gluc3MangGlcNac2 structure (left) to asparagine residues within the N-X-S/T motif of nascent proteins. This initial structure is sequentially trimmed down to MansGlucNac2 (center) by a number of glycosidases as the protein moves from the ER to the Golgi apparatus. Various glycosyltransferases then add monosaccharides creating hybrid (second from right) and complex (right) glycoforms. Kifunensine and Swainsonine are both inhibitors that halt further processing at the points shown above. EndoH and PNGase F remove the glycan structures where indicated by the arrows, with hybrid and complex glycans being insensitive to Endo H.

[00295] Figure 2. GeneArt® CRISPR Nuclease vector. The orange fluorescent protein (OFP) reporter and Cas9 is expressed as a single unit, driven by a CMV promoter sequence, and joined by a self-cleaving 2A peptide linker. Nuclear localization signals NLS1 and NLS2 usher Cas9 to the nucleus. The target sequence specific double stranded DNA oligo that will generate the crRNA is inserted into the pre-linearized vector via 5 base pair overhangs. The tracrRNA sequence is located 3' of the crRNA DNA oligo insert and is followed to by a DNA polymerase III termination sequence to ensure correct RNA folding for loading in the Cas9 complex. A U6 promoter drives expression of the crRNA and tracrRNA, which together will form the mature gRNA. Figure adapted from GeneArt® technical manual.

[00296] Figures 3A and 3B. Vector to Edit CHO Mgatl gene. The CHO Mgatl gene (Figure 3A) is a single exon gene. Three gRNA sequences were designed to correspond with three target sequences in the 5 ' region of the gene. One target is shown underlined above with the requisite protospacer adjacent motif (PAM) in bold. Since Cas9 causes a double stranded break, either the template or non-template strand may be targeted. In this case the guide RNA was designed to be complementary to the template strand. Figure 3B : Following design of the gRNA, a complementary oligonucleotide was ligated to the gRNA with sticky ends

complementary to the GeneArt CRISPR nuclease vector (Thermo Fisher) to ensure correct directionality following ligation into the vector. This vector includes an orange fluorescent protein (OFP) reporter attached by a self-cleaving 2A linker to the Cas9 exonuclease enzyme. Three separate gRNA sequences were created, each targeting the 5' end of the gene. The crRNA sequence shown was used for creation of the GB Mgatl line.

[00297] Figure 4. Flow chart of Mgatl gene editing and cell line selection strategy. The Cas9 nuclease vector with gRNA sequence inserted was electroporated into suspension adapted CHO-S cells. The transfected cells were re-suspended in conditioned media and cloned in 96 well plates at a calculated density of 0.5 cells/well. Those single cell derived colonies that grew well after 10-14 days were moved to 24 well plates. Aliquots were removed from each 24 plate well and screened for GNA lectin binding. Those that did not demonstrate uniform lectin binding were discarded. Candidate lines were expanded to shake flasks and screened for rapid growth, discarding slow growers. A test transient transfection was performed with A244 gpl20 (SEQ ID NO:4) to determine relative expression levels and PG9 binding properties of gpl20 produced by candidate lines. Those with the best growth and PG9 binding were moved forward. The Mgatl gene was PCR amplified from the remaining candidates and sequenced. The clones with the most robust growth and g l20 expression were expanded and frozen banks created. Two of these cell lines are deposited at ATCC (PTA- 124141; or PT A- 124142).

[00298] Figure 5. A GNA lectin Binding assay was used to find cells with high mannose surface glycoproteins following CRISPR/Cas9 targeted cleavage of Mgatl. As a first step to determine successful knockout of the Mgatl gene, the candidate cells were examined for fluorescein conjugated GNA lectin binding to surface glycoproteins. GNA binds exposed mannose residues with a preference for terminal ocl,3 mannose residues, such as those found on the Mans glycoforms. Cells were removed from culture, washed of media three times in ice cold 10μΜ EDTA PBS, then re-suspended in same wash buffer with 5μg/mL fluorescein conjugated GNA and kept on ice for 30 minute incubation. Following incubation all cells were washed three times again to remove unbound GNA. Wild type CHO-S cells should have predominantly complex and hybrid glycans on surface glycoproteins and demonstrated very little binding to GNA (E) serving as a negative control. HEK Gntl " is limited to Mans glycans and demonstrated posotive GNA binding (D). A representative sample of transfected CHO-S cells that showed uniform GNA binding is shown in F and C. Those wells that demonstrate uniform GNA binding were advanced for growth, productivity, and genetic characterization. All images are at 20X. A, B, and C are shown in differential interference contrast (DIC), D, E, F, are shown under 495nm excitation.

[00299] Figures 6A-6D. NHEJR induced changes to Mgatl gene. Following initially promising phenotypic analysis, the four leading candidate lines Mgatl genes were sequenced via Sanger sequencing to confirm silencing of the gene. The guide RNA was designed to be complementary with the template strand, using the PAM 'AGG. Show above is the coding strands with the PAM complement, CTT, in bold and the putative double stranded cut site indicated by the black triangle. Changes from the native sequence are underlined. A: The native sequence. B: Clones 3.4F10 and 3.5D8, each had the same mutation. C: Clone 3.5A2. D: Clone 3.5D9.

[00300] Figures 7A and 7B. Cell doubling time and transient expression of gpl20 in Mgatl- CHO cell lines. Figure 7A: Candidate cell lines were placed in 125mL shake flasks at 20mL volumes. Cell counts were taken daily for 14 days and cells were back split to 3.5xl0 5 cells/mL daily. Figure 7B: Transient transfections were performed using Fugene in 24 well plates. Five days post transfection unpurified supernatant was tested via FIA using a gD flag epitope capture and detection with PG9. Purified gpl20 from a HEK Gntl cell line was used as a comparator high-mannose line. [00301] Figures 8A-8C. Expression of gpl20 in GB Mgatl " CHO cell line. Figure 8A: Purified A244 produced by WT CHO-S, GB Mgatl CHO, and 293 HEK Gntl " cells, reduced and denatured then run on pre-cast 4-12% tris-glycine SDS Page gel (NuPage, ThermoFisher) and stained with Simply Blue Safe Stain (ThermoFisher) Samples of the same proteins were than then digested with glycosidases Endo (New England BioLabs) H or PNGase F (New England BioLabs) for 16 hours at 37°C. Figure 8B: Endo H digest. Figure 8C: PNGase F digest.

[00302] Figures 9A and 9B. Isoelectric focusing of CHO-S and Mgatl gpl20. Purified CHO-S (Figure 9A) and Mgatl (Figure 9B) produced gpl20 was fractionated in the first dimension by isoelectric focusing on 11cm IPG (pi 3-10) strips. Second dimension fractionation was performed using a 4-20% Tris-HCL SDS PAGE pre-cast gel. Two internal pH standards were included, pi 5.6 carbonic anhydrase isozyme II (solid arrow) and pi 3.9 amyloglucosidase (open arrow).

[00303] MALDI-TOFF analysis of glycans present on gpl20 produced by CHO-S and Mgatl cell lines. The glycosylation on A244 gpl20 produced by CHO-S and Mgatl cells was stripped by PNGase F digestion and examined by MALDI-TOFF MS. The CHO-S glycosylation is heterogeneous with 72% being complex and 25% high mannose. The Mgatl material was almost exclusively high mannose (99.47%). This analysis was performed by the Complex Carbohydrate Research Center at the University of Georgia.

[00304] Figures 10A and 10B. PG9 binding to monomeric gpl20 and V1/V2 scaffold improved by Mgatl knockout in CHO cells. Purified A244 gpl20 (Figure 10A) and V1/V2 fragment (Figure 10B) protein produced by WT CHO-S, 293 (gpl20 only), HEK Gntl, and GB Mgatl " cell lines was compared for binding affinity to the canonical glycan dependent bNAb PG9.

Example 2

Construction of plasmid for the expression of A244_N332-rgpl20 HIV-1 vaccine immunogen in CHO cell lines

[00305] A244-rgpl20 produced in Mgatl " CHO-S cell lines showed increased binding to broadly neutralizing antibody (bNAb) PG9.

[00306] This report describes the construction of a plasmid (UCSC1331) for the expression of a mutated HIV-1 envelope gene A244-N332-rgpl20 in stable CHO cell lines.

[00307] The A244-N332-rgpl20 gene encodes a recombinant protein that differs from the parental A244-rgpl20 gene product in its ability to bind multiple broadly neutralizing antibodies (bNAbs) that depend on the presence of an N-linked glycosylation site at asparagine residue, N332. The A244-rgpl20 immunogen is significant since it was a major component of a prime/boost immunization regimen used in the RV144 clinical trial. This 16,000 person study carried out in Thailand (2003-2009) is the only vaccine trial to demonstrate vaccine induced protection in humans. It is thought that the N332 mutation will improve the A244-rgpl20 vaccine immunogen by adding multiple epitopes recognized by broadly neutralizing antibodies (bNAbs). The use of vaccine immunogen that contains multiple epitopes recognized by glycan dependent bNAbs has the potential to improve the level of vaccine efficacy from -31% observed in the RV144 trial to a level of 50% or more required for regulatory approval and clinical deployment.

Materials and Methods

[00308] The starting plasmid for the construction of UCSC1331 was the PCF1 expression developed in the Berman lab at UCSC. Standard genetic engineering methods, including PCR based mutagenesis, were used to splice and mutate specific gene fragments. A synthetic, codon optimized gene encoding the A244-rgpl20 was mutagenized using standard methods to alter the location of N-linked glycosylation sites. Plasmids were propagated in the DH5a strain of E. coli, and plasmids were purified using the endotoxin free QiaGen Gigprep purification kit (cat No. 12391) DNA sequencing was carried out at the University of California at Berkeley Core Sequencing facility using Sanger chain termination sequencing.

Results

[00309] The UCSC1331 plasmid (Figure 11) was engineered to contain three principal elements: 1) a bacterial plasmid backbone originally derived from PBR322 containing a bacterial origin of replication and a bacterial transcription unit enabling the expression of a gene (β-lactamase) conferring resistance to ampicillin when expressed in bacterial cells, 2) a chimeric DNA fragment containing an transcription unit where an SV40 promoter and origin of replication that enables plasmid replication and the expression of neomycin phosphotransferase and confers resistance to the antibiotic G418 when expressed in mammalian cells; and 3) a second transcription unit with cloning sites for the expression of any transgene (e.g. HIV envelope protein) with a 3' stop codon for expression in mammalian cells. The second transcription unit includes a partial CMV promoter sequence, and a polyA adenylation sequence from bovine growth hormone (BGH).

[00310] Design of promoter and 5' untranslated sequences. The core CMV promoter in UCSC1331 differs from the CMV promoter found in many commercially available vectors (e.g. pCDNA3.1) that are useful for transient transfection, but unsuitable for the production of stable cell lines because of gradual inactivation of the CMV promoter by mammalian cell

methy transferases. To allow stable expression in mammalian cells, by avoiding inactivation of the CMV promoter by CHO methyl-transferases, the CMV promoter was mutated to remove two CpG sites at positions C41G and C179G, as described by Moritz and Gopfert (Moritz, B. et al., Scientific Reports, 2015. 5: p. 16952). Other features designed to improve expression levels compared to those achieved by commercial expression vectors was the insertion of a chimeric intron downstream of the CMV promoter (Bothwell, A.L., et al., Cell, 1981. 24(3): p. 625-37; Senapathy, P. et al., Methods Enzymol, 1990. 183: p. 252-78) and a 5' UTR spacer, upstream from the translational start codon. The precise arrangement of the CMV promoter, the intron, the A244_N332-rgpl20 transgene and the bovine growth hormone (BGH) poly A tail expression cassette is diagrammed in Figure 11.

[00311] Design heterologous signal sequence and N-terminal purification tag. The A244_N332-rgpl20 protein produced in these studies was expressed as a fusion protein (Figure 12) with the N-terminal signal sequences and a 27 amino acid purification tag from Herpes Simplex Virus Type 1 glycoprotein D (gD).

[00312] Mutagenesis of N332 and N334 N-linked glycosylation sites. A major functional difference between the wild type and A244-N332 gene products is the location of a critical predicted N-linked glycosylation site (PNGS) in the base of the V3 loop (V3/C3 domain). Thus the N334 PNGS in the wildtype A244-rgpl20 gene was deleted and replaced with an alternative PNGS site added at position N332 (Doran et al. 2017, manuscript in preparation). The change is known to facilitate the binding of a major class of broadly neutralizing monoclonal antibodies (bN-mAbs) such as PGT121, PGT128 and 1010-74 that require a glycosylation site at N332. A comparison of the A244-rgpl20 and A244-N332-rgpl20 protein sequences is provided in a pairwise alignment (Figure 13).

[00313] Assembly of the chimeric A244 N332-rgpl20 coding sequence. High level expression of multiple rgpl20 genes was previously achieved (Lasky, L.A., et al., Science, 1986. 233(4760): p. 209-12) by codon optimization and replacing the signal sequence and 5' UTR of HIV-1 with that of the Herpes Simplex virus Type 1 glycoprotein D (HSV-1 gD) gene. In addition a 27 amino acid purification tag and a 3 amino acid linker sequence (LEE) was fused to amino acid 12 of the mature fully processed sequence of gpl20. A diagram showing this structure is provided in Figure 14. To construct A244_N332-rgpl20 gene, a synthetic DNA sequence encoding a modified CMV core promoter, a chimeric intron, a 5 ' UTR spacer, the HSV-lgD signal sequence, and a 27 amino acid HSV-1 gD flag epitope with a three amino acid (LLE) linker sites was purchased from Thermo Fisher (Waltham, MA, USA). The fragment was designed to include unique restriction sites after the 5 'UTR (EcoRl) and at the gD-flag linker (Kpnl) and a Notl site for convenient cloning of gpl20 sequences missing the first 11 amino acids, and is flanked by Hind III and XBal restriction sites, which are compatible with the Hindlll-XBal digested pCFl vector fragment. In addition, multiple stop codons are encoded between the Notl the XBal site. Assembly of the expression construct was a two-step process: an intermediate was assembled by ligation of the Hind III- XBal restricted synthetic sequence to the Hindlll -Xbal fragment of pCFl (+) to produce the "empty" expression cassette

(UCSC1324). The resultant vector was then digested with Kpnl and Notl, and ligated to a Kpnl-Notl fragment from plasmid UCSC1250 that encodes a codon optimized A244ucsc gene sequence, and the resulting plasmid (UCSC_CHO.A244N332) was sequenced. A schematic of the fully ligated, codon optimized, chimeric expression gene chimeric gene used to express A244_N332-rgpl20 compared to the wildtype A244-rgpl20 sequence use in shown in Figure 15A and 15B. Chimeric protein expressed by the UCSC_CHO.A244N332 plasmid can be affinity purified using antibody to the gD flag. This vector can be used for transient, or for stable expression by selecting transfected cells with the antibiotic G-418 (Southern, P.J. and P. Berg, J Mol Appl Genet, 1982. 1(4): p. 327-41).

[00314] Figure 11. Diagram of UCSC1331 plasmid used to express A244_N332-rgpl20.

[00315] Figure 12. Diagram of the chimeric gene used for the expression of A244_N332- rgpl20.

[00316] Figure 13. Emboss Needle pairwise sequence alignment of the amino acid sequence of the A244_N332-rgpl20 transcription product with the A244-rgpl20 transcription product used to produce rgpl20 for the RV144 clinical trial. A is A244ucsc rgpl20. B is A244 G NE rgpl20.

[00317] Figure 14. Comparison of the wild-type A244-rgpl20 transcription product with the A244-N332-rgpl20 transcription product and the mature processed form of the 244_N332- rgpl20 protein.

[00318] Figure 15A and 15B. Emboss Needle pairwise sequence alignment of the nucleotide sequence of the codon optimized A244_N332-rgpl20 gene and the A244-rgpl20 gene used to produce A244-rgpl20 for the RV144 clinical trial.

Example 3

Preparation of Goat Polyclonal Antibody Required for Selection Stable Cell Lines Expressing HIV Envelope Proteins Using the ClonePix 2 Robot

[00319] The production of affinity purified polyclonal antibodies reactive with HIV envelope protein, gpl20, derived from clade B (MN), clade C (CN97001) and clade CRF01_AE (TH023) strains of HIV- 1 is described. These antibodies represent an essential reagent for use in the robotic selection of stable cell lines expressing high levels of recombinant HIV envelope proteins.

[00320] The ClonePix 2 robotic cell line selection technology requires a fluorescently labeled antibody mixture to a specific secreted gene product that is capable of forming a precipitin band around colonies of cells suspended in a semisolid matrix (e.g. methylcellulose or soft agar). The size of the precipitin band, and the intensity of antibody staining, is proportional to the amount of gene product secreted and serves as the basis for identifying and ranking cell colonies in order of the amount of protein being secreted. Based on this ranking the ClonePix robot is able to sort through tens of thousands of individual cell colonies and identify the small percentage of unusual variants capable of secreting extraordinarily large amounts of proteins. A typical ClonePix 2 experiment might involve screening 40-50,000 individual colonies and selecting 20-40 for further growth and analysis. Before the availability of this instrument, investigators had to manually pick, culture, and assay thousands of individual cell colonies (clones) in order to identify a rare cell line producing high levels of a secreted transgene gene product suitable for biopharmaceutical production. This process was extremely time and labor intensive, usually requiring a team of researchers to pick, culture, and assay the thousands of clones in order to find a high producer cell line. Some proteins such as immunoglobulins are easy to express and high producing cell lines can readily be identified by manual selection in 6 months. However, other proteins, such as HIV envelope proteins, are difficult to express and the identification of high producing cell lines by manual selection typically takes 12-24 months using selective conditions requiring repeated cycles of gene amplification and selection targeting selectable markers such as dihydrofolate reductase and glutamine synthetase.

[00321] The ClonePix2 instrument automates the selection of cell lines producing large amounts of secreted gene products, providing a significant reduction in the time and cost of selecting a high producing cell line that can be used for biopharmaceutical production.

Commercial antibody reagents are available for the isolation of cell lines producing monoclonal antibodies, but are not available for other proteins such as HIV envelope proteins. Therefore reagents that could be used for the identification of cells expressing levels of recombinant HIV envelope proteins >50 mg/L in transfected CHO cells were created. Initial experiments based on the suggestions of the ClonePix2 manufacturer (Molecular Devices, Mountain View, CA) and other HIV vaccine researchers (Lu, S. 2015. HIV Env. Manufacturing Workshop, NIAID, Bethesda, MD June 11, 2015) involved the growth and production mixtures of fluorescently labeled monoclonal antibodies. The formation of precipitin bands requires an antigen with at least three different epitopes and antibodies in approximately equal concentrations to each of these epitopes. However, after spending -18 months trying cocktails of three or more monoclonal antibodies to different gpl20 epitopes precipitin bands around colonies of cells known to express gpl20 could not be observed using this technique. It was therefore concluded that the same approach used in selecting cell lines producing monoclonal antibodies was unlikely to work for selecting cell lines producing gpl20, and that a different strategy was needed. Protein A or protein G purified polyclonal rabbit and goat antibodies to recombinant gpl20 were then used to label cell lines secreting HIV envelope proteins. This approach was similarly unsuccessful. Finally, it was reasoned that the background fluorescence in purified polyclonal sera might obscure the visualization of the minute precipitin bands surrounding each cell colony.

Materials and Methods

[00322] Ethics statement. Animal experiments were performed according to the guidelines of the Animal Welfare Act. Pocono Rabbit Farm and Laboratory, Inc. has an Animal Welfare Assurance on file with The Office of Laboratory Animal Welfare (OLAW). The Animal Welfare Assurance number is A3886-01 effective January 29, 2013 through January 31, 2017.

[00323] gpl20 immunogens. Purified gpl20s from three clades of HIV (CRF01_AE, B, and C) were expressed by large scale transient expression in 293 cells. Each protein was expressed as a fusion protein containing an N-terminal 27 amino acid purification tag from Herpes Simplex Virus type 1 glycoprotein D (gD). Growth conditioned cell culture medium was harvested, filtered, and the gpl20 proteins were purified by immunoaffinity chromatography using a monoclonal antibody to gD coupled to an insoluble matrix. The proteins recovered consisted of gpl20s from the A244, MN, CN97001 isolates of HIV-1. SDS-PAGE gels of the proteins used for immunization are provided in Figure 16). The lots of the three antigens used were: 1) CN97001-rgpl20, produced in 293HEK cells, 2) MN468-rgpl20 (lot 456; produced in Gntl- 293 cells); and 3) A244 G NE-rgpl20 (lots 368, 329, and 338, produced in Gntl- 293 cells).

[00324] Goat immunization. A single male goat (557) weighing approximately 56 kg was immunized with a mixture of three gpl20 antigens at Pocono Laboratories, Canadensis, PA. Immunization began on day 0 with a mixture of all three immunogens (100 μ , each) and booster immunizations on days 7, 14, and 35, 49 and 63. The primary immunization on day 0 was via intradermal injection using Complete Freund's Adjuvant (CFA). The boosts at days 7, 14, and 35 were intra muscular and used Incomplete Freund's Adjuvant with MightyQuick Stimulator (PRF&L's proprietary immune stimulator). Bleeds were taken on days 0 (prebleed), 21, 28, 35, 42, 56, 63, 70, and a final exsanguination bleed at day 77. 2.5L of 557 serum is stored at -20°C at UCSC.

[00325] Verification of antibody levels in goal serum. The goat serum was assayed by direct FIA assay using 96-well plates (Fluortrac 600, Greiner) coated with 2 iig/ml of protein overnight in PBS. Bound antibody was detected using a polyclonal donkey anti-goat antibody at a dilution of 1/5000 (Life Technologies, Carlsbad, CA), and plates read on an Envision plate reader (Perkin Elmer, Waltham, MA). Results are shown in Figure 17.

[00326] Purification of antibodies. Total IgG was purified from goat serum by affinity chromatography using a HiTrap Protein G column (GE Healthcare, Little Chalfont, United Kingdom), following the manufacturer's instructions. The purified antibodies were stored at 20 mg/ml in PBS at -20°C. Immunoaffinity columns were prepared by coupling MNgpl20-rgpl20 and A244-rgpl20 to cyanogen bromide activated sepharose (GE Healthcare, Little Chalfont, United Kingdom). An aliquot of serum was purified by successive purification on two affinity columns created with TH023-rgpl20, MN-rgpl20, respectively. Columns were washed with 10 column volumes of 50 mM Tris, 0.5 M NaCl, 0.1 M TMAC (tetramethyl ammonium chloride) buffer (pH 7.4), and eluted with 0.1 M sodium acetate buffer, pH 3.0. The pH of the buffer was neutralized by the addition of 1.0 M Tris (1: 10 ratio) and the resulting solution was concentrated using an AMICON molecular weight cutoff centrifuge tube (Millipore, Billericia, MA). The purified protein was adjusted to a final concentration of 1-2 mg/mL in PBS buffer. Protein concentrations were determined using the bicinchoninic acid assay (BCA) method.

[00327] Alexa 488 antibody labeling. Two aliquots of goat 557 polyclonal antibody were labeled with Alexa 488 (Thermo Fisher Scientific, Waltham, MA). The first batch was protein G purified and the second, immunoaffinity purified. Conjugate labeling was performed using an Alexa Fluor labeling kit (Thermo Fisher Scientific, Waltham, MA) as per instructions excepting that the labeled antibody was separated from unlabeled dye using a 30K cutoff Amicon Ultra spin column centrifuging three times 10 min at in a 3750 rpm 2750 rcf washing with 10 ml of PBS each time until no dye was detected in the filtrate. The Alexa 488-conjugated antibody was concentrated to 1.8-2 mg/ml, and the amount of dye coupled to antibody, was calculated using a Nanodrop spectrophotometer (Thermo Fisher Scientific, Waltham, MA). It was determined to be to be four moles and six moles per mole respectively, for the protein G and immunoaffinity batches. Anywhere between 4-9 moles per mole is deemed acceptable by the manufacturers protocol. Labeled antibody filter sterilized through a 0.2 micron (will use 0.1 in future) filter, and was stored at 4 ° C in the dark in a refrigerator in room 288 Baskin labs, UCSC. Results

[00328] Recombinant gpl20s for immunization studies. Recombinant gpl20s from the A244, MN, and CN97001 isolates of HIV-1 were expressed 293 HEK cells by transient transfection as described previously (Nakamura, G.R., et al., J Virol, 1993. 67(10): p. 6179-91; Smith, D.H., et al., PLoS One, 2010. 5(8): p. el2076). Growth conditioned cell culture supernatants were collected, filtered, and applied to an immuno-affinity column prepared with a purified monoclonal antibody reactive with the N-terminal 27 amino acids of Herpes Simplex Virus Type 1 glycoprotein D (gD). The column was eluted at pH 3, and the eluted gpl20 was purified by immune- affinity and size exclusion chromatography. The purified proteins were analyzed for purity and quality (e.g. proteolytic degradation) by SDS_PAGE. Visualization after Coomassie blue staining (Figure 16) showed that all of the protein ran as a single band and that there was little if any evidence of dimerization or proteolysis upon reduction with dithiotheitol.

[00329] Immunization of Goat 577 with purified gpl20. A healthy goat with a documented record of veterinary care was immunized five times with a mixture of gpl20s from 3 different clades using a protocol compliant with USDA guidelines and the Animal Welfare Act. Adjuvants were provided by the contract research organization, Pocono Laboratories, Canadensis, PA. Samples of antisera were collected after each immunization and pooled sera from pairs of immunization were monitored for the presence of antibodies to all three gpl20s used for immunization as well as antibodies to the HSV-1 gD purification tag present on all three immunogens. Antibodies to all three antibodies were detected in the pooled sera analyzed (Figure 17), however the titers to CN97001-rgpl20 were lower than the titers to the other two antigens until after bleeds 7-10. Serum was collected and stored as described above (Materials and Methods).

[00330] Comparison of Protein G and antigen specific affinity-purified antibody in

ClonePix assay. An Mgatl CHO cell line expressing A244_N332-rgpl20 (clone 5F) were diluted to 25 cells/ml in CHO-A matrix (Molecular Devices, Sunnyvale, CA) containing 10 .Lig/ml of either Alexa 488, protein G purified, IgG from goat 577 or immuno-affinity purified, Alexa 488 labeled, IgG antibody purified goat 557. Colonies were imaged using the ClonePix 2 after 14 days in culture. Halos were visible around clones that had been incubated in the presence of Immuno-affinity purified, Alexa 488 antibody, but were absent in the protein-G purified Alexa 488 test wells (Figure 18). These results demonstrate that polyclonal antibodies to gpl20 can be used to visualize colonies of cells secreting recombinant HIV envelope proteins provided that they are immunoaffinity purified prior to labeling with an appropriate fluorophore (e.g. Alexa 488). [00331] Figure 16. All three proteins were boiled with LDS sample loading buffer (Invitrogen) with or without reducing reagent DTT addition for 2 minutes. Then it was run in 4- 12% Bis-Tris gel with MES running buffer (Thermo Fisher, Life Technologies, Carlsbad, CA), stained by SimplyBlue Safe stain (Thermo Fisher, Life Technologies, Carlsbad, CA) for a hour and destained overnight in distilled water. SDS-Gel image was captured by Fluorchem Q system (Alpha Innotech, Genetic Technologies, Grover, MO). Lane 1: Molecular Weight standard (Thermo Fisher, Life Technologies, Carlsbad, CA) Lane 2 and 3. Clade C gpl20: CN97001. It was produced from 293 cells in Genetech with/without reducing reagent DTT addition respectively. Lane 4 and 5. Clade B gpl20: MN468, a glycosylation mutation of MN strain. It was produced from Gntl- cells, purified by affinity and gel filtration chromatograph in UCSC, with/without reducing reagent DTT addition respectively. Lane 6 and 7. Clade AE gpl20: A244. It was produced from Gntl- cells, purified by affinity and gel filtration chromatograph in UCSC, with/without reducing reagent DTT addition respectively.

[00332] Figures 17A-17D. Measurement of antibodies to A244-, MN-, and CN97001 gpl20s and to the HSV1 glycoprotein purification tag during the course of immunization of Goat 577. Protein lots #647, #648 and #15 of gpl20 and a synthetic peptide corresponding to the gD purification tag were used to in a direct coat FIA assay. Titer data is grouped for production lots that were combined for purification purposes. Bleeds 2 and 3 were protein G purified and affinity purified for use in the ClonePix2 cell line selection experiments.

[00333] Figures 18A-18C. Comparison of ClonePix2 images obtained with protein G purified, Alexa 488 labeled goat IgG and with affinity-purified, Alexa 488 labeled IgG. Mgatl- CHO cells were transfected with the UCSC 1331 plasmid by electroporation and the resulting cells were suspended in semi-solid CHO-A growth media (Molecular Devices, Sunnyvale, CA) containing Alexa488 -labelled IgG elicited against a mixture recombinant gpl20s from the MN-, A244-, and CN97001- strains of HIV-1. The cells were cultured for 14 days at 37 ° C in 8% C0 2 and then visualized in the ClonePix 2 robotic selection system. Figure 18 A, images of cells after a 14 day incubation of Mgatl- cells expressing A244-N332-rgpl20 with polyclonal immuno- affinity purified Alexa 488 labeled goat IgG (goat557). Top row, white light; bottom row, fluorescent light (535 nM). Figure 18B, images of cells after a 14 day incubation of Mgatl- cells expressing A244_N332-rgpl20 with lOug/ml of Alexa 488 labeled, protein G purified, goat IgG. Top row, white light; bottom row, fluorescent light (535 nM). Figure 18C, images of cells from a control experiment where of Mgatl- cells expressing A244_N332-rgpl20 were incubated for 14 days without added antibody. Top row, white light; bottom row, fluorescent light (535 nM). Example 4

Method for the selection of Stable CHO Cell Lines Producing Recombinant HIV Envelope Proteins for Use as Vaccine Immunogens

[00334] This report describes a novel method for the rapid development of a stable CHO cell lines producing recombinant forms of the HIV-1 envelope proteins, gpl20, where N-linked glycosylation is limited to mannose-5 glycans and earlier structures in the N-linked

glycosylation pathway. This method provides major economic advantages in the HIV vaccine manufacturing process, and provides major biologic advantages in pharmacokinetics and antigenic structure. These improvements derive from improved method for creating novel cell lines with extraordinarily high gpl20 production capacity, as well the use of a novel cell line Mgatl CHO that limits N-linked glycosylation primarily to mannose-5 glycans. Because the final product incorporates multiple glycan dependent epitopes recognized by broadly neutralizing antibodies, the new molecule (A244_N332-rgpl20) described in this report should be more effective than previous gpl20 vaccines in eliciting protective immunity than and can be manufactured more efficiently at a substantially reduced cost.

[00335] The development of a safe, effective, and affordable HIV vaccine is a global public health priority. After more than 30 years of vaccine development, a vaccine with these properties has yet to be described. To date, the only clinical study to show that vaccination can prevent HIV infection is the 16,000 RV144 trial carried out in Thailand between 2003 and 2009 (Rerks-Ngarm, S., et al., N Engl J Med, 2009. 361(23): p. 2209-20). This study involved immunization with a recombinant canarypox virus to induce cellular immunity and a bivalent recombinant gpl20 vaccine designed to elicit protective antibody responses (Berman, P.W., et al., Virology, 1999. 265(1): p. 1-9; Berman, P.W., AIDS Res Hum Retroviruses, 1998. 14 Suppl 3: p. S277-89). Unfortunately only a modest level protection (-31%) was achieved in this study, resulting in an urgent need to find a way to improve the level of protection. Improving the efficacy of gpl20 vaccines from 31% seen in RV144 to the level of protection of 50% or more (thought to be required for regulatory approval), is likely faster and more cost effective than developing a new vaccine concept from scratch. Several correlates of protection studies have suggested that the protection achieved in the RV144 trial can be attributed to antibodies to gpl20 (Haynes, B.F., et al., N Engl J Med, 2012. 366(14): p. 1275-86; Montefiori, D.C., et al., J Infect Dis, 2012. 206(3): p. 431-41; O'Connell, R.J. et al., Expert Rev Vaccines, 2014. 13(12): p. 1489-500). A roadmap to improve the gpl20 vaccine used in the RV144 trial has been provided by the recent identification of multiple broadly neutralizing monoclonal antibodies (bN-mAbs) to gpl20. Surprisingly, many of these were found to recognize unusual glycan dependent epitopes that were dependent on mannose-5 or mannose-9 structures (Walker, L.M., et al., Science, 2009. 326(5950): p. 285-9; Walker, L.M., et al., Nature, 2011. 477(7365): p. 466-70). Since the gpl20 vaccine used in the RV144 trials lacked these structures, they also lacked multiple epitopes with the potential to stimulate protective virus neutralizing antibodies. The work described in this report represents the results of a focused effort to find a practical and economical way to produce an improved gpl20 vaccine antigens possessing the glycan structures required to bind bNAbs.

[00336] Previous experience showed that the production of recombinant HIV envelope proteins (gpl20 and gpl40) for clinical research and commercial deployment was extremely challenging. Not only was it difficult to isolate stable cell lines producing commercially acceptable yields (e.g. >50 mg/mL), but it was also difficult to consistently manufacture a high quality, well defined product with uniform glycosylation, free of proteolytic clipping and aggregated species. Key breakthroughs in improving the yields of HIV envelope expression came with the discovery that the native HIV envelope glycoprotein signal sequence often limited expression and that replacement with other signal sequences such as Herpes Simplex Virus glycoprotein D (gD) or the prepro signal sequence of tissue plasminogen activator enhanced expression (Lasky, L.A., et al., Science, 1986. 233(4760): p. 209- 12). Additional progress was achieved when it was recognized that codon optimization could enhance HIV envelope glycoprotein expression (Haas, J. et al., Curr Biol, 1996. 6(3): p. 315-24). However, even with these improvements it was often difficult to create stable CHO cell lines, suitable for vaccine production that expressed more than 2-20 mg/L. These low levels of expression necessitated production of candidate HIV vaccine antigens at large scale (up to 10,000L) in order to produce sufficient material for large scale vaccine trials such as the 16,000 person RV144 HIV vaccine trial (Rerks-Ngarm, S., et al., N Engl J Med, 2009. 361(23): p. 2209-20). Vaccine production at this scale is very expensive and required the use of manufacturing facilities costing in excess of $500 million for production. In principle, a major way to reduce the cost of manufacturing and production is to develop high producing cell lines yielding 200-2000 mg/L such as those used to produce therapeutic monoclonal antibodies. Because the required dosage of subunit vaccines is typically less than 1 mg the size of the manufacturing facility required for the commercial production of gpl20 vaccines from a high producing cell line is proportionally less (e.g. 1,000 L) as is the cost of materials and supplies required to recover the recombinant proteins from the smaller size fermentation cultures. This report describes a rapid method to produce a high yielding CHO cell line producing gpl20 that should result in a 10-fold or more reduction in the cost of manufacturing and production compared to HIV vaccines described previously. Moreover the disclosed method for producing gpl20 cell lines requires only 2-3 months compared to previous efforts which have taken 12-24 months.

[00337] Another challenge in the development of recombinant gpl20 derives from the fact that it is highly glycosylated and typically possess 25-26 predicted N-linked glycosylation sites. Because each glycosylation site can be occupied by as many as 40 different glycans (Go, E.P., et al., J Virol, 2011. 85(16): p. 8270-84; Go, E.P., et al., Journal of proteome research, 2013. 12(3): p. 1223-1234), some with as many as 4 sialic acid residues, the net charge and biophysical properties of recombinant gpl20 are highly variable. The variability in glycosylation makes it difficult to purify and difficult to define the chemical structure of the recombinant protein. Moreover since the pharmacokinetic and pharmacodynamic properties of glycoproteins such as gpl20 that are in large part determined by the sialic acid content, glycan variability represents a potential source of product variability (Sinclair, A.M. and S. Elliott, J Pharm Sci, 2005. 94(8): p. 1626-35). Disclosed herein is a solution to the problems in glycosylation heterogeneity. The solution involves the production of gpl20 in a novel CHO cell line with a mutation in the Mgatl gene (see Example 1). Production of recombinant gpl20 in this cell line limits glycosylation primarily to mannose-5 and earlier structures in N-linked glycosylation pathway. This approach considerably improves the homogeneity of the recombinant gpl20 and simplifies the recovery process required to manufacture the protein. It also reduces "lot to lot" variation and should improves the consistency and biological activity of the protein. Finally, as described above, mannose-5 glycans are an essential feature of many epitopes recognized by broadly neutralizing antibodies. Thus the novel method for producing gpl20 described in this report substantially improves the quality and biologic activity of recombinant gpl20 while at the same time lowering the manufacturing costs compared to previous methods.

Materials and Methods

[00338] Broadly neutralizing human monoclonal antibodies. The following reagents were obtained through the NIH AIDS Reagent Program, Division of AIDS, NIAID, NIH: PG9 (Walker, L.M., et al., Science, 2009. 326(5950): p. 285-9), VRCOl (Wu, X., et al., Science, 2010. 329(5993): p. 856-861), PGT121, PGT128 (Walker, L.M., et al., Nature, 2011. 477(7365): p. 466-70), and 101074 (Shingai, M., et al., Nature, 2013. 503(7475): p. 277-280). PG16 was purchased from and Polymun A.G. (Vienna, Austria). The antiviral compound CD4-IgG has been described previously (Ashkenazi, A., et al., Proc Natl Acad Sci U S A, 1991. 88(16): p. 7056-60; Capon, D.J., et al., Nature, 1989. 337(6207): p. 525-31) and was provided by

GSID. All secondary polyclonal antibody conjugates were purchased from Jackson

ImmunoResearch Laboratories, West Grove). [00339] Transfection of gpl20 genes by electroporation. Mgatl " CHO is a novel cell line derived from the commercially available CHO-S cell line (Thermo Fisher, Life Technologies, Carlsbad, CA). The cell line possess mutation that inactivate both copies of the Mannosyl (Alpha-l,3-)-Glycoprotein Beta-1,2-N Acetylglucosaminyltransferase 1 gene (Mgatl). Cells with this produce proteins where N-linked glycosylation is limited primarily to Man(5) GlcNAc(2) glycans with a small percentage of glycans possessing earlier structures in the N- linked glycosylation pathway (e.g. Mannose-8 and mannose-9) (Byrne et al 2017, manuscript in preparation ). Cell cultures of Mgat 1 were maintained in CD-CHO (Thermo Fisher Life Carlsbad CA) 8mM Glutamax, IX HT (Thermo Fisher Life Carlsbad CA) culturing at 37 , with 8% C02 and 85% humidity, rotating at 135 rpm in a Climo 1SF1X shaker (Kuhner, San Carlos, CA). Mgatl cells were transfected with a linearized plasmid expression vector

(UCSC1331) containing a chimeric gene directing the synthesis of a variant of the gpl20 gene from the A244 isolate of HIV-1. The protein synthesized by this gene is termed

A244ucscrgpl20. This plasmid contains a gene encoding the neomycin resistance allowing selection in the antibiotic G418 (Southern, P.J. and P. Berg, J Mol Appl Genet, 1982. 1(4): p. 327-41). Also transfected, was a plasmid directing the expression of dihydrofolate reductase (DHFR) that could be used as a selectable marker. Transfection of Mgatl- cell was

accomplished using electroporation using the MaxCyte Scalable transfection system (MaxCyte Inc., Gaithersburg, MD) according to the manufacturer's protocol. Briefly, 120 μg of plasmid was mixed with 8XE107 cells in C400 cuvette in MaxCyte transfection buffer. After electroporation the cells were cultured for 24 hrs in 15 mL of non-selective CD opti-CHO (Thermo Fisher Life, Carlsbad CA) media supplemented with 2mM glutamax (Thermo Fisher Life Technologies, Carlsbad CA), 0.1 % Pluronic (Thermo Fisher Life Technologies, Carlsbad CA) 1 X hypoxanthine/thymidine (Thermo Fisher Life Technologies, Carlsbad CA) in a 125 ml Corningflask (Thermo Fisher Life Technologies, Carlsbad CA), at 8% Co2 370C, rotating at 135rpm, with 85% humidity.

[00340] Seeding of transfected cells in semi solid media. Twenty four hours after electroporation, cells were counted and diluted to a concentration o5 x 10 3 /ml in 50 ml of semisolid CHO-Growth A with L-glutamine (Molecular Devices, Sunnyvale, CA) containing 500 μg/ml of Geneticin (G418) (Thermo Fisher, Life Technologies, Carlsbad, CA), 2.5% New Zealand Fetal Bovine Serum (Thermo Fisher, Life Technologies, Carlsbad, CA) lOOng/ml Methotrexoate (Sigma- Aldrich, St. Louis, MO) and 10 μg/ml Alexa 488 labeled affinity-purified polyclonal antibody in 6 well plates (Greiner, Kremsmunster, Austria). The plates were incubated in static culture at CO2 37°C with 8% and 85% humidity. Distinct colonies with a fluorescent halo were visible after 6 days, but robotic selection was performed after 16 days to allow for additional antibody selection.

[00341] Isolation of single high producing clones. The ClonePix2 system (Molecular Devices, Sunnyvale, CA) was used to image colonies secreting A244„ CS crgpl20 into the semisolid media. Colonies were imaged under white light and fluorescence (Lee, C, et al.,

Bioprocess International, 2006. 4(sup 3): p. 32-35). Both images were superimposed, and the colonies sorted according to mean exterior fluorescent intensity. The top 0.1% were aspirated with micro-pins controlled by the ClonePix 2 system, and dispersed automatically in a 96-well plate containing 100 μΐ of rescue media XP CHO (Genetix Mol Devices, Sunnyvale, CA) conditioned 0.2 micron filtered CD-CHO (Thermo Fisher, Life Technologies, Carlsbad, CA) at 50:50 ratio, with lx hypoxanthine and thymidine (HT) supplement (Thermo Fisher, Life Technologies, Carlsbad, CA), lx Insulin/Transferrin/ Selenium supplement (Thermo Fisher, Life Technologies, Carlsbad, CA) with a final concentration of 500 μg/ml Geneticin / G418 (Thermo Fisher, Life Technologies, Carlsbad, CA), and cultured at 37 » C, with 8% C0 2 and 85% humidity. After 5 days in culture, a further 100 μΐ of rescue media was added to each well. Cultures were assayed at day 9 to confirm rgpl20 production, and positive colonies transferred to 2 ml wells (37 » C, 8% QXand 85% humidity). Supernatants from 2 ml wells were assayed for protein production by capture ELISA and western blot. Cells simultaneously at a viable density and positive for Α244„ ¾ρ120 expression were transferred to 50 ml shaker tubes, then 125 ml shaker flasks, culturing at 37 » C, with 8% C0 2 and 85% humidity, rotating at 135 rpm in a Climo 1SF1X shaker (Kuhner, San Carlos, CA).

[00342] Batch fed culture expression. The ClonePix2 system (Molecular Devices, Sunnyvale, CA) was used to image colonies secreting A244ucscrgpl20 into the semi-solid media. Colonies were imaged under white light and fluorescence (Lee, C, et al., Bioprocess International, 2006. 4(sup 3): p. 32-35). Both images were superimposed, and the colonies sorted according to mean exterior fluorescent intensity. The top 0.1% were aspirated with micro- pins controlled by the ClonePix 2 system, and dispersed automatically in a 96-well plate containing 100 μΐ of rescue media XP CHO (Genetix Mol Devices, Sunnyvale, CA) conditioned 0.2 micron filtered CD-CHO (Thermo Fisher, Life Technologies, Carlsbad, CA) at 50:50 ratio, with lx hypoxanthine and thymidine (HT) supplement (Thermo Fisher, Life Technologies, Carlsbad, CA), lx Insulin/Transferrin/ Selenium supplement (Thermo Fisher, Life Technologies, Carlsbad, CA) with a final concentration of 500 μg/ml Geneticin / G418 (Thermo Fisher, Life Technologies, Carlsbad, CA), and cultured at 37°C, with 8% C02and 85% humidity. After 5 days in culture, a further 100 μΐ of rescue media was added to each well. Cultures were assayed at day 9 to confirm rgpl20 production, and positive colonies transferred to 2 ml wells (37°C, 8% CO2 and 85% humidity). Supernatants from 2 ml wells were assayed for protein production by capture ELISA and western blot. Cells simultaneously at a viable density and positive for A244ucscrgpl20 expression were transferred to 50 ml shaker tubes, then 125 ml shaker flasks, culturing at 37°C, with 8% CC and 85% humidity, rotating at 135 rpm in a Climo 1SF1X shaker (Kuhner, San Carlos, CA).

[00343] Batch fed culture expression. At day 56, clones selected for a larger scale batch fed protein production experiment were cultured in production media, that is CD-OptiCHO (Thermo Fisher, Life Technologies, Carlsbad, CA) at 32 ° C, supplemented with 1 mM sodium butyrate, 2 mM Glutamax, X1HT, 0.1% Pluronic® at 8% C0 2 85% humidity, and a rotation speed of 135 rpm at a starting density of 1 x 10 7 cells/ml, until the viability dropped below 50%. Cultures were fed daily with MaxCyte CHO A Feed, 0.5% Yeastolate (BD, Franklin Lakes, NJ), 2.5% CHO-CD Efficient Feed A, 0.25 mM GlutaMAX, 2 g/L Glucose (Sigma- Aldrich, St. Louis, MO). Supernatant was harvested by pelleting the cells at 250 g for 30 min followed by pre-filtration through Nalgene™ Glass Pre-filters (Thermo Scientific, Waltham, MA) and 0.45 micron SFCA filtration Nalgene (Thermo Scientific, Waltham, MA), then stored frozen at -20 ° C before purification.

[00344] ELISA to measure A244ucsc -rgyl20 production. An indirect capture ELISA was carried out as follows: 96-well Nunc MaxiSorb flat bottom plates (Thermo Fisher Scientific, Waltham, MA) were coated with 2 μg/ml of anti-gD flag antibody 34.1 in PBS. After blocking for 1 hr with 5% milk/PBS, recombinant protein from tissue culture supernatant was captured by overnight incubation at 4 ° C. The plates were washed four times with 0.05% Tween/PBS, and bound protein was detected using antigen specific anti- CRF01AE/ MN rabbit polyclonal antibody (PB94) or the bNAb PG9 followed by either goat anti-rabbit or goat anti-human H and L chain affinity purified secondary Horse Radish Peroxidase (HRP) conjugated antibodies at a 1/5000 dilution in 5% milk/PBS, as appropriate (Jackson ImmunoResearch Laboratories, West Grove, PA). Control standards included three-fold serial dilutions of purified recombinant r-gD- gpl20 proteins starting at 10 μg/ml. HRP was detected using o-Phenylenediamine

dihydrochloride substrate (Thermo Fisher Scientific, Waltham, MA) following the

manufacturer's instructions. Assays were stopped after 10 min development with 3 M H2SO4, and read on a microtiter plate reader at a wavelength of 490 nm. Protein yield was quantified by serial dilution and interpolation from a standard curve prepared by serial dilution of purified A244ucscrgpl20 HIV-1 produced by transient transfection of MGAT1 cells, and assayed at the same time. [00345] ELISA to measure the binding of bNAbs. A direct ELISA format was used to measure the binding of monoclonal antibodies to A244ucsc-rg l20 Purified protein was carried out on 96-well Nunc MaxiSorb flat bottom plates (Thermo Fisher Scientific, Waltham, MA) coated with 2 μ^ητΐ PBS of A244ucscrgpl20 HIV-1 or A244 G NE rgpl20 protein overnight. Plates were blocked with 5% milk/PBS for 1 hr, washed four times with 0.05% Tween/PBS, and bNAbs three-fold serially diluted in blocking buffer, added for 1 hr. The plates were washed four times with 0.05% Tween/PBS, and specific bNAb binding detected using a goat anti-human L and H chain HRP conjugated secondary antibody as previously described. Data was plotted and analyzed using Prism version 6.00 for Mac (GraphPad Software, La Jolla, CA,

www.graphpad.com).

[00346] Western blot to detect antibody binding to gp 120 produced by different clones.

Growth conditioned cell culture supernatants (1-10 ul) or 50 ng of purified proteins were aliquoted and treated with SDS-PAGE sample buffer with and/or without reduction by dithiothreitol (DTT). The specimens were fractionated on a 4-12% NuPage PAGE SDS gel in MES buffer (Thermo Scientific, Waltham, MA). Protein was transferred to a PDVF membrane using the iBlot 2® Dry Blotting System (Thermo Fisher, Life Technologies, Carlsbad, CA). The membrane was blocked for 1 hr in 5% milk/PBS, then probed with polyclonal rabbit anti- A244/MNGNE antibody at 1 μg/ml overnight at 4°C, washed three times for 10 min with each wash using 100 ml of 0.05%Tween/PBS, then probed with an affinity purified secondary HRP conjugated goat anti-rabbit H+L chain antibody (Jackson ImmunoResearch Laboratories, West Grove, PA, ImmunoResearch, West Grove, PA) for 1 hr at room temperature. After a final (X3) wash with 0.05%Tween PBS the membrane was developed using WesternBright ECL kit (Advanta, Menlo Park, CA) and visualized using an Innotech FluoChem2 system (Genetic Technologies, Grover, MO).

[00347] Immunoaffinity purification of ' A244 ^c rgpl20. The A244ucsc -rgpl20 proteins from individual clones were immunoaffinity purified using the gD purification tag as described previously (Lasky, L.A., et al., Science, 1986. 233(4760): p. 209-12; Smith, D.H., et al., PLoS One, 2010. 5(8): p. el2076). Briefly, 5 ml of cell culture medium was applied to an anti-gD flag monoclonal antibody coupled controlled poured glass column. The column was washed with 10 column volumes of 50 mM Tris, 0.5 M NaCl, 0.1 M TMAC (tetramethylammonium chloride) buffer (pH 7.4), and eluted with 0.1 M sodium acetate buffer, pH 3.0. The pH of the buffer was neutralized by the addition of 1.0 M Tris (1: 10 ratio) and the resulting solution was concentrated using an AMICON molecular weight cutoff centrifuge tube (Millipore, Billericia, MA). The purified protein was adjusted to a final concentration of 1-2 mg/mL in PBS buffer. Protein concentrations were determined using the bicinchoninic acid assay (BCA) method.

[00348] bNAb binding to gpl20s. The binding of bNAbs to gp 120 proteins to was assayed using a capture Fluorescence Immunoassay (FIA) assay. Briefly, 2 μg /mL of anti-gD tag monoclonal antibody, 34.1, was diluted into PBS and incubated at 4°C overnight in 96 well black- microtiter plates (Greiner, Bio-One, USA). Plates were blocked in PBS containing 1% BSA+0.05% normal goat serum in 0.01% thimerosal for two hours at room temperature. Wells were incubated with 60uL of blocking solution containing 6ug/mL of purified rgpl20 overnight at 4°C. Three-fold serial dilutions of primary antibody were added starting at lOug/mL, followed by incubation with a 1:3,000 dilution of goat- anti-human or donkey-anti-goat AlexaFluor 488 conjugated polyclonal (Jackson ImmunoResearch Laboratories, West Grove, PA, Life

Technologies, Carlsbad, CA). All dilutions were performed in solution of PBS containing 1% BSA with 0.05% normal goat serum and 0.01% thimerosal, and incubations were carried out for 90 min at room temperature followed by a 4x wash in PBST buffer unless otherwise noted. Absorbance was read using an En Vision Multilabel Plate Reader (PerkinElmer, Inc Waltham, MA) with a FITC 353 emission filter and a FITC 485 excitation filter. Each assay was performed in duplicate and results were reported as half maximal effective concentration, (EC50), or the concentration of antibody required for half of the maximal binding readout. Polyclonal goat sera against the full-length gpl20 and human isotype control were used as coating and negative controls, respectively.

Results

[00349] Colony selection. The timeline for production of clones expressing

A244 U c SC rgpl20/HIV-l is shown in Figure 19. A total of 8 x 10 7 Mgatl cells were transfected with the expression plasmid UCSC1331 (UCSC_CHO.A244N332) by electroporation.

Transfection of CHO-S cells using the MaxCyte electroporation system is highly efficient (> 88% expression of GFP by FACS at 48 hr) (Figure 20). Just six days after setting up Mgatl / UCSC1331 electroporated cells with gpl20 specific, immune- affinity purified, Alexa 488 labeled polyclonal antibody, precipitin halos were visible under fluorescent light around a small percentage of colonies in each 6-well plate (Figure 21). After 16 days in selective media, cells transiently expressing protein had died or were dying off, and thriving colonies of cells expressing antibiotic resistance are clearly visible by white light (Figures 22A-22E). 45,000 colonies from four 6-well plates were screened using the ClonePix 2, and of these,

approximately 0.1% were picked and transferred into 96 well plates. [00350] Forty-three of the selected colonies grew and actively secreted A244rg l20 HIV- 1 highest mean external fluorescent intensity and final clone selection did not completely correlate. Only fifteen out of the forty-three positive clones selected secreted a protein that bound both polyclonal anti-gD gpl20 (PB94) and the bNAb PG9 in ELISA. In general, with the exception of a single clone (5C), clones with the highest level of mean external intensity at pick did not bind PG9, and did not survive transfer from 96-well plate to 2 ml wells. After 31 days in culture, 14/15 of PG9/PB94 positive clones were secreting A244 U c SC rgpl20 HIV-1 was confirmed by western blot (Figures 23A and 23B) with an antigen specific polyclonal serum. Individual A244 U cs C rgpl20 HIV-1 clones had slightly different growth characteristics, but some had a particular tendency to form large clumps in suspension. Clones were cryo-preserved and the most promising ones carried forward.

[00351] Batch fed culture expression. Two months after the initial transfection, six Mgatl A244_N332-rgpl20 clones selected for optimal protein expression were assayed for protein production. Each clone expanded to 600 ml fed batch culture with a 1 x 10 7 cells/ ml seed. Flasks were cultured in the presence of lmM sodium butyrate at 32°C, 135 rpm with 8% CO2 and 85% humidity until the viability dropped below 50%. Protein accumulation was detectable in daily 10 μΐ samples of cell supernatant by SDS/PAGE (Figures 24A and 24B). By day 5, recombinant gpl20 was the principle protein in the tissue culture supernatant. Protein production by indirect ELISA of supernatant, and raw three-fold dilution data for the six clones demonstrated rapid protein accumulation (Figures 25A-25F). Clone 5C only survived 3 days, clone 5F, 5 days. All of the other clones were stable for 10-11 days in culture with daily feeding.

[00352] Protein recovery and bNAb binding. The A244 ucsc-rgpl20 proteins from different clones were immunoaffinity purified using the gD purification tag as described. A western blot using polyclonal anti-gpl20 sera determined that there was minimal proteolysis or aggregation of the affinity purified proteins Figure 26 A and 26B. At least three clones produced at more than 200 mg/L of affinity purified protein (clones 3E, 3D, 5F). The protein produced by individual clones was assayed by binding to glycan dependent- and glycan independent- bNAbs (Figure 27A-27H). There was little or no difference in bNAb binding by gpl20s recovered from different Mgatl- A244ucsc-rgpl20 clones, or protein isolated following transient protein production. The proteins all behaved in a similar manner by ELISA, all bound to the bNAbs: PG9, PGT128, VRCOl and CD4-IgG, but not to PG16. In additional experiments, (Figure 28A- 28J) the antigenicity of A244_N332-rgpl20-rgpl20 produced in the Mgatl- cell line was compared to A244.rgpl20 produced in normal DG44 CHO cells and used in the RV144 clinical trial. These studies showed markedly enhanced binding of glycan dependent bNAbs (PG9, PGT128, CHOI, PGT126, CH03 and 10-107410-1074) to A244_N332-rg l20 expressed in Mgatl " CHO cells compared A244-rgpl20 to expressed in normal DG44 CHO cells. Neither gpl20 was able to bind the glycan dependent antibodies PGT121, and PGT122. Surprisingly the protein produced in Mgatl- cells also exhibited enhanced binding of VRCOl, an antibody that recognizes a glycan independent epitope that overlaps the CD4 binding site. Thus the incorporation of smaller high mannose structures appear to enhance the binding of antibodies to glycan dependent epitopes, perhaps by minimizing steric hindrance.

[00353] Cr opreservation of cells and pathogen testing. A master cell bank of cryopreserved cells was created from the 5F clone that secreted the highest levels of A244- N332-rgpl20. Vials containing lxlO 7 cells were transferred to the ATCC for archival storage and distribution. Cells from this bank were also transferred to the IDEXX commercial cell line testing facility in Columbia, Missouri. These were tested for contamination by other cell lines (e.g. HeLA and 293), mycoplasma, and a large panel of human and animal viruses such as minute virus of mice (MVM). The results of these assays are provided in Berman Lab Technical Report TR-01-17.

[00354] Preliminary data indicates that clone 5F is stable for at least 90 as clones were cells were still expressing >200mg/L protein as measured by ELISA/FIA assay.

Discussion

[00355] This report describes the development of an improved method for the construction of stable CHO cell lines producing an improved variant of recombinant gpl20 for use as a candidate HIV vaccine immunogen. The improved method of stable cell lines depended the development of methods, reagent, and procedures allow selection of rare high producing cell lines by robotic selection using the ClonePix2 robot (Molecular Devices, Sunnyvale, CA). This protocol and the MaxCyte electroporation device allow the screening of at least 45,000 transfected CHO cells in a single day - a task that would take many months if cell lines were picked by conventional approaches such as of manual selection. A major unexpected finding from these experiments was that it was not necessary to employ standard methods of gene amplification based on co-expression of dihihydrofolate (dhfr) or glutamine synthetase (GS) transgenes. The elimination of this approach further saves months if not years of time in the identification of a high producing cell line. The results suggest that the disclosed screening method involving ClonePix 2 can identify extremely rare high producing cell lines with protein yields in excess of 200 mg/L. These yields are comparable to those in cells selected using conventional techniques that can be performed in a fraction of the time (i.e. 2-3 months compared to 12-24 months). [00356] Besides improving protein yield, another major goal of this project was to improve the antigenic structure of the A244-rgpl20 protein thought to be the principal immunogen responsible for protection in the RV144 clinical trials (Rerks-Ngarm, S., et al., N Engl J Med, 2009. 361(23): p. 2209-20). The A244ucsc rgpl20 described in this report appears to represent an improved form of the original immunogen. Several studies have shown that the type and location of N-linked glycosylation sites are major determinants of antigenic structure and the binding of bNAbs. The A244ucsc rgpl20 produced in the Mgatl " cell line is the first gpl20 produced under conditions suitable for biopharmaceutical production able to take advantage of both aspects of envelope structure. Although A244-rgpl20 is unusual in its ability to bind several bNAbs such as PG9, PGT128, and 10-1074, the binding to these sites is enhanced by production in the Mgatl cell line that restricts glycosylation primarily to mannose - 5 structures. The fact that glycans are not completely limited to mannose -5 and approximately 20% of the glycans are mannose-9 is an unexpected benefit in that mannose -9 is preferred by PGT128. A further enhancement in binding is attributable to relocating the predicted N-linked glycosylation site at N334 in the wild-type A244-rgpl20 protein to N332 in the A244_N332- rgpl protein. The N322 glycan has been reported to be essential for a number of bNAbs (Walker, L.M., et al., Science, 2009. 326(5950): p. 285-9; Shingai, M., et al., Nature, 2013. 503(7475): p. 277-280).

[00357] Finally, an unanticipated benefit of gpl20 production in the Mgatl " cell line is that the protein is far more homogenous in glycan content and net charge compared to gpl20s produced in normal cell lines. Historically, controlling glycosylation is difficult in commercial manufacturing, and different fermentations often yield proteins with different glycan content. Differences in glycan content can affect recovery yields and affect pharmacokinetic half-life, biodistribution, and product immunogenicity (Sinclair, A.M. and S. Elliott, J Pharm Sci, 2005. 94(8): p. 1626-35; Sola, R.J. and K. Griebenow, BioDrugs, 2010. 24(1): p. 9-21). Variation in any one of these properties can alter the potency and biologic efficacy of protein

biopharmaceuticals and regulatory approval. The improvement in glycan homogeneity in the A244-N332-rgpl20 produced in the Mgatl- cell line allows for a simpler, more productive purification process provides for improved manufacturing reproducibility and more consistent biologic activity. It is anticipated that these improvements in the location and structure of N- linked glycosylation sites will enhance the efficacy of the gpl20 vaccine used in the RV144 trial from a level of -31% to a vaccine efficacy of 50% or greater thought to be required for regulatory approval and clinical deployment. [00358] Figure 19. Diagram of method for rapid production of cell lines expressing recombinant gpl20.

[00359] Figure 20. GFP expression after MaxCyte STX electroporation of CHO-S cells. At 48 hr, 88.7% of live cells were expressing GFP. Gate PI, all cells, Gate P2 live cells, Gate P3, live and expressing GFP.

[00360] Figure 21. White and fluorescent images from a single well of

UCSC_CHO.A244N332 transfected cells on the ClonePix 2. Images were captured 6 days (16- 32 cells per colony) after plating in semi solid selective matrix, in the presence of Alexa488 labeled affinity-purified polyclonal antibody. Antigen specific precipitin rings (or halos) are visible around a proportion of colonies under selection G418 selection.

[00361] Figures 22A-22E. ClonePix 2 Clone images at Day 16. Figure 22A Day 16, a single 35mm well of UCSC_CHO.A244N332 transfected colonies illuminated by white light alone; Figure 22B the same well as in Figure 22A but FITC imaged: Figure 22C the superimposition of white and FITC images reveals the "halo' area outside the colony where secreted antigen interacts with FITC labeled antibody in the matrix. Mean fluorescence intensity is calculated from these images by the ClonePix 2. Figure 22D Six colonies picked on Day 16 and expanded. Expression was tested at Day 24, 31 Day 56 and Day 90. Top row colonies visualized with white light, bottom row, with FITC. Figure 22E Clone 5F recloned (from early passage cryopreserved cells) at 25 cells/ ml. Left panel white light, right panel FITC.

[00362] Figures 23A-23B. Expression of proteins in 2ml wells (Day 31). Figure 23A Western blot of tissue culture supernatant from 2 ml wells (not controlled for cell density or viability). 10 μΐ of supernatant, 4-5 days growth (<5E+05 cells /ml), reduced in DTT and electrophoresed on a 4-12% SDS/PAGE gel was transferred to a PVDF membrane and probed with an antigen specific polyclonal rabbit serum. Bound antibody was detected with a goat anti- rabbit HRP conjugate. Size markers for rgpl20 A244GNE produced from transient transfection of CHO-S cells (682) and transient A244GNE expression (lot 767) are included as size markers. Figure 23B Indirect ELISA quantification of rgpl20 A244N332. Supernatants were captured by anti-gD (34.1 A64 2 jig/ml). Bound antigen was detected with 1 g/ml polyclonal rabbit sera followed by a goat anti-rabbit HRP at a 1/5000 dilution. Protein concentration was determined by serial dilution of cell supernatant then interpolation from a standard curve using GraphPad Prism version 6.00 for Mac, GraphPad Software, La Jolla California USA.

[00363] Figures 24A and 24B. Batch Fed Culture Expression of Clone 5F: accumulation of rgpl20 during 600ml protein expression trial culture. Figure 24A. 10 μΐ DTT reduced tissue culture supernatant (days 0-5) loaded per lane of a 4-12% Bis-Tris/MED buffer SDS/PAGE gel stained with Coomassie blue. Figure 24B. 1 μΐ DTT reduced tissue culture supernatant (days 0- 5) loaded per lane of a 4-12% Bis-Tris/MES buffer SDS/PAGE gel western blotted with an antigen specific polyclonal rabbit serum. Bound antibody was detected with a goat anti-rabbit HRP conjugate. 100 ng of DTT treated purified MGAT CHO-S gDA244 N332 is included as a control on each gel.

[00364] Figures 25A-25F. Batch Fed Culture Expression. Indirect ELISA showing raw dilution data of tissue culture supernatant collected during the course of a 600 ml batch fed protein expression assay. Wells were coated with 34.1 (A64) at 2 μg/ml for indirect capture of serial dilutions of supernatant containing gpl20. Bound protein was detected using an antigen specific polyclonal rabbit serum (PB94) and an anti-rabbit HRP conjugate (Jackson

ImmunoResearch West Grove PA).

[00365] Figures 26A and 26B. Protein yield. Figure 26A Yield from 600 ml batch fed cultures pre and post purification by immuno- affinity capture. Pre-purification yield was determined by indirect ELISA (anti-gD, 34.1 A64, 2 ^ig/ml) capture followed by detection of bound antigen by polyclonal rabbit anti-gpl20 (PB94) and goat anti-rabbit HRP. Figure 26B Western blot of protein purified by affinity chromatography from 600 ml batch fed cultures. 50 ng of each non or DTT reduced protein, was loaded per lane (with the exception of 3E) of a 4- 12% PAGE/SDS MES buffer gel. Protein 692, A244rgp 120 produced in DG44 cells, and protein lot 767, a transiently produced A244ucsc, were included as controls. Recombinant gpl20 was detected using an antigen specific polyclonal rabbit serum (PB94) and an anti-rabbit HRP conjugate.

[00366] Figure 27A-27H. Direct binding of purified MGAT gl20 HIV-1 proteins to bNAbs. Nunc Immulon 96 well plates were coated with 2 μg/ml of affinity purified protein from the stable lines (UCSC protein batches 782-787) and a transiently produced protein (767) overnight in PBS. After blocking for 1 hr at room temperature, 3-fold serial dilutions of antibody were made in 5% milk /PBS and incubated directly with the protein coated wells for 1 hr at room temperature. Bound antibody was detected by incubation with a 1/5000 dilution of rabbit anti-human HRP conjugate (Jackson Immuno, WestGrove, PA) and developed with o- phenyldiamine-dichoride (OPD) Thermo-Fisher Waltham MA) according to the manufacturers protocol. The reaction was stopped after 10 minutes using H2SO4 and the plates read on a Maxisorb plate reader at 490nm.

[00367] Figures 28A-28J. Comparison of bNAb binding to CHO A244 G NE-rgpl20 produced in normal CHO cells and used in the RV144 trial, and improved A244-N332-rgpl20 produced in Mgatl " cells. Recombinant gpl20s were captured onto the surface of microtiter plates coated with a monoclonal antibody (34.1) to the gD purification tag present at the N- terminus of both proteins. Wells were incubated with an Alexa 488-labeled Three-fold serial dilutions of primary antibody were added starting at lOug/mL, followed by incubation with a 1:3,000 dilution of goat-anti-human or donkey-anti-goat AlexaFluor 488 conjugated polyclonal (Jackson ImmunoResearch Laboratories, West Grove, PA, Life Technologies, Carlsbad, CA).

Example 5

Purification of Recombinant gpl20 Produced in an Mgatl " Cell Line

[00368] It is disclosed herein that recombinant gpl20 (A244-N332_rgpl20) produced in Mgatl " cells, incorporating primarily the mannose-5 glycans, is highly homogeneous in net charge and can be purified by conventional, cost-effective, ion-exchange and size exclusion column chromatography. It is known that gpl20 expressed in normal CHO cells incorporated highly heterogeneous, sialic acid containing glycans and cannot be efficiently purified using conventional column chromatography. The variation in sialic acid content in gpl20 produced in normal CHO cell lines resulted in heterogeneity in net change and other biophysical properties that prevented efficient purification by standard methods without experiencing a substantial loss of in yield (e.g. 30-60%). As a consequence most commercial scale recovery processes, designed to purify gpl20, involved the use of expensive affinity resins prepared from

monoclonal antibodies or lectins (e.g. GNA) to recover the gpl20 containing complex glycosylation. This affinity purification step added considerable time and expense related to the production need to manufacture antibodies and lectins by processes compliant with current Good Manufacturing Practices (cGMP). It is disclosed herein that conventional methods of protein purification, suitable for biopharmaceutical manufacturing, can be used to efficiently purify A244-N332-rgpl20 and results in a final product with high yields (>90%) yields and high product purity.

[00369] Historically, the development of HIV envelope proteins (e.g. gpl20 and gpl40) for use as vaccines has been limited by the fact that they are poorly expressed in conventional mammalian cell culture expression systems. Thus many investigators have reported expression levels in the 2-20 mg/L range whereas yields for other recombinant proteins often exceed 50 mg/L and often, as in the case of antibodies, can be produced in the 0.5 to 5 g/L range. Moreover recombinant gpl20s are difficult to purify due to the fact that they are highly heterogeneous due to the presence of approximately 26 N-linked glycosylation sites (Leonard, C.K., et al., J Biol Chem, 1990. 265(18): p. 10373-82). Many of these contain anywhere from one to four residues of sialic acid, leading to unusually large variation in net charge. When expressed in normal mammalian cell lines (e.g. CHO or 293HEK), as many as 40 different glycan structures described for a single site (Go, E.P., et al., Journal of proteome research, 2013. 12(3): p. 1223- 1234). The heterogeneity in glycosylation results in considerable heterogeneity in net charge (Figure 29) with 20-40 discrete bands typically visible on 2-dimensional isoelectric focusing gels (Yu, B., et al., PLoS One, 2012. 7(8): p. e43903). A consequence of the heterogeneity in net charge is it has been difficult to purify recombinant HIV glycoproteins by standard, cost- effective, chromatographic methods that can be used for biopharmaceutical production. To circumvent the dual problems of low yields and high heterogeneity in molecular mass and net charge, most approaches to purify recombinant HIV envelope proteins make use of an affinity chromatography step that makes use of monoclonal antibody (Yu, B., et al., PLoS One, 2012. 7(8): p. e43903; Lasky, L.A., et al., Science, 1986. 233(4760): p. 209-12) or a lectin (Srivastava, I.K., et al., J Virol, 2002. 76(6): p. 2835-47; Sellhorn, G., et al., Journal of virology, 2012. 86(1): p. 128-142; Arthos, J., et al., Nat Immunol, 2008. 9(3): p. 301-9). Either type of affinity column adds additional steps to the purification process and requires expensive custom reagents that must be produced and tested under validated current Good Manufacturing Practices (cGMPs). For example the preparation of an antibody or lectin affinity columns for the large scale (2,000- 10,000L) production of gpl20 can easily cost hundreds of thousands to millions of dollars and requires extensive quality control and validation to define its ligand binding capacity, cleaning and elution procedures, antibody leaching into the final product, and the number of times it can be used before it needs to be replaced. Moreover, the proteins recovered from the affinity purification step are still heterogeneous with respect to glycosylation and need additional purification (polishing) and virus inactivation steps by standard chromatographic methods such as ion-exchange chromatography (IEX), size exclusion chromatography (SEC), and tangential flow filtration (TFF) before they can be vialed and used as a vaccine. As a consequence of this multi-step process, there is typically considerable loss of material, often 30-50%. Additionally, the heterogeneous glycosylation in the conventionally purified proteins results in heterogeneity at critical epitopes recognized by glycan dependent monoclonal antibodies. Many of the most potent and broadly neutralizing antibodies to HIV-1 (e.g. PG9, PGT128, PGT121, and 10-1074) recognize glycan dependent epitopes. Uniformity in epitopes recognized by bNAbs may be a key factor in defining vaccine potency and efficacy. Production of vaccines in the Mgatl cell line described in this report is currently the only scalable method to produce recombinant envelope proteins that primarily contain mannose-5 glycans required for the binding on multiple bNAbs. Materials and Methods

[00370] Growth conditioned cell culture medium containing A244 N332-rgpl20. The stable 5F clone of the Mgatl " CHO cell line transfected with the gene encoding A244-N332- rg l20 was grown in a 1.6 L shake flask in serum free CD-OptiCHO growth medium (Gibco, Thermofisher) at 37°C. After achieving a density of 1 x 10 7 cells/mL, sodium butyrate was added (1 mM) and the temperature was shifted to 32°C. Once cell viability dropped to 50% (day 5). The growth conditioned cell culture was harvested by centrifugation and vacuum filtered through a 0.45 um SCFS membrane and stored frozen at -20 °C.

[00371] Purification of gpl20 by column chromatography. After thawing, the gpl20 was recovered by column chromatography according to the process described in Figure 30.

[00372] Purification by affinity chromatography. After thawing, the gpl20 was recovered by column chromatography according to the process described in Figure 32.

[00373] Carbohydrate content. After purification the carbohydrate content of

A244_N332-rgpl20 was determined by MALDI-TOF mass spectroscopy by Dr. Parastoo Azadi of the Complex Carbohydrate Research Center (university of Georgia, Athens, GA).

Results

[00374] Comparison of A244 N332-rgpl20 purified by immunoaffinity chromatography and by conventional ion exchange chromatography. Experiments were carried out to determine whether A244_N332-rgpl20 produced in the Mgatl " cell line could be purified by a practical, high yielding recovery process suitable for biopharmaceutical production. These experiments involved screening different chromatography resins and different conditions for adsorption and elution (data not shown). The recovery process described in Figure 30 represents the final method developed in this study. When analyzed by SDS-PAGE the resulting gpl20 (Figure 31) possessed physical properties closely resembling A244-N322-rgpl20 protein purified by a process requiring immunoaffinity chromatography developed at Genentech in the early 1990s (Figure 32). This immunoaffinity process (Figure 32) was similar to that used in the large scale production of HIV vaccine for multiple clinical trials including the 16,000 person RV144 trial (Rerks-Ngarm, S., et al., N Engl J Med, 2009. 361(23): p. 2209-20).

[00375] To compare the efficiency of purification by both processes side by side purifications were carried out with the same staring material. The results of this study (Figure 33) showed that both recovery processes resulted in protein of comparable purity with yields of approximately 90%. Thus comparable results could be obtained by both processes, however the conventional process is more economical to run and doesn't involve the use of custom made monoclonal antibodies. Another potential advantage of the conventional process is that eliminate the low pH elution step required for the affinity process. Although there is no direct evidence from these studies that the low pH step harms protein structure, low pH treatment often results in conformational changes that lower the potency of treated proteins and hence are usually avoided.

[00376] Carbohydrate analysis. Finally, the glycosylation on the A244-4gpl20 protein was characterized by mass spectrometry and compared to the glycosylation present of two gpl20 proteins (TV1 and 1086) currently being tested in clinical trials in Africa. It can be seen that the N-linked glycosylation present on the gpl20 made in the Mgatl " cell line is

predominately mannose 5, with small amounts of mannose-8 and mannose-9, whereas the glycans present on the gpl20s made in normal CHO cells consist of a broad spectrum of high mannose and sialic acid containing glycans.

[00377] In summary, these results confirm that recombinant HIV-1 envelope proteins (e.g. A244-rgpl20N332) produced in the Mgatl- cell line are homogeneous and can be purified by conventional column chromatography without significant loss of material during recovery.

[00378] Diagram of gpl20 from the IIIB strain of HIV-1 showing the location of N-linked glycosylation sites is published in Leonard et al 1990 (Leonard, C.K., et al., Assignment of intrachain disulfide bonds and characterization of potential glycosylation sites of the type 1 recombinant human immunodeficiency virus envelope glycoprotein (gpl20) expressed in Chinese hamster ovary cells. J Biol Chem, 1990. 265(18): p. 10373-82).

[00379] Figures 29A-29F. Data from 2-dimensional isoelectric focusing gel analysis of MN-rgpl20 produced in CHO and 293 HEK cells. Data shows showing the heterogeneity of net charge of in proteins purified by immuno-affinity chromatography (Figures 29A and 29D). The sensitivity of gpl20s to digestion of the glycosidases, neuraminidase (Figures 29B and 29E) and endoglycosidase H (Figures 29C and 29F) was also measured. Digestion with neuraminidase, specific for sialic acid, shows that much of the heterogeneity in isoelectric point and net charge can be attributed to the incorporation of sialic acid. Digestion with Endo H shows that glycans lacking sialic acid are present in the two gpl20 preparations and account for more heterogeneity in CHO cells than 293 cells. Data taken from Yu, et al 2012 (Yu, B., et al., Glycoform and Net Charge Heterogeneity in gpl20 Immunogens Used in HIV Vaccine Trials. PLoS One, 2012. 7(8): p. e43903).

[00380] Figure 30. Purification of A244_N332-rgpl20 by column chromatography.

[00381] Figure 31. Comparison of A244_N332-rgpl20 recovered by an immunoaffinity recovery process dependent of the 5B6 monoclonal antibody and column chromatography (Desalting-IEXHP-SEC) recovery process. Data shows the proteins in fraction from the size exclusion step common to both recovery processes. Pluses (+) and minuses (-) indicate the presence or absence of the reducing agent dithiothreitol (DTT).

[00382] Figure 32. Purification of A244_N332-rgpl20 by immunoaffinity

chromatography and size exclusion chromatography.

[00383] Figure 33. Comparison of the recovered yields of A244_N332-rgpl20 obtained from the recovery process containing an immunoaffinity step and the recovery process depending only on column chromatography. AUC indicates area under the curve. BCA indicates data from modified Bradford assay to measure protein concentration.

[00384] Mass spectroscopy analysis of glycans present in A244_N332-rgpl20 recovered from the stable Mgatl- CHO cell line expressing A244_N332-rgpl20 and gpl20s from the TV1.C and 1086.C strains of HIV1 produced in normal CHO cell lines shows that the glycosylation was 99.47% high mannose. Data on A244_N332-rgpl20 was kindly provided by Dr. Parastoo Asadi (Complex Carbohydrate Research Center, University of Georgia, Athens, GA). Data showing the glycan analysis of the TVl and 1086 gpl20 protein was taken from Wang et al. (Wang, Z., et al., Vaccines, 2016. 4(2): p. 17).

[00385] Although preferred embodiments of the subject invention have been described in some detail, it is understood that obvious variations can be made without departing from the spirit and the scope of the invention as defined herein.