Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
AFFIPHORES, THEIR ISOLATION AND USE
Document Type and Number:
WIPO Patent Application WO/2007/051623
Kind Code:
A1
Abstract:
The present invention concerns novel circularly permuted green fluorescent proteins (GFP and variants thereof), termed Affiphores, with the ability of specifically binding to target compounds and which retain their ability to fluoresce, polynucleotides encoding these, kits or libraries comprising these and their use for analytics and the diagnosis of diseases and conditions.

Inventors:
PASCHKE MATTHIAS (DE)
Application Number:
PCT/EP2006/010526
Publication Date:
May 10, 2007
Filing Date:
November 02, 2006
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CHARITE UNIVERSITAETSMEDIZIN (DE)
PASCHKE MATTHIAS (DE)
International Classes:
C07K14/435; C07K16/18; C12N5/10; C12N15/09; G01N33/48
Other References:
BAIRD G S ET AL: "Circular permutation and receptor insertion within green fluorescent proteins", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF USA, NATIONAL ACADEMY OF SCIENCE, WASHINGTON, DC, US, vol. 96, no. 20, 28 September 1999 (1999-09-28), pages 11241 - 11246, XP002187230, ISSN: 0027-8424
TOPELL S ET AL: "Circularly permuted variants of the green fluorescent protein", FEBS LETTERS, ELSEVIER, AMSTERDAM, NL, vol. 457, no. 2, 27 August 1999 (1999-08-27), pages 283 - 289, XP004260167, ISSN: 0014-5793
YANG F ET AL: "The molecular structure of green fluorescent protein", NATURE BIOTECHNOLOGY, NATURE PUBLISHING, US, vol. 14, no. 10, October 1996 (1996-10-01), pages 1246 - 1251, XP002141412, ISSN: 1087-0156
NAGAI T ET AL: "Circularly permuted green fluorescent proteins engineered to sense Ca2+.", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA. 13 MAR 2001, vol. 98, no. 6, 13 March 2001 (2001-03-13), pages 3197 - 3202, XP002368359, ISSN: 0027-8424
ZEYTUN AHMET ET AL: "Fluorobodies combine GFP fluorescence with the binding characteristics of antibodies.", NATURE BIOTECHNOLOGY. DEC 2003, vol. 21, no. 12, December 2003 (2003-12-01), pages 1473 - 1479, XP002368360, ISSN: 1087-0156
ZEYTUN AHMET ET AL: "Retraction. Fluorobodies combine GFP fluorescence with the binding characteristics of antibodies.", NATURE BIOTECHNOLOGY. MAY 2004, vol. 22, no. 5, May 2004 (2004-05-01), pages 601, XP002368361, ISSN: 1087-0156
AKEMANN WALTHER ET AL: "Functional characterization of permuted enhanced green fluorescent proteins comprising varying linker peptides", PHOTOCHEMISTRY AND PHOTOBIOLOGY, vol. 74, no. 2, August 2001 (2001-08-01), pages 356 - 363, XP002368362, ISSN: 0031-8655
KAWAI YASUTOSHI ET AL: "Single color fluorescent indicators of protein phosphorylation for multicolor imaging of intracellular signal flow dynamics.", ANALYTICAL CHEMISTRY. 15 OCT 2004, vol. 76, no. 20, 15 October 2004 (2004-10-15), pages 6144 - 6149, XP002368363, ISSN: 0003-2700
PÉDELACQ JEAN-DENIS ET AL: "Engineering and characterization of a superfolder green fluorescent protein.", NATURE BIOTECHNOLOGY. JAN 2006, vol. 24, no. 1, January 2006 (2006-01-01), pages 79 - 88, XP002368364, ISSN: 1087-0156
Attorney, Agent or Firm:
VOSSIUS, Volker et al. (München, DE)
Download PDF:
Claims:
Claims

1. A circularly permuted green fluorescent protein (GFP), wherein the native N and C termini of said protein are linked by a polypeptide chain of between 4 and 20 amino acids (loop 3), the loop region between β-sheet IX and β-sheet X (loop 2) comprises an insertion of 1 to 5 amino acids and/or between 1 and 11 mutations with respect to the native loop region and the new N and C termini are located in a further loop region of the native GFP, wherein the circularly permuted GFP is capable of specifically binding to a target compound and has fluorescence activity.

2. The protein according to claim 1, wherein the native N terminus comprises deletions of between 1 to 10 amino acids and/or the native C terminus comprises deletions of between 1 and 11 amino acids.

3. The protein according to claim 1 or 2, wherein loop 3 comprises 1 to 3 amino acids π at the junction to the GFP sequence, wherein π stands for an amino acid, which is neither aromatic, hydrophobic nor cysteine.

4. The protein according to claim 3, wherein loop 3 has a length of 10, 12, 14, or 16 amino acids.

5. The protein according to claims 1 to 4, wherein loop 2 comprises an insertion of 1 to 3 amino acids and between 2 and 10 mutations with respect to the native loop region.

6. The protein according to claims 1 to 5, wherein loop 2 comprises two consecutive amino acids σ and η at the C-terminal end of the mutated loop region, wherein σ stands for an amino acid selected from the group Ala, GIy, He, Leu, Met, Ser, Thr and VaI and η stands for an amino acid selected from the group consisting of He, Leu, Met or VaI.

7. The protein according to claims 1 to 6, wherein the loop region between β-sheet VII and β-sheet VIII (loop 1) comprises an insertion of between 1 and 10 amino acids and/or between 2 and 15 mutations with respect to the native loop region.

8. The protein according to claim 7, wherein loop 1 comprises an insertion of between 2 and

4 amino acids and between 3 and 7 mutations with respect to the native loop region.

9. The protein according to claims 1 to 8, wherein the new N and C termini are located in the loop region between β-sheet I and β-sheet II, β-sheet II and β-sheet III, between β- sheet III and α-helix II, β-sheet IV and β-sheet V, β-sheet V and β-sheet VI, β-sheet VI and β-sheet VII, β-sheet VIII and β-sheet IX, in particular in the loop region between β- sheet VIII and β-sheet IX.

10. The protein according to claims 1 to 9, wherein the new N and/or C terminus(i) is(are) deleted by between 1 and 10 amino acids.

1 1. The protein claims 1 to 10, wherein the new N and/or C terminus(i) only comprise(s) native GFP amino acids but for a M residue at the N terminus.

12. The protein according to claims 1 to 11, wherein the fluorescence activity is at least 50% of the activity of native GFP on which the circularly permuted GFP is based.

13. The protein according to claims 1 to 12, wherein the native GFP on which the circularly permuted GFP is based has a sequence according to SEQ ID NO. 1 or is a derivative thereof.

14. The protein according to claim 13, wherein the native GFP comprises one or more stabilizing mutations selected from the group consisting of S30R, N39I, L42I, F64L, F99S, N105S, El I lV, I128S, I128T, Y145F, M153T, K162N, V163A, K166T, I167V, 1171 V, S205T and A206V.

15. The protein according to claim 14, wherein the native GFP comprises the mutations L42I, F64L, F99S, M153T, V163A, and A206V.

16. A protein library comprising at least two proteins according to claims 1 to 15, which differ in at least one amino acid insertion and/or mutation in loop 3, loop 2 and/or loop 1.

17. A polynucleotide selected from the group consisting of:

(a) polynucleotide encoding at least the mature circularly permuted GFP according to

claims 1 to 15;

(b) polynucleotides encoding a derivative of a mature circularly permuted GFP encoded by a polynucleotide of (a), wherein in said derivative 1 to 10 amino acid residues are conservatively substituted compared to the native GFP and wherein this substitutions are in addition to any mutations in the loop region(s) said derivative having fluorescence activity;

(c) polynucleotides the complementary strand of which hybridizes, preferably under stringent conditions to a polynucleotide as defined in any one of (a) to (b) and which code for a circularly permuted GFP having fluorescence activity; or the complementary strand of such a polynucleotide.

18. The polynucleotide of claim 17, which is DNA, RNA, PNA or phosphorothioate.

19. A vector containing the polynucleotide of one of claim 17 or 18.

20. The vector of claim 19, which is an expression vector, a gene targeting vector and/or a gene transfer vector.

21. The vector of claim 19 or 20, wherein the polynucleotide is operatively linked to expression control sequences allowing expression in prokaryotic and/or eukaryotic host cells.

22. The vector of claims 19 to 21, wherein the vector comprises an interaction domain and a protein translocation domain fused to the polynucleotide, which effects that the encoded fusion protein upon expression in a bacteria is translocated in an essentially folded state through the cytoplasmatic membrane.

23. The vector according to claim 22, wherein the translocation domain is a Tat dependent or thylakoid-δpH dependent sequence.

24. A host cell genetically engineered with the polynucleotide of claim 17 or 18 or the vector of any of claims 19 to 23.

25. A process for producing a circularly permuted GFP encoded by the polynucleotide of

claim 17 or 18 comprising culturing the host cell of claim 24 and recovering the circularly permuted GFP encoded by said polynucleotide.

26. A process for producing cells capable of expressing circularly permuted GFP comprising genetically engineering cells in vitro with the vector of any of claims 19 to 23, wherein said circularly permuted GFP is encoded by the polynucleotide of claim 17 or 18.

27. A polypeptide encoded by the nucleic acid molecule of claim 17 or 18 or produced by the method of claim 25.

28. An antibody specifically directed to the polypeptide of claim 17.

29. A primer or pair of primers capable of specifically amplifying a nucleic acid molecule as defined in claims 17 or 18 but not the nucleic acid molecule encoding the native GFP on which the circularly permuted GFP is based.

30. A polynucleotide library comprising at least two polynucleotides of claims 17 and 18, which differ in at least two nucleotides encoding amino acid insertion(s) and/or mutation(s) in loop 3, loop 2 and/or loop 1.

31. The polynucleotide library according to claim 30 generated by inserting random or partially random nucleotide triplet sequence instead of nucleotide triplets encoding specific amino acid insertion(s) and/or mutation(s) in loop 3, loop 2 and/or loop 1.

32. The polynucleotide library of claim 31 , wherein

a) the sequence (NNV) n , (MTC)(NNV) n-1 , (NNV) n-0 (VVS) 0 , or (MTC)(NNV) n-0-1 (VVS) 0 is inserted into loop 2 between nucleotide triplets encoding amino acid 186, 187 or 188 and 195, 196 or 197 according to SEQ ID NO 1 or a derivative thereof, wherein n is 7 to 1 1 and o is 1 to 3 b) the sequence (NNV)p, (NNV)p-q(VVS)q is inserted into loop 3 between nucleotide triplets encoding amino acid 228, 229 or 230 and 3, 4 or 5 according to SEQ ID NO 1 or a derivative thereof, wherein p is 8 to 16 and q is 1 to 3, c) optionally the sequence: (NNV) 01 is inserted into loop 1 between nucleotide triplets

encoding amino acid 153, 154, or 155 and 159, 160, 161 according to SEQ ID NO 1 or a derivative thereof, wherein m is 5 to 11,

33. A vector library comprising a polynucleotide library according to claims 30 to 32.

34. A method for identifying a circularly permuted GFP or variant thereof specifically binding to a target compound comprising the steps of:

c) contacting a polypeptide library according to claims 16, a vector library according to claim 33 or a cell comprising a polynucleotide library according to claims 30 to 32 or a vector library according to claim 33 with a target compound d) selecting a protein, vector or cell showing specific binding to the target compound.

35. A protein, vector or cell identified with a method of claim 34.

36. A kit comprising a vector according to claims 19-23 or 35 or vector library according to claims 33 or 35 and a second vector comprising an interaction domain and a protein translocation domain, which effects that the encoded fusion protein upon expression in a bacteria is translocated in an essentially unfolded state through the cytoplasmatic membrane.

37. A kit according to claim 36, wherein the second vector further comprises a phage coat protein, preferentially selected from the group of M 13 phage coat proteins pill, pVI, pVII, pVIII and pIX.

38. Use of a protein according to claim 35, a vector of claim 35 or a cell according to claim 35 for the diagnosis of a disease or condition characterized by the presence, absence, increase or decrease of the target compound.

Description:

AFFIPHORES, THEIR ISOLATION AND USE

The present invention concerns novel circularly permuted green fluorescent proteins (GFPs) and variants thereof, termed Affiphores, with the ability of specifically binding to target compounds and which retain their ability to fluoresce, polynucleotides encoding these, kits or libraries comprising these and their use for analytics and diagnosis of diseases and conditions.

BACKGROUND OF THE INVENTION Molecular libraries are a rich source for the selection of highly specific binding molecules. Peptide and antibody libraries comprising up to 10 different molecules are most commonly used when selecting peptides and antibodies specifically binding to a given target compound. Large libraries for selection of binding proteins have been produced on the basis of lipocalines, protein-A domains, ankyrines and other protein scaffolds. Some of those binding proteins are smaller, more stable and/or cheaper to produce than antibodies and, thus, appear to have more desirable properties than antibodies. For the detection of all binding molecules whether they are antibody based or based on an alternative protein scaffold the binding molecules have to be labeled in a separate step or secondary detection reagents have to be used. It has therefore been desired to design binding proteins capable of generating an intrinsic signal, e.g. fluorescence, which can be detected directly and without the need for any further reagents. Binding proteins of this type would have wide applicability, in particular in diagnosis and research.

By now a large number of fluorescent proteins are known like, for example the green fluorescent protein (GFP) which comprises an intrinsic fluorescence, i.e. does not need to bind to a fluorophore, and would appear suitable for the production of such binding molecules. To that end Zeytun, A. et al. (2003) Nature Biotechnology 21 :1473-1479 inserted diverse antibody binding loops into 4 of the exposed loops on one side of the green fluorescent protein (GFP) and claimed that these proteins mimicked the natural antibody binding footprint to create robustly binding ligands that combine the advantages of antibodies (high affinity and specificity) with those of GFP (intrinsic fluorescence, high stability, expression and solubility). It was also claimed that these GFP variants termed fluorobodies, have been used effectively in enzyme-linked immunosorbent assays (ELISAs), flow cytometry, immunofluorescence, arrays and gel shift assays and show affinities as high as antibodies. The fluorophores were generated by inserting complementary determining regions (CDRs) of an anti-lysozyme antibody into various loop regions of the GFP. A first CDR was

inserted between β-sheet I and β-sheet II, a second into the loop region between β-sheet IV and β-sheet V, a third into the loop region between β-sheet VIII and IX and a fourth into the loop connecting β-sheet X and XI. The numbering of the secondary structures of GFP, i.e. the β-sheets, α-helices and intervening loops used with respect to the fluorophores is according to the teaching of Yang. F. et al ( 1996) Nat. Biotechnol. 14: 1296- 1151. However, in Zeytun, A. et al. (2004) Nat. Biotechnol. 22:601, the above-cited paper by Zeytun, A. et al. (2003) supra was retracted because the results presented in the paper could not be verified and none of the fluorobodies expressed from frozen bacterial stocks showed any fluorescence. In the retraction it was also revealed that upon repetition of the experiments originally disclosed in Zeytun, A. et al. (2003) supra that only those phage vectors, which comprised randomized libraries in a single loop, i.e. in the loop between β-sheet VIII and IX, showed any fluorescence, i.e. only 30% of the clones of the library had a fluorescence of 5% of the native GFP fluorescence with a maximum of 12%. Accordingly, the Zeytun, A. et al. (2003) supra publication although it attempted to provide a GFP protein with antibody like properties did in fact not provide a protein capable of specific binding and which retained fluorescence. In addition it was taught in the accompanying retraction that the introduction of additional sequences into more than one loop, led to loss of fluorescence.

US 6,548,632 contemplates the use of GFP fusion constructs in which a defined peptide or a randomized peptide is inserted at the N- and/or C-terminus of the GFP, however, it is also contemplated to insert a defined or random peptide into one of the loops between β- sheet VI and β-sheet VII, β-sheet VII and β-sheet VIII, β-sheet VIII and β-sheet IX and β- sheet IX and β-sheet X. It was shown that the introduction of a defined peptide sequences into one loop led to a GFP protein which retained fluorescence activity, however, there is no indication that one of the GFPs carrying a single insertion in one loop provides specific binding capacity to the GFP.

Accordingly, there is still a need in the art to provide GFP proteins which have retained autofluorescent properties and which in addition have the ability to specifically bind to a given target compound. Accordingly, there is also a need to provide scaffolds suitable for insertion of defined polypeptide sequence or random polypeptide sequences, which, e.g. can subsequently be selected to identify those GFP proteins having the desired binding specificity.

SUMMARY OF THE INVENTION

The present invention concerns a circularly permuted green fluorescent protein (GFP or variant thereof), wherein the native N and C termini of said protein are linked by a

polypeptide chain of between 4 and 20 amino acids (loop 3), the loop region between β-sheet IX and β-sheet X (loop 2) comprises an insertion of 1 to 5 amino acids and/or between 1 and 11 mutations with respect to the native loop region and the new N and C termini are located in a further loop region of the native GFP, wherein the circularly permuted GFP or variant thereof is capable of specifically binding to a target compound and capable to fluoresce and wherein the secondary structures of the GFP are numbered according to Yang F., et al. (1996) supra.

In a further preferred embodiment of the protein of the present invention the native N terminus comprises deletions of between 1 to 10 amino acids and/or the native C terminus comprises deletions of between 1 and 11 amino acids.

In a preferred embodiment of the protein of the present invention loop 3 comprises 1 to 3 amino acids π at the junction to the GFP sequence, wherein π stands for an amino acid, which is neither aromatic, hydrophobic nor cystein.

In a preferred embodiment of the protein of the present invention loop 3 has a length of 10 to 16 amino acids.

In a preferred embodiment of the protein of the present invention loop 2 comprises an insertion of 1 to 3 amino acids and between 2 and 10 mutations with respect to the native loop region.

In a preferred embodiment of the protein of the present invention loop 2 comprises two consecutive amino acids σ and η at the C-terminal end of the mutated loop region, wherein σ stands for an amino acid, which is a small amino acid and η stands for an amino acid, which is a small hydrophobic amino acid.

In a preferred embodiment of the protein of the present invention the loop region between β-sheet VII and β-sheet VIII (loop 1) comprises an insertion of between 1 and 10 amino acids and/or between 2 and 15 mutations with respect to the native loop region.

In a preferred embodiment of the protein of the present invention loop 1 comprises an insertion of between 2 and 4 amino acids and between 3 and 7 mutations with respect to the native loop region.

In a preferred embodiment of the protein of the present invention the new N and C termini are located in the loop region between β-sheet I and β-sheet II, β-sheet II and β-sheet

III, between β-sheet III and α-helix II, β-sheet IV and β-sheet V, β-sheet V and β-sheet VI, β- sheet VI and β-sheet VII, β-sheet VIII and β-sheet IX, in particular in the loop region between β-sheet VIII and β-sheet IX.

In a preferred embodiment of the protein of the present invention the new N and/or C terminus(i) is(are) deleted by between 1 and 10 amino acids.

In a preferred embodiment of the protein of the present invention the new N and/or C terminus(i) only comprise(s) native GFP amino acids but for a M residue at the N terminus.

In a preferred embodiment of the protein of the present invention the fluorescence activity is at least 50% of the activity of native GFP. In a preferred embodiment of the protein of the present invention the native GFP on which the circularly permuted GFP is based has a sequence according to SEQ ID NO. 1 or is a variant thereof.

In a preferred embodiment of the protein of the present invention the native GFP comprises one or more stabilizing mutations selected from the group consisting of S 3 OR, N39I, L42I, F64L, F99S, N105S, El I lV, I128S, I128T, Y145F, M153T, K162N, V163A, K166T, I167V, I171V, S205T and A206V.

In a preferred embodiment of the protein of the present invention the native GFP comprises the mutations L42I, F64L, F99S, M153T, V163A, and A206V.

In a preferred embodiment of the protein of the present invention the native GFP comprises one or more "colour" or fluorescence shift mutations selected from the group consisting of S65T, Y66H, Y66W, S65G/S72A/T203Y.

In a further preferred embodiment the protein of the present invention comprises one or more addition mutations selected from the group consisting of N105S, I128S, K162N, and K166T. In a further aspect the present invention concerns a protein library comprising at least two proteins of the present invention differing in at least one amino acid insertion and/or mutation in loop 3, loop 2 and/or loop 1.

In a further aspect the present invention concerns a polynucleotide selected from the group consisting of: (a) polynucleotide encoding at least the mature circularly permuted GFP of the present invention;

(b) polynucleotides encoding a derivative of a mature circularly permuted GFP encoded by a polynucleotide of (a), wherein in said derivative 1 to 10 amino acid residues are conservatively substituted compared to the native GFP and wherein this substitutions are in addition to any mutations in the loop region(s) said derivative having fluorescence activity;

(c) polynucleotides the complementary strand of which hybridizes, preferably under stringent conditions to a polynucleotide as defined in any one of (a) to (b) and which code for a circularly permuted GFP having fluorescence activity; or the complementary strand of such a polynucleotide.

In a preferred embodiment the polynucleotide of the present invention is DNA, RNA, PNA or phosphorothioate.

In a further aspect the present invention concerns a vector containing the polynucleotide of the present invention. In a preferred embodiment the vector of the present invention is an expression vector, a gene targeting vector and/or a gene transfer vector.

In a preferred embodiment of the vector of the present invention the polynucleotide is operatively linked to expression control sequences allowing expression in prokaryotic and/or eukaryotic host cells. In a preferred embodiment of the vector of the present invention it comprises an interaction domain and a protein translocation domain fused to the polynucleotide, which effects that the encoded fusion protein upon expression in a bacteria is translocated in a folded or an essentially folded state through the cytoplasmatic membrane.

In a preferred embodiment of the vector of the present invention the translocation domain is a Tat dependent or thylakoid-δpH dependent sequence.

In a further aspect the present invention concerns a host cell genetically engineered with the polynucleotide of the present invention or the vector of the present invention.

In a further aspect the present invention concerns a process for producing a circularly permuted GFP encoded by the polynucleotide of the present invention comprising culturing the host cell of the present invention and recovering the circularly permuted GFP encoded by said polynucleotide.

In a further aspect the present invention concerns a process for producing cells capable of expressing circularly permuted GFP comprising genetically engineering cells in vitro with the vector of the present invention, wherein said circularly permuted GFP is encoded by the polynucleotide of the present invention.

In a further aspect the present invention concerns a polypeptide encoded by the nucleic acid molecule of the present invention or produced by the method of the present invention.

In a further aspect the present invention concerns an antibody specifically directed to the polypeptide of the present invention. In a further aspect the present invention concerns a primer or pair of primers capable of specifically amplifying a nucleic acid molecule of the present invention but not the nucleic acid molecule encoding the native GFP on which the circularly permuted GFP is based.

In a further aspect the present invention concerns a polynucleotide library comprising at least two polynucleotides of the present invention, which differ in at least two nucleotides encoding amino acid insertion(s) and/or mutation(s) in loop 3, loop 2 and/or loop 1.

In a preferred embodiment the polynucleotide library of the present invention is generated by inserting random or partially random nucleotide triplet sequence instead of the nucleotide triplets encoding specific amino acid insertion(s) and/or mutation(s) in loop 3, loop 2 and/or loop 1. In a preferred embodiment of the polynucleotide library of the present invention a) the sequence (NNV) n , (MTC)(NNV) n-1 , (NNV) n-0 (VVS) 0 , or (MTC)(NNV) n-0 . , (VVS) 0 is inserted into loop 2 between nucleotide triplets encoding amino acid 186, 187 or

188 and 195, 196 or 197 according to SEQ ID NO 1 or a derivative thereof, wherein n is 7 to 1 1 and o is 1 to 3 b) the sequence (NNV)p, (NNV)p-q(VVS)q is inserted into loop 3 between nucleotide triplets encoding amino acid 228, 229 or 230 and 3, 4 or 5 according to SEQ ID NO 1 or a derivative thereof, wherein p is 8 to 16 and q is 1 to 3, c) optionally the sequence: (NNV) m is inserted into loop 1 between nucleotide triplets encoding amino acid 153, 154, or 155 and 159, 160, 161 according to SEQ ID NO 1 or a derivative thereof, wherein m is 5 to 11, wherein N has the meaning A, C, G, or T; V has the meaning G, A, or C, M has the meaning A or C and S has the meaning G or C.

In a further aspect the present invention concerns a vector library comprising a polynucleotide library of the present invention. In a further aspect the present invention concerns a method for identifying a circularly permuted GFP or variant thereof specifically binding to a target compound comprising the steps of: a) contacting a polypeptide library of the present invention, a vector library of the present invention or a cell comprising a polynucleotide library of the present invention or a vector library of the present invention with a target compound b) selecting a protein, vector or cell showing specific binding to the target compound.

In a further aspect the present invention concerns a protein, vector or cell identified with a method of the present invention.

In a further aspect the present invention concerns a kit comprising a vector of the present invention or vector library of the present invention and a second vector comprising an interaction domain and a protein translocation domain, which effects that the encoded fusion protein upon expression in bacteria is translocated in an unfolded or essentially unfolded state through the cytoplasmatic membrane.

In a preferred embodiment of the kit of the present invention the second vector further comprises a phage coat protein, preferentially selected from the group of Ml 3 phage coat

proteins pill, pVI, pVII, pVIII and pIX.

A further aspect of the present invention is the use of a protein of the present invention, a vector of the present invention or a cell of the present invention for the diagnosis of a disease or condition characterized by the presence, absence, increase or decrease of the target compound.

DETAILED DESCRIPTION

Before the present invention is described in detail below, it is to be understood that this invention is not limited to the particular methodology, protocols and reagents described herein as these may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention which will be limited only by the appended claims. Unless defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art. Preferably, the terms used herein are defined as described in "A multilingual glossary of biotechnological terms: (IUPAC Recommendations)", Leuenberger, H. G. W, Nagel, B. and Klbl, H. eds. (1995), Helvetica Chimica Acta, CH-4010 Basel, Switzerland).

Throughout this specification and the claims which follow, unless the context requires otherwise, the word "comprise", and variations such as "comprises" and "comprising", will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integer or step.

Several documents are cited throughout the text of this specification. Each of the documents cited herein (including all patents, patent applications, scientific publications, manufacturer's specifications, instructions, etc.), whether supra or infra, are hereby incorporated by reference in their entirety. Nothing herein is to be construed as an admission that the invention is not entitled to antedate such disclosure by virtue of prior invention.

The present inventors have observed that a circularly permuted GFP comprising an artificial loop region between the native N and C termini and further artificial sequence in at least one additional loop in the vicinity of the artificial loop provides an interaction surface capable of specific binding to a target compound without compromising GFP 's autofluorescence activity. In particular the GFP scaffold defined herein allows to generate libraries of GFP with randomized peptide regions created by insertion and/or mutation, which can be screened to identify GFPs capable of specific binding to a given target compound. These modified GFPs have been termed Affiphores.

Thus, the present invention in one aspect provides a circularly permuted green fluorescent protein (GFP), wherein the native N and C termini of said protein are linked by a polypeptide chain of between 4 and 20 amino acids (loop 3), the loop region between β-sheet IX and β-sheet X (loop 2) comprises an insertion of 1 to 5 amino acids and/or between 1 and 15 mutations with respect to the native loop region and the new N and C termini are located in a further loop region of the native GFP, wherein the secondary structures of the GFP are numbered according to Yang, F. et al. (1996) supra.

The term "green fluorescent protein" (GFP) as used throughout this specification refers to GFP from Aequorea and variants thereof; including, but not limited to, GFP, (Chalfie, et al., "Green Fluorescent Protein as a Marker for Gene Expression," Science 263(5148): 802-805 (1994)); enhanced GFP (EGFP; Clontech-Genbank Accession Number U55762)), blue fluorescent protein (BFP; Quantum Biotechnologies, Inc. 1801 de Maisonneuve Blvd. West, 8th Floor, Montreal (Quebec) Canada H3H 1J9; Stauber, R. H. Biotechniques 24(3):462-471 (1998); Heim, R. and Tsien, R. Y. Curr. Biol. 6:178-182 (1996)), and enhanced yellow fluorescent protein (EYFP; Clontech Laboratories, Inc., 1020 East Meadow Circle, Palo Alto, Calif. 94303) or the GFP disclosed in GenBank Acces.: U62636 (the amino acid sequence is depicted in SEQ ID NO. 1 and the encoding nucleic acid sequence is depicted in SEQ ID NO. 2). These GFPs and variants thereof all have a similar, essentially similar or identical tertiary structure, e.g. have the same number of loops, α-helixes and β-sheets, accordingly, the reference to a specific loop, β-sheet or α-helix termed according to Yang, F. et al. (1996) supra makes reference to the corresponding structure in the variant of Aequorea GFP. The respective loop, β-sheet and/or α-helix structures might differ in length and exact steric position within the molecule.

The term "native GFP" or "native GFP polypeptide sequence" as used below refers to the GFP having the amino acid sequence of the GFP from Aequorea and variants thereof, on which the respective Affiphor is based, e.g. prior to the connection of the N and C terminus by introduction of loop 3 and modification of loop 2. Examples are indicated above and include EGFP, BFP, EYFP and a GFP variant having an amino acid sequence according to SEQ ID NO: 1 and encoded by a nucleotide sequence according to SEQ ID NO. 2. The term "native N and C termini" refers to the N and C termini of a GFP protein within above definition prior to them being linked by a polypeptide chain (loop 3). A polypeptide chain linking the native N and C termini preferably has a length of 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 amino acids. The second interaction domain of the circularly permuted green fluorescent protein of the present invention is provided in the loop region between β-sheet IX and β-sheet X also termed loop 2. This loop can comprise an

insertion of between 1 to 5 amino acids, e.g. 1, 2, 3, 4 or 5 amino acids and/or between 1 and 15 mutations, e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 14 or 15 mutations with respect to the native loop region of the GFP protein on which the circularly permuted GFP is based. In this context the term "mutation" refers to a change of an amino acid at a given position as compared to the amino acid in the native GFP. The term "insertion" implies the introduction of additional amino acids. For example, the loop region between β-sheet IX spanning amino acids 178 to 187 and β-sheet X spanning amino acids 199 to 208 has a native length of 11 amino acids. Accordingly, the insertion of one amino acid will lead to a total length of loop 2 of 12 amino acids. While it is possible to provide a further binding surface in loop 2 by either inserting new amino acids or mutating native amino acids, it is particular preferred, if loop 2 comprises both mutations and an insertion, i.e. 1 to 5 amino acids, e.g. 1, 2, 3, 4 or 5 amino acids and between 1 and 15 mutations, e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15 mutations, preferably an insertion of between 1 and 3 amino acids, i.e. providing a loop 2 with a length of 12, 13, or 14 amino acids. It is further preferred that the insertion lies within the mutated area. Accordingly, a stretch of, e.g. five mutations comprising one insertion will provide six contiguous amino acids not found in the native GFP as a novel binding surface. Even so in most cases the mutated amino acids will be contiguous it is possible that a stretch of mutated sequence is interrupted one or more times by 1, 2, 3, 4 or 5 consecutive amino acids, which are identical to the native GFP on which the circularly permuted GFP is based. The process of circular permutation is known in the prior art and has been described, e.g. in Mullens, et al. (1994) J. Am. Chem. Soc. 1 16:5529-5533, Protasova et al, (1994) Prot. Eng. 7:1373-1377, Vignais et al, (1995) Protein Sci. 4:494-1000, Yang, R. Y and Schachmann H.K. (1993) PNAS U.S. 90:11980-11984 and WO 01/30998. Essentially, circular permutation involves connecting of the native amino (N) and carboxy (C) termini of a protein or polypeptide. Circular permutation, thus, reorganizes the primary sequence of the protein so that its original amino and carboxy terminals are covalently closed and new terminals are created at a different site within the sequence. Covalent closure of the natural terminus can involve insertion of one or more amino acids, e.g. if the terminus is close enough in space to be linked to each other. If the two termini are naturally not close to each other a spacer of sufficient length must be provided to prevent the disruption of the tertiary protein structure. Proteins reorganized in this way may retain some or all of the original function and properties and may have new functions or properties.

When deciding where to create the new N and C termini of the circularly permuted protein care needs to be taken not to disrupt the tertiary GFP structure, i.e. the "barrel shape" of GFP and it is, therefore, required that the new N and C terminus of the GFP of the present

invention is located within a loop region of the protein, i.e. in a region which has neither a β- sheet nor an α-helical structure and which serves to connect to such secondary structural elements of the native GFP. Once the position of the new N and C termini has been determined within one of the loop regions the primary amino acid sequence of the modified protein has to be further modified by the provision of a methionine start codon. Such a codon can either be created at the new N terminus by mutating the first N terminal amino acid into a methionine or alternatively by adding one or more additional amino acids to the new N terminus of the protein the first of which needs to be a methionine.

In a preferred embodiment of the protein of the present invention the native N terminus comprises deletions of between 1 and 10 amino acids and/or the native C terminus comprises deletions of between 1 and 11 amino acids. For optimal binding activity the complete sequence between β-sheet I starting, e.g., at amino acid 12 for a GFP according to SEQ ID NO: 1, and β-sheet IX ending at, e.g., amino acid 227 for a GFP according to SEQ ID NO: 1, may be replaced by loop 3 sequence. Non GFP-native loop 3 sequence preferably has a length of between 18 and 28 amino acids, i.e. 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, or 28. An insertion of this length can be achieved by deleting native sequences of the N and/or C- termini and replacing them depending on the amount of deletion with a polypeptide chain of sufficient length to provide a new loop 3 having a length within the desired parameters. The first α-helix spanning amino acids 3 to 9 may or may not be deleted. Preferably, the deletion does not comprise more than 4 amino acids of the native N terminus and not more than 10 amino acids of the native C-terminus. It is preferred that loop 3 comprises 1 , 2 or 3 π amino acids at the junction between the loop and the GFP sequence, wherein π stands for an amino acid which is neither aromatic, hydrophobic nor cystein. Preferably, the 1 to 3 π amino acids are consecutive and even more preferably they are arranged at the C-terminal junction of the polypeptide chain with the native or deleted N and C termini of GFP.

As stated above the binding capability of GFP is improved, if the length of loop 3 is 10, 12, 14 or 16 amino acids and if the respective length of the amino acid chain between β- sheet I and amino acid 227 and β-sheet IX is 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, or 28. Those lengths are in particular preferred, if the native N terminus of the GFP protein is deleted by between 2 and 5 amino acids and the native C terminus is deleted by between and 8 and 10 amino acids.

In that respect it has been observed that the fluorescence activity of circular permuted GFPs is preserved particularly well while providing a new binding surface, if 1, 2 or 3 additional amino acids are inserted into loop 2 and if between 2, 3, 4, 5, 6, 7, 8, 9 or 10 mutations with respect to the native loop region are inserted into loop 2, preferably the loop 2

region has a length of 11, 12 or 13 amino acids. Accordingly, the protein of the present invention preferably comprises 1 to 3 amino acid insertions and between 2 and 10 mutations in loop 2. It is even further preferred that the circularly permuted GFP of the present invention comprises one amino acid insertion into loop 2, i.e. a loop with a length of 12 and between 7 and 9 mutations with respect to the native loop region. It is preferred that the insertion lies within a consecutive stretch of mutations. It is not required to provide so called "linker regions" composed of, e.g. stretches of glycine and a C terminal of peptide sequences with potential binding capacity, however, it is preferred that loop 2 comprises at least two, preferably consecutive amino acids σ and η at the C-terminal end of the mutated loop region wherein σ stands for an amino acid, which is a small amino acid like, e.g. alanine, glycine, isoleucine, leucine, methionine, serine, threonine or valine and η stands for an amino acid which is a small hydrophobic amino acid like, e.g. isoleucine, leucine, methionine or valine.

A "target compound" to which the circularly permuted GFP of the present invention may bind can be any natural or synthetic compound. Preferred target compounds are biologically occurring polymers or monomers e.g. hormones, fatty acids, proteins, polysaccharides, or polynucleotides. The term "specific binding" is generally understood in the field of antibodies to refer to a preferential binding of an antibody to e.g. a target protein versus binding to none target proteins. Since the circularly permuted GFPs of the present invention are antibody-like in that they provide a scaffold for binding to various targets, the term "specific binding" has the same meaning within the context of the present invention as in the context of antibodies. Accordingly, it is possible to employ methods similar to those used for assessing specific binding of antibodies to assess the specificity of binding of a given circularly permuted GFP, e.g. ELISA, surface plasmon resonance (SPR) or Western blots. The specific association of target compounds and circularly permuted GFPs is dependent on hydrogen bonds, hydrophobic interactions, electrostatic forces, and van der Waals forces. These are all bonds of a weak, non-covalent nature, yet some of the associations between a target compound and a circularly permuted GFP can be quite strong. All target compound- circularly permuted GFP binding is reversible, however, and follows the basic thermodynamic principles of any reversible bimolecular interaction: K A = [GFP-TCl/CtGFPltTC]) where K A is the affinity constant, GFP and TC are the molar concentrations of unoccupied binding sites on the GFP or target compound, respectively, and GFP-TC is the molar concentration of the target compound-circularly permuted GFP complex.

In a preferred embodiment the affinity constant for target compound-circularly permuted GFP binding is in the range of from 10 "4 mol to 10 "13 mol. Preferably, a circularly

permuted GFP has a binding affinity constant K A to its target compound of less than 10 "4 mol, less than 10 *5 mol, less than 10 "6 mol, less than 10 "7 mol, less than 10 "8 mol, less than 10 ~9 mol, less than 10 '10 mol, less than 10 " " mol, less than 10 "12 mol, or less than 10 "13 mol.

The binding specificity of the circularly permuted GFPs of the present invention can be enhanced, if the protein comprises a further binding site in the loop region between β-sheet VII spanning amino acids 145 and 155 and β-sheet VIII spanning amino acids 160 to 170 (loop 1). This binding can comprise an insertion of between 1 and 10 amino acids, e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids and/or between 2 and 15, e.g. 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15 amino acid mutations, with respect to the native loop region. Loop 1 is in the vicinity of the binding scaffold provided by loop 2 and the new recreated scaffold provided by the peptide connecting the native N and C termini (loop 3) and is capable of contributing to the binding of base to other binding regions within the molecule. It is preferred that loop 1 present in the molecules of the invention comprises an insertion of between 2, 3 and 4 amino acids and between 3 and 7 mutations with respect to the native loop region. The total length of loop 1 including mutations and/or insertions is preferably between 4 and 15 amino acids, e.g. 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 amino acids, preferably between 6 and 8 amino acids long. In one preferred embodiment the mutated region is extended one amino acid into β- sheet VII, e.g. involves amino acid 155 or 160.

As has been set out before, the new N and C terminus created in the process of circular permutation can be located in various loop regions of GFP, however, it is preferably located between β-sheet I and β-sheet II, β-sheet II and β-sheet III, between β-sheet III and α-helix II, between β-sheet IV and β-sheet V, between β-sheet V and β-sheet VI, between β-sheet VI and β-sheet VII, or between β-sheet VIII and β-sheet IX. Out of those loop regions the loop region between β-sheet VIII and β-sheet IX is particularly preferred for the creation of the new N and C termini. As outlined before the creation of a new N terminus involves the introduction of an additional methionine or the mutation of the new N-terminal amino acid into a methionine residue. It is particularly preferred to create a new C terminus at amino acid residue 173 (with reference to wild type Swissprot P42212) and a new N terminus at amino acid 174. Again, in this context the residue at position 174 can be mutated into a methionine or one or more additional amino acids comprising an N-terminal methionine can be added N-terminally of the amino acid at position 174. However, it is also possible that amino acids are deleted from the newly created N and/or C terminus (i) and this deletion can be in the range of between 1 and 10 amino acids should preferably, however, not delete any of the secondary structures adjacent the loop, i.e. the β-sheets or α-helices of the GFP protein.

In the prior art circularly permuted GFP proteins have been described (see, for example, Nagai, T. et al. (2001) 98:3197-3202 wherein a certain functionality was added to the newly created C terminus. In the cited publication Xenopus calmoduline was added to the newly created C-terminus to provide the GFP with the Ca 2+ binding function of calmoduline. However, since the interaction surface of the circular permuted GFP protein of the present invention is provided by the mutated/inserted amino acids in loop 3, loop 2 and in preferred embodiments in loop 1 the circularly permuted GFP protein of the present invention preferably does not comprise any additional functionalities like enzyme activities, accordingly it is preferred that added non-native amino acids other than the methionine residue which is added to the newly created N terminus to provide a translational start point.

In some embodiments wherein, e.g. the purification of the circularly permuted GFPs of the invention is desired the protein can further comprise sequences allowing easy purification. A large number of those sequences is known in the art and comprise without limitation His-tags, Myc-tags, Chitin-binding-tags, GST and the like. Such tags would be preferentially inserted at the newly created N or C terminus to avoid masking of the tag by the rest of the protein. It is often possible to remove any tag which has been attached to a GFP protein of the present invention after purification by, e.g. protease cleavage and the like.

The GFP scaffold is capable to fluoresce after the introduction of insertions and mutations in loop 3 and loop 2 and optionally in loop 1. It is one of the advantages of the circularly permuted GFPs of the present invention, that the fluorescence of the protein, which additionally comprises two or three binding domains is almost as strong as the fluorescence of the underlying GFP, i.e. the GFP without the mutations and/or insertions and circular permutation. In a preferred embodiment of the protein of the present invention the fluorescence activity of the circularly permuted GFP is considered to be essentially unaltered if the fluorescence activity is at least 30% of the activity of native GFP, preferably at least 35%, more preferably at least 40%, at least 50%, at least 55%, at least 60%, at least 65% at least 70%, at least 75%, at least 80%, at least 85% at least 90% and most preferably at least 95%.

As set out above the term GFP within the meaning of this invention refers to GFP from Aequorea and variants thereof but also comprises structurally similar proteins from other species, e.g. having a barrel shape, being capable of autofluorescence etc. and mutants with shifted absorption and/or emission spectra (e.g. red or blue shifted mutants), increased stability, or more rapid folding. In a preferred embodiment of the protein of the present invention the native GFP on which the circularly permuted GFP is based has an amino acid sequence according to SEQ ID NO. 1 or is another GFP variant, in particular EGFP, BFP,

EYFP. Also included in the present invention are derivatives of GFP, i.e. of Aequorea GFP or variants thereof. In that sense a derivative is a protein comprising 1 to 15, i.e. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15 additional mutations (also called substitutions), deletions or insertions, preferably substitutions in regions outside loop 1, loop 2 or loop 3. Preferably, the GFP derivative comprises conservative substitutions. Conservative substitutions are known in the art and typically include substitution of, e.g. one polar amino acid with another polar amino acid and one acidic amino acid with another acidic amino acid. Accordingly, conservative substitutions preferably include substitutions within the following groups of amino acids: glycine, alanine, valine, proline, isoleucine, and leucine (non polar, aliphatic side chain); aspartic acid and glutamic acid (negatively charged side chain); asparagine, glutamine, methionine, cysteine, serine and threonine (polar uncharged side chain); lysine, histidine and arginine; and phenylalanine, tryptophane and tyrosine (aromatic side chain); and lysine, arginine and histidine (positively charged side chain). However, any substitution, insertion or deletion is permissible in as long as it does not substantially alter the fluorescence activity of the GFP according to SEQ ID NO. 1 or a GFP variant, e.g. EGFP, BFP, EYFP. The fluorescence activity of a circularly permuted GFP derivative is considered to be essentially unaltered if the fluorescence activity of the derivative for a given molar amount of circularly permuted GFP is reduced by less than 50% in comparison to the circularly permuted GFP on which it is based. In a preferred embodiment of the protein of the present invention the native GFP preferably the GFP according to SEQ ID NO 1 or the derivative thereof, additionally comprises one or more stabilizing mutations selected from the group consisting of S30R, N39I, L42I, F64L, F99S, N105S, El I lV, I128S, I128T, Y145F, M153T, K162N, V163A, K166T, 1167V, Il 7 IV, S205T and A206V. Out of the large number of mutants described for GFP a GFP derivative comprising one or more, preferably all six mutations L42I, F64L, F99S, M153T, V163A, and A206V is particularly preferred as a basis for the circularly permuted GFPs of the present invention, i.e. the circularly permuted GFP will comprise these mutations at positions corresponding to above indicated positions. It is even more preferred, if the native GFP comprises one or both of the additional mutations N105S and/or I128S. Additionally or alternatively the native GFP may further comprise one or both of the mutations K162N and/or K166T.

As has been set out above a large number of GFP variants is known, which exhibit an altered absorption and/or emission spectrum. The use of such mutants will be particularly preferred, if more than one target compound needs to be detected simultaneously with the Affiphores of the present invention. Accordingly, in a preferred embodiment of the protein of

the present invention the native GFP on which the circularly permuted GFP will be based comprises one or more "colour" or fluorescence shift mutations selected from the group consisting of S65T, Y66H, Y66W, S65G/S72A/T203Y.

While the Affiphor as such is of primary importance in the context of the present invention the identification of an Affiphor for a given target compound will often involve selection processes, wherein a large number of circularly permuted GFP proteins bearing different amino acids in loop 2 and loop 3 and optionally in loop 1 are selected on the basis of their interaction with a given target compound. Accordingly, in a further aspect the present invention concerns a protein library comprising at least two proteins of the present invention differing in at least one amino acid insertion and/or mutation in loop 3, loop 2 and optionally in loop 1.

In a further aspect the present invention concerns a polynucleotide selected from the group consisting of:

(a) polynucleotide encoding at least the mature circularly permuted GFP of the present invention;

(b) polynucleotides encoding a derivative of a mature circularly permuted GFP encoded by a polynucleotide of (a), wherein in said derivative 1 to 15, i.e. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,or 15 amino acid residues are, preferably conservatively substituted compared to the native GFP and wherein this substitutions are in addition to any mutations in the loop region(s) said derivative having fluorescence activity, preferably having an unaltered or essentially unaltered fluorescence activity;

(c) polynucleotides the complementary strand of which hybridizes, preferably under stringent conditions to a polynucleotide as defined in any one of (a) to (b) and which code for a circularly permuted GFP having fluorescence activity, preferably having an unaltered or essentially unaltered fluorescence activity; or the complementary strand of such a polynucleotide.

The polynucleotide molecules of the invention can contain naturally occurring sequences, or sequences that differ from those that occur naturally, but, due to the degeneracy of the genetic code, encode the same polypeptide. The circularly permuted GFP encoding nucleic acid molecules of the invention can be DNA, cDNA, genomic DNA, synthetic DNA,

RNA, PNA or phosphorothioate and can be double-stranded or single-stranded, the sense and/or an anti-sense strand.

The term "vector" refers to a means, including, for example, a protein or a polynucleotide or a mixture thereof which is capable of being introduced or of introducing the proteins and/or polynucleotides of the invention into a cell. Certain vectors are particular

suitable for the introduction of polynucleotides or polypeptides into only some specific cell types, while other vectors can be introduced into a variety of different cell types. The skilled artisan knows how to choose a particular vector depending on the cell type into which the polynucleotide or polypeptide is to be introduced. In a preferred embodiment the vector of the present invention comprises plasmids; phagemids; phages, in particular Ml 3 and derivatives thereof; cosmids; artificial chromosomes, in particular artificial mammalian chromosomes or artificial yeast chromosomes; knock-out or knock-in constructs; viruses, in particular adenovirus, vaccinia virus, attenuated vaccinia virus, canary pox virus, lentivirus (Chang and Gay, 2001), herpes virus, in particular Herpes simplex virus (HSV-I, Carlezon, et al, 2000), baculovirus, retrovirus, adeno-associated-virus (AAV, Carter and Samulski. 2000), rhinovirus, human immune deficiency virus (HIV), filovirus, and engineered versions of above mentioned viruses (see, for example, Kobinger et al, 2001); virosomes; "naked" DNA, liposomes; virus- like particles; and nucleic acid coated particles, in particular gold spheres. Particularly preferred are viral vectors like adenoviral vectors, lentiviral vectors, baculovirus vectors or retroviral vectors (Lindemann et al, 1997, and Springer et al, 1998). Examples of plasmids, which allow the generation of such recombinant viral vectors include pFastBacl (Invitrogen Corp., Carlsbad CA), pDCCMV (Wiznerowicz et al, 1997) and pShuttle-CMV (Q-biogene, Carlsbad, California). Liposomes are also preferred vectors and are usually small unilamellar or multilamellar vesicles made of cationic, neutral and/or anionic lipids, for example, by ultrasound treatment of liposomal suspensions. The polynucleotides can, for example, be ionically bound to the surface of the liposomes or internally enclosed in the liposome. Suitable lipid mixtures are known in the art and comprise, for example, DOTMA (1, 2- Dioleyloxpropyl-3-trimethylammoniumbromide) and DPOE (Dioleoylphosphatidyl- ethanolamine) which both have been used on a variety of cell lines.

Nucleic acid coated particles are another means for introducing the polynucleotides of the invention into cells using so called "gene guns", which allow the mechanical introduction of particles into the cells. Preferably the particles itself are inert, and therefore, are in a preferred embodiment made out of gold spheres.

In a further aspect the polynucleotide of the present invention is operatively linked to one or more expression control sequences, which allow expression in prokaryotic and/or eukaryotic host cells. The transcriptional/translational regulatory elements referred to above include, but are not limited to, inducible and non-inducible, constitutive, cell cycle regulated, metabolically regulated promoters, enhancers, operators, silencers, repressors and other

elements that are known to those skilled in the art and that drive or otherwise regulate gene expression. Such regulatory elements include, but are not limited to, regulatory elements directing constitutive expression like, for example, promoters transcribed by RNA polymerase III like, e.g., promoters for the snRNA U6 or scRNA 7SK gene, the cytomegalovirus hCMV immediate early gene, the early or late promoters of SV40 adenovirus, viral promoter and activator sequences derived from, e.g., NBV, hepatits (HCV), herpes simplex virus (HSV), human papilloma virus (HPV), Ebstein-Barr virus (EBV), human T-cell leukaemia virus (HTLV), mouse mammary tumor virus (MMTV) or HIV; which allow inducible expression like, for example, CUP-I promoter, the tet-repressor as employed, for example, in the tet-on or tet-off systems, the lac system, the trp_ system; regulatory elements directing cell cycle specific expression like, for example, cdc2, cdc25C or cyclin A promoter; or the TAC system, the TRC system, the major operator and promoter regions of phage A, the control regions of fd coat protein, the promoter for 3-phosphoglycerate kinase (PGK), the promoters of acid phosphatase, and the promoters of the yeast α- or a-mating factors. Particularly preferred promoters are the constitutive CMV immediate early gene promoter, the early or late SV 40 promoter, the polyhedrin promoter, retroviral LTRs, PGK promoter, elongation factor 1-α (EF 1-α.) and phosphoenolpyruvate carboxy kinase (PEPCK).

As used herein, "operatively linked" means incorporated into a genetic construct so that expression control sequences effectively control expression of a coding sequence of interest.

Another aspect of the present invention is a host cell genetically engineered with the polynucleotide(s) or the vector(s) of the present invention outlined above. The host cells that may be used for purposes of the invention include, but are not limited to, prokaryotic cells such as bacteria (for example, E. coli and B. subtilis), which can be transformed with, for example, recombinant bacteriophage DNA, plasmid DNA, or cosmid DNA expression vectors containing the polynucleotide molecules of the invention; simple eukaryotic cells like yeast (for example, Saccharomyces and Pichiά), which can be transformed with, for example, recombinant yeast expression vectors containing the polynucleotide molecule of the invention; insect cell systems like, for example, Spodoptera frugiperda and Trichoplusioa ni cell lines, e.g. Sf9 or Hi5 cells, which can be infected with, for example, recombinant virus expression vectors (for example, baculovirus) containing the polynucleotide molecules of the invention; Xenopus oocytes, which can be injected with, for example, plasmids; plant cell systems, which can be infected with, for example, recombinant virus expression vectors (for example, cauliflower mosaic virus (CaMV) or tobacco mosaic virus (TMV)) or transformed with recombinant plasmid expression vectors (for example, Ti plasmid) containing a

circularly permuted GFP polypeptide encoding nucleotide sequence; or mammalian cell systems (for example, COS, CHO, BHK, HEK293, VERO, HeLa, MDCK, Wϊ38, NSO and NIH 3T3 cells), which can be transformed with recombinant expression constructs containing, for example, promoters derived, for example, from the genome of mammalian cells (for example, the metallothionein promoter) from mammalian viruses (for example, the adenovirus late promoter and the vaccinia virus 7.5K promoter) or from bacterial cells (for example, the tet-repressor binding its employed in the tet-on and tet-off systems). Also useful as host cells are primary or secondary cells obtained directly from a mammal and transfected with a plasmid vector or infected with a viral vector. Depending on the host cell and the respective vector used to introduce the polynucleotide of the invention the polynucleotide can integrate, for example, into the chromosome or the mitochondrial DNA or can be maintained extrachromosomally like, for example, episomally or can be only transiently comprised in the cells. Preferred host cells are Spodoptera frugiperda, Trichoplusioa ni; mammalian cells, in particular stem cells, hemopoietic cells, hepatocytes, adipocytes, neurons, osteoclasts, uterine endometrium cells, dermatocytes, myocardial cells, mucosal cells, leucocytes or tumor cells; bacterial cells, in particular of Escherichia or Bacillus species and yeast cells, in particular of Pischia or Saccharomyces species.

From the different requirements on the cellular conditions for folding of certain proteins a further problem arises upon expression of GFP proteins, in particular in bacteria if the protein only attains a correct folding in the cytoplasma as is the case, for example for green fluorescent proteins (GFP,) which is incompatible with Sec transport pathway. Thus, the expression of GFP and in particular GFP-fusion proteins, which is desirable in GFP phage display libraries uses vectors, wherein the vector further comprises an interaction domain and a protein translocation domain fused to the polynucleotide, which effects that the encoded fusion protein upon expression in a bacteria is translocated in a folded or essentially folded state through the cytoplasmatic membrane. Examples of such translocation domains are Tat dependent or thylakoid-δpH dependent sequence. The interaction domains which are used in the GFP fusion protein allow binding of this protein to a second fusion protein. Thereby interaction domains are preferred which result in a relatively stable interaction between the two proteins, wherein a relatively stable interaction is an interaction which remains stable in the oxidative environment of the periplasma, on the bacterial cell surface or also outside the cell upon secretion of the heterodimer or heteromultimer. Suitable interaction domains of the first and second fusion protein which can be comprised in the fusion protein according to the invention are, for example, a leucine zipper domain and a leucine zipper domain as they have been described for the first time in the two oncoproteins cJun and cFos (Landschulz et al.

(1988) supra) or variants thereof derived from other hetero- or homodimers as well as artificial leucine zipper domains or helix-loop-helix-domains and helix-loop-helix-domains (Moor et al. (1989) Cell 56:777-783), a calmodulin and a calmodulin binding peptide (Montigiani, S. et al. (1996) JMB 258:6-13) or in each case of a peptide of a peptide dimer. The term interaction domain also comprises domains which allow the formation of multimers of more than two fusion proteins. A system for translocating fusion proteins and in particular GFP fusion protein in a folded or essentially folded state and displaying them on bacterial surfaces has been described in WO 2004/050871 to which specific reference is made for all aspects of the design of such fusion proteins. A GFP fusion protein is considered to be translocated through a bacterial membrane in an essentially folded or folded state, if GFP activity can be detected on the surface of bacteria expressing a GFP fusion protein. Conversely, if no GFP activity is detectable on the surface of bacteria after expression of a GFP fusion protein (comprising GFP, translocation domain and interaction domain) and translocation through the bacterial membrane than the GFP fusion protein has been translocated in an essentially unfolded or unfolded state.

A further aspect of the invention is a process for producing a circularly permuted GFP polypeptide encoded by the polynucleotide of the invention comprising: culturing the host cell of the present invention and recovering the circularly permuted GFP polypeptide encoded by said polynucleotide. The skilled practitioner is aware of a variety of expression systems, which yield high level expression of heterologous proteins and which can, thus, be used for production of the circularly permuted GFP polypeptide of the present invention. The choice of expression system depends on the required amount of protein and the required modifications. While it is standard to use single cell organisms for the expression of heterologous proteins, it is also possible to use cells derived from multicellular organisms. In principle, any such cell culture is workable, whether from vertebrate or invertebrate culture. In addition to mammalian cells, these include insect cell systems infected with recombinant virus expression vectors (e.g., baculovirus); plant cell systems infected with recombinant virus expression vectors (e.g., cauliflower mosaic virus, CaMV; tobacco mosaic virus, TMV); or transformed with recombinant plasmid expression vectors (e.g., Ti plasmid), yeast transformed with plasmids or artificial chromosomes and prokaryotic cells transformed with plasmids, cosmids, phagemids or phage containing one or more circularly permuted GFP polypeptide coding sequences.

In a useful insect system, Autograph californica nuclear polyhedrosis virus (AcNPV) is used as a vector to express foreign genes. The virus grows in Spodoptera frugiperda and Trichoplusion ni cells. Spodopteria frugiperda cells are used for amplification of the virus,

while for circularly permuted GFP production H5 cells isolated from Trichoplusia ni are employed. The circularly permuted GFP polypeptide coding sequences are cloned into nonessential regions (for example, the polyhedrin gene) of the virus and placed under control of an AcNPV promoter (for example, the polyhedrin promoter). Successful insertion of the coding sequences results in the inactivation of the polyhedrin gene and production of non-occluded recombinant virus (i.e., virus lacking the proteinaceoυs coat coded for by the polyhedrin gene). These recombinant viruses are then used to infect Spodoptera frugiperda cells in which the inserted gene is expressed (e.g., U. S. Patent No. 4,215,051, Smith, incorporated herein by reference). Examples of vectors which can be used to generate recombinant virus have been indicated above.

Examples of useful mammalian host cell lines are VERO and HeLa cells, CHO cell lines, WI 38, BHK, COS-7, 293, HepG2, NIH3T3, RIN, NSO and MDCK cell lines. In addition, a host cell strain may be chosen that modulates the expression of the inserted sequences, or modifies and processes the gene product in the specific fashion desired. Such modifications (e.g., glycosylation) and processing (e.g., cleavage) of protein products may be important for the function of the protein. Different host cells have characteristic and specific mechanisms for the posttranslational processing and modification of proteins. Appropriate cell lines or host systems can be chosen to ensure the correct modification and processing of the foreign protein expressed. Expression vectors for use in mammalian cells ordinarily include an origin of replication (as necessary), a promoter located in front of the gene to be expressed, along with any necessary ribosome binding sites, RNA splice sites, polyadenylation site, and transcriptional terminator sequences. The origin of replication may be provided either by construction of the vector to include an exogenous origin, such as may be derived from S V40 or other viral (e. g., Polyoma, Adeno, CMV, VSV, BPV) source, or may be provided by the host cell chromosomal replication mechanism. If the vector is integrated into the host cell chromosome, the latter is often sufficient.

The promoters may be derived from the genome of mammalian cells (e. g., metallothionein promoter) or from mammalian viruses (e.g., the adenovirus late promoter; the vaccinia virus 7.5K promoter). Further, it is also possible, and may be desirable, to utilize promoter or control sequences normally associated with the circularly permuted GFP polynucleotide provided such control sequences are compatible with the host cell system used.

A number of viral based expression systems may be utilized, for example, commonly used promoters are derived from polyoma, Adenovirus 2, and most frequently Simian Virus

40 (SV40). The early and late promoters of SV40 virus are particularly useful because both are obtained easily from the virus as a fragment which also contains the SV40 viral origin of replication. Smaller or larger SV40 fragments may also be used, provided there is included the approximately 250 bp sequence extending from the Hindlll site toward the BgIW site located in the viral origin of replication.

In cases where an adenovirus is used as an expression vector, the coding sequences may be ligated to an adenovirus transcription/translation control complex, e.g., the late promoter and tripartite leader sequence. This chimeric gene may then be inserted in the adenovirus genome by in vitro or in vivo recombination. Insertion in a non-essential region of the viral genome (e. g., region El, E3, or E4) will result in a recombinant virus that is viable and capable of expressing circularly permuted GFP in infected hosts.

Specific initiation signals may also be required for efficient translation of circularly permuted GFP coding sequences. These signals include the ATG initiation codon and adjacent sequences. Exogenous translational control signals, including the ATG initiation codon, may additionally need to be provided. One of ordinary skill in the art would readily be capable of determining this and providing the necessary signals. It is well known that the initiation codon must be in-frame (or in-phase) with the reading frame of the desired coding sequence to ensure translation of the entire insert. These exogenous translational control signals and initiation codons can be of a variety of origins, both natural and synthetic. The efficiency of expression may be enhanced by the inclusion of appropriate transcription enhancer elements and transcription terminators. In eukaryotic expression, one will also typically desire to incorporate into the transcriptional unit an appropriate polyadenylation site (e.g., 5'-AATAAA-3') if one was not contained within the original cloned segment. Typically, the poly A addition site is placed about 30 to 2000 nucleotides "downstream" of the termination site of the protein at a position prior to transcription termination.

For long-term, high-yield production of a recombinant fusion polypeptides stable expression is preferred. For example, cell lines that stably express constructs encoding a circularly permuted GFP polypeptide may be engineered. Rather than using expression vectors that contain viral origins of replication, host cells can be transformed with vectors controlled by appropriate expression control elements (e.g., promoter, enhancer, sequences, transcription terminators, polyadenylation sites, etc.), and a selectable marker. Following the introduction of foreign DNA, engineered cells may be allowed to grow for 1-2 days in an enriched media, and then are switched to a selective media. The selectable marker in the recombinant plasmid confers resistance to the selection and allows cells to stably integrate the plasmid into their chromosomes and grow to form foci which in turn can be cloned and

expanded into cell lines.

A number of selection systems may be used including, but not limited to, the herpes simplex virus thymidine kinase (tk), hypoxanthine-guanine phosphoribosyltransferase (hgprt) and adenine phosphoribosyltransferase (aprt) genes, in tk-. hgprt-or aprt-cells, respectively. Also, antimetabolite resistance can be used as the basis of selection for dihydrofolate reductase (dhfr), that confers resistance to methotrexate; gpt, that confers resistance to mycophenolic acid; neomycin (neo), that confers resistance to the aminoglycoside G-418; and hygromycin (hygro), that confers resistance to hygromycin.

A further aspect of the invention is a process for producing cells capable of expressing circularly permuted GFP. This process comprises genetically engineering cells in vitro with at least one vector of the present invention wherein said circularly permuted GFP is (are) encoded by the polynucleotide(s) of the present invention. The type of cell, which can be transformed is not limited and depends on the respective vector or vector system used to genetically engineer the cells. Vectors and vector systems, which are preferred for the transformation of certain cell types, have been indicated above. In addition it is preferred that the particular cells and cell lines outlined above are employed in this process of the invention.

In a further aspect the present invention concerns a polypeptide encoded by the nucleic acid molecule of the present invention or produced by the method of the present invention.

To detect the expression of the circularly permuted GFPs of the present invention it is possible to use most antibodies raised either against GFP. However, to distinguish between the expression of native GFP and the circularly permuted protein based thereon, it is in some aspects of the invention desirable to have an antibody, which specifically detects the circularly permuted protein, but which shows only a weak or essentially no specificity to the native GFP alone. Thus in a further aspect the present invention relates to an antibody specific to the polypeptide encoded by the polynucleotide of the present invention or obtainable by the process of the present invention for producing the circularly permuted GFP, which is nonspecific or essentially non-specific to the native GFP on which the circularly permuted GFP is based. It is known in the art how to generate antibodies which only recognize a particular epitope within a protein. It is possible, for example, to immunize mammals, in particular mice or rabbits with a peptide that comprises the junction and/or part of loop 3, loop 2 or optionally loop 1. This will lead to antibodies, which specifically recognize the circularly permuted GFP. From thus produced antibodies those will be further selected, which show only a weak or essentially no specificity towards the native GFP.

Similarly, the present invention further concerns a primer or pair of primers capable of specifically amplifying a nucleic acid molecule of the present invention but not the nucleic

acid molecule encoding the native GFP on which the circularly permuted GFP is based. A primer is a polynucleotide capable of specific hybridizing with a polynucleotide encoding a circularly permuted GFP of the present invention under stringent conditions, i.e. at a temperature of above 50°C. Accordingly, it is preferred that the primer has a length of at least 15, preferably at least 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or at least 30 nucleotides.

In a further aspect the present invention concerns a polynucleotide library comprising at least two polynucleotides of the present invention, which differ in at least two nucleotides encoding amino acid insertion(s) and/or mutation(s) in loop 3, loop 2 and/or loop 1. In a preferred embodiment the polynucleotide library of the present invention is generated by inserting random or partially random nucleotide triplet sequence instead of the nucleotide triplets encoding specific amino acid insertion(s) and/or mutation(s) in loop 3, loop 2 and/or loop 1. The insertion of random sequence is easily carried out using, e.g. known PCR based methods as outlined, for example in Fig. 3. These methods usually involve the introduction of primers with randomized sequences flanked by exactly matching sequences, which allow specific hybridization to a GFP sequence flanking the areas to be randomized. A primer with "randomized sequence" is in fact a set of primers comprising sets of primers which differ in one or more nucleotides. They are usually generated by including in the coupling reaction during primer synthesis not only one but two, three or four different nucleotide building blocks. If, for example, equal molar amounts of dATP, dCTP, cGTP and dTTP are included in one coupling step during the synthesis reaction of a prime the resulting primer will be "randomized" at one position and will essentially comprise four different set of primers, wherein one quarter comprises an A, one quarter a C, one quarter a G new at the randomized position. Using this or other art known approaches is possible to generate primers with fully random nucleotide triplets, i.e. having all 4 3 = 64 possible codons at one triplet position or partially random triplet sequences, which represent only a subset of the 64 possible codons and, accordingly, only encode a subset of the 20 possible amino acids. It is known in the art how to randomize triplets in order to only select a subset of amino acids like, for example, polar amino acids. While it is desired in screening procedures like, for example, phage display to have as many different peptides presented as possible it might be desired at a later stage during optimization of the binding of a Affiphor lead structure to only allow small variations in loop 3, loop 2 and/or loop 1. For example, if a selected Affiphor comprises two alanine residues in the loop regions a circularly permuted GFP encoding polynucleotide can be randomized by introducing triplets at the positions encoding alanine, which encode other small hydrophobic amino acids. The following abbreviations are commonly used to designate randomization: N = A,C,G,T; V - G,A,C; D = G,A,T; B = G,T,C ; H = A,T,C ; W = A,T; M = A,C; R = A,G ; K = G,T ; Y = C,T ; S = G,C. Accordingly, the fully randomized triplet is

represented by "NNN" while preferred partly randomized codon triplets can be represented by the following abbreviations: NNV, NND, NNB, NNH, NNW, NNM, NNR, NNK, NNY, NNS, NVN, NDN, NBN, NHN, NWN, NMN, NRN, NKN, NYN, NSN, VNN, DNN, BNN, HNN, WNN, MNN, RNN, KNN, YNN 5 SNN, NVV, NDD, NBB, NHH, NWW, NMM, NRR, NKK, NYY, NSS, VNV, DND, BNB, HNH, WNW, MNM, RNR, KNK, YNY, SNS, VVN, DDN, BBN, HHN, WWN, MMN, RRN, KKN, YYN, SSN, VVV, DDD, HHH, WWW, MMM, RRR, KKK, YYY, SSS, DDV, DDB, DDH, DDW, DDM, DDR, DDK, DDY, DDS, DVD, DBD, DHD, DWD, DMD, DRD, DKD, DYD, DSD, VDD, BDD, HDD, WDD, MDD, RDD, KDD, YDD, SDD, DVV, DBB, DHH, DWW, DMM, DRR, DKK, DYY, DSS, VDV, BDB, HDH, WDW, MDM, RDR, KDK, YDY, SDS, VVD, BBD, HHD, WWD, MMD, RRD, KKD, YYD, BBV, BBD, BBH, BBW, BBM, BBR, BBK, BBY, BBS, BVB, BDB, BHB, BWB, BMB, BRB, BKB, BYB, BSB, VBB, DBB, HBB, WBB, MBB, RBB, KBB, YBB, SBB, BVV, BDD, BHH, BWW, BMM, BRR, BKK, BYY, BSS, VBV, DBD, HBH, WBW, MBM, RBR, KBK, YBY, SBS, VVB, DDB, HHB, WWB, MMB, RRB, KKB, YYB, SSB, HHV, HHD, HHB, HHW, HHM, HHR, HHK, HHY, HHS, HVH, HDH, HBH, HWH, HMH, HRH, HKH, HYH, HSH, VHH, DHH, BHH, WHH, MHH, RHH, KHH, YHH, SHH, HVV, HDD, HBB, HWW, HMM, HRR, HKK, HYY, HSS, VHV, DHD, BHB, WHW, MHM, RHR, KHK, YHY, SHS, VVH, WWH, MMH, RRH, KKH, YYH, SSH, WWV, WWD, WWB, WWH, WWM, WWR, WWK, WWY, WWS, WVW, WDW, WBW, WHW, WMW, WRW, WKW, WYW, WSW, VWW, DWW, BWW, HWW, MWW, RWW, KWW, YWW, SWW, WVV, WDD, WBB, WHH, WMM, WRR, WKK, WYY, WSS, VWV, DWD, BWB, HWH, MWM, RWR, KWK, YWY, SWS, VVW, DDW, BBW, HHW, MMW, RRW, KKW, YYW, SSW, MMV, MMD, MMB, MMH, MMW, MMR, MMK, MMY, MMS, MVM, MDM, MBM, MHM, MWM, MRM, MKM, MYM, MSM, VMM, DMM, BMM, HMM, WMM, RMM, KMM, YMM, SMM, MVV, MHH, MWW, MRR, MKK, MYY, MSS, VMV, DMD, BMB, HMH, WMW, RMR, KMK, YMY, SMS, VVM, DDM, BBM, HHM, WWM, RRM, KKM, YYM, SSM, RRV, RRD, RRB, RRH, RRW, RRM, RRK, RRY, RRS, RVR, RDR, RBR, RHR, RWR, RMR, RKR, RYR, RSR, VRR, DRR, BRR, HRR, WRR, MRR, KRR, YRR, SRR, RVV, RDD 5 RBB, RHH, RWW 5 RMM 5 RKK 5 RYY 5 RSS, VRV, DRD, BRB, HRH, WRW, MRM, KRK, YRY, SRS, VVR, DDR, BBR, HHR, WWR, MMR, KKR 5 YYR, SSR 5 KKV, KKD, KKB, KKH, KKW, KKM, KKR, KKY 5 KKS 5 KVK, KDK, KBK 5 KHK, KWK, KMK, KRK, KYK 5 KSK, VKK, DKK, BKK, HKK, WKK 5 MKK, RKK, YKK 5 SKK 5 KVV, KDD, KBB 5 KHH 5 KWW, KMM, KRR, KYY, KSS, VKV 5 DKD 5 BKB, HKH 5 WKW, MKM, RKR, YKY, SKS, VVK, DDK, BBK 5 HHK 5 WWK 5 MMK 5 RRK, YYK, SSK, YYV, YYD, YYB, YYH, YYW, YYM 5 YYR 5 YYK 5 YYS 5 YVY 5 YDY 5 YBY 5 YHY, YWY, YMY, YRY 5 YKY, YSY, VYY, DYY, BYY, HYY, WYY, MYY, RYY, KYY, SYY, YVV, YDD 5 YBB 5 YHH, YWW 5 YMM 5 YRR 5 YKK 5 YSS 5 VYV 5

DYD, BYB, HYH, WYW, MYM, RYR, KYK, SYS, VVY, DDY, BBY, HHY, WWY,

MMY, RRY, KKY, SSV, SSD, SSB, SSH, SSW, SSM, SSR, SSK, SSY, SVS, SDS, SBS,

SHS, SWS, SMS, SRS, SKS, SYS, VSS, DSS, BSS, HSS, WSS, MSS, RSS, KSS, YSS,

SVV, SDD, SBB, SHH, SWW, SMM, SRR, SKK, SYY, VSV, DSD, BSB, HSH, WSW, MSM, RSR, KSK, YSY, VVS, DDS, BBS, HHS, WWS, MMS, RRS, KKS, MTC or YYS.

In a preferred embodiment of the polynucleotide library of the present invention a) the sequence (NNV) n , (MTC)(NNV) n- h (NNV) n-0 (VVS) 0 , or (MTC)(NNV) n-0- i (VVS) 0 is inserted into loop 2 between nucleotide triplets encoding amino acid 186, 187 or 188 and 195, 196 or 197 according to SEQ ID NO 1 or a derivative thereof, wherein n is 7 to 11 and o is 1 to 3 b) the sequence (NNV)p, (NNV)p-q(VVS)q is inserted into loop 3 between nucleotide triplets encoding amino acid 228, 229 or 230 and 3, 4 or 5 according to SEQ ID NO 1 or a derivative thereof, wherein p is 8 to 16 and q is 1 to 3, c) optionally the sequence: (NNV) n , is inserted into loop 1 between nucleotide triplets encoding amino acid 153, 154, or 155 and 159, 160, 161 according to SEQ ID NO 1 or a derivative thereof, wherein m is 5 to 11, wherein N has the meaning A, C, G, or T; V has the meaning G 5 A, or C, M has the meaning A or C and S has the meaning G or C. A particularly preferred nucleic acid, which can be the basis for the randomized/mutated sequences is depicted in SEQ ID NO. 2. In most case the screening for Affiphores will be carried out on cells, wherein each cell will comprise and express one type of circularly permuted GFP only. To that end it is necessary to transfer the polynucleotide libraries of the present invention into a suitable vector. Accordingly, a further aspect of the present invention concerns a vector library comprising a polynucleotide library of the present invention. The vectors indicated above in the context of a single polynucleotide of the present invention can similarly be used for the generation of vector libraries. Particularly preferred libraries are based on phage libraries, which can be engineered to express a protein of interest, e.g. the circularly permuted GFP as a fusion of a phage coat protein. Upon infection of bacteria with such phage the bacteria will present the various versions of the circularly permuted GFPs encoded by the library on their surfaces.

In a further aspect the present invention concerns a method for identifying a circularly permuted GFP specifically binding to a target compound comprising the steps of: a) contacting a polypeptide library of the present invention, a vector library of the present invention or a cell comprising a polynucleotide library of the present invention or a vector library of the present invention with a target compound b) selecting a protein, vector or cell showing specific binding to the target compound.

In a further aspect the present invention concerns a protein, vector or cell identified

with a method of the present invention. The sequence of steps a) to b) is also referred to as a screening round. A first screening round will usually identify several different Affiphores binding to the target compound. The repetition of screening rounds for 1, 2, 3, 4, or more times will allow isolating those Affiphores with the highest binding affinity. If the strength of the fluorescence is simultaneously or subsequently measured it will be possible to specifically select those Affiphores which show both binding and strong fluorescence. Once Affiphores with the desired properties have been isolated it will be possible to further optimize binding of these Affiphores as outlined above, i.e. by analyzing the sequence of the selected Affiphor and by selected randomization of small parts of the Affiphor and selection of the Affiphores with the highest binding affinity and/or strongest fluorescence in this subset.

A "target compound" can be any natural or synthetic compound to which it is desired to isolate an Affiphore. Preferred target compounds are biologically occurring polymers or monomers e.g. hormones, fatty acids, proteins, polysaccharides, or polynucleotides.

In a further aspect the present invention concerns a kit comprising a vector of the present invention or vector library of the present invention and a second vector comprising an interaction domain and a protein translocation domain, which effects that the encoded fusion protein upon expression in a bacteria is translocated in an unfolded or essentially unfolded state through the cytoplasmatic membrane. Such kits, vectors or vector libraries are described in detail in WO 2004/05871 to which specific reference is made for these aspects. In a preferred embodiment of the kit of the present invention the second vector further comprises a phage coat protein, preferentially selected from the group of Ml 3 phage coat proteins pill, pVI, pVII, pVIII and pIX.

A further aspect of the present invention is the use of a protein of the present invention, a vector of the present invention or a cell of the present invention for the diagnosis of a disease or condition characterized by the presence, absence, increase or decrease of the target compound.

BRIEF DESCRIPTION OF THE FIGURES

Fig. 1: Tertiary structure of native GFP according to Yang F. et al. (1996) supra. The eleven β sheets are designated from N to C-terminus as β sheets I through XI. The three α helical structures are designated from N to C terminus as α helices I to III (SEQ ID

NO 1). The first and last amino acid belonging to the respective secondary structure are also indicated.

Fig. 2: Primary structure of preferred randomized circularly permuted GFPs. The amino acid numbering according to wildtyp GFP, mutation (compared to wildtyp (SwissProt

P42212, Q80R variant) are in bold, fluorophor- forming amino acids are in [square brackets], additional amino acids in italics; X = any amino acid; σ = small amino

acids (AILMSTV); η = small hydrophobic amino acids (ILMV); and π = no aromatic, no hydrophobic, no cystein (ADEGHKNQRST). The preferred randomized nucleic acid molecules are designated (SEQ ID NO 3 to 10). In all those sequences L2 is preferably randomized as follows: (nnb) 6 dyrntr. In that context the following further modifications/randomizations are comprised: SEQ ID NO 3: L3: (nnb) 8 , Ll : (nnb) 2 ,

SEQ ID NO 4: L3: (nnb) 8 , Ll : (nnb) 4 ; SEQ ID NO 5: L3: (nnb) )0 , Ll : (nnb) 2 SEQ ID NO 6: L3: (nnb)i 0 , Ll : (nnb) 4 ; SEQ ID NO 7: L3: (nnb) 12 , Ll: (nnb) 2 ; SEQ ID NO 8: L3: (nnb)i 2 , Ll : (nnb) 4 ; SEQ ID NO 9: L3: (nnb) )4 , Ll : (nnb) 2 ; SEQ ID NO 10: L3: Fig. 3: PCR strategy for randomizing the three selected loop regions, i.e. loop 1, loop 2, and loop 3.

Fig. 4: The upper panel shows the specificity of Affiphores isolated by three rounds of selection of phage specifically binding to egg white lysozyme (HeI). Depicted is the specificity of the isolated phages for 3 different targets: HeI, monoclonal anti-P24 antibody (CB) and caseine. The lower penal depicts the specificity of proteins isolated after 3 rounds of selection against CB as tested by incubation with CB, and caseine and HeI as control.

Fig. 5: The graph depicts the excitation and emission spectra of GFPuv (dark coloured line) which is the native GFP on which the preferred Affiphores of the present invention are based, correlated with the excitation and emission spectrum of the Affiphor (light coloured line).

EXPERIMENTAL SECTION Production of an Affiphor library. The outline of the PCR strategy leading to the randomisation of loop 1, loop 2 and loop 3 and which allows the generation of a randomized Affiphor library involves the following steps:

In IOO μl PCR reactions two fragments of the circularly permuted GFP (Affiphor template) was amplified. In that amplification loop regions (Ll, L2 and L3) were replaced by primer encoded degenerated codons (e.g. NNB). The first PCR reaction comprised 67.7 μl H 2 O, 10 μl 1Ox buffer, 3 μl 50 mM MgCl 2 , 8 μl 10 μM primer AF-fw 1 (SEQ ID No. 11), 8 μl 10 μM AF-bw 1-mix (comprising primers SEQ ID No. 12, SEQ ID No. 13, SEQ ID No. 14 and SEQ ID No. 15) 0,8 μl 100 mM dNTPs, 2 μl Taq polymerase and 0,5 μl template (10-100 ng). Preferably, a template was used comprising the nucleotide sequence as depicted in SEQ

ID NO.2. The PCR programme was run with 30 cycles at 60 s 95°C, 60 s 51 0 C, 30 s 72°C and with a final elongation phase of 10 minutes at 68°C.

The second PCR 2 comprises 67.7 μl H 2 O, 10 μl 1Ox buffer, 3 μl 50 mM MgCl 2 , 8 μl 10 μM primer AF-fw 2 (SEQ ID No. 16), 8 μl 10 μM AF-bw 2-mix (a mixture of SEQ ID No. 17 and SEQ ID No. 18), 0,8 μl 100 mM dNTPs, 2 μl Taq polymerase and 0,5 μl template (10-100 ng). The PCR programme was run with 30 cycles at 60 s 95°C, 60 s 51°C, 60 s 72°C and with a final elongation phase for 10 minutes at 68°C.

After gel purification equivalent molecular amounts of PCR products (about 1 μg DNA in total) were mixed with the following PCR components dNTPs, Taq polymerase and buffer and the two parts were assembled by 10 PCR cycles at 60 s 95°C, 60 s 51 0 C, 150 s 72°C and with a final elongation phase for 10 minutes at 68 0 C. About 100 ng of the assemble product was subsequently used as template for the final PCR. The final PCR step comprised in one particular example 58,2 μl H 2 O, 10 μl 10x buffer, 3 μl 50 mM MgCl 2 , 8 μl 10 μM primer AF-fw (SEQ ID No. 19), 8 μl 10 μM AF-bw (SEQ ID No. 20), 0,8 μl 100 mM dNTPs, 2 μl Taq polymerase and 10 μl template. The PCR programme was run with 30 cycles at 60 s 95°C, 60 s 51°C, 30 s 72°C and with a final elongation phase of 10 minutes at 68°C.

Table 1

Selection of binding proteins from an Affiphor library using Tat-mediated phage display to isolate binders of monoclonal anti-P24 antibody (CB) and of egg white lysozyme (HeI)

The Affiphor library generated as described above was cloned in the phage display vector pCD87SA. Phages were subsequently produced as described in Paschke, M. and Hohne, W. (2005) Gene 350:90-88. The selection was carried out in 96 well microtiter plates which were coated with the respective protein solution, i.e. with CB or HeI (100 μg/ml) over night at 4°C. Subsequently, 250 μl Ix caseine blocking buffer (Genosis) was used for blocking. The microtiter wells thus prepared were incubated with about 10 10 to 10 1 phages for two hours at room temperatures and afterwards washed 10x with PBST plus 0,1% Tween. Bound phage was eluted with 0.2 M glycine/HCl pH 2.2, 1 mg/ml BSA. The eluted phage was immediately neutralised with 1 M/Tris/HCl pH 9 for each of the two targets three subsequent selection rounds were carried out. After the third round of selection single clones were picked and the specificity of the selected clones was assessed by ELISA (see Fig. 4). For functional characterisation an Affiphor obtained by selection against CB was cloned into the vector pET9h expressed and purified using IMAC. By ELISA a dissociation constant of 3 x 10 "8 M was determined for its binding to CB. The excitation and emission spectra of this Affiphor is depicted in Fig. 5 and shows that equal amounts of both the monoclonal anti-P24- antibody and the Affiphor led to are essentially identical spectra. This result is indicative of the fact that the GFP molecule serving as the basis for the modification/randomization has been provided with specific binding activity without significantly altering the fluorescent properties of the underlying GFP molecule.