Indian Pediatrics - Technology Update

Technology Update

Indian Pediatrics 2001; 38: 875-883

Proteomics: Challenge for the New Millennium

Piyush Gupta
Lokesh Guglani

From the Department of Pediatrics, University College of Medical Sciences and G.T.B. Hospital, Delhi 110 095, India.

Correspondece to: Dr. Piyush Gupta, Block R-6-A, Dilshad Garden, Delhi 110 095, India.
E-mail: drpiyush@satyam.net.in

Mapping of the human genetic code, the biggest revolution of the 1990s, is all set to take us into the new millennium, with the hope of providing answers to the questions baffling mankind since the beginning of time. Nevertheless, how does one make sense of these huge reams of sequence data written with only four letters - A, T, C and G ? How can it help us in improving our understanding of disease, its mechanism at the cellular level and possible modes of treatment?

The genome provides a blueprint and reflects the potential of an organism but it still doesn’t tell us about what goes on in the back-alleys and the dark, dingy, congested lanes within a cell. True, genome sequencing has given us major insights into the complexity of this vast ghetto of life; but the eventual goal, i.e., to elucidate the organization and dynamics of the metabolic, signaling and regulatory networks through which the day-to-day life of the cells is transacted, remains elusive. To be able to understand how these networks become dysfunctional in disease, to reliably diagnose these alterations, and to manipulate their functions through drugs or genetic manipula-tions - that should be the real long term goal for the new millennium.

The Philosophy of Proteome

Proteome, a term coined by Wilkins et al. (1) in 1996. (PROTEin complement of a genOME) and used to describe the entire protein spectrum of a species, is the actual working currency of the cell. It signifies the entire workforce that determines biological phenotype of an organism. Proteomics is all about reading of proteome and has been defined as "the science that uses quantitative protein level measurement of gene expression to characterize biological processes (e.g., disease processes and drug effects) and decipher the mechanisms of gene expression control"(2).

Genes are recipes for proteins and proteins make up the executive faction of life. The way proteins are coiled in a three dimensional structure, decides their performance. For example, the Human Genome Project reads the sequence of a gene and annotates it, structural genomics checks out the shape of the protein the gene dictates, and proteomics checks out the functional relevance of the protein. Just as the sister discipline of genomics set itself the task of sequencing the complete human genome, proteomics now aims to map and identify the entire human proteome, and to compile the Human Protein Index (HPI) as a compre-hensive, tissue specific inventory of the proteins expressed in our species. Proteomics serves to complement the knowledge gained by genomics and in fact, takes it many a steps ahead.

The goal of proteomics is a comprehensive, quantitative description of protein expression and its change under the effect of biological perturbations such as disease or drug treatment. It studies protein properties (at the level of expression, post-translational modifications or interactions) on a large scale to obtain a global integrated view of disease processes, cellular processes and networks at protein level.

Genomics to Proteomics: Strategic Shift

Now that the complete genome sequence of about 18 organisms has been published (the first being of Haemophilus influenzae about five years ago)(3), there is a shift of emphasis towards making genomics functional. If the humble baker’s yeast, Saccharomyces cerevisiae is a guide, at least half the proteins encoded by the human genome may have no known function.

We cannot simply assume that there may be as many proteins as there are genes. A single mRNA may give rise to a number of proteins (due to extensive post-translational modifica-tions); on the other hand, synthesis of a single protein may be regulated by a number of genes. Most cellular systems operate purely in protein domain without any mRNA involvement and the protein is more stable in the clinical samples than mRNA; these are some important features, which steer the focus towards proteomics. Proteins embody the active life of cell while nucleic acids represent only plans. In proteomics, DNA has no function except to store information.

Therefore, the strategies required to approach completion in genomics and proteo-mics are quite different. The main advantage of DNA lies in their amazing alikeness; DNA in all of the estimated 252 different somatic cell types in man has the same basic sequence. Thus, the sequencing process can be applied to DNA from any source. The protein composition of different cell types is different, both quantitatively and qualitatively. Additionally, the three-dimen-sional structure, function, and final form of proteins can not be predicted with certainty from the linear codes of genes, and many, if not most, proteins are modified after they are synthesized! The world of individual proteins is far larger, more complex, and potentially more rewarding than the world of the genome(4).

Past of Proteomics: How it all Began

The concept of Proteomics actually dates back to 1975 when high-resolution two dimen-sional electrophoretic (2DE) methods were des-cribed for the first time by Klose(5), O’Farell(6) and Scheele(7) almost simultaneously. This opened new vistas for exploring detailed working of cellular machinery. Built around 2DE, efforts progressed towards constructing the biological equivalent of the periodic table for man, and the projected database of human molecular anatomy, i.e., "Human Protein Index (HPI)". Subsequently, HPI Task Force was set up in 1980 sought to take up this project on a large scale(8). Had things gone right, we would have had a proteomic database first, based on which the human genome would have been studied. But compared to proteins, genes have less complex structure. They are governed by simple rules of base pairing and are easy to amplify/modify in vitro. In addition, change of government in the US, serendipitous discovery of restriction enzymes, and failure to attract large-scale investments and support for HPI led to proteomics assuming a backseat. It took more than 20 years for the scientists to rediscover and rejuvenate the science of the proteome!

Processing the Proteome: The Technology

The basic task in proteomics is to separate and visualize the proteins from a sample. Two dimensional electrophoresis (2DE) was initially used for display of proteins. Mass spectrometry (MS) allows for rapid large-scale identification of proteins resolved by 2DE. An integration of 2DE and MS based approaches will be highly advantageous.

The complex protein mixtures are separated into individual proteins by solubilizing and denaturing into the subunits. In the first dimen-sion, these subunits are separated according to charge using isoelectric focusing (IEF). In this technique, a current is passed through a polyacrylamide gel. It leads to migration of polypeptides to fixed points based on the charge carried by them. This is followed by the second dimension, wherein the polypeptides are separated on the basis of size (i.e., molecular weight). These polypeptides or proteins are then stained using coomassie brilliant blue or silver stains to allow their visualization on the gel. Fluorescent stains and immunoblotting can also be used for the same purpose. These are being increasingly used nowadays as they provide enhanced sensitivity for the identification of specific molecules. This is followed by automated spot detection (using spot-excision robotics) and analysis of gel images with specialized software(9).

The proteins are finally identified and characterized for post-translational modifica-tions by mass spectrometry. MALDI-TOF (matrix-assisted laser desorption ionization-time of flight) and nano-electrospray are the currently used techniques. In these, proteins or peptides are ionized by electrospray ionization from liquid state and matrix assisted laser desorption ionization from solid state, and the mass of ions is measured very accurately by various coupled analyzers(10).

The use of protein bioinformatics databases (via the Internet - ExPASy proteomics server: www.expasy.ch/www/tools.html) for creating protein maps of various complex proteins like those of human plasma, urine, CSF and tissues such as breast and heart allows for comparison and further characterization of these proteins. It includes special software packages for quanti-tative analysis, databasing of electrophoretic separations and a range of bioinformatic tools for identifying proteins based on data from micro-chemical analysis. Given the dynamic nature of the proteome, this can be very helpful for result standardization and also for the sharing of information by these experiments all over the world. In addition, a number of 2-D Electro-phoresis and annotated protein databases (e.g., SWISS-PROT server) are being generated by proteome projects all over the world. These can be browsed with interactive software and integrated with in-house results(11). Thus, these real experiments done in the laboratories can be complemented with virtual experiments on the computers and Internet.

In addition to these relatively conventional approaches, newer approaches are coming up. In one such technique, all the proteins from a sample are digested with an endoproteinase (usually trypsin) to produce a mixture of hundreds of thousands of polypeptides. These, after separation on reverse-phase columns, are fed to online tandem mass spectrometers. The spectrometers automatically fragment peptides and the generated fragments are then matched against databases to identify the proteins. Yet another approach is to establish the presence of certain proteins in a tissue sample by screening with large number of antibodies known to be specific for certain protein sequences. In this, the peptides are first synthesized from expressed sequence tags (ESTs) or other nucleotide sequences of interest. The antibody probes to the peptides are produced using phage display techniques(12). The resulting antibodies can then be used in screening against tissue sections, e.g., those of tumors.

Proposed HPI Project: The Game Plan

The first stage of the project, now well under way, deals with high resolution mapping of major tissues and readily obtainable cell types. The second stage will involve cell separation and mapping of different cell types. Consider-able progress has been made in this direction. The third stage involves subcellular fractiona-tion and fractionation of soluble protein mixtures. This will allow determination of subcellular location of each protein. The fourth stage involves the production of an antibody library containing antibodies against each human protein. Fifth and the final stage is concerned with production of solid state protein chips for routine clinical use, both for diagnosis as well as for treatment of diseases. Current technology is sufficient only for the first stage and partly for the second and third stage of the project; much of the science involved in the HPI remains to be invented, developed, or refined(4). HPI is, therefore, a long-term endeavor, requiring both a friendly competition and funding from public and private sectors.

Proteomics: Perspective and Prospects

Proteomics is making rapid strides riding on the back of major technological advancements in the recent past in the field of biotechnology. The study of proteomics can be visualized from two main aspects:

Expression Proteomics deals with the estimation of quantity of each protein component within a protein complex. This may vary with the cell or tissue type or under various circumstances within the same cell. In other words, it observes quantitatively how the pattern of expression changes in a particular disease or in response to a drug and how we can usefully modify it to our advantage. This technique relies on the two dimensional separation of complex protein mixtures using two dimensional polyacrylamide gel electrophoresis (2D PAGE) and creates expression maps of these proteins. This aspect holds promise for disease-marker discovery, toxicology and in drug target validation.

Cell Map Proteomics determines the sub cellular location of proteins and their inter-actions, by purification of organelles or protein complexes followed by mass-spectrometric identification of components. Since these comp-lexes form part of the cellular machinery, their identification would help us to define these machines and allow ‘physical maps’ to be created for a number of cell types and states. This strategy is likely to prove effective for the study of pathways, assignment of protein function and validation of new disease targets(13).

Pharmacoproteomics: The biggest explosion, among all the potential utilities of proteomics, has been in the field of drug development(14). Most drugs act on cell proteins. The mechanism of action of drugs can be studied by studying protein interactions and maps of cellular pathways. Comparison of drugs with respect to their effectiveness in restoring the normal protein expression in various disease states could help physicians choose the optimal-response-producing agents. It may become possible to rank a series of candidate drugs in terms of their profile of effects and side-effects. The side effects of a drug are also mediated by drug action on proteins of cells uninvolved in the disease process. These are again identifiable by proteomics.

Another major application of proteomics is in investigating the dynamics of drug resistance with the help of response maps constructed by comparing resistant and susceptible individuals, the sum total of the responses representing the "resistome". This may help identify the early reversible stages of drug resistance and give us a global picture of the mechanics of resistance (15).

Heart Disease: Studies on pacing-induced heart failure in the dog and bovine dilated cardio-myopathy have shown a seven-fold increase in the enzyme ubiquitin carboxyl terminal hydro-lase, resulting in increased protein ubiquination in the disease state, leading to proteolysis. Another study has shown that inappropriate ubiquination of proteins could contribute to the development of heart failure. Studies on changes in cardiac proteins in response to alcohol and lead toxicity using proteomic analysis have also been done. It is likely that significant alterations in myocardial protein expression underlie most of the heart diseases and determine their progression and outcome(16). This aspect can be explored further with proteomics.

Neurological Disorders: Genomics may take us to the disease-causing genes. But it won’t tell us about any post-translational modifications taking place that may have a profound influence on the disease itself. Alzheimer’s disease shows the formation of neurofibrillary tangles whose major component is the microtubule associated protein ‘tau’, present in six alternately spliced isoforms. Electrophoretic analyses have shown the biochemical differences between normal and pathological ‘tau’(17).

Another example is that of prion mediated diseases, especially Creutzfeld-Jakob Disease (CJD) where the search is on for diagnostic and screening tests. Analysis of CSF in these patients by 2DE revealed two proteins designated p130 and p131, the presence of which could be used to differentiate between CJD and other dementias. The knowledge of these specific protein abnormalities may lead to development of newer therapeutic approaches(18).

Infectious Diseases: Now that the genome sequencing of a large number of micro-organisms is complete, the identification of proteins produced by these organisms through Proteomics will help in the search for deter-minants of virulence, new diagnostic markers, and candidate antigens for prospective vaccines. One of the best-studied organisms is Myco-bacterium tuberculosis. Proteins secreted by this organism are being systematically characterized and antigens are being scrutinized for serodiagnostic potential in early disease state or for incorporation into trial vaccines(19). Proteomics also offers a new set of tools for investigating parasites and parasite related diseases. Immunoblotting of proteome maps with first infection or hyperimmune sera can be used to identify the major immunodominant proteins present in parasite extracts (the "immunome"). Using the proteomic approach, antigens can be divided into those that are secreted, those that are only present at particular stages of the life cycle, and those that are restricted by one or more factors(15).

Another field where proteomics proves to be a valuable tool in identifying proteins of importance for diagnosis is proteome analysis of pathogenic microorganisms such as Borrelia burgdorferi (Lyme disease) and Toxoplasma gondii (toxoplasmosis). In the later, it is also possible to distinguish between acute and latent infection, an important dignostic tool in both pregnancy and immunosuppressed patients(20). Proteomics has also shown potential in differentiating between virulent and avirulent strains, and between drug resistant and drug sensitive strains of Candida albicans(21).

Vaccine Research: Rapid strides can be made in the field of vaccine development. As discussed above, principles of proteomics can be applied to identify potential candidate proteins that are vital for the organism and which are immuno-genic. In this aspect, genomics may supplement proteomics in identifying the products of predicted genes. In the first instance of this type, a putative vaccine candidate, the outer membrane lipoprotein P6 was identified by applying appropriate tools for genomic mining to the published sequence of Haemophilus influenzae Rd genome. Another milestone was achieved when vaccine candidates were identified for another pathogenic bacterium, Helicobacter pylori using two different approaches. One was the 2-D analysis of outer membrane proteins followed by its tryptic digestion and peptide mass map analysis and the other, by identification of monoclonal antibody reactive proteins from N-terminal sequence tags(22).

Cancer: Proteomics has extensive clinical applications in the field of oncology and simply by comparison of the normal and tumor tissues by 2-DE-based techniques, a wealth of data is being generated relating to the basic fiddles of tumorigenesis. The most notable studies so far have been on cancers of bladder and breast(23,24). Celis et al.(23) have identified several disease specific proteins, which can be used to raise antibodies capable of identifying metaplastic lesions. They have identified "psoriasin" as a protein marker of squamous cell carcinoma of the bladder, which may prove to be a simple, non-invasive marker for this cancer.

Mental Disorders: Application of proteomics to the construction of a proteomic map of the human hippocampus has led to the identification of 18 proteins with abnormal expression in schizophrenia, several of which have been mapped to chromosome 6. The potential roles of these proteins in the pathogenesis are being investigated(25).

Toxicology: This may prove to be a novel approach for the screening of toxic substances and probing of toxicity mechanisms. Once a reference library of proteomic signatures of known toxic compounds is created, it may be applicable for the study of toxicity of new compounds as well. Promising animal studies to pursue this aspect include the rabbit model of lead toxicity to identify the change in protein expression and liver protein changes in mice and hamsters following exposure to a perixosome proliferator. The rabbit model has identified several molecules, provisionally identified as glutathione-S variants, which may be developed into valuable markers of lead toxicity in humans(26).

Aging: A lot of effort and investment has been devoted to aging research in the west and for the biotech companies; it’s a lucrative commercial option for obvious reasons. The accumulation of non-enzymatic modifications of both DNA and protein molecules under the attack of reactive oxygen species is well established as one of the mechanisms of functional deterioration in aged cells. Characterization of these protein modifica-tions, caused due to either direct damage or via alterations in the DNA, by proteome analysis would prove to be an innovative tool for the investigation of molecular mechanisms of cellular aging(27).

Players in Proteomics: The Heat is On

The race to understand the proteins assembled by human genes is hotting up. Proteomics looms large on the horizon as the "next big science". Following the conquest of the human genome, the private sector is trying to get hold of the most lucrative business on earth, i.e., patenting the protein chips that are implicated in disease and those which will be useful in therapeutics.

Celera Genomics, Rockville, Maryland (the company that has nearly completed the human genomic sequences) is already set to embark on an ambitious effort to conquer proteomics as well(28). It has already raised $944 million in a stock offering last year, much of which will be devoted to proteomics. However, Celera faces a huge challenge and stiff competition from companies already in this field. Virtually every major pharmaceutical company has a proteo-mics effort under its belly and is gearing up to raise public money for expanded research. Large Scale Proteomics Corp (Rockville, USA), Genebio (Geneva, Switzerland), Protana (Odense, Denmark), Proteome Inc. (Beverly, USA), Proteome Systems (Sydney, Australia) and Oxford Glycosciences (Oxford, UK) are few of those after the proteomics rush. Besides these, protein analysis continues to advance in many other comparatively smaller organiza-tions, unlike genomic efforts, which are mainly coalesced around a few large centres. More than 6,000 papers have already been published in this field and by the time this article is published, the inaugural issue of "Proteomics", a new journal by Large Scale Proteomics, would have been released(4).

The competition promises to be fierce, with involvement of both big fish and big money. Proteomics appears to be a promising field and profitable business proposition, especially in the field of therapeutics. It is apparent now that the newly generated proteome data is going to be restricted to certain individuals and corporates. This is perhaps unfortunate but inevitable. Already, suggestions have been mooted to establish a worldwide, public access proteome database managed by an international secretariat. The proponents of this idea feel that the ultimate aim of benefiting the humans and the humanity should reign supreme and not be overawed by materialistic concerns.

To conclude, Proteomics has a vast undiscovered potential with regard to human science and research. Though very much in its intrauterine period right now, the arrival of this "new baby" will herald the beginning of an era of direct applicability of biotechnology in clinical science- something which had so far always been promised but never fully achieved. Proteomics is the precocious answer to the probable "what next?" question that would have arisen after the mapping of the entire human genome, due for completion in 2005. Till the whole mystery is unravelled, the research continues, as the proteome is ‘dynamic’; it keeps changing in response to every stimulus or disease process. As technology improves further, we may see nothing short of a revolution in almost every sphere of research and clinical application through the integration of genomics and proteomics.

Contributors: PG provided the framework and overall concept of the article and was in touch with some of the leading authorities doing active research in this field. He will act as the guarantor of the paper. LG collected the data and drafted the manuscript, which was edited by PG.

Funding: None.

Competing interests: None stated.

Key Messages

Proteome - the PROTEin complement of a genOME - is the basic functional machinery of the cell and its study is christened as Proteomics.
Proteomics is a newly emerging field, which deals with the protein expression and mapping to characterize biological processes and decipher gene expression control, so as to provide valuable insights into all life processes and its variation under various conditions.
Proteomics has vast potential in the fields of pharmacology, oncology, cardiology, infectious diseases and vaccine research, neurology, toxicology, gerontology and many more as yet unrealized applications.
A number of recent advances in the field of biotechnology have spurred on the work in the field of proteomics that holds great promise for the future.

References

1. Wilkins MR, Sanchez JC, Gooley, AA Appel RD, Humphery-Smith I, Hochstrasser DF, et al. Progress with proteome projects: Why all proteins expressed by a genome should be identified and how to do it. Biotechnol Genet Eng Rev 1996, 13: 19-50.

2. Anderson NL, Anderson NG. Proteome and Proteomics: New technologies, new concepts, and new words. Electrophoresis 1998; 19: 1853-1861.

3. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AF, et al. Whole genome random sequencing and assembly of Haemophilus influenzae Rd Science 1995; 269: 496-512.

4. Anderson NG, Matheson A, Anderson NL. Back to the future: The Human Protein Index and the agenda for post proteome biology. Proteomics 2001 (18 Jan 2001, inaugural issue: In press).

5, Klose J. Protein mapping by combined iso-electric focusing and electrophoresis of mouse tissues. A novel approach to testing for induced point mutations in mammals. Humangenetik 1975; 26: 231-243.

6. O’Farrell PH. High resolution two-dimensional electrophoresis of proteins. J Biol Chem 1975; 250: 4007-4021.

7. Scheele GA. Two-dimensional gel analysis of soluble proteins. Characterization of guinea pig exocrine pancreatic proteins. J Biol Chem 1975; 250: 5375-5385.

8. Anderson NG, Anderson L. The Human Protein Index. Clin Chem 1982; 28: 739-748.

9. Hochstrasser DF. Proteome in perspective. Clin Chem Lab Med 1998; 36: 825-836.

10. Yates JR III. Mass spectrometry-from genomics to proteomics. Trends in Genetics 2000; 16: 5-8.

11. Bairoch A, Apweiler R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1988. Nucleic Acids Res 1998; 26: 38-42.

12. Wilkins M. Proteomic paradigms for perceiving purpose. Trends Biotech 2000; 18: 91-92.

13. Blackstock WP, Weir MP. Proteomics: Quanti-tative and physical mapping of cellular proteins. Trends Biotech 1999; 17: 121-127.

14. Page MJ, Amess B, Rohlff C, Stubberfield C, Parekh R. Proteomics: A major new technology for the drug discovery process. Drug Discovery Today 1999; 4: 55-62.

15. Barett J, Jefferies JR, Brophy PM. Parasite Proteomics. Parasitology Today 2000; 16: 400-403.

16. Dunn MJ. Studying heart disease using the proteomic approach. Drug Discovery Today 2000; 5: 76-84.

17. Tolnay M, Probst A. Review: Tau protein pathology in Alzheimer’s disease and related disorders. Neuropathol Appl Neurobiol 1999; 25: 171-87.

18. Banks RE, Dunn MJ, Hochstrasser DF, Sanchez JC, Blackstock W, Pappin DJ, et al. Proteomics: new perspectives, new biomedical opportu-nities. Lancet 2000, 356: 1749-1756.

19. Orme IM. New Vaccines against tuberculosis: The status of current research. Infect Dis Clin North Am 1999; 13: 169-185.

20. Jungblut PR, Zimny-Amdt U, Zeindl-Eberhart E, Stulik J, Koupilova K, Pleissner KP, et al. Proteomics in human disease: Cancer, heart and infectious diseases. Electrophoresis 1999; 20: 2100-2110.

21. Niimi M, Cannon RD, Monk BC. Candida albicans pathogenicity: A proteomic perspec-tive. Electrophoresis 1999; 20: 2299-2308.

22. Chakravarti DN, Fiske MJ, Fletcher LD, Zagursky RJ. Application of genomics and proteomics for identification of bacterial gene products as potential vaccine candidates. Vaccine 2000; 19: 601-612.

23. Celis JE, Celis P, Ostergaard M, Basse B, Lauridsen JB, Ratz G, et al. Proteomics and immunohistochemistry define some of the steps involved in the squamous differentiation of bladder transitional epithelium: A novel strategy for identifying metaplastic lesions. Cancer Research 1999; 59: 3003-3009.

24. Anderson NL, Matheson AD, Steiner S. Proteomics: Applications in basic and applied biology. Curr Opin Biotech 2000; 11: 408- 412.

25. Edgar PF, Douglas JE, Cooper GJ, Dean B, Kydd R, Faull RL. Comparative analysis of the hippocampus implicates chromosome 6q in schizophrenia. Mol Psychiatry 2000; 5: 85-90.

26. Kanitz MH, Witzman FA, Zhu H, Fultz CD, Skaggs S, Moorman WJ. et al. Alterations in rabbit kidney protein expression following lead exposure as analyzed by two dimensional gel electrophoresis. Electrophoresis 1999; 20: 2977-2985.

27. Toda T. Status and perspectives of proteomics in aging research. Exp Gerontol 2000; 35: 803-810.

28. Service RF. Can Celera do it again. Science 2000; 287: 2136-2138.