Top-down Mass Spectrometry based proteomics

Mass Spectrometry (MS)-based proteomics is a powerful tool for systems biology since it provides a systematic, global, unbiased, and quantitative assessment of proteins, including interactions, modifications, location, and function. 

Post-translational modifications (PTMs) modulate protein activity, stability, localization, and function, playing essential roles in many critical cell signaling events in both healthy and disease states. Dysregulation of a number of PTMs such as protein acetylation, glycosylation, hydroxylation, and phosphorylation, has been implicated in a spectrum of human diseases. The conventional peptide-based bottom-up shotgun proteomics approach is widely used but has intrinsic limitations for mapping proteinmodifications due to the dramatically increased complexity in examining an already complicated proteome as each protein is digested into many peptide components as well as loss of specific information concerning the protein since only a small and variable fraction of the digested peptides are recovered.
In contrast, the protein-based top-down MS-based proteomics approach is arguably the most powerful technique for analysis of protein modifications. In the top-down approach, intact proteins are analyzed, which greatly simplifies sample preparation and reduces the mixture complexity as no proteolytic digestion is required. Subsequently, specific proteins of interests can be “gas-phase” purified and modification sites can be mapped by tandem MS (MS/MS) strategies. The top-down MS provides comprehensive sequence information for the whole protein by detecting all types of PTMs (e.g. phosphorylation, proteolysis, acetylation) and sequence variants (e.g. mutations, polymorphisms, alternatively spliced isoforms) simultaneously in one spectrum (a “bird’s eye view”) without a priori knowledge. We have made significant advances in top-down MS for analysis of large intact proteins purified from complex biological samples including cell and tissue lysate as well as body fluids. We have shown that top-down MS has unique advantages for unraveling the molecular complexity, quantifying modified protein forms, deep sequencing of intact proteins, mapping modification sites with full sequence coverage, discovering unexpected modifications, identifying and quantifying positional isomers and determining the order of multiple modifications. Moreover, we have shown that a tandem mass spectrometry technique, electron capture dissociation (ECD), is especially useful for mapping labile PTMs such as phosphorylation which is well-preserved during the ECD fragmentation process. Notably, we have been able to isotopically resolve large proteins (>115 kDa) with very high mass accuracy (1-3 ppm) and extended ECD to characterize very large phosphoproteins (>140 kDa)
Nevertheless, the top-down MS approach still faces significant challenges in terms of protein solubility, separation, and the detection of low abundance and large proteins, as well as under-developed data analysis tools. Consequently, new technological developments are urgently needed to advance the field of top-down proteomics. We have been establishing an integrated top-down disease proteomics platform to globally examine intact proteins extracted from tissues for the identification and quantification of proteins and possible PTMs present in vivo. Specifically, we are developing novel approaches to address the current challenges in top-down MS-based proteomics.
A. To address the protein solubility challenge, we are developing new degradable surfactants that can effectively solubilize proteins and are compatible with top-down MS. we have recently developed an MS-compatible slowly degradable Surfactant (MasDeS) that can effectively solubilize proteins.24 Furthermore, we demonstrated that the solubility of membrane protein was significantly improved with the addition of this new surfactant. We are also developing different types of degradable surfactants and evaluating their performance for top-down proteomics.
B. To address the proteome complexity challenge, we are developing new chromatography materials and novel multi-dimensional liquid chromatography (MDLC) strategies to separate intact proteins. To address the proteome complexity challenge, we are developing new chromatography materials and novel strategies for multi-dimensional liquid chromatography (MDLC) to separate intact proteins. We have demonstrated the use of ultrahigh-pressure size exclusion chromatography (UHP-SEC)and hydrophobic interaction chromatography (HIC)for top-down proteomics. Moreover, we have developed a novel 3DLC strategy by coupling HIC with ion exchange chromatography (IEC) and reverse phase chromatography (RPC) for intact protein separation. We demonstrated that this 3D (IEC-HIC-RPC) approach greatly outperformed the conventional 2D IEC-RPC approach. We are now developing novel chromatography materials for intact protein separation.
C. To address the proteome dynamic range, we have been developing novel nanomaterials that can bind low abundance proteins with PTMs (e.g. phosphorylation) with high specificity in collaboration with a nanotechnologist, Prof. Song Jin (U. of Wisconsin). The current focus is to develop multivalent nanoparticle (NP) reagents for capturing phosphoproteins globally out of the human proteome for top-down MS analysis of intact phosphoproteins.
D. To address the challenge in under-developed software, we are developing user-friendly and versatile software interface for comprehensive analysis of high-resolution top-down MS-based proteomics data. Previously, we have developed a MASH Suite, a versatile and user-friendly software interface for processing, interpreting, visualizing and presenting high-resolution MS data. Recently, we have developed MASH Suite Pro, a comprehensive, user-friendly and freely available program tailored for top-down high-resolution mass spectrometry (MS)-based proteomics (Manuscript submitted). MASH Suite Pro significantly simplifies and speeds up the processing and analysis of top-down proteomics data by combining tools for protein identification, quantitation, characterization, and validation into a customizable and user-friendly interface.
We envision that by taking this multi-pronged approach to overcome the challenges facing top-down proteomics, we will significantly advance the burgeoning top-down proteomics field, which recently gained momentum through the creation of the Consortium for Top-down Proteomics

Systems proteomics of liver mitochondria function

Combined analysis of large data sets characterizing genes, transcripts, and proteins can elucidate biological functions and disease processes. Williams et al. report an exceptionally detailed characterization of mitochondrial function in a genetic reference panel of recombinant inbred mice. They measured the metabolic function of nearly 400 mice under various environmental conditions and collected detailed quantitative information from livers of the animals on over 25,000 transcripts. These data were integrated with quantitation of over 2500 proteins and nearly 1000 metabolites. Such analysis showed a frequent lack of correlation of transcript and protein abundance, enabled the identification of genomic variants of mitochondrial enzymes that caused inborn errors in metabolism, and revealed two genes that appear to function in cholesterol metabolism.
Structured Abstract
Over the past two decades, continuous improvements in “omics” technologies have driven an ever-greater capacity to define the relationships between genetics, molecular pathways, and overall phenotypes. Despite this progress, the majority of genetic factors influencing complex traits remain unknown. This is exemplified by mitochondrial supercomplex assembly, a critical component of the electron transport chain, which remains poorly characterized. Recent advances in mass spectrometry have expanded the scope and reliability of proteomics and metabolomics measurements. These tools are now capable of identifying thousands of factors driving diverse molecular pathways, their mechanisms, and consequent phenotypes and thus substantially contribute toward the understanding of complex systems.
Genome-wide association studies (GWAS) have revealed many causal loci associated with specific phenotypes, yet the identification of such genetic variants has been generally insufficient to elucidate the molecular mechanisms linking these genetic variants with specific phenotypes. A multitude of control mechanisms differentially affect the cellular concentrations of different classes of biomolecules. Therefore, the identification of the causal mechanisms underlying complex trait variation requires quantitative and comprehensive measurements of multiple layers of data—principally of transcripts, proteins, and metabolites and the integration of the resulting data. Recent technological developments now support such multiple layers of measurements with a high degree of reproducibility across diverse sample or patient cohorts. In this study, we applied a multilayered approach to analyze metabolic phenotypes associated with mitochondrial metabolism.
We profiled metabolic fitness in 386 individuals from 80 cohorts of the BXD mouse genetic reference population across two environmental states. Specifically, this extensive phenotyping program included the analysis of metabolism, mitochondrial function, and cardiovascular function. To understand the variation in these phenotypes, we quantified multiple, detailed layers of systems-scale measurements in the livers of the entire population: the transcriptome (25,136 transcripts), proteome (2622 proteins), and metabolome (981 metabolites). Together with full genomic coverage of the BXDs, these layers provide a comprehensive view on overall variances induced by genetics and environment regarding metabolic activity and mitochondrial function in the BXDs. Among the 2600 transcript-protein pairs identified, 85% of observed quantitative trait loci uniquely influenced either the transcript or protein level. The transomic integration of molecular data established multiple causal links between genotype and phenotype that could not be characterized by any individual data set. Examples include the link between D2HGDH protein and the metabolite D-2-hydroxyglutarate, the BCKDHA protein mapping to the gene Bckdhb, the identification of two isoforms of ECI2, and mapping mitochondrial supercomplex assembly to the protein COX7A2L. These respectively measured variants in these mitochondrial proteins were in turn associated with varied complex metabolic phenotypes, such as heart rate, cholesterol synthesis, and branched-chain amino acid metabolism. Of note, our transomics approach clarified the contested role of COX7A2L in mitochondrial supercomplex formation and identified and validated Echdc1 and Mmab as involved in the cholesterol pathway.
Overall, these findings indicate that data generated by next-generation proteomics and metabolomics techniques have reached a quality and scope to complement transcriptomics, genomics, and phenomics for transomic analyses of complex traits. Using mitochondria as a case in point, we show that the integrated analysis of these systems provides more insights into the emergence of the observed phenotypes than any layer can by itself, highlighting the complementarity of a multilayered approach. The increasing implementation of these omics technologies as complements, rather than as replacements, will together move us forward in the integrative analysis of complex traits.

Clinical Chemistry

BACKGROUND: There is an urgent need for blood-based molecular tests to assist in the detection and diagnosis of cancers at an early stage, when curative interventions are still possible, and to predict and monitor response to treatment and disease recurrence. The rich content of proteins in the blood that are impacted by tumor development and host factors provides an ideal opportunity to develop noninvasive diagnostics for cancer.
CONTENT: Mass spectrometry instrumentation has advanced sufficiently to allow the discovery of proteinalterations directly in plasma across no less than 7 orders of magnitude of protein abundance. Moreover, the use of proteomics to harness the immune response in the form of seropositivity to tumor antigens has the potential to complement circulating protein biomarker panels for cancer detection. The depth of analysis currently possible in a discovery setting allows the detection of potential markers at concentrations of less than 1 μg/L. Such low concentrations may exceed the limits of detection of ELISAs and thus require the development of clinical assays with exquisite analytical sensitivity. Clearly, the availability for discovery and validation of biospecimens that are highly relevant to the intended clinical application and have been collected, processed, and stored with the use of standard operating procedures is of crucial importance to the successful application of proteomics to the development of blood-based tests for cancer.
SUMMARY: The realization of the potential of proteomics to yield blood biomarkers will benefit from a collaborative approach and a substantial investment in resources.
For disease investigation, the profiling of blood constituents, notably serum and plasma, using protein characterization technologies holds long-standing interest because of the easy accessibility of this circulating fluid and its rich content of proteins that inform scientists about the health status of an individual. The available methodologies to analyze proteins have evolved dramatically over the past few decades. The initial method consisted of 1-dimensional protein separations, which was followed by the use of 2-dimensional polyacrylamide gel electrophoresis coupled with Edman sequencing. The advent of mass spectrometry, coupled with the sequencing of human and other genomes, has had a dramatic impact on the field of proteomics. The capabilities of current proteomics technologies in terms of coverage of the proteome and depth of analysis that can be achieved in a quantitative manner are truly astounding compared with just a decade ago. Recent advances include substantial increases in speed, analytical sensitivity, and dynamic range, and the availability of multiple fragmentation techniques. Equally important is orthogonal sample fractionation before mass spectrometry. Yet there remains a perception that proteomics technologies are inadequate to address the protein complexities inherent in cells, tissues, and biological fluids. Here we outline strategies for the application of proteomics to the development of blood-based cancer markers.

Proteomics Cells

Researchers can count on improved proteomics method: 
Every cell in the body contains thousands of different protein molecules and they can change this composition whenever they are induced to perform a particular task or convert into a different cell type. Understanding how cells function depends on proteomics, the ability to measure all of the changes in a cell’s protein components.
In a recent paper published in the journal Analytical Chemistry, Martin Wühr and colleagues in Princeton University’s Department of Molecular Biology described an improved method to accurately count the proteins present in a cell under different circumstances.
The basic tool for counting proteins is a machine called a mass spectrometer. Cell samples can be run through this type of instrument one at a time, but this is laborious and it can be difficult to detect any changes between different samples. An alternative approach is to label all of the proteins in a particular sample with a unique “isobaric” tag. Multiple samples–up to 11–can then be mixed together and run through the mass spectrometer at the same time, with the isobaric tag functioning as an identifying barcode that tells the researcher which sample the protein originally came from. This speeds things up and makes it easier to quantify any changes in the protein composition of different samples.
“However, with the simplest version of isobaric tagging, known as TMT-MS2, there are major difficulties in distinguishing real signals from background noise,” Wühr explains. “That makes the readouts unreliable and only semi-quantitative.”
A more complex version of isobaric tagging, called TMT-MS3, can improve this signal-to-noise problem, but it is slower and less sensitive. Moreover, it relies on a much more expensive type of mass spectrometer beyond the reach of most researchers.
While he was a postdoc at Harvard University, Wühr developed a different approach to isobaric tagging that solved the signal-to-noise problem while remaining compatible with cheaper, widely available mass spectrometers. But the technique–known as TMTc–was not without its own problems, particularly a lack of precision that made it hard to obtain consistent results.
In their recent Analytical Chemistry paper, Wühr and two of his graduate students, Matthew Sonnett and Eyan Yeung, described an improved version of TMTc that they named TMTc+. By changing how the cell samples are prepared and altering the computer algorithm that extracts data from the mass spectrometer, Wühr and colleagues were able to address many of the limitations associated with the various methods of isobaric tagging.
“The TMTc+ method is in a kind of sweet spot compared to the other methods,” Wühr says. “It provides superb measurement accuracy and precision, it’s at least as sensitive as any other method, and it’s compatible with around ten times more mass spectrometers than TMT-MS3.”
Naturally, Wühr says, there is still room for improvement. TMTc+ only allows a maximum of 5 samples to be run at the same time, and the detection of proteins in these samples is relatively inefficient. Both of these problems can be solved by developing new types of isobaric tags. “We have to explore the chemical space of these tags and find ones that work really well,” Wühr says. “To this end, we have started a collaboration with the Carell group, organic chemistry experts at the LMU Munich, and already published a proof of principle paper. Eventually, these efforts should lead to an approach that will allow researchers to count every protein in a cell as it changes its form and function.”

Host/Pathogen Proteomics

Host/Pathogen Proteomics: Mycoplasma pneumoniae
The bacterium Mycoplasma pneumoniae colonizes host pulmonary epithelium and is the most common cause of human community-acquired pneumonia. It successfully avoids detection by the host immune system, as the microbe alters its own cell membrane to mimic its host in order to establish chronic respiratory infections. Progression to autoimmune disease is not unknown.
Initial infection by M. pneumoniae stimulates production and release of proteins by the host cell in response to binding to the infectious agent and its internalization. Li et al. (2014) used a label-free shotgun quantitative proteomics approach to investigate the host secretome.1By characterizing the biologically active proteins released in this initial phase, the researchers hope to elucidate the pathways involved and the roles of these secretory proteins in disease pathogenesis.
Li and colleagues used the human alveolar carcinoma cell line A549 to establish initial proteomic events following infection by M. pneumoniae. They chose airway epithelial cells because these are the most common primary cells colonized by the microorganism upon infection.

The researchers cultured A549 cells either with or without the infectious agent before harvesting the conditioned media. The scientists analyzed the tryptic digests by nano liquid chromatography–tandem mass spectrometry (LC-MS/MS) using an LTQ Velos ion trap mass spectrometer (Thermo Scientific). The authors searched the data obtained against the IPI human protein database v3.60 to identify proteins secreted by the cell cultures.
Initial LC-MS/MS quantification using DeCyder software showed that 113 out of the 256 proteins identified showed at least 1.5-fold differential expression between the control (uninfected) and the infected cell cultures. Of these, 65 were elevated in abundance and 48 were reduced in the infected cells. Nine proteins were found only in the uninfected control cells, whereas 10 were exclusive to the cells post-infection. Interleukin-33 (IL-33) was one of these proteins, and the researchers confirmed increased levels following infection by utilizing an enzyme-linked immunosorbent assay (ELISA).
The researchers confirmed the proteomics results using Western blotting and real-time polymerase chain reaction (PCR) for selected proteins. Immunoblotting identified the same proteins in cell lysates and conditioned media, showing results consistent with the LC-MS/MS findings for proteins ADAM9, SERPINE1, IL-33, IGFBP4, Gal-1, and MIF, which were more abundant in conditioned media from infected cells.
Out of the 256 identified, over 59% (n = 152) of the proteins were classified as either classical secretory (n = 83) or non-classical secretory (n = 69). A total of 190 proteins were associated with exosomes. Further analysis using GO classification showed that the majority were nuclear-associated. The researchers used DAVID 6.7 for functional annotation clustering analysis and found that most proteins were associated with vesicles or were from the extracellular region or matrix.
Using the KEGG database for pathway analysis, the researchers found that those associated with metabolism, infection and proliferation were over-represented in the proteins identified post-infection. Li and co-workers further analyzed their data to discover more about the functional processes targeted by the post-infection secretome, using the BiNGO tool and STRING algorithm to examine differentially expressed proteins. The STRING analysis highlighted clusters involving stress and immune response pathways, among others. In summary, pathway analyses gave the researchers indications of where M. pneumoniae could induce protein secretion that alters the host cell function.
Finally, Li and co-authors repeated their investigation and found similar proteins in broncheoaveolar lavage samples and plasma from patients with confirmed M. pneumoniae infection. When they measured IL-33 in these two samples, they found elevated levels compared with a control group of patients who presented with respiratory foreign body. Statistical testing showed that IL-33 could indeed be used as a diagnostic marker of infection.
Overall, Li et al. are confident that, by characterizing the post-infection secretome, they have uncovered new regulatory pathways in the host response to M. pneumoniae that can be explored further to elucidate disease pathogenesis and treatment options.