Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data. As an interdisciplinary field of science, bioinformatics combines biology, computer science, mathematics and statistics to analyze and interpret biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques.

Bioinformatics is both an umbrella term for the body of biological studies that use computer programming as part of their methodology, as well as a reference to specific analysis “pipelines” that are repeatedly used, particularly in the field of genomics. Common uses of bioinformatics include the identification of candidates genes and single nucleotide polymorphisms (SNPs). Often, such identification is made with the aim of better understanding the genetic basis of disease, unique adaptations, desirable properties (esp. in agricultural species), or differences between populations. In a less formal way, bioinformatics also tries to understand the organizational principles within nucleic acid and protein sequences, called proteomics.
Historically, the term bioinformatics did not mean what it means today. Paulien Hogeweg and Ben Hesper coined it in 1970 to refer to the study of information processes in biotic systems.
To study how normal cellular activities are altered in different disease states, the biological data must be combined to form a comprehensive picture of these activities. Therefore, the field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data. This includes nucleotide and amino acid sequences, protein domains, and protein structures. The actual process of analyzing and interpreting data is referred to as computational biology. Important sub-disciplines within bioinformatics and computational biology include:
Development and implementation of computer programs that enable efficient access to, use and management of, various types of information
Development of new algorithms (mathematical formulas) and statistical measures that assess relationships among members of large data sets. For example, there are methods to locate a gene within a sequence, to predict protein structure and/or function, and to cluster protein sequences into families of related sequences.
The primary goal of bioinformatics is to increase the understanding of biological processes. What sets it apart from other approaches, however, is its focus on developing and applying computationally intensive techniques to achieve this goal. Examples include pattern recognition, data mining, machine learning algorithms, and visualization. Major research efforts in the field include sequence alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, prediction of gene expression and protein-protein interactions, genome-wide association studies, the modeling of evolution and cell division/mitosis.
Bioinformatics now entails the creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data.
Over the past few decades, rapid developments in genomic and other molecular research technologies and developments in information technologies have combined to produce a tremendous amount of information related to molecular biology. Bioinformatics is the name given to these mathematical and computing approaches used to glean the understanding of biological processes.

Industrializing Proteomics


Researchers are discovering a plethora of potential new biomarkers every year, each touted as the ‘next big thing’ that will help herald a new era of precision medicine. But so far, very few have made it into clinical practice. We find out how some proteomics laboratories are now tackling this bottleneck using a factory-type setup – to get more biomarkers into the clinic, faster.
“In many diseases, the medication given to the patient is often not effective – so we need to be able to stratify patients to give them the right drug, at the right dosage, at the right time,” says Professor Tony Whetton, Director of the Stoller Biomarker Discovery Centre at the University of Manchester.
But there is a giant hurdle in the way of this revolutionary new approach – known as precision medicine – becoming commonplace. To enable it to happen, doctors will need a battery of clinically robust biomarkers.
“The biomarker is the buzzword – it’s what allows us to distinguish between different states of the pathological process of disease,” explains Dr. Alexandre Zougman, Team Leader in Clinical Proteomics at the University of Leeds. “For us, a biomarker is a protein – for others, it could be something else – a gene or a metabolite.”
“If you look into the literature there are literally thousands of papers about biomarkers, but in reality, you don’t see that many of them coming into the clinic. It has to change,” he adds.


The Stoller Biomarker Discovery Centre in Manchester, which Whetton heads up, is the largest clinical biomarker facility in Europe. Its aim is not only to discover new biomarkers for diagnosis, prognosis, and response to therapy – but also to validate and verify them for clinical use. 
“Normally, the time-course for developing a new proteomics biomarker would be about 12 years or so. But by integrating all of the various aspects that are needed into a single center we plan to cut down that time considerably,” says Whetton.  The team has set out to overcome all the potential pinch-points in the pipeline from lab to the clinic as effectively as possible.
“Our first challenge is associating with a decent clinical study. The second is running the samples on mass spectrometers with the highest possible quality control you can achieve so that the data actually means something. And then you need to do informatics on extensive and deep datasets in order to turn it into information as swiftly as you possibly can,” explains Whetton.


Biological relevance for precision medicine depends on having statistically relevant numbers of samples, and one way of tackling this is by using larger and larger sample sets.
“What we’ve done is industrialize the proteomics so that we can turn out digitized maps on the sample after sample very swiftly,” says Whetton.
“Ordinarily, a proteomics lab might have one or two mass-spectrometers – but we’ve got 13 machines that can pump samples through the pipeline very effectively. Quality control is of a high level – and we’ve got a lot of high-end computing power so that we can process the data in a matter of seconds or minutes, whereas other labs may take hours,” he adds.
Moving biomarkers from the lab to the clinic
Importantly, the team is then able to contextualize their proteomics data with patients’ electronic healthcare records. And as the whole lab is built around good clinical practice, everything is in place to enable new biomarkers to move into the clinic as swiftly as possible.
Although there are a variety of different diseases where new clinical biomarkers may be helpful, the center is currently focussing on inflammatory diseases and cancer.
“For example, we’ve been looking for markers of risk in ovarian and lung cancer and have had some successes,” says Whetton.
In other diseases, including rheumatoid arthritis, they are seeking to identify new biomarkers that can help determine whether someone is responding to a particular treatment.
Advances in proteomics technology
A key enabler for this new factory-like approach has been a coming-of-age for mass spectrometry coupled with liquid chromatography (LC-MS) alongside better data-acquisition methods.
Mark Cafazzo, Director, Global Academic & Applied Markets Business at SCIEX, explains: “Over the last few years, we have seen a step-change in the speed and sensitivity and also the dynamic range of these instruments to be able to acquire enough data that can also show you a measurement on the very low-abundance protein in the presence of high-abundance proteins.”

“And new methods of acquiring the data are enabling labs to run more and more samples and get a reproducible quantitative result across the sample set for every protein that they’re looking for,” he adds.
But despite recent advancements in technology, antibody-based assays still remain very much at the fore when it comes down to the pathology. However, there is hope that mass spectrometry platforms could become a fixture of pathology labs in the future.
“We also employ two professors of pathology to try and develop new tests that can actually get used as opposed to just being a technique or a technology that doesn’t impact on the clinic,” says Whetton.
Next-generation bioinformatics
The next big challenge will be to find ways to handle the increasingly large datasets – and also finding ways to integrate the various ‘omics data to tie it all together at the biological level.
Cafazzo explains: “If the study is designed right and you can get RNA-Seq, proteomics and metabolomics results on the same set of samples then you have a much more powerful, very multi-dimensional set of data to play with to try and tease out the most useful markers.”
But the informatics solutions needed to actually do that are still in their infancy, with bigger advances necessary to manage those very rich sets of data.
“Clusters and arrays of hardware in a local site is one way to address it. Or another is to put your data into a cloud solution and to make use of a number of more powerful technology software applications,” says Cafazzo.
Unlocking the benefits of precision medicine
The future looks bright for clinical proteomics, particularly with the added power of industrialized proteomics that will help to propel more biomarkers into the clinic. Unlocking the plethora of benefits promised by precision medicine relies on its success.
Zougman sums up: “If you can find a molecule that’s either a prognostic or a diagnostic tool in different diseases that’s just great – it’s great for patient management, for the disease outcome, and for the health care system economically.”

Personalized Proteomics

Personalized Proteomics: The Future of Precision Medicine

Medical diagnostics and treatment have advanced from a one size fit all science to the treatment of the patient as a unique individual. Currently, this is limited solely to genetic analysis. However, epigenetic, transcriptional, proteomic, posttranslational modifications, metabolic, and environmental factors influence a patient’s response to disease and treatment. As more analytical and diagnostic techniques are incorporated into medical practice, the personalized medicine initiative transitions to precision medicine giving a holistic view of the patient’s condition. The high accuracy and sensitivity of mass spectrometric analysis of proteomes are well suited for the incorporation of proteomics into precision medicine. This review begins with an overview of the advance to precision medicine and the current state of the art in technology and instrumentation for mass spectrometry analysis. Thereafter, it focuses on the benefits and potential uses for personalized proteomic analysis in the diagnostic and treatment of individual patients. In conclusion, it calls for a synthesis between basic science and clinical researchers with practicing clinicians to design proteomic studies to generate meaningful and applicable translational medicine. As clinical proteomics is just beginning to come out of its infancy, this overview is provided for the new initiate

Top-down Mass Spectrometry based proteomics

Mass Spectrometry (MS)-based proteomics is a powerful tool for systems biology since it provides a systematic, global, unbiased, and quantitative assessment of proteins, including interactions, modifications, location, and function. 

Post-translational modifications (PTMs) modulate protein activity, stability, localization, and function, playing essential roles in many critical cell signaling events in both healthy and disease states. Dysregulation of a number of PTMs such as protein acetylation, glycosylation, hydroxylation, and phosphorylation, has been implicated in a spectrum of human diseases. The conventional peptide-based bottom-up shotgun proteomics approach is widely used but has intrinsic limitations for mapping proteinmodifications due to the dramatically increased complexity in examining an already complicated proteome as each protein is digested into many peptide components as well as loss of specific information concerning the protein since only a small and variable fraction of the digested peptides are recovered.
In contrast, the protein-based top-down MS-based proteomics approach is arguably the most powerful technique for analysis of protein modifications. In the top-down approach, intact proteins are analyzed, which greatly simplifies sample preparation and reduces the mixture complexity as no proteolytic digestion is required. Subsequently, specific proteins of interests can be “gas-phase” purified and modification sites can be mapped by tandem MS (MS/MS) strategies. The top-down MS provides comprehensive sequence information for the whole protein by detecting all types of PTMs (e.g. phosphorylation, proteolysis, acetylation) and sequence variants (e.g. mutations, polymorphisms, alternatively spliced isoforms) simultaneously in one spectrum (a “bird’s eye view”) without a priori knowledge. We have made significant advances in top-down MS for analysis of large intact proteins purified from complex biological samples including cell and tissue lysate as well as body fluids. We have shown that top-down MS has unique advantages for unraveling the molecular complexity, quantifying modified protein forms, deep sequencing of intact proteins, mapping modification sites with full sequence coverage, discovering unexpected modifications, identifying and quantifying positional isomers and determining the order of multiple modifications. Moreover, we have shown that a tandem mass spectrometry technique, electron capture dissociation (ECD), is especially useful for mapping labile PTMs such as phosphorylation which is well-preserved during the ECD fragmentation process. Notably, we have been able to isotopically resolve large proteins (>115 kDa) with very high mass accuracy (1-3 ppm) and extended ECD to characterize very large phosphoproteins (>140 kDa)
Nevertheless, the top-down MS approach still faces significant challenges in terms of protein solubility, separation, and the detection of low abundance and large proteins, as well as under-developed data analysis tools. Consequently, new technological developments are urgently needed to advance the field of top-down proteomics. We have been establishing an integrated top-down disease proteomics platform to globally examine intact proteins extracted from tissues for the identification and quantification of proteins and possible PTMs present in vivo. Specifically, we are developing novel approaches to address the current challenges in top-down MS-based proteomics.
A. To address the protein solubility challenge, we are developing new degradable surfactants that can effectively solubilize proteins and are compatible with top-down MS. we have recently developed an MS-compatible slowly degradable Surfactant (MasDeS) that can effectively solubilize proteins.24 Furthermore, we demonstrated that the solubility of membrane protein was significantly improved with the addition of this new surfactant. We are also developing different types of degradable surfactants and evaluating their performance for top-down proteomics.
B. To address the proteome complexity challenge, we are developing new chromatography materials and novel multi-dimensional liquid chromatography (MDLC) strategies to separate intact proteins. To address the proteome complexity challenge, we are developing new chromatography materials and novel strategies for multi-dimensional liquid chromatography (MDLC) to separate intact proteins. We have demonstrated the use of ultrahigh-pressure size exclusion chromatography (UHP-SEC)and hydrophobic interaction chromatography (HIC)for top-down proteomics. Moreover, we have developed a novel 3DLC strategy by coupling HIC with ion exchange chromatography (IEC) and reverse phase chromatography (RPC) for intact protein separation. We demonstrated that this 3D (IEC-HIC-RPC) approach greatly outperformed the conventional 2D IEC-RPC approach. We are now developing novel chromatography materials for intact protein separation.
C. To address the proteome dynamic range, we have been developing novel nanomaterials that can bind low abundance proteins with PTMs (e.g. phosphorylation) with high specificity in collaboration with a nanotechnologist, Prof. Song Jin (U. of Wisconsin). The current focus is to develop multivalent nanoparticle (NP) reagents for capturing phosphoproteins globally out of the human proteome for top-down MS analysis of intact phosphoproteins.
D. To address the challenge in under-developed software, we are developing user-friendly and versatile software interface for comprehensive analysis of high-resolution top-down MS-based proteomics data. Previously, we have developed a MASH Suite, a versatile and user-friendly software interface for processing, interpreting, visualizing and presenting high-resolution MS data. Recently, we have developed MASH Suite Pro, a comprehensive, user-friendly and freely available program tailored for top-down high-resolution mass spectrometry (MS)-based proteomics (Manuscript submitted). MASH Suite Pro significantly simplifies and speeds up the processing and analysis of top-down proteomics data by combining tools for protein identification, quantitation, characterization, and validation into a customizable and user-friendly interface.
We envision that by taking this multi-pronged approach to overcome the challenges facing top-down proteomics, we will significantly advance the burgeoning top-down proteomics field, which recently gained momentum through the creation of the Consortium for Top-down Proteomics

Systems proteomics of liver mitochondria function

Combined analysis of large data sets characterizing genes, transcripts, and proteins can elucidate biological functions and disease processes. Williams et al. report an exceptionally detailed characterization of mitochondrial function in a genetic reference panel of recombinant inbred mice. They measured the metabolic function of nearly 400 mice under various environmental conditions and collected detailed quantitative information from livers of the animals on over 25,000 transcripts. These data were integrated with quantitation of over 2500 proteins and nearly 1000 metabolites. Such analysis showed a frequent lack of correlation of transcript and protein abundance, enabled the identification of genomic variants of mitochondrial enzymes that caused inborn errors in metabolism, and revealed two genes that appear to function in cholesterol metabolism.
Structured Abstract
Over the past two decades, continuous improvements in “omics” technologies have driven an ever-greater capacity to define the relationships between genetics, molecular pathways, and overall phenotypes. Despite this progress, the majority of genetic factors influencing complex traits remain unknown. This is exemplified by mitochondrial supercomplex assembly, a critical component of the electron transport chain, which remains poorly characterized. Recent advances in mass spectrometry have expanded the scope and reliability of proteomics and metabolomics measurements. These tools are now capable of identifying thousands of factors driving diverse molecular pathways, their mechanisms, and consequent phenotypes and thus substantially contribute toward the understanding of complex systems.
Genome-wide association studies (GWAS) have revealed many causal loci associated with specific phenotypes, yet the identification of such genetic variants has been generally insufficient to elucidate the molecular mechanisms linking these genetic variants with specific phenotypes. A multitude of control mechanisms differentially affect the cellular concentrations of different classes of biomolecules. Therefore, the identification of the causal mechanisms underlying complex trait variation requires quantitative and comprehensive measurements of multiple layers of data—principally of transcripts, proteins, and metabolites and the integration of the resulting data. Recent technological developments now support such multiple layers of measurements with a high degree of reproducibility across diverse sample or patient cohorts. In this study, we applied a multilayered approach to analyze metabolic phenotypes associated with mitochondrial metabolism.
We profiled metabolic fitness in 386 individuals from 80 cohorts of the BXD mouse genetic reference population across two environmental states. Specifically, this extensive phenotyping program included the analysis of metabolism, mitochondrial function, and cardiovascular function. To understand the variation in these phenotypes, we quantified multiple, detailed layers of systems-scale measurements in the livers of the entire population: the transcriptome (25,136 transcripts), proteome (2622 proteins), and metabolome (981 metabolites). Together with full genomic coverage of the BXDs, these layers provide a comprehensive view on overall variances induced by genetics and environment regarding metabolic activity and mitochondrial function in the BXDs. Among the 2600 transcript-protein pairs identified, 85% of observed quantitative trait loci uniquely influenced either the transcript or protein level. The transomic integration of molecular data established multiple causal links between genotype and phenotype that could not be characterized by any individual data set. Examples include the link between D2HGDH protein and the metabolite D-2-hydroxyglutarate, the BCKDHA protein mapping to the gene Bckdhb, the identification of two isoforms of ECI2, and mapping mitochondrial supercomplex assembly to the protein COX7A2L. These respectively measured variants in these mitochondrial proteins were in turn associated with varied complex metabolic phenotypes, such as heart rate, cholesterol synthesis, and branched-chain amino acid metabolism. Of note, our transomics approach clarified the contested role of COX7A2L in mitochondrial supercomplex formation and identified and validated Echdc1 and Mmab as involved in the cholesterol pathway.
Overall, these findings indicate that data generated by next-generation proteomics and metabolomics techniques have reached a quality and scope to complement transcriptomics, genomics, and phenomics for transomic analyses of complex traits. Using mitochondria as a case in point, we show that the integrated analysis of these systems provides more insights into the emergence of the observed phenotypes than any layer can by itself, highlighting the complementarity of a multilayered approach. The increasing implementation of these omics technologies as complements, rather than as replacements, will together move us forward in the integrative analysis of complex traits.