Chemometrics in Metabolomics (Part-I):Overview of Biomarker Discovery
Diagnosis is a process to identify the exact cause of adverse symptoms experienced by the subject. In order to identify the cause, diagnosis process often looks at the constituents of the biofluid and check for the presence of a marker that is unique to the disease.
Biomarker is defined as cellular, biochemical or molecular alterations that are measurable in biological media such as human tissues, cells, or fluids.
Presence of various surface protein of the biopsy sample for cancer, bilirubin in urine for jaundice, blood glucose for diabetes is common examples of the biomarker. An ideal biomarker should have following properties a) should be sensitive and specific to a particular disease condition, b) present in a non-invasive and minimally invasive fluid, c) can be detected at a very early stage of a disease onset, d) rapid analysis and e) cost-effective. Therefore, although diagnostic markers for many diseases are already available, hunt for identification of a marker that matches best to the above-mentioned criterion is still on. In this article, I will confine my discussion on identification of biochemical marker using metabolomics.
Metabolomics is emerging as a latest revolution in the functional genomics arena. A total number of metabolites and their abundances/concentrations in a biological system is known as metabolome and was coined for the first time in 1998. The technological approach to capture the closest form of this metabolome information is known as metabolomics. Currently, Gas-Chromatography-Mass Spectrometry (GC-MS), Liquid Chromatography-Mass Spectrometry (LC-MS) and Nuclear Magnetic Resonance (NMR) are the major technology platform utilized for metabolomics analysis. However, none of the platform alone can identify all the metabolites present is a biological matrix viz, blood, serum, plasma, urine etc. It is estimated that there are close to 3000 metabolites present in human body, however, based on sensitivity, resolution, and type of instruments, a single platform can identify up to 1000 or little more.
In fact, the techniques used for capturing metabolome information are not new. All these analytical platforms are it mass spectrometric or magnetic resonance are known since 60’s or 70’s. However, with continuous development in terms of their sensitivity and resolution, numbers of molecules detected by these instruments have improved many folds. This was aided by concomitant advancement in the field of chemometrics. Chemometrics has made its presence relevant throughout the steps involved in metabolomics be its data acquisition, raw data pre-processing, pattern analyses and identification of important feature(s).
The real challenge is to identify the biomarker(s) of a particular disease from hundreds of metabolites identified by metabolomics. Here chemometrics plays an important role. Chemometrics can be defined as the method of analyzing chemical data using mathematical, statistical and informatics tools and techniques. For diagnostic marker discovery, case-control subject classification is used, i.e., a comparative analysis between well-characterized patients and healthy controls. In some cases, a set of a patient cohort is followed from diseased to clinically treated condition following therapeutic intervention for a comparative analysis between before disease and after disease condition. Figure 1 demonstrates the steps involved in metabolomics-based biomarker discovery.
Each sample following data acquisition in an appropriate platform generates a data file commonly called as a raw data file. The process of mining meaningful information from the raw data file for further analysis is called data pre-processing. In mass spectrometric platform (GC-MS, LC-MS) data pre-processing include following steps: baseline correction, noise filtering, pick peaking, deconvolution, spectral matching, library annotation, alignment, and data integration. Currently, most of the instrument manufacturer develop their own software for data pre-processing, albeit external software are also available for pre-processing of MS raw data.
Following data alignment and integration in raw data analysis the analyst now have the metadata or the data matrix to analyze further for pattern analyses and identification of important feature. It involved multiple statistical steps to identify a robust biomarker of a set of biomarkers. Considering the complexity of the data matrix and variations within or between the groups, metabolomics researchers use both uni- and multivariate statistical tools to identify biomarkers. In practice, they develop a statistical model using a set of samples known as discovery set to identify important features that have a difference in presence between or among groups viz., disease/non-disease, before/after disease, mild/moderate/severe disease. The validity of the model is then checked in different sets of a subject if these tentative biomarkers can place the subjects in the appropriate group.
In the upcoming articles on Chemometrics in Metabolomics, I will be discussing the raw data analyses part using bioinformatics tools and techniques and also about the statistical investigation for biomarker discovery.
Bibliography and Further Reading:
- Johnson, C.H., Ivanisevic, J., Benton, H.P., Siuzdak, G., 2015. Bioinformatics: The Next Frontier of Metabolomics. Analytical Chemistry 87 (1), 147-156.
- Wishart, D., 2009. Bioinformatics for Metabolomics. In: Krawetz, S., (Ed.,): Bioinformatics for Systems Biology, pp 581-599. DOI 10.1007/978-1-59745-440-7_30.
- Blekherman, G., Laubenbacher, R., Cortes, D.F., Mendes, P., Torti, F.M., Akman, S., Torti, S.V., Shulaev, V., 2011. Bioinformatics tools for cancer metabolomics. Metabolomics 7 (3) 329-343.
Tutorial: Vina Output Analysis Using PyMol
The analysis of Autodock Vina  results is a bit tricky in the sense of viewing all interactions and selecting the best pose. In our last video tutorial, we explained how to analyze docking results obtained from Vina using PyMol. This article is the written guide for the same. (more…)
Video Tutorial: Autodock Vina Result Analysis with PyMol
This is a video tutorial to demonstrate the analysis of Autodock Vina results using PyMol, in continuation of our existing docking tutorial.
Genome editing of human embryos using CRISPR/Cas9- crossing the ethics of gene editing?
CRISPR/Cas9 system is a recently developed multi-purpose technology for genome editing [1,2] and its tool CRISPR-ERA/Cas9 is widely used as explained in the previous article . The possible applications of this system have been discussed by its developers  and have been successfully applied for genome editing, gene function identification, and for gene therapy in animals and human cells [5-9]. Recently, a group of Chinese researchers has reported the editing of a genome using the CRISPR/Cas9 system in human embryo for the first time in history . The team has attempted to remove ‘harmful’ genetic codes to be potentially replaced by the ‘good’ ones. The results are published in Protein & Cell journal  but have opened a topic for debate over crossing the ethics of gene editing. (more…)
How to perform docking in a specific binding site using AutoDock Vina?
AutoDock Vina is a bioinformatics tool that is used to perform in- silico docking of proteins with a ligand. It provides many options depending on the needs of a user. This tool offers blind docking and binding in a specific pocket as well, which is sometimes more demanding when the binding site is already known. This article will guide you to dock a protein with a ligand in a specific binding site/ pocket. (more…)
BiR: Impact report – July 2016 & More
As we improve the quality of articles and as we are nearing our first collective birthday, Bioinformatics Review has expanded its reach, opening new avenues. The progress is summarised as below:
- We have partnered with London Business Conference Group to be an official media partner for Discovery Informatics and Analytics Summit 2016
- Our scientific articles are starting to appear in Google Scholar, articles indexed there can be viewed by following this link.
- Bioinformatics Review was visited over 143,801 times in July 2016.
As for DIAS Summit 2016, we are giving away 15% coupons to our readers. To claim a discount coupon, please visit our Facebook page.
Do share this page with your colleagues and refer them for a coupon.
Deciding the right journal for your paper: 5 things to look for
Considering the amount of effort and hard work that goes into writing a research paper, it is critical to choose the right journal to reach the right scientific audience. It is particularly important not to literally waste your valuable work by falling prey to predatory journals. So we came up with this short guide that will make it easier for you to decide where to publish without getting into problems or without getting duped.
First things first, OA or Not?
Some journals are purely Open Access(OA). Every single paper published in such journals are available without any fees or subscribe directly via the internet. Open access journals are good to go. But making science available to the public for free comes with a cost – they charge the authors for the publication charges. This makes them out of reach for an independent researcher or authors without much financial backing. To overcome this, some journals are partly open access and allow the authors to choose whether they want their paper to be open access by paid Open Access Charge. So decide carefully whether you want to go open access. If so, you can search for the journal of your interest on DOAJ(Directory of Open Access Journals).
Beware of predatory journal
Predatory journals are ‘Fake’ journals that are fraudulently set up to earn some easy cash. These journals attract young researchers by offering close to 100% acceptance rate, namesake/no review. Typically you can know whether a journal is predatory or not by looking into this list or this list (Popularly known as Bealle’s List). Although this is not an absolute list but it may come handy when deciding. You can identify a predatory journal by looking at the general signs such as lack of contact/office address on the website(or having just an email address as contact information). Not mentioning phone number of editors, missing details of the editorial board etc.
- Does your journal participate in archiving programs?
What if the website goes down tomorrow? or maybe the journal goes bankrupt? What will happen to the valuable research that was published in it? The good news is that the most journals participate in an archiving program where they deposit data for permanent storage, i.e. even if the journal shuts down, your paper will not cease to exist. This is done by providing each article with a DOI(digital object identifier) number which is unique and points to the same article, forever. Okay, so what if the data center gets nuked? or maybe what if the data center where all the DOI are stored suddenly catch fire? To prevent such scenarios, a large number of copies of the same content are kept at different locations around the world. This is achieved by a journal participating in LOCKSS (Large Number of Copies Keep Stuff Safe) program. Some journals participate in NCBI PubMed where they deposit published data. So the next time you choose a journal, better make sure it participates in LOCKSS.
Know the scope of the journal you are publishing in
The basic aim & essence of publishing research are to reach scientific fraternity, telling them about the significant work that you have done and what it could be used for(Although there are people who publish to merely gain credits). So better reach the right audience, by selecting a journal whose readership and editorial board comprises of people in a related field. The aims and scopes of the journal should match with your work. For example, the Journal of Theoretical Biology is aimed at theoretical studies and it will not accept your wet lab research work. Furthermore, the areas outlined in the journal’s homepage are the ones that will be accepted. Articles falling out of scope are usually rejected thus the loss of precious time.
Impact (Not Impact Factor)
Ask yourself whether the contents of the journal are accessible via search. Make sure you are able to search the articles published in your selected journal via Google Scholar, PubMed or PubMed Central. You can also consider Impact Factor (by Thomson Reuters) of a journal while deciding buy, try not to overemphasize it. IF is a measure of others work and does not mean it will put any weight to your research. A journal with a good impact factor is likely to be having more readership and thus, citations. You should target a journal that not only publishes your research but also takes strides in publicizing it. What is the point of publishing in a journal that does not really publishes your research?
It is wise to use caution while selecting a journal. It will save you time, money and your efforts that may otherwise go into vain. Emphasizing on these five points may be time taking, but in the end, it pays off to be more cautious.
Liked this article? Why not share the URL of this article with a friend and save their day?
May 27, 2016 at 2:18 am
A very informative article with insights into practical approaches in this field. Eagerly waiting for the coming issues