Meta-analysis of biological literature : Explained

It’s a fine Monday morning, and the new intern finds his way to the laboratory of biological data mining procedures. His brief interview with the concerned scientist has allowed him to have very limited understanding of the subject….Upon his arrival he is greeted with a humongous corpus of mixed articles, say some 4000, and he is required to assemble specific information out of the data set, by diligently scrutinizing the components of each article.

Well, the situation could be frightening to a purely wet lab working biologist, but a man who has had any exposure to the real power of file handling with any programming language will know how to let a simple few lines of code do his bidding.

So what is meta-analysis about?

The new cool word to biological realm “meta-analysis” can be better understood by understanding the meaning of first half of the term; META, meaning data of data, thence making meta-analysis an analysis of already published data by simply rearranging it, sorting it, and trying to find hidden patters out of published literature.

By most rudimentary means, meta-analysis can be achieved by reading the corpus of research and review articles concerning a particular topic which may be as wide a whole Eukaryotic genome or may be narrowed down to phyla, groups, species may be a specific disease or even any gene in particular. Where on one part we try to narrow down to disease or gene, one must also realize biological systems are most complex to date and present day computer simulations fail to rival the complexity with equal efficiency, so any analysis narrowed down to gene must also consider that the gene may very well be found in multiple organisms and thus may present considerably high amount of results irrelevant to the study.

A rigorous manual inspection of program sorted data is required to sort out such entries. Since meta-analysis relies heavily on statistical studies of data, researchers tend to rely on programming languages such as Stata and R to write their specific codes for analyses, R unlike Stata is free, produces publication quality outputs and provides a plethora of packages, of which a few provide programs like PDF miner, PubMed miner etc used for accessing PubMed database, these packages contain codes to access the database and extract all information off them with a command based interface for huge data sets at once cutting down manual efforts and time taken to achieve the task.

All praises sent, the method has its own fair share of drawbacks and issues. The current query system to NCBI and sister organizations fail to acknowledge synonymous terms and treats them as individual entities not linked to any, but only in association with the length of query items made alongwith. A robust query system is needed to enhance the results, and make the whole concept more efficient.

Need of the hour is to engage more resources into developing well-structured and somewhat intelligent query systems which can truly acknowledge the gene-names and abbreviations, scientific and English names of organisms and also the variations of presenting names of techniques involved.