Bioinformatics Softwares,Concepts,Articles,Career & More

Category archive

Tools - page 9

Do you HYPHY with (Data)Monkey !!

in Tools by

HyPhy, acronym for Hypothesis Testing Using Phylogenies ( was written & designed by Kosakovsky Pond and workers to provide likelihood-based analyses on molecular evolutionary data sets and help detect differential rates of variability within a coding sequence datasets. It is freely available, has a Graphical User Interface and can be used by anyone with or without much computer language or programming exposure.

It was earlier presumed that substitution rates were uniform over an alignment of homologous DNA/Protein sequences but many workers studying molecular evolutionary processes influencing rates and patterns of evolution negated this presumption with quite a lot of data and this is especially true for highly evolving gene family datasets and for viral genomes. Natural selection takes place at different domains/regions/sites which are under positive, negative or neutral selection pressures. Positive selection originates with more of non-synonymous substitutions in a protein coding sequence influencing the fitness advantage (protein structure and function) of an organism whereas negative selection takes place with more of synonymous substitution in a protein coding sequence leaving the amino acid sequence or protein structure and function unchanged. A neutral evolution is said to be taking place when the non-synonymous substitutions do not affect the protein structure and function and rate of non-synonymous substitutions. The rate of synonymous and non-synonymous substitutions is given by dS and dN respectively. In the case of neutral evolution, dS and dN are observed to be in equilibrium. Accordingly, the ratio of dN/dS given by ω=β/α (also referred to as dN/dS) has become a standard measure of selective pressure. The total ω  for a sequence alignment is referred to as Global ω. Global ω with a value of approximately 1 signifies neutral evolution, below 1 suggests negative selection whereas ω more than 1 implies positive selection. To start with the analyses, all one needs is, a suitable codon substitution model as detected by MODELTEST program (available online), a nexus formatted sequence alignment file (must be codon data file) and a Maximum Likelihood tree of the data.

Datamonkey is a web interface ( which uses HyPhy batch files to execute most of its tools and packages for the computational analyses. This web interface can be used for estimating dS and dN over an alignment of coding sequences and also for identifying codons and lineages under selection. It also provides “state of the art” tests of codon based models to infer signatures of positive Darwinian selection by comparing rates of synonymous (dS) versus non-synonymous (dN) mutations even in the presence of recombination. It actually reports ω (=dN/dS) using a variety of evolutionary models. Apart from this, Datamonkey also offers a number of packages such as GARD, SLAC, REL, FEL, EVOBLAST etc. These will be discussed in the next issue. Keep reading!!


A comprehensive list of references on the article are available upon request to the author ([email protected])

Perl one-liners for bioinformaticians

in Bioinformatics Programming/Tools by

Perl one-liners are extremely short Perl scripts written in the form of a string of commands that fits onto one line. That would amount to a bit less than 80 symbols for most purposes. Here’s the obligatory “Hello World!” one-liner in Perl and it’s output:

$ perl -e 'print "Hello World!\n";'
Hello World!

Try it! (of course, Perl must be installed on your computer for the “perl” command to work).

The most common and useful way to use such one-liners is to use them as stream processors on the command line, sometimes connected by pipes to other utilities typical for a Linux command-line environment. To process the stream one would commonly use Perl regular expression syntax to match (m/string/) or substitute (s/string1/string2/). Let us use “echo” to generate an empty input to act upon and “-p” to tell Perl to print the $_ variable (entire line) at the end:

$ echo | perl -pe 's/$_/Hello World!\n/;'
Hello World!

Notice that Perl iterates over all lines of the input (first create a file test with 3 empty lines):

$ cat test | perl -pe 's/$_/Hello World!\n/;'
Hello World!
Hello World!
Hello World!

Finally, let us introduce the “-i” switch to make Perl do the changes directly on a supplied file:

$ perl -pi -e 's/$_/Hello World!\n/;' test2

This will result in the contents of test2 getting overwritten with “Hello World!” now present on every line! Needless to say, the “-i” switch can be quite dangerous for it’s ability to completely overwrite files.

Suppose you have a file where you would like to number the lines directly in the file. This is a no-brainer with Perl one-liners! Just replace the beginning of each line with it’s number:

cat test2 | perl -pe '$i++; s/^/$i: /;'
1: Hello World!
2: Hello World!
3: Hello World!

The “^” symbol denotes the beginning of the line in Perl regular expressions. Notice that the one-liner actually contains two lines of Perl code separated by a semicolon (;).

Bioinformaticians often process FASTA files with nucleotide or amino-acid sequences. Suppose you have a FASTA file you would like to convert to a format where every sequence occupies only one line, so that you can apply “grep” to look for a specific k-mer in the sequence (say TATATAA for TATA-box). This can be easily done by removing every end-of-line symbol on non-header lines:

$ cat test2 | perl -pe 's/^([^>]+)\n/$1/;END{print "\n"}' | grep -B1 TATATAA

The “$1” is a special Perl variable created in regular expressions whenever you enclose something in parentheses. Here we do that with entire lines that do not begin with a “>” character (“^” in brackets like “[^>]” means NOT “>”, in this case we choose non-header lines).

Perl one-liners can be very useful in ad-hoc processing or parsing of files and streams from a plethora of sources. Additional examples of clever Perl one-liners can be found here or here.

Go to Top