Bioinformatics Softwares,Concepts,Articles,Career & More

Category archive

Bioinformatics Programming - page 2

TIN: R package to analyze Transcriptome Instability

in Bioinformatics Programming/Softwares by

Alternative Splicing plays a very essential role in proper functioning of eukaryotic cells. It acts as a regulatory mechanism for gene expression and any kind of disruption in this mechanism may lead to human diseases. Alternative splicing of pre-mRNA is a major source of genetic variation in human beings and disruption of the splicing process may cause human diseases such as cancer.  Continue reading “TIN: R package to analyze Transcriptome Instability” »

Perl one-liners for bioinformaticians

in Bioinformatics Programming/Tools by

Perl one-liners are extremely short Perl scripts written in the form of a string of commands that fits onto one line. That would amount to a bit less than 80 symbols for most purposes. Here’s the obligatory “Hello World!” one-liner in Perl and it’s output:

$ perl -e 'print "Hello World!\n";'
Hello World!

Try it! (of course, Perl must be installed on your computer for the “perl” command to work).

The most common and useful way to use such one-liners is to use them as stream processors on the command line, sometimes connected by pipes to other utilities typical for a Linux command-line environment. To process the stream one would commonly use Perl regular expression syntax to match (m/string/) or substitute (s/string1/string2/). Let us use “echo” to generate an empty input to act upon and “-p” to tell Perl to print the $_ variable (entire line) at the end:

$ echo | perl -pe 's/$_/Hello World!\n/;'
Hello World!

Notice that Perl iterates over all lines of the input (first create a file test with 3 empty lines):

$ cat test | perl -pe 's/$_/Hello World!\n/;'
Hello World!
Hello World!
Hello World!

Finally, let us introduce the “-i” switch to make Perl do the changes directly on a supplied file:

$ perl -pi -e 's/$_/Hello World!\n/;' test2

This will result in the contents of test2 getting overwritten with “Hello World!” now present on every line! Needless to say, the “-i” switch can be quite dangerous for it’s ability to completely overwrite files.

Suppose you have a file where you would like to number the lines directly in the file. This is a no-brainer with Perl one-liners! Just replace the beginning of each line with it’s number:

cat test2 | perl -pe '$i++; s/^/$i: /;'
1: Hello World!
2: Hello World!
3: Hello World!

The “^” symbol denotes the beginning of the line in Perl regular expressions. Notice that the one-liner actually contains two lines of Perl code separated by a semicolon (;).

Bioinformaticians often process FASTA files with nucleotide or amino-acid sequences. Suppose you have a FASTA file you would like to convert to a format where every sequence occupies only one line, so that you can apply “grep” to look for a specific k-mer in the sequence (say TATATAA for TATA-box). This can be easily done by removing every end-of-line symbol on non-header lines:

$ cat test2 | perl -pe 's/^([^>]+)\n/$1/;END{print "\n"}' | grep -B1 TATATAA

The “$1” is a special Perl variable created in regular expressions whenever you enclose something in parentheses. Here we do that with entire lines that do not begin with a “>” character (“^” in brackets like “[^>]” means NOT “>”, in this case we choose non-header lines).

Perl one-liners can be very useful in ad-hoc processing or parsing of files and streams from a plethora of sources. Additional examples of clever Perl one-liners can be found here or here.

Go to Top