A collection of articles on bioinformatics programming published in Bioinformatics Review.

Perl script to find duplicate FASTA sequences using their header?

In a large file of FASTA sequences, it is nearly impossible to perform some operations manually.

This is a simple Perl script to find out duplicate sequences in a multi-fasta file using a FASTA header. Keep Reading

How to perform graph-based clustering of peptide/protein sequences using MCL?


Markov Cluster Algorithm (MCL) is a clustering algorithm that clusters networks [1]. One of its applications is in clustering protein or peptide sequences. This is a fast and scalable clustering algorithm. Previously, we have shown protein/peptide sequence clustering using Cd-hit software. Keep Reading

How to concatenate FASTA sequences using Perl?


Here is a simple Perl script to concatenate multiline FASTA sequences into a single line. A similar Perl script has been provided in one of the previous articles (A Perl script to convert multiline FASTA sequences into a single line). That script is useful for the FASTA files consisting of sequences split into a particular number of residues per line. The script provided in this article can also be used for large files consisting of multiple FASTA sequences split into the indefinite number of residues per line. Keep Reading

