Perl programming in Bioinformatics

Perl script to find duplicate FASTA sequences using their header?

In a large file of FASTA sequences, it is nearly impossible to perform some operations manually.

This is a simple Perl script to find out duplicate sequences in a multi-fasta file using a FASTA header.

Let’s say, your multi-fasta file is ‘sequence.fasta’. 

 

#! /usr/bin/perl
use warnings;
use strict;

my ($infile, $header) = @ARGV;

my $duplicate;
open my $input, '<', $infile or die $!;
while (<$input>) {
    $duplicate = $1 eq $header if /^>(.*)/;
    print if $duplicate;
}

close $input;
exit;
Muniba is a Bioinformatician based in the South China University of Technology. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba

Leave a Reply

HOW TO CITE THIS ARTICLE Muniba Faiza (2020). Perl script to find duplicate FASTA sequences using their header?. Bioinformatics Review, 6 (06)
clustalw2
Previous Story

Multiple Sequence Alignment and Phylogenetic Tree construction using ClustalW2 command-line tool

Next Story

Modifying multi-FASTA files using Bash: 'Sed' Command

Latest from Bioinformatics Programming

Willing to stay updated?

By investing less than 30 seconds you can start recieving all our new articles in your mailbox. Stay updated with latest Bioinformatics Research, trends and tools of trade.

 

Lost your password? Please enter your email address. You will receive mail with link to set new password.

0 $0.00