Perl script to find duplicate FASTA sequences using their header?

Dr. Muniba Faiza
1 Min Read

In a large file of FASTA sequences, it is nearly impossible to perform some operations manually.

This is a simple Perl script to find out duplicate sequences in a multi-fasta file using a FASTA header.

Let’s say, your multi-fasta file is ‘sequence.fasta’. 

 

#! /usr/bin/perl
use warnings;
use strict;

my ($infile, $header) = @ARGV;

my $duplicate;
open my $input, '<', $infile or die $!;
while (<$input>) {
    $duplicate = $1 eq $header if /^>(.*)/;
    print if $duplicate;
}

close $input;
exit;
Share This Article
Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba
Leave a Comment

Leave a Reply