Perl script to find duplicate FASTA sequences using their header?

Last updated: June 29, 2020 4:03 pm

1 Min Read

In a large file of FASTA sequences, it is nearly impossible to perform some operations manually.

This is a simple Perl script to find out duplicate sequences in a multi-fasta file using a FASTA header.

Let’s say, your multi-fasta file is ‘sequence.fasta’.

#! /usr/bin/perl
use warnings;
use strict;

my ($infile, $header) = @ARGV;

my $duplicate;
open my $input, '<', $infile or die $!;
while (<$input>) {
    $duplicate = $1 eq $header if /^>(.*)/;
    print if $duplicate;
}

close $input;
exit;

TAGGED:Duplicate sequences Fasta multifasta Perl

Share This Article

ByDr. Muniba Faiza

Follow:

Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba

Leave a Reply Cancel reply

You Might Also Like

How to calculate drug-likeness using RDKit?

DockingAnalyzer.py: A New Python script to Identify Ligand Binding in Protein Pockets.

How to search motif pattern in FASTA sequences using Perl hash?

A perl script to convert multiline FASTA sequences into a single line