Perl script to find duplicate FASTA sequences using their header?

Last updated: June 29, 2020 4:03 pm

1 Min Read

In a large file of FASTA sequences, it is nearly impossible to perform some operations manually.

This is a simple Perl script to find out duplicate sequences in a multi-fasta file using a FASTA header.

Let’s say, your multi-fasta file is ‘sequence.fasta’.

#! /usr/bin/perl
use warnings;
use strict;

my ($infile, $header) = @ARGV;

my $duplicate;
open my $input, '<', $infile or die $!;
while (<$input>) {
    $duplicate = $1 eq $header if /^>(.*)/;
    print if $duplicate;
}

close $input;
exit;

TAGGED:Duplicate sequences Fasta multifasta Perl

Share This Article

ByDr. Muniba Faiza

Follow:

Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba

Leave a Reply Cancel reply

You Might Also Like

Operations on FASTA files using Perl, PHP, and Bash commands

How to obtain SMILES of ligands using PDB ligand IDs?

How to convert the PDB file to PSF format?

A perl script to convert multiline FASTA sequences into a single line