Perl script to find duplicate FASTA sequences using their header?

Last updated: June 29, 2020 4:03 pm

1 Min Read

In a large file of FASTA sequences, it is nearly impossible to perform some operations manually.

This is a simple Perl script to find out duplicate sequences in a multi-fasta file using a FASTA header.

Let’s say, your multi-fasta file is ‘sequence.fasta’.

#! /usr/bin/perl
use warnings;
use strict;

my ($infile, $header) = @ARGV;

my $duplicate;
open my $input, '<', $infile or die $!;
while (<$input>) {
    $duplicate = $1 eq $header if /^>(.*)/;
    print if $duplicate;
}

close $input;
exit;

TAGGED:Duplicate sequences Fasta multifasta Perl

Share This Article

ByDr. Muniba Faiza

Follow:

Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba

Leave a Reply Cancel reply

You Might Also Like

dssp_parser: A new Python package to extract helices from DSSP files.

vs_interaction_analysis.py: Python script to perform post-virtual screening analysis

tanimoto_similarities_one_vs_all.py – Python script to calculate Tanimoto Similarities of multiple compounds

Modifying multi-FASTA files using Bash: ‘Sed’ Command