In a large file of FASTA sequences, it is nearly impossible to perform some operations manually.
This is a simple Perl script to find out duplicate sequences in a multi-fasta file using a FASTA header.
Let’s say, your multi-fasta file is ‘sequence.fasta’.
#! /usr/bin/perl use warnings; use strict; my ($infile, $header) = @ARGV; my $duplicate; open my $input, '<', $infile or die $!; while (<$input>) { $duplicate = $1 eq $header if /^>(.*)/; print if $duplicate; } close $input; exit;