Here is a simple Perl script to search for motif patterns in a large FASTA file with multiple sequences.
Suppose, your multifasta file is “input.fa”, in which you want to search for the motif patterns.
Case-I Search a pre-defined motif pattern.
use strict 'vars';
use warnings;
my $regex = "motif_pattern";
my %sequences = %{ read_fasta_file( 'input.fa' ) };
open( STDOUT, ">", "output.fa" ) or die!$;
foreach my $header ( keys %sequences ) {
if ( $sequences{$header} =~ /$regex/ ) {
print $header, "\n";
print $sequences{$header}, "\n";
}
}
sub read_fasta_file {
my $filename = shift;
my $current_header = '';
my %sequences;
open FILE, "$filename" or die $!;
while ( my $line = <FILE>) {
chomp $line;
if ( $line =~ /^(>.*)$/ ) {
$current_header = $1;
}
elsif ( $line !~ /^\s*$/ ) { # skip blank lines
$sequences{$current_header} .= $line;
}
}
close FILE or die $!;
return \%sequences;
}
Case-II Search a user input motif pattern.
use strict 'vars';
use warnings;
print "Enter a motif pattern to search";
my $regex = ;
my %sequences = %{ read_fasta_file( 'input.fa' ) };
open( STDOUT, ">", "output.fa" ) or die!$;
foreach my $header ( keys %sequences ) {
if ( $sequences{$header} =~ /$regex/ ) {
print $header, "\n";
print $sequences{$header}, "\n";
}
}
sub read_fasta_file {
my $filename = shift;
my $current_header = '';
my %sequences;
open FILE, "$filename" or die $!;
while ( my $line = <FILE> ) {
chomp $line;
if ( $line =~ /^(>.*)$/ ) {
$current_header = $1;
}
elsif ( $line !~ /^\s*$/ ) { # skip blank lines
$sequences{$current_header} .= $line;
}
}
close FILE or die $!;
return \%sequences;
}
Save this script with .pl extension and run as perl script.pl in terminal (in Linux) or in command prompt (in Windows).
