Here is a simple Perl script to search for motif patterns in a large FASTA file with multiple sequences.
Suppose, your multifasta file is “input.fa”, in which you want to search for the motif patterns.
Case-I Search a pre-defined motif pattern.
use strict 'vars'; use warnings; my $regex = "motif_pattern"; my %sequences = %{ read_fasta_file( 'input.fa' ) }; open( STDOUT, ">", "output.fa" ) or die!$; foreach my $header ( keys %sequences ) { if ( $sequences{$header} =~ /$regex/ ) { print $header, "\n"; print $sequences{$header}, "\n"; } } sub read_fasta_file { my $filename = shift; my $current_header = ''; my %sequences; open FILE, "$filename" or die $!; while ( my $line = <FILE>) { chomp $line; if ( $line =~ /^(>.*)$/ ) { $current_header = $1; } elsif ( $line !~ /^\s*$/ ) { # skip blank lines $sequences{$current_header} .= $line; } } close FILE or die $!; return \%sequences; }
Case-II Search a user input motif pattern.
use strict 'vars'; use warnings; print "Enter a motif pattern to search"; my $regex = ; my %sequences = %{ read_fasta_file( 'input.fa' ) }; open( STDOUT, ">", "output.fa" ) or die!$; foreach my $header ( keys %sequences ) { if ( $sequences{$header} =~ /$regex/ ) { print $header, "\n"; print $sequences{$header}, "\n"; } } sub read_fasta_file { my $filename = shift; my $current_header = ''; my %sequences; open FILE, "$filename" or die $!; while ( my $line = <FILE> ) { chomp $line; if ( $line =~ /^(>.*)$/ ) { $current_header = $1; } elsif ( $line !~ /^\s*$/ ) { # skip blank lines $sequences{$current_header} .= $line; } } close FILE or die $!; return \%sequences; }
Save this script with .pl extension and run as perl script.pl
in terminal (in Linux) or in command prompt (in Windows).