How to search motif pattern in FASTA sequences using Perl hash?

//
2 mins read

Here is a simple Perl script to search for motif patterns in a large FASTA file with multiple sequences.

Suppose, your multifasta file is “input.fa”, in which you want to search for the motif patterns.

Case-I Search a pre-defined motif pattern.

use strict 'vars';
use warnings;

my $regex = "motif_pattern";

my %sequences = %{ read_fasta_file( 'input.fa' ) };
open( STDOUT, ">", "output.fa" ) or die!$;
foreach my $header ( keys %sequences ) {
    if ( $sequences{$header} =~ /$regex/ ) {
		print $header, "\n";
        print $sequences{$header}, "\n";
    }
}

sub read_fasta_file {
    my $filename = shift;
    my $current_header = '';
    my %sequences;
    open FILE, "$filename" or die $!;
	    while ( my $line = <FILE>) {
        chomp $line;
        if ( $line =~ /^(>.*)$/ ) {
            $current_header  = $1;
        }

        elsif ( $line !~ /^\s*$/ ) { # skip blank lines
            $sequences{$current_header} .= $line;
        }
    }
    close FILE or die $!;

    return \%sequences;
}

Case-II Search a user input motif pattern.

use strict 'vars';
use warnings;

print "Enter a motif pattern to search";
my $regex = ;

my %sequences = %{ read_fasta_file( 'input.fa' ) };
open( STDOUT, ">", "output.fa" ) or die!$;
foreach my $header ( keys %sequences ) {
    if ( $sequences{$header} =~ /$regex/ ) {
		print $header, "\n";
        print $sequences{$header}, "\n";
    }
}

sub read_fasta_file {
    my $filename = shift;
    my $current_header = '';
    my %sequences;
    open FILE, "$filename" or die $!;
	    while ( my $line = <FILE> ) {
        chomp $line;
        if ( $line =~ /^(>.*)$/ ) {
            $current_header  = $1;
        }

        elsif ( $line !~ /^\s*$/ ) { # skip blank lines
            $sequences{$current_header} .= $line;
        }
    }
    close FILE or die $!;

    return \%sequences;
}

Save this script with .pl extension and run as perl script.pl in terminal (in Linux) or in command prompt (in Windows).

Muniba is a Bioinformatician based in the South China University of Technology. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba

Leave a Reply

Previous Story

Should predatory journals be eliminated completely from the research community?

Next Story

How to cluster peptide/protein sequences using cd-hit software?

Latest from Algorithms

miRBase: Explained

Micro RNAs (miRNAs) are the short endogenous RNAs (~22 nucleotides) and originate from the non-coding RNAs

0 $0.00