How to search motif pattern in FASTA sequences using Perl hash?

//

Here is a simple Perl script to search for motif patterns in a large FASTA file with multiple sequences.

Suppose, your multifasta file is “input.fa”, in which you want to search for the motif patterns.

Case-I Search a pre-defined motif pattern.

use strict 'vars';
use warnings;

my $regex = "motif_pattern";

my %sequences = %{ read_fasta_file( 'input.fa' ) };
open( STDOUT, ">", "output.fa" ) or die!$;
foreach my $header ( keys %sequences ) {
    if ( $sequences{$header} =~ /$regex/ ) {
		print $header, "\n";
        print $sequences{$header}, "\n";
    }
}

sub read_fasta_file {
    my $filename = shift;
    my $current_header = '';
    my %sequences;
    open FILE, "$filename" or die $!;
	    while ( my $line = <FILE>) {
        chomp $line;
        if ( $line =~ /^(>.*)$/ ) {
            $current_header  = $1;
        }

        elsif ( $line !~ /^\s*$/ ) { # skip blank lines
            $sequences{$current_header} .= $line;
        }
    }
    close FILE or die $!;

    return \%sequences;
}

Case-II Search a user input motif pattern.

use strict 'vars';
use warnings;

print "Enter a motif pattern to search";
my $regex = ;

my %sequences = %{ read_fasta_file( 'input.fa' ) };
open( STDOUT, ">", "output.fa" ) or die!$;
foreach my $header ( keys %sequences ) {
    if ( $sequences{$header} =~ /$regex/ ) {
		print $header, "\n";
        print $sequences{$header}, "\n";
    }
}

sub read_fasta_file {
    my $filename = shift;
    my $current_header = '';
    my %sequences;
    open FILE, "$filename" or die $!;
	    while ( my $line = <FILE> ) {
        chomp $line;
        if ( $line =~ /^(>.*)$/ ) {
            $current_header  = $1;
        }

        elsif ( $line !~ /^\s*$/ ) { # skip blank lines
            $sequences{$current_header} .= $line;
        }
    }
    close FILE or die $!;

    return \%sequences;
}

Save this script with .pl extension and run as perl script.pl in terminal (in Linux) or in command prompt (in Windows).

Muniba is a Bioinformatician based in the South China University of Technology. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba

Leave a Reply

HOW TO CITE THIS ARTICLE Muniba Faiza (2019). How to search motif pattern in FASTA sequences using Perl hash?. Bioinformatics Review, 5 (09)
Previous Story

Should predatory journals be eliminated completely from the research community?

Next Story

How to cluster peptide/protein sequences using cd-hit software?

Latest from Algorithms

miRBase: Explained

Micro RNAs (miRNAs) are the short endogenous RNAs (~22 nucleotides) and originate from the non-coding RNAs

Willing to stay updated?

By investing less than 30 seconds you can start recieving all our new articles in your mailbox. Stay updated with latest Bioinformatics Research, trends and tools of trade.

 

Lost your password? Please enter your email address. You will receive mail with link to set new password.

0 $0.00