Bioinformatics Softwares,Concepts,Articles,Career & More

How to search motif pattern in FASTA sequences using Perl hash?

in Algorithms/Bioinformatics Programming/Perl by

Here is a simple Perl script to search for motif patterns in a large FASTA file with multiple sequences.

Suppose, your multifasta file is “input.fa”, in which you want to search for the motif patterns.

Case-I Search a pre-defined motif pattern.

use strict 'vars';
use warnings;

my $regex = "motif_pattern";

my %sequences = %{ read_fasta_file( 'input.fa' ) };
open( STDOUT, ">", "output.fa" ) or die!$;
foreach my $header ( keys %sequences ) {
    if ( $sequences{$header} =~ /$regex/ ) {
		print $header, "\n";
        print $sequences{$header}, "\n";
    }
}

sub read_fasta_file {
    my $filename = shift;
    my $current_header = '';
    my %sequences;
    open FILE, "$filename" or die $!;
	    while ( my $line = <FILE>) {
        chomp $line;
        if ( $line =~ /^(>.*)$/ ) {
            $current_header  = $1;
        }

        elsif ( $line !~ /^\s*$/ ) { # skip blank lines
            $sequences{$current_header} .= $line;
        }
    }
    close FILE or die $!;

    return \%sequences;
}

Case-II Search a user input motif pattern.

use strict 'vars';
use warnings;

print "Enter a motif pattern to search";
my $regex = ;

my %sequences = %{ read_fasta_file( 'input.fa' ) };
open( STDOUT, ">", "output.fa" ) or die!$;
foreach my $header ( keys %sequences ) {
    if ( $sequences{$header} =~ /$regex/ ) {
		print $header, "\n";
        print $sequences{$header}, "\n";
    }
}

sub read_fasta_file {
    my $filename = shift;
    my $current_header = '';
    my %sequences;
    open FILE, "$filename" or die $!;
	    while ( my $line = <FILE> ) {
        chomp $line;
        if ( $line =~ /^(>.*)$/ ) {
            $current_header  = $1;
        }

        elsif ( $line !~ /^\s*$/ ) { # skip blank lines
            $sequences{$current_header} .= $line;
        }
    }
    close FILE or die $!;

    return \%sequences;
}

Save this script with .pl extension and run as perl script.pl in terminal (in Linux) or in command prompt (in Windows).

Download PDF

Muniba is a Bioinformatician based in the South China University of Technology. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba

Leave a Reply

Latest from Algorithms

miRBase: Explained

Micro RNAs (miRNAs) are the short endogenous RNAs (~22 nucleotides) and originate
Go to Top