Connect with us

Algorithms

How to search motif pattern in FASTA sequences using Perl hash?

Dr. Muniba Faiza

Published

on

Here is a simple Perl script to search for motif patterns in a large FASTA file with multiple sequences.

Suppose, your multifasta file is “input.fa”, in which you want to search for the motif patterns.

Case-I Search a pre-defined motif pattern.

use strict 'vars';
use warnings;

my $regex = "motif_pattern";

my %sequences = %{ read_fasta_file( 'input.fa' ) };
open( STDOUT, ">", "output.fa" ) or die!$;
foreach my $header ( keys %sequences ) {
    if ( $sequences{$header} =~ /$regex/ ) {
		print $header, "\n";
        print $sequences{$header}, "\n";
    }
}

sub read_fasta_file {
    my $filename = shift;
    my $current_header = '';
    my %sequences;
    open FILE, "$filename" or die $!;
	    while ( my $line = <FILE>) {
        chomp $line;
        if ( $line =~ /^(>.*)$/ ) {
            $current_header  = $1;
        }

        elsif ( $line !~ /^\s*$/ ) { # skip blank lines
            $sequences{$current_header} .= $line;
        }
    }
    close FILE or die $!;

    return \%sequences;
}

Case-II Search a user input motif pattern.

use strict 'vars';
use warnings;

print "Enter a motif pattern to search";
my $regex = ;

my %sequences = %{ read_fasta_file( 'input.fa' ) };
open( STDOUT, ">", "output.fa" ) or die!$;
foreach my $header ( keys %sequences ) {
    if ( $sequences{$header} =~ /$regex/ ) {
		print $header, "\n";
        print $sequences{$header}, "\n";
    }
}

sub read_fasta_file {
    my $filename = shift;
    my $current_header = '';
    my %sequences;
    open FILE, "$filename" or die $!;
	    while ( my $line = <FILE> ) {
        chomp $line;
        if ( $line =~ /^(>.*)$/ ) {
            $current_header  = $1;
        }

        elsif ( $line !~ /^\s*$/ ) { # skip blank lines
            $sequences{$current_header} .= $line;
        }
    }
    close FILE or die $!;

    return \%sequences;
}

Save this script with .pl extension and run as perl script.pl in terminal (in Linux) or in command prompt (in Windows).

Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba

Algorithms

VS-Analysis: A Python Script to Analyze Virtual Screening Results of Autodock Vina

Dr. Muniba Faiza

Published

on

VS-Analysis: A Python Script to Analyze Virtual Screening Results of Autodock Vina

The output files obtained as a result of virtual screening (VS) using Autodock Vina may be large in number. It is difficult or quite impossible to analyze them manually. Therefore, we are providing a Python script to fetch top results (i.e., compounds showing low binding affinities). (more…)

Continue Reading

Algorithms

How to read fasta sequences from a file using PHP?

Tariq Abdullah

Published

on

Here is a simple function in PHP to read fasta sequences from a file. (more…)

Continue Reading

Algorithms

How to read fasta sequences as hash using perl?

Tariq Abdullah

Published

on

This is a simple Perl script to read a multifasta file as a hash. (more…)

Continue Reading

LATEST ISSUE

ADVERT

Feedback
Feedback
How Would You Rate the Design of Bioinformatics Review Website?
Do you have any additional comment?
Next
Enter your email if you'd like us to contact you regarding with your feedback.
Back
Submit
Thank you for submitting your feedback!