Bioinformatics Softwares,Concepts,Articles,Career & More

How to read fasta sequences as hash using perl?

in Algorithms/Bioinformatics Programming/Perl by

This is a simple Perl script to read a multifasta file as a hash.

Suppose, your multifasta file is “input.fasta”, which you want to read as the hash.

#! /usr/bin/perl
use warnings;
use strict;
my $infile = "input.fasta";
my %sequences;

open( FH, '<', $infile ) or die $!;

while( my $line = <FH> ){
      chomp $line;
      if ( $line =~ /^(>.*)$/ )){
           my $id = $1;
      }
      elseif ( $line !~ /^\s*$/ ){
           $sequences{$id} .= $line;
      }
}

close (FH);
exit;

If you want to write a subroutine for reading a fasta file, then you can do like this:

#! /usr/bin/perl
use warnings;
use strict 'vars';

my $infile = "input.fasta";
my %seqs = %{ read_fasta_as_hash($infile) };#call the subroutine
          #your code goes here

sub read_fasta_as_hash{
  my $inputfile = shift;
  my $id = '';
  my %sequences;
  open( INFILE, '<', $inputfile ) or die $!;
        
  while( my $line = <INFILE> ){ 
        chomp $line;
        if ( $line =~ /^(>.*)$/ )){
            my $id = $1;
        }
        elseif ( $line !~ /^\s*$/ ){
             $sequences{$id} .= $line;
        }
   }
   close (INFILE);
   return %sequences;
}

exit;
Download PDF
Tags:

Tariq's areas of expertise include algorithm design, phylogenetics, MicroArray, Plant Systematics and genome data analysis. He is looking for a suitable PhD position in Bioinformatics. If you want him in your lab, let him know here

Leave a Reply

Latest from Algorithms

miRBase: Explained

Micro RNAs (miRNAs) are the short endogenous RNAs (~22 nucleotides) and originate
Go to Top