How to read fasta sequences as hash using perl?

//
1 min read

This is a simple Perl script to read a multifasta file as a hash.

Suppose, your multifasta file is “input.fasta”, which you want to read as the hash.

#! /usr/bin/perl
use warnings;
use strict;
my $infile = "input.fasta";
my %sequences;

open( FH, '<', $infile ) or die $!;

while( my $line = <FH> ){
      chomp $line;
      if ( $line =~ /^(>.*)$/ )){
           my $id = $1;
      }
      elseif ( $line !~ /^\s*$/ ){
           $sequences{$id} .= $line;
      }
}

close (FH);
exit;

If you want to write a subroutine for reading a fasta file, then you can do like this:

#! /usr/bin/perl
use warnings;
use strict 'vars';

my $infile = "input.fasta";
my %seqs = %{ read_fasta_as_hash($infile) };#call the subroutine
          #your code goes here

sub read_fasta_as_hash{
  my $inputfile = shift;
  my $id = '';
  my %sequences;
  open( INFILE, '<', $inputfile ) or die $!;
        
  while( my $line = <INFILE> ){ 
        chomp $line;
        if ( $line =~ /^(>.*)$/ )){
            my $id = $1;
        }
        elseif ( $line !~ /^\s*$/ ){
             $sequences{$id} .= $line;
        }
   }
   close (INFILE);
   return %sequences;
}

exit;
Tariq is founder of Bioinformatics Review and a professional Software Developer at IQL Technologies. His areas of expertise include algorithm design, phylogenetics, MicroArray, Plant Systematics, and genome data analysis. If you have questions, reach out to him via his homepage.

Leave a Reply

Previous Story

Installing Roary and Prokka on Ubuntu

Next Story

How to calculate dN, dS, and dN/dS ratio on a set of genes using MEGA?

Latest from Algorithms

miRBase: Explained

Micro RNAs (miRNAs) are the short endogenous RNAs (~22 nucleotides) and originate from the non-coding RNAs

0 $0.00