Algorithms Bioinformatics Programming Perl

How to read fasta sequences as hash using perl?

Last updated: May 20, 2020 5:48 pm

1 Min Read

This is a simple Perl script to read a multifasta file as a hash.

Suppose, your multifasta file is “input.fasta”, which you want to read as the hash.

#! /usr/bin/perl
use warnings;
use strict;
my $infile = "input.fasta";
my %sequences;

open( FH, '<', $infile ) or die $!;

while( my $line = <FH> ){
      chomp $line;
      if ( $line =~ /^(>.*)$/ )){
           my $id = $1;
      }
      elseif ( $line !~ /^\s*$/ ){
           $sequences{$id} .= $line;
      }
}

close (FH);
exit;

If you want to write a subroutine for reading a fasta file, then you can do like this:

#! /usr/bin/perl
use warnings;
use strict 'vars';

my $infile = "input.fasta";
my %seqs = %{ read_fasta_as_hash($infile) };#call the subroutine
          #your code goes here

sub read_fasta_as_hash{
  my $inputfile = shift;
  my $id = '';
  my %sequences;
  open( INFILE, '<', $inputfile ) or die $!;
        
  while( my $line = <INFILE> ){ 
        chomp $line;
        if ( $line =~ /^(>.*)$/ )){
            my $id = $1;
        }
        elseif ( $line !~ /^\s*$/ ){
             $sequences{$id} .= $line;
        }
   }
   close (INFILE);
   return %sequences;
}

exit;

TAGGED:Fasta Perl

Share This Article

ByTariq Abdullah

Tariq is founder of Bioinformatics Review and Lead Developer at IQL Technologies. His areas of expertise include algorithm design, phylogenetics, MicroArray, Plant Systematics, and genome data analysis. If you have questions, reach out to him via his homepage.

Leave a Reply Cancel reply

You Might Also Like

Basic SQL Queries to Create and Retrieve Information from a Database

Basic Concept of Multiple Sequence Alignment

Bioinformatics Challenges and Advances in RNA interference

Extract FASTA sequences based on sequence length using Perl