Extract FASTA sequences based on sequence length using Perl

Here are simple Perl scripts to filter out FASTA sequences from a multi-fasta file based on sequence length.

Let’s say our input file consisting of multiple FASTA sequences is ‘input.fasta’.
#!/usr/bin/perl use strict; use warnings; my ($infile, $minlen) = @ARGV; { local $/=">"; while(<$infile>) { chomp; next unless /\w/; my @keep = split /\n/; my $header = shift @keep; my $seqlen = length join "", @keep; if($seqlen >= $minlen){ print ">$_"; } } local $/="\n"; } exit;

Save this Perl script as ‘extractfasta.pl‘ and run in the terminal as

$ perl extractfasta.pl input.fasta <minlen> > output.fasta

For example,

$ perl extractfasta.pl input.fasta 100 > output.fasta

If you want to set a maximum length limit as well, then use the following script.
#!/usr/bin/perl use strict; use warnings; my ($infile, $minlen, $maxlen) = @ARGV; { local $/=">"; while(<$infile>) { chomp; next unless /\w/; my @keep = split /\n/; my $header = shift @keep; my $seqlen = length join "", @keep; if($seqlen >= $minlen){ print ">$_"; } } local $/="\n"; } exit;
Save this Perl script as ‘extractfasta.pl‘ and run in the terminal as

$ perl extractfasta.pl input.fasta <minlen> <maxlen> > output.fasta

For example,

$ perl extractfasta.pl input.fasta 100 350 > output.fasta

v.berriosfarias@gmail.com says:

April 13, 2021 at 3:52 am

I got an error when usong the second command:

readline() on unopened filehandle at extractfasta.pl line 7

this is what I run:
perl extractfasta.pl /path-to/BAC4A_L00M_R1_001.fasta 50 100 > 100_maxln.fasta

the input fasta is ok , dont know what is wrong ):

Log in to Reply