Connect with us

Bioinformatics Programming

How to extract fasta sequences from a multi-fasta file based on matching headers in a separate file?

Published

on

This is a simple Perl script to extract FASTA sequences from a large fasta file depending on the matching fasta headers present in another file.

For example, your fasta sequences are present in a file named, “input.fa” and the headers are in another file called “headers.txt”.

#! /usr/bin/perl
use warnings;
use strict;
my $headerfile = 'headers.txt';
my $input = 'input.fa';
open( HEADERFILE, '<', $headerfile ) or die $!;
chomp ( my @headers = map { split } <$headerfile> );    #splitting lines on whitespaces.
close HEADERFILE;
my %seqs;
open( INPUTFILE, '<', $input ) or die $!;
{
local $/ = '';         #Reading until blank line
while ( <$input> ) {
     my ( $header, $sequence ) = m/>\s*(\S+)\n(.*)/ms;
     $sequences{$header} = $sequence;
}
open( my $seqsfile, ">", "input.fa" );
foreach my $header (@headers) {
             if ( $sequences{$header} ) {
                       print $header, "\n";
                       print $sequences{$header}, "\n";
             }
}

close( $seqsfile );
}

close INPUTFILE;
exit;

Tariq is founder of Bioinformatics Review and CEO at IQL Technologies. His areas of expertise include algorithm design, phylogenetics, MicroArray, Plant Systematics, and genome data analysis. If you have questions, reach out to him via his homepage.

Bioinformatics Programming

tanimoto_similarities_one_vs_all.py – Python script to calculate Tanimoto Similarities of multiple compounds

Published

on

tanimoto_similarities_one_vs_all.py – Python script to calculate Tanimoto Similarities of a compound with multiple compounds

We previously provided a Python script to calculate the Tanimoto similarities of multiple compounds against each other. In this article, we are providing another Python script to calculate the Tanimoto similarities of one compound with multiple compounds. (more…)

Continue Reading

Bioinformatics Programming

tanimoto_similarities.py: A Python script to calculate Tanimoto similarities of multiple compounds using RDKit.

Published

on

tanimoto_similarities.py: A Python script to calculate Tanimoto similarities of multiple compounds using RDKit.

RDKit [1] is a very nice cheminformatics software. It allows us to perform a wide range of operations on chemical compounds/ ligands. We have provided a Python script to perform fingerprinting using Tanimoto similarity on multiple compounds using RDKit. (more…)

Continue Reading

Bioinformatics Programming

How to commit changes to GitHub repository using vs code?

Published

on

How to commit changes to GitHub repository using vs code?

In this article, we are providing a few commands that are used to commit changes to GitHub repositories using VS code terminal.

(more…)

Continue Reading

LATEST ISSUE

ADVERT