Connect with us

Bioinformatics Programming

A perl script to convert multiline FASTA sequences into a single line

Published

on

There are different software or tools which require different kinds of input, especially, when you are trying to developing a pipeline or want to process multiple large files.

If you are dealing with a big FASTA file consisting of thousands of sequences split into a particular number of residues per line, and you want each sequence into a single line, then you can use this simple Perl program.

There are two cases to input your multiline fasta file, either you define the filename in your Perl script or get it through the command line.

1. Define input file within the script

The multifasta input file is “input.fasta”.

#!/usr/bin/perl
use strict;
use warnings;
my $input_fasta = "input.fasta";
open(IN,"<", "input_fasta") || die ("Can't open $input_fasta $!");

my $line = <IN>;
print $line;

while ($line = <IN>)
{
chomp $line;
if ($line=~m/^>/gi) { 

   print "\n",$line,"\n";
}
else { 
print $line; 
}
}

print "\n";

2. As a command-line argument

#!/usr/bin/perl
use strict;
use warnings;
my $input_fasta = $ARGV[0];
open(IN,"<", "$input_fasta") || die ("Can't open $input_fasta $!");

my $line = <IN>;
print $line;

while ($line = <IN>)
{
chomp $line;
if ($line=~m/^>/gi) { 

   print "\n",$line,"\n";
}
else { 
print $line; 
}
}

print "\n";

Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba

Bioinformatics Programming

tanimoto_similarities_one_vs_all.py – Python script to calculate Tanimoto Similarities of multiple compounds

Published

on

tanimoto_similarities_one_vs_all.py – Python script to calculate Tanimoto Similarities of a compound with multiple compounds

We previously provided a Python script to calculate the Tanimoto similarities of multiple compounds against each other. In this article, we are providing another Python script to calculate the Tanimoto similarities of one compound with multiple compounds. (more…)

Continue Reading

Bioinformatics Programming

tanimoto_similarities.py: A Python script to calculate Tanimoto similarities of multiple compounds using RDKit.

Published

on

tanimoto_similarities.py: A Python script to calculate Tanimoto similarities of multiple compounds using RDKit.

RDKit [1] is a very nice cheminformatics software. It allows us to perform a wide range of operations on chemical compounds/ ligands. We have provided a Python script to perform fingerprinting using Tanimoto similarity on multiple compounds using RDKit. (more…)

Continue Reading

Bioinformatics Programming

How to commit changes to GitHub repository using vs code?

Published

on

How to commit changes to GitHub repository using vs code?

In this article, we are providing a few commands that are used to commit changes to GitHub repositories using VS code terminal.

(more…)

Continue Reading

LATEST ISSUE

ADVERT