Connect with us

Bioinformatics Programming

Modifying multi-FASTA files using Bash: ‘Sed’ Command

Tariq Abdullah

Published

on

Dealing with thousands of FASTA sequences is a tedious task without using bioinformatics programming. It eases multiple minute tasks to be performed on FASTA sequences or their headers such as removal, addition, or substitution of certain characters in the header, or manipulating the sequence format, and so on. In such cases, shell bash commands provide an easy way to perform such tasks on FASTA sequences.

Here are some simple sed commands to manipulate FASTA headers in multi-fasta files.

  1. To remove everything after first ‘/’ or  ‘_’ from FASTA headers.

$ sed 's|\/.*||' input.fasta > output.fasta

$ sed 's|\_.*||' input.fasta > output.fasta

2. To remove everything after last ‘/’ or  ‘_’ from FASTA headers.

$ sed 's|_[^/]*$||' input.fasta > output.fasta

$ sed 's|_[^_]*$||' input.fasta > output.fasta

3. To remove all FASTA headers and output only sequences.

$ sed '|^>|d' input.fasta > output.fasta

4. To remove everything after a dot (.) from FASTA headers.

$ sed '|[.].*$||' input.fasta > output.fasta

5. To replace a dot with an underscore (_) in FASTA header (provided no dot is present in the sequence).

$ sed '|\.|_|g' input.fasta > output.fasta

6. To delete a specific number of characters (n) from the FASTA header.

$ sed '|^>|s|.\{n\}$||' input.fasta > output.fasta

Here, replace with the specific number, for example, 3, 5, 10, etc.

7. To remove all characters after first space in the FASTA header.

$ sed '|^>| s| .*||' input.fasta > output.fasta

These are a few examples of ‘sed’ commands. Besides, there is awk, grep, Perl -e, and so on that are used for similar operations on multi-FASTA files.

Tariq is founder of Bioinformatics Review and CEO at IQL Technologies. His areas of expertise include algorithm design, phylogenetics, MicroArray, Plant Systematics, and genome data analysis. If you have questions, reach out to him via his homepage.

Bioinformatics Programming

sminalog_analysis.py – A new Python script to fetch top binding affinities from SMINA log file

Dr. Muniba Faiza

Published

on

sminalog_analysis.py – A new Python script to fetch top binding affinities from SMINA log file

In one of our previous posts, we provided a Python script for the virtual screening analysis of Autodock Vina. This script analyzes all log files obtained from docking of multiple ligands to a receptor and provides the binding affinities for top poses from each file. In this article, we are publishing a new Python script for the virtual screening analysis of SMINA [1]. (more…)

Continue Reading

Bioinformatics Programming

Installing Pycharm on Ubuntu (Linux)

Tariq Abdullah

Published

on

Installing pycharm on Ubuntu

Pycharm [1] is an integrated development environment (IDE) for developers. It combines Python developer tools and provides an easy graphical user interface. In this article, we are going to install Pycharm on Ubuntu. (more…)

Continue Reading

Algorithms

vs_Analysis.py: A Python Script to Analyze Virtual Screening Results of Autodock Vina

Dr. Muniba Faiza

Published

on

VS-Analysis: A Python Script to Analyze Virtual Screening Results of Autodock Vina

The output files obtained as a result of virtual screening (VS) using Autodock Vina may be large in number. It is difficult or quite impossible to analyze them manually. Therefore, we are providing a Python script to fetch top results (i.e., compounds showing low binding affinities). (more…)

Continue Reading

LATEST ISSUE

ADVERT