Connect with us

Bioinformatics Programming

Modifying multi-FASTA files using Bash: ‘Sed’ Command

Published

on

Dealing with thousands of FASTA sequences is a tedious task without using bioinformatics programming. It eases multiple minute tasks to be performed on FASTA sequences or their headers such as removal, addition, or substitution of certain characters in the header, or manipulating the sequence format, and so on. In such cases, shell bash commands provide an easy way to perform such tasks on FASTA sequences.

Here are some simple sed commands to manipulate FASTA headers in multi-fasta files.

  1. To remove everything after first ‘/’ or  ‘_’ from FASTA headers.

$ sed 's|\/.*||' input.fasta > output.fasta

$ sed 's|\_.*||' input.fasta > output.fasta

2. To remove everything after last ‘/’ or  ‘_’ from FASTA headers.

$ sed 's|_[^/]*$||' input.fasta > output.fasta

$ sed 's|_[^_]*$||' input.fasta > output.fasta

3. To remove all FASTA headers and output only sequences.

$ sed '|^>|d' input.fasta > output.fasta

4. To remove everything after a dot (.) from FASTA headers.

$ sed '|[.].*$||' input.fasta > output.fasta

5. To replace a dot with an underscore (_) in FASTA header (provided no dot is present in the sequence).

$ sed '|\.|_|g' input.fasta > output.fasta

6. To delete a specific number of characters (n) from the FASTA header.

$ sed '|^>|s|.\{n\}$||' input.fasta > output.fasta

Here, replace with the specific number, for example, 3, 5, 10, etc.

7. To remove all characters after first space in the FASTA header.

$ sed '|^>| s| .*||' input.fasta > output.fasta

These are a few examples of ‘sed’ commands. Besides, there is awk, grep, Perl -e, and so on that are used for similar operations on multi-FASTA files.

Tariq is founder of Bioinformatics Review and CEO at IQL Technologies. His areas of expertise include algorithm design, phylogenetics, MicroArray, Plant Systematics, and genome data analysis. If you have questions, reach out to him via his homepage.

Bioinformatics Programming

tanimoto_similarities_one_vs_all.py – Python script to calculate Tanimoto Similarities of multiple compounds

Published

on

tanimoto_similarities_one_vs_all.py – Python script to calculate Tanimoto Similarities of a compound with multiple compounds

We previously provided a Python script to calculate the Tanimoto similarities of multiple compounds against each other. In this article, we are providing another Python script to calculate the Tanimoto similarities of one compound with multiple compounds. (more…)

Continue Reading

Bioinformatics Programming

tanimoto_similarities.py: A Python script to calculate Tanimoto similarities of multiple compounds using RDKit.

Published

on

tanimoto_similarities.py: A Python script to calculate Tanimoto similarities of multiple compounds using RDKit.

RDKit [1] is a very nice cheminformatics software. It allows us to perform a wide range of operations on chemical compounds/ ligands. We have provided a Python script to perform fingerprinting using Tanimoto similarity on multiple compounds using RDKit. (more…)

Continue Reading

Bioinformatics Programming

How to commit changes to GitHub repository using vs code?

Published

on

How to commit changes to GitHub repository using vs code?

In this article, we are providing a few commands that are used to commit changes to GitHub repositories using VS code terminal.

(more…)

Continue Reading

LATEST ISSUE

ADVERT