Connect with us

Sequence Analysis

Multiple Sequence Alignment and Phylogenetic Tree construction using ClustalW2 command-line tool

Tariq Abdullah

Published

on

clustalw2

ClustalW2 is a bioinformatics tool for multiple sequence alignment of DNA or protein sequences. It can easily align sequences and generate a phylogenetic tree online (https://www.genome.jp/tools-bin/clustalw). However, in some cases, we need to perform these operations on a large number of FASTA sequences using the command-line tool of ClustalW2 [1]. It generates output files in very less time and provides quite accurate results. In this article, we will perform these operations using stand-alone tool of ClustalW2. Additionally, we will also generate a percent identity matrix (PIM) for the input sequences. PIM helps to identify the identity amongst the subjected sequences.

Let’s assume our input file name is ‘input.fasta’.  We will run ClustalW2 on the Ubuntu platform in this article. If you wish to run on Windows, then enter the same command as mentioned below. Open the command prompt (cmd) on Windows and type the following command. Don’t forget to provide the full pathway of the ClustalW2 binary installed on your system.

Open a terminal (Ctrl+Alt+T) in Ubuntu and type the following commands:

$ /usr/local/bin/clustalw2 -infile=input.fasta -tree -pim -type=protein -case=upper

Provide full path to ClustalW2 binary, generally, it is /usr/local/bin/.

If you want your sequence residues to appear in small letters in the alignment, then type -case=lower and define the type of input sequences with -type argument.

It will generate .aln file as the alignment output, .tree as the phylogenetic tree output file, and .pim file as the PIM output.

References

  1. Higgins, D. G., & Sharp, P. M. (1988). CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene73(1), 237-244.

Tariq is founder of Bioinformatics Review and CEO at IQL Technologies. His areas of expertise include algorithm design, phylogenetics, MicroArray, Plant Systematics, and genome data analysis. If you have questions, reach out to him via his homepage.

Sequence Analysis

Easy installation of some alignment software on Ubuntu (Linux) 18.04 & 20.04

Dr. Muniba Faiza

Published

on

Easy installation of some alignment software on Ubuntu (Linux) 18.04 & 20.04

There are commonly used alignment programs such as muscle, blast, clustalx, and so on, that can be easily installed from the repository. In this article, we are going to install such software on Ubuntu 18.04 & 20.04. (more…)

Continue Reading

Sequence Analysis

FEGS- A New Feature Extraction Model for Protein Sequence Analysis

Tariq Abdullah

Published

on

FEGS- A New Feature Extraction Model for Protein Sequence Analysis

Protein sequence analyses include protein similarity, Protein function prediction, protein interactions, and so on. A new feature extraction model is developed for easy analysis of protein sequences. (more…)

Continue Reading

Sequence Analysis

Installing RDPTools on Ubuntu (Linux)

Dr. Muniba Faiza

Published

on

Installing RDPTools on Ubuntu

RDP provides analysis tools called RDPTools. These tools are used to high-throughput sequencing data including single-strand, and paired-end reads [1]. In this article, we are going to install RDPTools on Ubuntu (Linux). (more…)

Continue Reading

LATEST ISSUE

ADVERT