Bioinformatics ReviewBioinformatics Review
Notification Show More
Font ResizerAa
  •  Home
  • Docking
  • MD Simulation
  • Tools
  • More Topics
    • Softwares
    • Sequence Analysis
    • Algorithms
    • Bioinformatics Programming
    • Bioinformatics Research Updates
    • Drug Discovery
    • Phylogenetics
    • Structural Bioinformatics
    • Editorials
    • Tips & Tricks
    • Bioinformatics News
    • Featured
    • Genomics
    • Bioinformatics Infographics
  • Community
    • BiR-Research Group
    • Community Q&A
    • Ask a question
    • Join Telegram Channel
    • Join Facebook Group
    • Join Reddit Group
    • Subscription Options
    • Become a Patron
    • Write for us
  • About Us
    • About BiR
    • BiR Scope
    • The Team
    • Guidelines for Research Collaboration
    • Feedback
    • Contact Us
    • Recent @ BiR
  • Subscription
  • Account
    • Visit Dashboard
    • Login
Font ResizerAa
Bioinformatics ReviewBioinformatics Review
Search
Have an existing account? Sign In
Follow US
Sequence AnalysisSoftwareTools

Homology search against a local dataset using NCBI-BLAST+ command-line tool

Tariq Abdullah
Last updated: October 27, 2022 9:13 pm
Tariq Abdullah
Share
3 Min Read
SHARE

NCBI-BLAST+ [1] command-line tool offers multiple functions to be performed on a large dataset of sequences. Previously, we have shown how to blast against a local dataset of sequences. This article will explain the search of homologous sequences for a query sequence against a local database of sequences and how to obtain the top 100 hits out of the searched results.

For performing homology search against a local database, follow the steps given below:

  1. Install NCBI-BLAST+ on Ubuntu

Open a terminal (Ctrl+Alt+T) and type the following command:

$ sudo apt-get install ncbi-blast+

2. Make BLAST database of your sequences

$ makeblastdb -in input.fasta -parse_seqids -dbtype prot -out blastdb

The details of these arguments are given in the previous article.

We have used blastp since we are demonstrating for protein sequences. You can use blastn if you are working on nucleotide sequences and define in dbtype as -dbtype nucl.

3. Perform homology search

$ blastp -query query.fasta -db blastdb -outfmt '6 sseqid' -max_target_seqs 100 -out homologousids.txt

Here, -query defines the input query sequence saved in a file ‘query.fasta’, 

-dbis the local BLAST database

-outfmt defines the output format. ‘6 sseqid’ means Subject Seq-id in a tabular format.

-max_target_seqs is used to define the number of hits to get in output, here it’s set to 100. You can set it to any number.

-out defines the output filename.

This command will result in a simple text file containing the sequence ids of all the homologous sequences.

4. Extract sequences of those homologous sequence ids.

In this step, we will obtain the sequences of all homologous sequence ids from the constructed local database. This can be achieved by using the blastdbcmd binary of the NCBI-BLAST+ package.

$ blastdbcmd -db blastdb -entry_batch homologousids.txt -out homlogseqs.fasta -outfmt %f

Here, -entry_batch is used for batch processing. Each entry should be in a single line and should begin with sequence ID and then followed by any other character/specifier.

-outfmt %f means output in FASTA format.

There are several other output formats. To read in detail, click here.

The output file (homologseqs.fasta) will be consisting of the top 100 hits of homology search.

References

  1. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., & Madden, T. L. (2009). BLAST+: architecture and applications. BMC bioinformatics, 10(1), 421.
TAGGED:BLAST databasehomology searchLocal databaselocal datasetNCBI-BLASTprotein sequences
Share This Article
Facebook Copy Link Print
ByTariq Abdullah
Tariq is founder of Bioinformatics Review and Lead Developer at IQL Technologies. His areas of expertise include algorithm design, phylogenetics, MicroArray, Plant Systematics, and genome data analysis. If you have questions, reach out to him via his homepage.
Leave a Comment

Leave a Reply Cancel reply

You must be logged in to post a comment.

Starting in Bioinformatics? Do This First!
Starting in Bioinformatics? Do This First!
Tips & Tricks
[Editorial] Is it ethical to change the order of authors’ names in a manuscript?
Editorial Opinion
Installing bbtools on Ubuntu
[Tutorial] Installing BBTools on Ubuntu (Linux).
Sequence Analysis Software Tools
wes_data_analysis Whole Exome Sequencing (WES) Data visualization Toolkit
wes_data_analysis: Whole Exome Sequencing (WES) Data visualization Toolkit
Bioinformatics Programming GitHub Python

You Might Also Like

Installing galaxy on Ubuntu
DockingSoftwareTools

Installing GalaxyPepDock & Galaxy-Server on Ubuntu

September 27, 2020
AlgorithmsBioinformatics ProgrammingSoftware

BETSY: A new backward-chaining expert system for automated development of pipelines in Bioinformatics

May 20, 2020
GenomicsSoftware

GenVisR : A tool for genomic visualization

May 20, 2020
Installing MODELLER 10.1 on Linux/Ubuntu
SoftwareTools

Installing MODELLER 10.1 on Linux/Ubuntu

November 16, 2021
Copyright 2024 IQL Technologies
  • Journal
  • Customer Support
  • Contact Us
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Cookie Policy
  • Sitemap
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?

Not a member? Sign Up