Genome "Fingerprinting": Identifying species with BLAST | STEM: DNA Biology and Bioinformatics

How do researchers identify the species of a bacteria or other organism? In the past, scientists used different traits (or phenotypes) such as the ability to catabolize ("eat") different nutrients, grow on specific types of media, the presence of flagella or shape of the bacteria, etc. Now we have the ability to determine the sequence of an organism's genomic DNA.

By comparing the sequence of different parts of an isolate's genome to sequences from known bacteria, we can identify those other bacteria that are most similar to it. For bacteria, we will use what is called the 16S rRNA small subunit. Every bacteria has a gene encoding for the 16S ribosomal RNA small subunit. Since every bacteria contains this gene, we can compare its DNA sequence across all bacteria to group them into species.

First, make sure you are in your home directory by typing the command:

cd ~

"~" is a shortcut for your home folder.

Change to your data directory:

cd data

There is a folder with 10 fasta files in it here:

ls /nfs1/Teaching/CGRB/dbbc_s16/data/examples/

Each of these files contains the 16S rDNA sequence from a bacterial plant pathogen. We will identify that bacteria by performing a BLAST search against the SILVA database of 16S sequences.

Use a random number generator to pick a number between 1 and 10:

rand 10

cp the file corresponding to your number to the current directory, replace # with your number:

cp /nfs1/Teaching/CGRB/dbbc_s16/data/examples/#.fasta ./

Now we will BLAST your file against a database using the blastn program.

The blastn program takes several arguments:

-query ./inputfile.fasta This is your input file with the sequence to search for
-db /path/to/database This is your database of sequences to search against
-out ./outputfile.txt This is where to store the output from your blast search

Let's run the blast search, replace # with your number:

blastn -query ./#.fasta -db /nfs1/Teaching/CGRB/dbbc_s16/data/db/bacteria16s -out ./blastoutput.txt

Look at the output of your blast search using nano:

nano blastoutput.txt

The search results are sorted from most similar to least similar. What is the genus and species of the bacteria most similar to yours?

Image