Day 5: Identify your strain/protein analysis | STEM: DNA Biology and Bioinformatics

Once you have annotated your genome, now we will try to identify what species of bacteria it is.

We will do that by finding the 16S rDNA gene in your assembly, and then performing a BLAST search against a database of known 16S sequences.

Make sure you are in the folder with your genome assembly:

cd ~/genome_assembly/kmer75assembly/

First, we will extract the 16S gene from your assembly using the following steps. We will make a BLAST database from your assembled genome sequence:

makeblastdb -in ./contigs.fa -dbtype nucl

Next we will search against your genome sequence database with an example 16S sequence:

blastn -query /nfs1/Teaching/CGRB/dbbc_s16/data/16S_query.fasta -db ./contigs.fa -out my16S.fasta -outfmt "6 sseq" -max_target_seqs 1

This will create a new file called "my16S.fasta". You can view the 16S rDNA sequence from your assembly with nano:

nano my16S.fasta

Now we will search against a database of 16S rDNA genes using your 16S sequence:

blastn -query ./my16S.fasta -db /nfs1/Teaching/CGRB/dbbc_s16/data/db/bacteria16s -outfmt 7 -out ./mysearchresults.txt

Look at the output of your blast search using nano:

nano mysearchresults.txt

The search results are sorted from most similar to least similar. What is the genus and species of the bacteria most similar to yours?

Now we will try to find known Agrobacterium virulence genes in your assembled genome. First we will copy a file that contains a few known virulence genes:

cp /nfs1/Teaching/CGRB/dbbc_s16/data/knownvirgenes.fasta ./

Finally, we will search for those virulence genes in your database of genes:

blastn -query ./knownvirgenes.fasta -db ./contigs.fa -outfmt 7 -out virgenesearchresults.txt

And view the results:

nano virgenesearchresults.txt

Did you find any virulence genes in your genome?