ANNOTATE BACTERIAL GENOME SEQUENCES (PREDICT GENES AND THEIR FUNCTIONS) USING ARTEMIS
Today we will be learning about genome annotation. We will use the program Prokka, which is designed for annotating Bacterial genomes, to annotate our genome assemblies from earlier today.
First, let’s copy our genome assembly from crick onto your desktop using FileZilla. Connect to crick using FileZilla and navigate to the folder genome_assembly/kmer75assembly/ and download the file contigs.fa by dragging it to your desktop.
Next, open the program Artemis by going to the following link and click the "LAUNCH" button:
http://www.sanger.ac.uk/science/tools/artemis
Click on the File menu and click “Open ...”
Next, find your contigs.fa file in your desktop folder in the file open window and click “Open”.
A new window will appear showing the regions of the genome.
This is your assembled genome sequence. All of the assembled contigs will be displayed next to each other (but not in the correct order). Scroll around to see the different sequences.
Now let’s try to find an open reading frame in your genome sequence.
Click on the “Create” menu and select the option “Mark Open Reading Frames ...”
Change the minimum open reading frame size to 100 if it isn’t already and click OK.
This will display all of the open reading frames (regions starting with “ATG” and ending with a stop codon) in your genome sequence as a blue box.
Let’s find out what one of these regions might be. Pick an open reading frame and click on it to select it.
Click the menu option “Run”, then select “NCBI Searches” and then “blastx”. Then click “OK” in the window that appears.
This will perform a BLAST search of a translation of your chosen open reading frame sequence against a database of known protein sequences.
A new window will pop up with the results of your BLAST search. This may take a while to run.
Look at the first few results of your BLAST search. What function might your chosen sequence have?
It might not be a gene at all. if you just see genome sequences or no results, try again with a different open reading frame.
Try looking around the genome sequence using Artemis and see what you find.