Kevin Wyllie Week 3

From LMU BioDB 2015
Revision as of 18:56, 21 September 2015 by Kwyllie (Talk | contribs) (Began the protocol, checking to see if my format/syntax worked out.)

Jump to: navigation, search

Complement of a Strand

Kwscreenshot1.jpg
  • Shown in green: the following command is used to open the file (using prokaryote.txt as an example).
cat prokaryote.txt
  • Shown in red: the following command is used to sequence the complementary strand (in the 5' -> 3' direction - thus the "rev" command).
cat prokaryote.txt | sed "y/atgc/tacg/" | rev
  • These commands yield the nucleotide sequence:
    • 5'- gttaaaatgccgccagcggaactggcggctgggttatacgcggaacatggtattaggcaacgtttcaagttcaatattgtcttctttggccatcgtacctattgaaatatagtaga -3'

Reading Frames

The original sequence in the prokaryote.txt file will be assumed to be the top strand for this exercise.

+1, +2, and +3 Frames

Kwscreenshot2.jpg
  • Shown in green: to separate the strand into codons (resulting in the +1 frame):
cat prokaryote.txt | sed "s/.../& /g"
  • Shown in red: to convert to the mRNA sequence (treating the DNA strand as the mRNA-like strand):
cat prokaryote.txt | sed "s/.../& /g" | sed "y/t/u/"
  • Shown in blue: to translate this mRNA sequence (yielding the +1 frame):
cat prokaryote.txt | sed "s/.../& /g" | sed "y/t/u/" | sed -f genetic-code.sed
  • For the +2 frame, the final pipe can be slightly altered:
cat prokaryote.txt | sed "s/^.//g" | sed "s/.../& /g" | sed "y/t/u/" | sed -f genetic-code.sed
  • And similarly, for the +3 frame:
cat prokaryote.txt | sed "s/^..//g" | sed "s/.../& /g" | sed "y/t/u/" | sed -f genetic-code.sed


Kwscreenshot3.jpg
  • These pipes yield the following amino acid sequences (shown on right):
    • +1 Nter- S T I F Q - V R W P K K T I L N L K R C L I P C S A Y N P A A S S A G G I L -Cter (shown in red)
    • +2 Nter- L L Y F N R Y D G Q R R Q Y - T - N V A - Y H V P R I T Q P P V P L A A F -Cter (shown in green)
    • +3 Nter- Y Y I S I G T M A K E D N I E L E T L P N T M F R V - P S R Q F R W R H F N -Cter (shown in blue)





-1, -2, and -3 Frames

  • For the -1 frame, open the file as usual, and then use the pipe from "Complement of a Strand" so that the commands after it will be applied to the complementary strand (instead of the original strand). Then, add the same pipe used for the +1 strand:
cat prokaryote.txt | sed "y/atgc/tacg/" | rev | sed "s/.../& /g" | sed "y/t/u/" | sed -f genetic-code.sed
  • As before, the -2 and -3 frames can be found by making a single adjustment to the pipe for the -1 frame. For the -2 frame:
cat prokaryote.txt | sed "y/atgc/tacg/" | rev | sed "s/^.//g" | sed "s/.../& /g" | sed "y/t/u/" | sed -f genetic-code.sed
  • And for the -3 frame:
cat prokaryote.txt | sed "y/atgc/tacg/" | rev | sed "s/^..//g" | sed "s/.../& /g" | sed "y/t/u/" | sed -f genetic-code.sed
Kwscreenshot4.jpg
  • These pipes yield the following amino acid sequences (shown on right):
    • -1 Nter- V K M P P A E L A A G L Y A E H G I R Q R F K F N I V F F G H R T Y - N I V -Cter (shown in red)
    • -2 Nter- L K C R Q R N W R L G Y T R N M V L G N V S S S I L S S L A I V P I E I - - -Cter (shown in green)
    • -3 Nter - - N A A S G T G G W V I R G T W Y - A T F Q V Q Y C L L W P S Y L L K Y S R - Cter. (shown in blue)




XMLPipeDB Match Practice

Kwscreenshot5.jpg
  • To count the occurrence of GO:0005, GO:0006, and GO:0007 (shown on right):
cat 493.P_falciparum.xml | java -jar xmlpipedb-match-1.1.1.jar "GO:000[567]"
  • There are three unique matches (the maximum possible for this command).
    • GO:0005 occurred 1,371 times.
    • GO:0006 occurred 1,100 times.
    • GO:0007 occurred 113 times.



Kwscreenshot6.jpg
  • To find GO:0007 "in situ" (shown on right):
grep "GO:0007" 493.P_falciparum.xml
  • Looking at the text found on the same lines as this pattern, it appears to be the first few characters of a gene ID. Based on prior knowledge, it also may have something to do with gene ontology, as I have seen "GO" as an acronym for that term before.



Kwscreenshot7.jpg
  • To count the occurrence of \"Yu.*\" (shown on right):
cat "493.P_falciparum.xml" | java -jar xmlpipedb-match-1.1.1.jar "\"Yu.*\""
  • There are three unique matches.
    • "yu b." occurred one time.
    • "yu k." occurred 228 times.
    • "yu m." occurred one time.
  • A grep command for this pattern brings up lines such as:
<person name="Yu K."/>

So these may be names of biologists, perhaps those who were responsible for the discovery of a given gene.



Kwscreenshot8.jpg
  • To count occurrence of of "ATG."
    • The match function finds 830,101 matches in hs_ref_GRCh37_chr19.fa (shown on right, in green).
    • Connecting grep to wc finds 502,410 lines, 502,410 words and 35,671,048 characters (shown on right, in red).
    • This discrepancy in matches is due to the differences in the functions. The Match function looks for the pattern outright, while grep-wc looks at the entirety of any line in which the pattern is found. The numbers that grep-wc returns apply to the lines that "ATG" is found in, not just the "ATG" pattern itself.


Protocol

Protocol - Complement of a Strand

  1. First, "ssh" into the server with the following command: ssh <username>@my.cs.lmu.edu
    • The window will prompt you to enter your password. Type it in and press enter.
  2. Open "prokaryote.txt" to view the DNA sequence it contains.
  3. To sequence the complementary strand, two things must be done to the original strand.
    1. Each base of the original strand must be given its complement.
      • The command that corresponds to this step is: sed "y/atgc/tacg/" . This replaces all G's with C's, all T's with A's, and so on.
    2. Since it is customary to express a nucleotide sequence in the 5' to 3' direction, the sequence must be reversed.
      • The command that corresponds to this step is: rev
    • Connecting these commands results in: cat prokaryote.txt | sed "y/atgc/tacg/" | rev