Difference between revisions of "Jwoodlee Week 3"

From LMU BioDB 2015
Jump to: navigation, search
(Reading Frames: fixed command)
(Reading Frames: fixed command)
Line 28: Line 28:
  
 
  +3  
 
  +3  
  cat prokaryote.txt | sed “s/^..//g” | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
+
  cat prokaryote.txt | sed "s/^..//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
  
 
In order to do these frames, I transcribed the DNA using sed "y///" and then reversed them in order to translate them from the proper side. (5' --> 3')
 
In order to do these frames, I transcribed the DNA using sed "y///" and then reversed them in order to translate them from the proper side. (5' --> 3')

Revision as of 18:22, 21 September 2015

Electronic Lab Notebook

ssh into my.cs.lmu.edu using your username, and enter your password.

Complement of a Strand

locate the file in ~dondi/xmlpipedb/data, and enter the following command:

cat prokaryote.txt | sed “y/actg/tgac”

This will yield prokaryote.txt’s complementary DNA strand.

Reading Frames

These sets of commands are more complicated than Complement of a Strand. This is essentially what I had to accomplish:

take sequence file, replace the t’s with u’s, break up the sequence into groups of 3, use genetic-code.sed as the translation “chart”, and then eliminate extra nucleotides if there are any. For the different reading frames I will just delete the first one or two nucleotides

After lots of googling I came up with this basic outline in terminal:

cat prokaryote.txt | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed

For different reading frames, insert sed “s/^.//g” or "s/^..//g" after prokaryote.txt, and of course in order to use a different DNA sequence prokaryote.txt would be different. The following commands will be written exactly and should return the correct output. Enter these commands into terminal after navigating to nfs/home/dondi/xmlpipedb/data/.

+1
cat prokaryote.txt | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
+2
cat prokaryote.txt | sed "s/^.//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
+3 
cat prokaryote.txt | sed "s/^..//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"

In order to do these frames, I transcribed the DNA using sed "y///" and then reversed them in order to translate them from the proper side. (5' --> 3')

-1
cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
-2
cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/^.//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g" 
-3
cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/^..//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g" 


Check Your Work

Fortunately, online tools are available for checking your work; we recommend the ExPASy Translate Tool, sponsored by the same people who run SwissProt. You’re free to use this tool to see if your text processing commands produce the same results.

XMLPipeDB Match Practice

For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:

  1. What Match command tallies the occurrences of the pattern GO:000[567] in the 493.P_falciparum.xml file?
    • How many unique matches are there?
    • How many times does each unique match appear?
  2. Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
    • Describe how you did this.
    • Based on where you find this occurrence, what kind of information does this pattern represent?
  3. What Match command tallies the occurrences of the pattern \"Yu.*\" in the 493.P_falciparum.xml file?
    • How many unique matches are there?
    • How many times does each unique match appear?
    • What information do you think this pattern represents?
  4. Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
    • What answer does Match give you?
    • What answer does grep + wc give you?
    • Explain why the counts are different. (Hint: Make sure you understand what exactly is being counted by each approach.)


BIOL 367, Fall 2015, User Page, Team Page

Weekly Assignments Individual Journal Pages Shared Journal Pages