Difference between revisions of "Jwoodlee Week 3"

From LMU BioDB 2015
Jump to: navigation, search
(edited wording)
(Reading Frames: fixed typo)
 
(17 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
ssh into my.cs.lmu.edu using your username, and enter your password.
 
ssh into my.cs.lmu.edu using your username, and enter your password.
  
Complement of a Strand:
+
===Complement of a Strand===
 +
 
 
locate the file in ~dondi/xmlpipedb/data, and enter the following command:
 
locate the file in ~dondi/xmlpipedb/data, and enter the following command:
 
  cat prokaryote.txt | sed “y/actg/tgac”
 
  cat prokaryote.txt | sed “y/actg/tgac”
 
This will yield prokaryote.txt’s complementary DNA strand.
 
This will yield prokaryote.txt’s complementary DNA strand.
  
Reading Frames:
+
===Reading Frames===
  
 
These sets of commands are more complicated than Complement of a Strand.  This is essentially what I had to accomplish:  
 
These sets of commands are more complicated than Complement of a Strand.  This is essentially what I had to accomplish:  
Line 18: Line 19:
 
cat prokaryote.txt | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed
 
cat prokaryote.txt | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed
  
For different reading frames, insert sed “s/^.//g” after prokaryote.txt, and of course in order to use a different DNA sequence prokaryote.txt would be different. The following commands will be written exactly and should return the correct output.  Enter these commands into terminal after navigating to nfs/home/dondi/xmlpipedb/data/.
+
For different reading frames, insert sed “s/^.//g” or "s/^..//g" after prokaryote.txt, and of course in order to use a different DNA sequence prokaryote.txt would be different. The following commands will be written exactly and should return the correct output.  Enter these commands into terminal after navigating to nfs/home/dondi/xmlpipedb/data/.
  
 
  +1
 
  +1
  cat prokaryote.txt | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g”
+
  cat prokaryote.txt | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
  
 
  +2
 
  +2
  cat prokaryote.txt | sed “s/^.//g” | sed "s/t/u/g" |sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
+
  cat prokaryote.txt | sed "s/^.//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
  
 
  +3  
 
  +3  
  cat prokaryote.txt | sed “s/^..//g” | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
+
  cat prokaryote.txt | sed "s/^..//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
  
In order to do these frames, I transcribed the DNA using sed "y///" and then reversed them in order to translate them from the proper side.
+
In order to do these frames, I transcribed the DNA using sed "y///" and then reversed them in order to translate them from the proper side. (5' --> 3')
  
 
  -1
 
  -1
Line 36: Line 37:
 
  -2
 
  -2
 
  cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/^.//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"  
 
  cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/^.//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"  
+
 
 
  -3
 
  -3
 
  cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/^..//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"  
 
  cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/^..//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"  
  
  
 +
Checked with Expasy translation tool.
  
 
+
For the XMLPipeDB utility I used the wiki provided on the course website. The first command I found on the wiki after scrolling down to the "Running Command-Line Java Programs" sectionI entered the commands into the command prompt window under the directory, ~dondi/xmlpipedb/data, this allowed me to use the XMLPipeDB utility.
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
==== Complement of a Strand ====
+
 
+
Write a sequence of piped text processing commands that, when given a nucleotide sequence, returns its complementary strand. In other words, fill in the question marks:
+
 
+
    cat ''sequence_file'' | '''?????'''
+
 
+
For example, if ''sequence_file'' contains:
+
 
+
    agcggtatac
+
 
+
Then your text processing commands should display:
+
 
+
    tcgccatatg
+
 
+
==== Reading Frames ====
+
 
+
Write ''6'' sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence. In other words, fill in the question marks:
+
 
+
    cat ''sequence_file'' | '''?????'''
+
 
+
You should have 6 different sets of commands, one for each possible reading frame. For example, if ''sequence_file'' contains:
+
 
+
    agcggtatac
+
 
+
Then your text processing commands for 5’-3’ frame 1 should display:
+
 
+
    SGI
+
 
+
Your text processing commands for 5’-3’ frame 3 should display:
+
 
+
    RY
+
 
+
...and so on.
+
 
+
* '''Hint 1:''' The 6 sets of commands are very similar to each other.
+
* '''Hint 2:''' Under the ''~dondi/xmlpipedb/data'' directory in the Keck lab, you will find a file called ''genetic-code.sed''To save you some typing, this file has already been prepared with the correct sequence of '''sed''' commands for converting any base triplets into the corresponding amino acid.  For example, this line in that file: <pre>s/ugc/C/g</pre> ...corresponds to a uracil-guanine-cytosine sequence transcribing to the cysteine amino acid (C).  The trick is to figure out how to use this file to your advantage, in the commands that you'll be forming.
+
 
+
==== Check Your Work ====
+
 
+
Fortunately, online tools are available for checking your work; we recommend the ExPASy Translate Tool, sponsored by the same people who run SwissProt. You’re free to use this tool to see if your text processing commands produce the same results.
+
  
 
=== XMLPipeDB Match Practice ===
 
=== XMLPipeDB Match Practice ===
Line 99: Line 51:
  
 
# What Match command tallies the occurrences of the pattern <code>GO:000[567]</code> in the ''493.P_falciparum.xml'' file?
 
# What Match command tallies the occurrences of the pattern <code>GO:000[567]</code> in the ''493.P_falciparum.xml'' file?
 +
#*Using the wiki page I found this command and ran it in ~dondi/xmlpipedb/data
 +
#*java -jar xmlpipedb-match-1.1.1.jar GO:000[567] < 493.P_falciparum.xml
 
#* How many unique matches are there?
 
#* How many unique matches are there?
 +
#*I read the output of the function, and wrote it here.
 +
#**Total Unique Matches: 3
 
#* How many times does each unique match appear?
 
#* How many times does each unique match appear?
 +
#**More reading of output brought me to this spot, and I wrote it down after clicking on "edit" on this wiki page.
 +
#**go:0007: 113
 +
#**go:0006: 1100
 +
#**go:0005: 1371
 
# Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
 
# Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
 +
#*One such occurrence: <dbReference type="GO" id="GO:0005777">
 
#* Describe how you did this.
 
#* Describe how you did this.
 +
#**I entered grep "GO:000[567]" 493.P_falciparum.xml | more, as a command and then picked out a random occurrence. I then edited the wiki page and wrote it down.
 
#* Based on where you find this occurrence, what kind of information does this pattern represent?
 
#* Based on where you find this occurrence, what kind of information does this pattern represent?
 +
#**The ID of the gene ontology within a database, or an identifier of a gene ontology term.  I found this out on the match utility wiki page.
 
# What Match command tallies the occurrences of the pattern <code>\"Yu.*\"</code> in the ''493.P_falciparum.xml'' file?
 
# What Match command tallies the occurrences of the pattern <code>\"Yu.*\"</code> in the ''493.P_falciparum.xml'' file?
 +
#* Entered this command into terminal, have not changed directories.
 +
#*java -jar xmlpipedb-match-1.1.1.jar \"Yu.*\" < 493.P_falciparum.xml
 
#* How many unique matches are there?
 
#* How many unique matches are there?
 +
#** Read output and jotted it down here.
 +
#**3
 
#* How many times does each unique match appear?
 
#* How many times does each unique match appear?
 +
#**"yu b.": 1
 +
#**"yu k.": 228
 +
#**"yu m.": 1
 
#* What information do you think this pattern represents?
 
#* What information do you think this pattern represents?
 +
#** I used grep on the same pattern to try to figure this out and based on what I found, I would say it is somebody's name.
 
# Use Match to count the occurrences of the pattern <code>ATG</code> in the ''hs_ref_GRCh37_chr19.fa'' file (this may take a while).  Then, use '''grep''' and '''wc''' to do the same thing.
 
# Use Match to count the occurrences of the pattern <code>ATG</code> in the ''hs_ref_GRCh37_chr19.fa'' file (this may take a while).  Then, use '''grep''' and '''wc''' to do the same thing.
 
#* What answer does Match give you?
 
#* What answer does Match give you?
 +
#**atg: 830101
 +
#**Total unique matches: 1
 
#* What answer does '''grep''' + '''wc''' give you?
 
#* What answer does '''grep''' + '''wc''' give you?
 +
#** <code> 502410  502410  35671048 </code>  from left to right: lines, words, bytes
 
#* Explain why the counts are different. (''Hint:'' Make sure you understand what exactly is being counted by each approach.)
 
#* Explain why the counts are different. (''Hint:'' Make sure you understand what exactly is being counted by each approach.)
 +
#**grep + wc counts the lines with at least one occurrence of "ATG" while the Match utility counts each individual instance of "ATG".  Therefore grep+wc gets a lower number for word count.
  
  
 
{{Template:Jwoodlee}}
 
{{Template:Jwoodlee}}

Latest revision as of 23:25, 21 September 2015

Electronic Lab Notebook

ssh into my.cs.lmu.edu using your username, and enter your password.

Complement of a Strand

locate the file in ~dondi/xmlpipedb/data, and enter the following command:

cat prokaryote.txt | sed “y/actg/tgac”

This will yield prokaryote.txt’s complementary DNA strand.

Reading Frames

These sets of commands are more complicated than Complement of a Strand. This is essentially what I had to accomplish:

take sequence file, replace the t’s with u’s, break up the sequence into groups of 3, use genetic-code.sed as the translation “chart”, and then eliminate extra nucleotides if there are any. For the different reading frames I will just delete the first one or two nucleotides

After lots of googling I came up with this basic outline in terminal:

cat prokaryote.txt | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed

For different reading frames, insert sed “s/^.//g” or "s/^..//g" after prokaryote.txt, and of course in order to use a different DNA sequence prokaryote.txt would be different. The following commands will be written exactly and should return the correct output. Enter these commands into terminal after navigating to nfs/home/dondi/xmlpipedb/data/.

+1
cat prokaryote.txt | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
+2
cat prokaryote.txt | sed "s/^.//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
+3 
cat prokaryote.txt | sed "s/^..//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"

In order to do these frames, I transcribed the DNA using sed "y///" and then reversed them in order to translate them from the proper side. (5' --> 3')

-1
cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
-2
cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/^.//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g" 
-3
cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/^..//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g" 


Checked with Expasy translation tool.

For the XMLPipeDB utility I used the wiki provided on the course website. The first command I found on the wiki after scrolling down to the "Running Command-Line Java Programs" section. I entered the commands into the command prompt window under the directory, ~dondi/xmlpipedb/data, this allowed me to use the XMLPipeDB utility.

XMLPipeDB Match Practice

For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:

  1. What Match command tallies the occurrences of the pattern GO:000[567] in the 493.P_falciparum.xml file?
    • Using the wiki page I found this command and ran it in ~dondi/xmlpipedb/data
    • java -jar xmlpipedb-match-1.1.1.jar GO:000[567] < 493.P_falciparum.xml
    • How many unique matches are there?
    • I read the output of the function, and wrote it here.
      • Total Unique Matches: 3
    • How many times does each unique match appear?
      • More reading of output brought me to this spot, and I wrote it down after clicking on "edit" on this wiki page.
      • go:0007: 113
      • go:0006: 1100
      • go:0005: 1371
  2. Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
    • One such occurrence: <dbReference type="GO" id="GO:0005777">
    • Describe how you did this.
      • I entered grep "GO:000[567]" 493.P_falciparum.xml | more, as a command and then picked out a random occurrence. I then edited the wiki page and wrote it down.
    • Based on where you find this occurrence, what kind of information does this pattern represent?
      • The ID of the gene ontology within a database, or an identifier of a gene ontology term. I found this out on the match utility wiki page.
  3. What Match command tallies the occurrences of the pattern \"Yu.*\" in the 493.P_falciparum.xml file?
    • Entered this command into terminal, have not changed directories.
    • java -jar xmlpipedb-match-1.1.1.jar \"Yu.*\" < 493.P_falciparum.xml
    • How many unique matches are there?
      • Read output and jotted it down here.
      • 3
    • How many times does each unique match appear?
      • "yu b.": 1
      • "yu k.": 228
      • "yu m.": 1
    • What information do you think this pattern represents?
      • I used grep on the same pattern to try to figure this out and based on what I found, I would say it is somebody's name.
  4. Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
    • What answer does Match give you?
      • atg: 830101
      • Total unique matches: 1
    • What answer does grep + wc give you?
      • 502410 502410 35671048 from left to right: lines, words, bytes
    • Explain why the counts are different. (Hint: Make sure you understand what exactly is being counted by each approach.)
      • grep + wc counts the lines with at least one occurrence of "ATG" while the Match utility counts each individual instance of "ATG". Therefore grep+wc gets a lower number for word count.


BIOL 367, Fall 2015, User Page, Team Page

Weekly Assignments Individual Journal Pages Shared Journal Pages