Difference between revisions of "Jwoodlee Week 3"
(added outline of assignment) |
(→Reading Frames: fixed typo) |
||
(23 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | == | + | ==Electronic Lab Notebook== |
− | + | ssh into my.cs.lmu.edu using your username, and enter your password. | |
− | + | ===Complement of a Strand=== | |
− | + | locate the file in ~dondi/xmlpipedb/data, and enter the following command: | |
+ | cat prokaryote.txt | sed “y/actg/tgac” | ||
+ | This will yield prokaryote.txt’s complementary DNA strand. | ||
− | + | ===Reading Frames=== | |
− | + | These sets of commands are more complicated than Complement of a Strand. This is essentially what I had to accomplish: | |
− | + | take sequence file, replace the t’s with u’s, break up the sequence into groups of 3, use genetic-code.sed as the translation “chart”, and then eliminate extra nucleotides if there are any. For the different reading frames I will just delete the first one or two nucleotides | |
− | + | After lots of googling I came up with this basic outline in terminal: | |
− | + | cat prokaryote.txt | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | |
− | + | For different reading frames, insert sed “s/^.//g” or "s/^..//g" after prokaryote.txt, and of course in order to use a different DNA sequence prokaryote.txt would be different. The following commands will be written exactly and should return the correct output. Enter these commands into terminal after navigating to nfs/home/dondi/xmlpipedb/data/. | |
− | + | +1 | |
+ | cat prokaryote.txt | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g" | ||
− | + | +2 | |
+ | cat prokaryote.txt | sed "s/^.//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g" | ||
− | + | +3 | |
+ | cat prokaryote.txt | sed "s/^..//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g" | ||
− | + | In order to do these frames, I transcribed the DNA using sed "y///" and then reversed them in order to translate them from the proper side. (5' --> 3') | |
− | + | -1 | |
+ | cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g" | ||
− | + | -2 | |
+ | cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/^.//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g" | ||
− | ... | + | -3 |
+ | cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/^..//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g" | ||
− | |||
− | |||
− | + | Checked with Expasy translation tool. | |
− | + | For the XMLPipeDB utility I used the wiki provided on the course website. The first command I found on the wiki after scrolling down to the "Running Command-Line Java Programs" section. I entered the commands into the command prompt window under the directory, ~dondi/xmlpipedb/data, this allowed me to use the XMLPipeDB utility. | |
=== XMLPipeDB Match Practice === | === XMLPipeDB Match Practice === | ||
Line 45: | Line 51: | ||
# What Match command tallies the occurrences of the pattern <code>GO:000[567]</code> in the ''493.P_falciparum.xml'' file? | # What Match command tallies the occurrences of the pattern <code>GO:000[567]</code> in the ''493.P_falciparum.xml'' file? | ||
+ | #*Using the wiki page I found this command and ran it in ~dondi/xmlpipedb/data | ||
+ | #*java -jar xmlpipedb-match-1.1.1.jar GO:000[567] < 493.P_falciparum.xml | ||
#* How many unique matches are there? | #* How many unique matches are there? | ||
+ | #*I read the output of the function, and wrote it here. | ||
+ | #**Total Unique Matches: 3 | ||
#* How many times does each unique match appear? | #* How many times does each unique match appear? | ||
+ | #**More reading of output brought me to this spot, and I wrote it down after clicking on "edit" on this wiki page. | ||
+ | #**go:0007: 113 | ||
+ | #**go:0006: 1100 | ||
+ | #**go:0005: 1371 | ||
# Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence. | # Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence. | ||
+ | #*One such occurrence: <dbReference type="GO" id="GO:0005777"> | ||
#* Describe how you did this. | #* Describe how you did this. | ||
+ | #**I entered grep "GO:000[567]" 493.P_falciparum.xml | more, as a command and then picked out a random occurrence. I then edited the wiki page and wrote it down. | ||
#* Based on where you find this occurrence, what kind of information does this pattern represent? | #* Based on where you find this occurrence, what kind of information does this pattern represent? | ||
+ | #**The ID of the gene ontology within a database, or an identifier of a gene ontology term. I found this out on the match utility wiki page. | ||
# What Match command tallies the occurrences of the pattern <code>\"Yu.*\"</code> in the ''493.P_falciparum.xml'' file? | # What Match command tallies the occurrences of the pattern <code>\"Yu.*\"</code> in the ''493.P_falciparum.xml'' file? | ||
+ | #* Entered this command into terminal, have not changed directories. | ||
+ | #*java -jar xmlpipedb-match-1.1.1.jar \"Yu.*\" < 493.P_falciparum.xml | ||
#* How many unique matches are there? | #* How many unique matches are there? | ||
+ | #** Read output and jotted it down here. | ||
+ | #**3 | ||
#* How many times does each unique match appear? | #* How many times does each unique match appear? | ||
+ | #**"yu b.": 1 | ||
+ | #**"yu k.": 228 | ||
+ | #**"yu m.": 1 | ||
#* What information do you think this pattern represents? | #* What information do you think this pattern represents? | ||
+ | #** I used grep on the same pattern to try to figure this out and based on what I found, I would say it is somebody's name. | ||
# Use Match to count the occurrences of the pattern <code>ATG</code> in the ''hs_ref_GRCh37_chr19.fa'' file (this may take a while). Then, use '''grep''' and '''wc''' to do the same thing. | # Use Match to count the occurrences of the pattern <code>ATG</code> in the ''hs_ref_GRCh37_chr19.fa'' file (this may take a while). Then, use '''grep''' and '''wc''' to do the same thing. | ||
#* What answer does Match give you? | #* What answer does Match give you? | ||
+ | #**atg: 830101 | ||
+ | #**Total unique matches: 1 | ||
#* What answer does '''grep''' + '''wc''' give you? | #* What answer does '''grep''' + '''wc''' give you? | ||
+ | #** <code> 502410 502410 35671048 </code> from left to right: lines, words, bytes | ||
#* Explain why the counts are different. (''Hint:'' Make sure you understand what exactly is being counted by each approach.) | #* Explain why the counts are different. (''Hint:'' Make sure you understand what exactly is being counted by each approach.) | ||
+ | #**grep + wc counts the lines with at least one occurrence of "ATG" while the Match utility counts each individual instance of "ATG". Therefore grep+wc gets a lower number for word count. | ||
{{Template:Jwoodlee}} | {{Template:Jwoodlee}} |
Latest revision as of 23:25, 21 September 2015
Contents
Electronic Lab Notebook
ssh into my.cs.lmu.edu using your username, and enter your password.
Complement of a Strand
locate the file in ~dondi/xmlpipedb/data, and enter the following command:
cat prokaryote.txt | sed “y/actg/tgac”
This will yield prokaryote.txt’s complementary DNA strand.
Reading Frames
These sets of commands are more complicated than Complement of a Strand. This is essentially what I had to accomplish:
take sequence file, replace the t’s with u’s, break up the sequence into groups of 3, use genetic-code.sed as the translation “chart”, and then eliminate extra nucleotides if there are any. For the different reading frames I will just delete the first one or two nucleotides
After lots of googling I came up with this basic outline in terminal:
cat prokaryote.txt | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed
For different reading frames, insert sed “s/^.//g” or "s/^..//g" after prokaryote.txt, and of course in order to use a different DNA sequence prokaryote.txt would be different. The following commands will be written exactly and should return the correct output. Enter these commands into terminal after navigating to nfs/home/dondi/xmlpipedb/data/.
+1 cat prokaryote.txt | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
+2 cat prokaryote.txt | sed "s/^.//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
+3 cat prokaryote.txt | sed "s/^..//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
In order to do these frames, I transcribed the DNA using sed "y///" and then reversed them in order to translate them from the proper side. (5' --> 3')
-1 cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
-2 cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/^.//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
-3 cat prokaryote.txt | sed "y/actg/tgac/" | rev | sed "s/t/u/g" | sed "s/^..//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/[acug]//g"
Checked with Expasy translation tool.
For the XMLPipeDB utility I used the wiki provided on the course website. The first command I found on the wiki after scrolling down to the "Running Command-Line Java Programs" section. I entered the commands into the command prompt window under the directory, ~dondi/xmlpipedb/data, this allowed me to use the XMLPipeDB utility.
XMLPipeDB Match Practice
For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:
- What Match command tallies the occurrences of the pattern
GO:000[567]
in the 493.P_falciparum.xml file?- Using the wiki page I found this command and ran it in ~dondi/xmlpipedb/data
- java -jar xmlpipedb-match-1.1.1.jar GO:000[567] < 493.P_falciparum.xml
- How many unique matches are there?
- I read the output of the function, and wrote it here.
- Total Unique Matches: 3
- How many times does each unique match appear?
- More reading of output brought me to this spot, and I wrote it down after clicking on "edit" on this wiki page.
- go:0007: 113
- go:0006: 1100
- go:0005: 1371
- Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
- One such occurrence: <dbReference type="GO" id="GO:0005777">
- Describe how you did this.
- I entered grep "GO:000[567]" 493.P_falciparum.xml | more, as a command and then picked out a random occurrence. I then edited the wiki page and wrote it down.
- Based on where you find this occurrence, what kind of information does this pattern represent?
- The ID of the gene ontology within a database, or an identifier of a gene ontology term. I found this out on the match utility wiki page.
- What Match command tallies the occurrences of the pattern
\"Yu.*\"
in the 493.P_falciparum.xml file?- Entered this command into terminal, have not changed directories.
- java -jar xmlpipedb-match-1.1.1.jar \"Yu.*\" < 493.P_falciparum.xml
- How many unique matches are there?
- Read output and jotted it down here.
- 3
- How many times does each unique match appear?
- "yu b.": 1
- "yu k.": 228
- "yu m.": 1
- What information do you think this pattern represents?
- I used grep on the same pattern to try to figure this out and based on what I found, I would say it is somebody's name.
- Use Match to count the occurrences of the pattern
ATG
in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.- What answer does Match give you?
- atg: 830101
- Total unique matches: 1
- What answer does grep + wc give you?
-
502410 502410 35671048
from left to right: lines, words, bytes
-
- Explain why the counts are different. (Hint: Make sure you understand what exactly is being counted by each approach.)
- grep + wc counts the lines with at least one occurrence of "ATG" while the Match utility counts each individual instance of "ATG". Therefore grep+wc gets a lower number for word count.
- What answer does Match give you?
BIOL 367, Fall 2015, User Page, Team Page
Weekly Assignments | Individual Journal Pages | Shared Journal Pages |
---|---|---|
|
|
|