Difference between revisions of "Nanguiano Week 3"
 (→XMLPipeDB Match Practice:  added answer to question 1)  | 
				 (→XMLPipeDB Match Practice:  Answered question 2.)  | 
				||
| Line 80: | Line 80: | ||
#**3  | #**3  | ||
#* How many times does each unique match appear?  | #* How many times does each unique match appear?  | ||
| − | #**  | + | #**GO:007 : 113  | 
| + | #**GO:006 : 1100  | ||
| + | #**GO:008 : 1371  | ||
# Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.  | # Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.  | ||
| + | #* One example was: <dbReference type="GO" id="GO:0005622">  | ||
#* Describe how you did this.  | #* Describe how you did this.  | ||
| + | #** <code>grep "GO:000[567]" 493.P_falciparum.xml | more</code>  | ||
#* Based on where you find this occurrence, what kind of information does this pattern represent?  | #* Based on where you find this occurrence, what kind of information does this pattern represent?  | ||
| + | #** Based on where I found it, this pattern shows the gene ontology ID of a particular gene in the database.  | ||
# What Match command tallies the occurrences of the pattern <code>\"Yu.*\"</code> in the ''493.P_falciparum.xml'' file?  | # What Match command tallies the occurrences of the pattern <code>\"Yu.*\"</code> in the ''493.P_falciparum.xml'' file?  | ||
| + | #* <code>java -jar xmlpipedb-match-1.1.1.jar \"Yu.*\" < 493.P_falciparum.xml</code>  | ||
#* How many unique matches are there?  | #* How many unique matches are there?  | ||
| + | #**3  | ||
#* How many times does each unique match appear?  | #* How many times does each unique match appear?  | ||
| + | #**"Yu b." : 1  | ||
| + | #**"Yu k." : 228  | ||
| + | #**"Yu m." : 1  | ||
#* What information do you think this pattern represents?  | #* What information do you think this pattern represents?  | ||
| + | #**I believe this pattern represents a name.   | ||
| + | #**This was confirmed by running the command <code>grep "Yu.*" 493.P_falciparum.xml</code>  | ||
# Use Match to count the occurrences of the pattern <code>ATG</code> in the ''hs_ref_GRCh37_chr19.fa'' file (this may take a while).  Then, use '''grep''' and '''wc''' to do the same thing.  | # Use Match to count the occurrences of the pattern <code>ATG</code> in the ''hs_ref_GRCh37_chr19.fa'' file (this may take a while).  Then, use '''grep''' and '''wc''' to do the same thing.  | ||
#* What answer does Match give you?  | #* What answer does Match give you?  | ||
Revision as of 22:47, 15 September 2015
Contents
The Genetic Code, by Computer
Connect to the my.cs.lmu.edu workstation as shown in class and do the following exercises from there.
For this exercise, I performed the following series of commands to prepare for the assignment.
ssh my.cs.lmu.edu -l nanguia1 mkdir biodb cat >"sequence_file.txt" agcggtatac cd biodb mkdir week3 mv sequence_file.txt biodb/week3 cd ~dondi/xmlpipedb/data cp genetic-code.sed ~nanguia1/biodb/week3 cp xmlpipedb-match-1.1.1.jar ~nanguia1/biodb/week3 cp 493.P_falciparum.xml ~nanguia1/biodb/week3 cp hs_ref_GRCh37_chr19.fa ~nanguia1/biodb/week3 cd ~nanguia1/biodb/week3
Complement of a Strand
Write a sequence of piped text processing commands that, when given a nucleotide sequence, returns its complementary strand.
On a sequence_file.txt file containing the sequence "agcggtatac", the command and output was as follows:
cat sequence_file.txt | sed "y/atgc/tacg/" tcgccatatg
Reading Frames
Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence. You should have 6 different sets of commands, one for each possible reading frame.
On a sequence_file.txt containing the sequence "agcggtatac", the command and output was as follows:
+1
cat sequence_file.txt | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g" SGI
+2
cat sequence_file.txt | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g" AVY
+3
cat sequence_file.txt | sed "s/^..//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g" RY
The remaining three were divided onto two lines on this wiki because they could not fit onto one without causing graphical bugs. The actual command was written without newlines.
-1
cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g" VYR
-2
cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g" YTA
-3
cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/^..//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g" IP
Check Your Work
Utilizing the ExPASy Translate Tool, I inputted my sample dna sequence, "agcggtatac". The result was as follows:
XMLPipeDB Match Practice
For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:
Note: I used this wiki page to learn about the match utility.
-  What Match command tallies the occurrences of the pattern 
GO:000[567]in the 493.P_falciparum.xml file?-  
java -jar xmlpipedb-match-1.1.1.jar GO:000[567] < 493.P_falciparum.xml -  How many unique matches are there?
- 3
 
 -  How many times does each unique match appear?
- GO:007 : 113
 - GO:006 : 1100
 - GO:008 : 1371
 
 
 -  
 -  Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
- One example was: <dbReference type="GO" id="GO:0005622">
 -  Describe how you did this.
-  
grep "GO:000[567]" 493.P_falciparum.xml | more 
 -  
 -  Based on where you find this occurrence, what kind of information does this pattern represent?
- Based on where I found it, this pattern shows the gene ontology ID of a particular gene in the database.
 
 
 -  What Match command tallies the occurrences of the pattern 
\"Yu.*\"in the 493.P_falciparum.xml file?-  
java -jar xmlpipedb-match-1.1.1.jar \"Yu.*\" < 493.P_falciparum.xml -  How many unique matches are there?
- 3
 
 -  How many times does each unique match appear?
- "Yu b." : 1
 - "Yu k." : 228
 - "Yu m." : 1
 
 -  What information do you think this pattern represents?
- I believe this pattern represents a name.
 - This was confirmed by running the command 
grep "Yu.*" 493.P_falciparum.xml 
 
 -  
 -  Use Match to count the occurrences of the pattern 
ATGin the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.- What answer does Match give you?
 - What answer does grep + wc give you?
 - Explain why the counts are different. (Hint: Make sure you understand what exactly is being counted by each approach.)
 
 
Links
 Nicole Anguiano
 BIOL 367, Fall 2015
Assignment Links
- Week 1 Assignment
 - Week 2 Assignment
 - Week 3 Assignment
 - Week 4 Assignment
 - Week 5 Assignment
 - Week 6 Assignment
 - Week 7 Assignment
 - Week 8 Assignment
 - Week 9 Assignment
 - Week 10 Assignment
 - Week 11 Assignment
 - Week 12 Assignment
 - Week 14 Assignment
 - Week 15 Assignment
 
Individual Journals
- Individual Journal Week 2
 - Individual Journal Week 3
 - Individual Journal Week 4
 - Individual Journal Week 5
 - Individual Journal Week 6
 - Individual Journal Week 7
 - Individual Journal Week 8
 - Individual Journal Week 9
 - Individual Journal Week 10
 - Individual Journal Week 11
 - Individual Assessment
 - Deliverables
 
