Difference between revisions of "Ajvree Week 3"

From LMU BioDB 2013
Jump to: navigation, search
(sed stuff/ reading frame)
(journal entry category)
 
(8 intermediate revisions by one user not shown)
Line 55: Line 55:
  
 
Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence. In other words, fill in the question marks:  
 
Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence. In other words, fill in the question marks:  
 +
<br><br>
 +
'''+1:'''<br>
 +
cat sequence_file | sed "s/.../& /g" | sed "s/t/u/g" | sed -F genetic-code.sed <br>
 +
'''+2:'''<br>
 +
cat sequence_file | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -F genetic-code.sed   
 +
<br>
 +
'''+3:'''<br>
 +
cat sequence_file | sed "s/^..//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -F genetic-code.sed
 +
<br>
 +
'''-1:''' <br>
 +
rev sequence_file | sed "y/atgc/tacg/" | sed "s/.../& /g" | sed "s/t/u/g" | sed -F genetic-code.sed<br>
 +
'''-2:'''<br>
 +
rev sequence_file | sed "y/atgc/tacg/" | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -F genetic-code.sed<br>
 +
'''-3:'''<br>
 +
rev sequence_file | sed "y/atgc/tacg/" | sed "s/^..//g" | sed "s/.../& ?g"| sed "s/t/u/g" | sed -F genetic-code.sed <br>
  
  
Line 61: Line 76:
 
For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:
 
For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:
  
    What Match command tallies the occurrences of the pattern GO:000916. in the 493.P_falciparum.xml file?
+
1. '''What Match command tallies the occurrences of the pattern GO:000916. in the 493.P_falciparum.xml file?'''<br>
        How many unique matches are there?
+
java -jar xmlpipedb-match-1.1.1.jar "GO:000916" < 493.P_falciparum.xml <br>
        How many times does each unique match appear?
+
How many unique matches are there?<br>
        What information do you think the pattern GO:000916. represents?
+
-2<br>
    What Match command tallies the occurrences of the pattern \"James.*\" in the 493.P_falciparum.xml file?
+
How many times does each unique match appear?<br>
        How many unique matches are there?
+
-2,1<br>
        How many times does each unique match appear?
+
What information do you think the pattern GO:000916. represents? <br>
        What information do you think the pattern \"James.*\" represents?  
+
I'm not entirely sure, but it looks like a type of identification tag for a protein.<br>
    Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
+
        What answer does Match give you?
+
        What answer does grep/wc give you?
+
        Do the answers make sense? Explain your response.
+
  
 +
2.'''What Match command tallies the occurrences of the pattern \"James.*\" in the 493.P_falciparum.xml file?'''<br>
 +
java -jar xmlpipedb-match-1.1.1.jar "\"James.*\" < 493.P_falciparum.xml<br>
 +
How many unique matches are there?<br>
 +
-2<br>
 +
How many times does each unique match appear?<br>
 +
-8231,1<br>
 +
What information do you think the pattern \"James.*\" represents?<br>
 +
It probably represents a reference to a person's name listed in the database.<br>
 +
3.'''Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.'''<br>
 +
What answer does Match give you?<br>
 +
830101<br>
 +
What answer does grep/wc give you?<br>
 +
502410<br>
 +
Do the answers make sense? Explain your response.<br>
 +
The answers don't really make sense, since the two values are completely different. The two different mechanisms must read the sequence different ways. <br><br>
 
[[User:Ajvree|Ajvree]] ([[User talk:Ajvree|talk]]) 08:48, 12 September 2013 (PDT) <br>
 
[[User:Ajvree|Ajvree]] ([[User talk:Ajvree|talk]]) 08:48, 12 September 2013 (PDT) <br>
 
[[user:Ajvree|User Page]]
 
[[user:Ajvree|User Page]]
 +
[[Week 3]]
 +
 +
[[Category:Journal Entry]]

Latest revision as of 16:11, 17 September 2013

[edit] Week 3 Individual Assignment

Notes:

sed review
& = "repeat what you found" /Wisconsin is still better than &/

Shortcuts-

  • cd to change directories, ls to view file content
  • up and down arrows to view command history, or type history, !number to redo that command
  • CTRL R for reverse search- type in part of search, will recall past commands
  • tab to fill in file name
  • grep- text finder - looks for pattern: "ACTG" filename
  • grep is case sensitive
  • A followed by T with multiple things in between:
  • . = "wildcard" "A......T"
  • indicate beginning of line: ^ "^A......T"
  • end of line: $ "A......T$"
  • use previous command | wc to find word count for previously used file
  • command|command
  • wc- word count
  • enter lines, then CTRL D
    1. lines, # words, #characters

To use xmldb match, enter java -jar xmlpipe.db-match-1.1.1.jar FIRST to give file, insert < sign in front
java -jar xmlpipedb-match-1.1.1.jar "A......T" < hs_ref_GRCh37_chr19.fa

1) "What Match command..."
-2 unique matches
-2,1
-what does info represent?

2) double quote w/in a double quote: "\"James.*\"" asterisk= zero or more
-unique 2
-2,1
-what info?

Reading frames -break into triplets s/.../&space/g and sed"s/t/u/g" | sed -f genetic-code.sed -convert into genetic code s/cgu/L/g s/aug/M/g USE -F -drop between 0-2 characters s/^.//g -3-5- reverse sequence rev



Reading Frames

Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence. In other words, fill in the question marks:

+1:
cat sequence_file | sed "s/.../& /g" | sed "s/t/u/g" | sed -F genetic-code.sed
+2:
cat sequence_file | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -F genetic-code.sed
+3:
cat sequence_file | sed "s/^..//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -F genetic-code.sed
-1:
rev sequence_file | sed "y/atgc/tacg/" | sed "s/.../& /g" | sed "s/t/u/g" | sed -F genetic-code.sed
-2:
rev sequence_file | sed "y/atgc/tacg/" | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -F genetic-code.sed
-3:
rev sequence_file | sed "y/atgc/tacg/" | sed "s/^..//g" | sed "s/.../& ?g"| sed "s/t/u/g" | sed -F genetic-code.sed


XMLPipeDB Match Practice

For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:

1. What Match command tallies the occurrences of the pattern GO:000916. in the 493.P_falciparum.xml file?
java -jar xmlpipedb-match-1.1.1.jar "GO:000916" < 493.P_falciparum.xml
How many unique matches are there?
-2
How many times does each unique match appear?
-2,1
What information do you think the pattern GO:000916. represents?
I'm not entirely sure, but it looks like a type of identification tag for a protein.

2.What Match command tallies the occurrences of the pattern \"James.*\" in the 493.P_falciparum.xml file?
java -jar xmlpipedb-match-1.1.1.jar "\"James.*\" < 493.P_falciparum.xml
How many unique matches are there?
-2
How many times does each unique match appear?
-8231,1
What information do you think the pattern \"James.*\" represents?
It probably represents a reference to a person's name listed in the database.
3.Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
What answer does Match give you?
830101
What answer does grep/wc give you?
502410
Do the answers make sense? Explain your response.
The answers don't really make sense, since the two values are completely different. The two different mechanisms must read the sequence different ways.

Ajvree (talk) 08:48, 12 September 2013 (PDT)
User Page Week 3

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox