Ajvree Week 3
Week 3 Individual Assignment
Notes:
sed review
& = "repeat what you found" /Wisconsin is still better than &/
Shortcuts-
- cd to change directories, ls to view file content
- up and down arrows to view command history, or type history, !number to redo that command
- CTRL R for reverse search- type in part of search, will recall past commands
- tab to fill in file name
- grep- text finder - looks for pattern: "ACTG" filename
- grep is case sensitive
- A followed by T with multiple things in between:
- . = "wildcard" "A......T"
- indicate beginning of line: ^ "^A......T"
- end of line: $ "A......T$"
- use previous command | wc to find word count for previously used file
- command|command
- wc- word count
- enter lines, then CTRL D
- lines, # words, #characters
To use xmldb match, enter java -jar xmlpipe.db-match-1.1.1.jar FIRST
to give file, insert < sign in front
java -jar xmlpipedb-match-1.1.1.jar "A......T" < hs_ref_GRCh37_chr19.fa
1) "What Match command..."
-2 unique matches
-2,1
-what does info represent?
2)
double quote w/in a double quote: "\"James.*\""
asterisk= zero or more
-unique 2
-2,1
-what info?
Reading frames -break into triplets s/.../&space/g and sed"s/t/u/g" | sed -f genetic-code.sed -convert into genetic code s/cgu/L/g s/aug/M/g USE -F -drop between 0-2 characters s/^.//g -3-5- reverse sequence rev
Reading Frames
Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence. In other words, fill in the question marks:
+1:
cat sequence_file | sed "s/.../& /g" | sed "s/t/u/g" | sed -F genetic-code.sed
+2:
cat sequence file | sed "s/^.//g" | sed "s/t/u/g" | sed -F genetic code.sed
XMLPipeDB Match Practice
For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:
1. What Match command tallies the occurrences of the pattern GO:000916. in the 493.P_falciparum.xml file?
The java -jar command allows you to use the XMLPipeDB Match to tally the occurrences.
How many unique matches are there?
-2
How many times does each unique match appear?
-2,1
What information do you think the pattern GO:000916. represents?
I'm not entirely sure, but it looks like a type of identification tag for a protein.
2.What Match command tallies the occurrences of the pattern \"James.*\" in the 493.P_falciparum.xml file?
How many unique matches are there?
-2
How many times does each unique match appear?
-2,1
What information do you think the pattern \"James.*\" represents?
3.Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
What answer does Match give you?
What answer does grep/wc give you?
Do the answers make sense? Explain your response.