Difference between revisions of "Gleis Week 3"
m (→XMLPIPEDB MATCH) |
(→XMLPIPEDB MATCH: finished match practice) |
||
Line 1: | Line 1: | ||
==XMLPIPEDB MATCH== | ==XMLPIPEDB MATCH== | ||
− | + | 1. What Match command tallies the occurrences of the pattern GO:000916. in the 493.P_falciparum.xml file? | |
:* Two unique matches | :* Two unique matches | ||
− | :* First unique match | + | :* First unique match occurred twice. Second match occurred once |
+ | :* GO:000916 possibly represents a sequence identification number for a portion of a protein sequence. | ||
+ | |||
+ | 2. What Match command tallies the occurrences of the pattern \"James.*\" in the 493.P_falciparum.xml file? | ||
+ | :* Two unique matches | ||
+ | :* The first unique match occurred 8283 times and the second unique match occurred once. | ||
+ | :* \"James.*\" likely refers to the last name of an author in a journal article. | ||
+ | |||
+ | 3. Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing. | ||
+ | :*830101 | ||
+ | :*502410 | ||
+ | :*The answers make sense because grep wc will only count the occurrence of ATG once per line even if ATG occurs more than once. | ||
==The Genetic Code, By Computer== | ==The Genetic Code, By Computer== |
Revision as of 18:44, 12 September 2013
Contents |
XMLPIPEDB MATCH
1. What Match command tallies the occurrences of the pattern GO:000916. in the 493.P_falciparum.xml file?
- Two unique matches
- First unique match occurred twice. Second match occurred once
- GO:000916 possibly represents a sequence identification number for a portion of a protein sequence.
2. What Match command tallies the occurrences of the pattern \"James.*\" in the 493.P_falciparum.xml file?
- Two unique matches
- The first unique match occurred 8283 times and the second unique match occurred once.
- \"James.*\" likely refers to the last name of an author in a journal article.
3. Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
- 830101
- 502410
- The answers make sense because grep wc will only count the occurrence of ATG once per line even if ATG occurs more than once.
The Genetic Code, By Computer
Complement of a Strand
cat sequence_file | sed "y/atgc/tacg/"
Reading Frames
+1 cat sequence_file | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed
+2 cat sequence_file | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed
+3 cat sequence_file | sed "s/^..//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed
-1 cat sequence_file | sed "y/atgc/tacg/" | rev | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed
-2 cat sequence_file | sed "y/atgc/tacg/" | rev | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed
-3 cat sequence_file | sed "y/atgc/tacg/" | rev | sed "s/^..//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed