Difference between revisions of "Laurmagee: Week 3"

Revision as of 03:35, 13 September 2013

The appropriate processing commands are the following: cat sequence_file | sed "y/atgc/tacg/"
This will turn a nucleotide sequence, "agcggtatac", into "tcgccatatg", it's compliment.

First Reading Frame (+1)
- cat sequence_file | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
Second Reading Frame (+2)
- cat sequence_file | sed "s/^.//g" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
Third Reading Frame (+3)
- cat sequence_file | sed "s/^..//g" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
Fourth Reading Frame (-1)
- rev sequence_file | sed "y/atgc/tacg/" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
Fifth Reading Frame (-2)
- rev sequence_file | sed "y/atgc/tacg/" | sed "s/^.//g" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
Sixth Reading Frame (-3)
- rev sequence_file | sed "y/atgc/tacg/" | sed "s/^..//g" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed

You must use the MATCH command java -jar xmlpipedb-match-1.1.1.jar "GO:000916." <493.P_falciparum.xml. to find the occurrences of the pattern in the file.
- The MATCH command finds two unique solutions.
- The pattern "go:0009165" appears twice and "go:0009168" appears once.
- The pattern "GO:000916" seems to be an ID. It probably relates to a much longer sequence, because it includes a large number of identification, 000916. I assume there are other points in the sequence that are identified with other identification numbers as well.
You must use the MATCH command java -jar xmlpipedb-match-1.1.1.jar "\"James.*\"" < 493.P_falciparum.xml to find the occurrences of the pattern in the file.
- The MATCH command finds two unique solutions.
- The pattern "james k.d." appears 8238 times and "james a.a." appears once.
- I think the pattern "\"James.*\"" stands for someone's name. The name could appear in a book, movie script, or any other piece of writing.
Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
- MATCH gives the answer of 830101.
- grep/wc give the answer 502410
- These answers do make sense, because grep/wc only count the amount of lines that contain ATG, but MATCH will also take into account the number times ATG occurs in each line.

@@ Line 21: / Line 21: @@
 #*The MATCH command finds two unique solutions.
 #*The pattern "go:0009165" appears twice and "go:0009168" appears once.
-#*
+#*The pattern "GO:000916" seems to be an ID. It probably relates to a much longer sequence, because it includes a large number of identification, 000916. I assume there are other points in the sequence that are identified with other identification numbers as well.
 #You must use the MATCH command java -jar xmlpipedb-match-1.1.1.jar "\"James.*\"" < 493.P_falciparum.xml to find the occurrences of the pattern in the file.
 #*The MATCH command finds two unique solutions.
 #*The pattern "james k.d." appears 8238 times and "james a.a." appears once.
-#*
+#*I think the pattern "\"James.*\"" stands for someone's name. The name could appear in a book, movie script, or any other piece of writing.
-#
+#Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
-#*
+#*MATCH gives the answer of 830101.
-#*
+#*grep/wc give the answer 502410
-#*
+#*These answers do make sense, because grep/wc only count the amount of lines that contain ATG, but MATCH will also take into account the number times ATG occurs in each line.