Difference between revisions of "Laurmagee: Week 3"

From LMU BioDB 2013
Jump to: navigation, search
(Added XMLPipeDB Information)
(Finished Assignment!)
Line 21: Line 21:
 
#*The MATCH command finds two unique solutions.  
 
#*The MATCH command finds two unique solutions.  
 
#*The pattern "go:0009165" appears twice and "go:0009168" appears once.
 
#*The pattern "go:0009165" appears twice and "go:0009168" appears once.
#*
+
#*The pattern "GO:000916" seems to be an ID. It probably relates to a much longer sequence, because it includes a large number of identification, 000916. I assume there are other points in the sequence that are identified with other identification numbers as well. 
 
#You must use the MATCH command java -jar xmlpipedb-match-1.1.1.jar "\"James.*\"" < 493.P_falciparum.xml to find the occurrences of the pattern in the file.
 
#You must use the MATCH command java -jar xmlpipedb-match-1.1.1.jar "\"James.*\"" < 493.P_falciparum.xml to find the occurrences of the pattern in the file.
 
#*The MATCH command finds two unique solutions.  
 
#*The MATCH command finds two unique solutions.  
 
#*The pattern "james k.d." appears 8238 times and "james a.a." appears once.
 
#*The pattern "james k.d." appears 8238 times and "james a.a." appears once.
#*
+
#*I think the pattern "\"James.*\"" stands for someone's name. The name could appear in a book, movie script, or any other piece of writing.
#
+
#Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
#*
+
#*MATCH gives the answer of 830101.
#*
+
#*grep/wc give the answer 502410
#*
+
#*These answers do make sense, because grep/wc only count the amount of lines that contain ATG, but MATCH will also take into account the number times ATG occurs in each line.

Revision as of 03:35, 13 September 2013

Complement of a Strand

  • The appropriate processing commands are the following: cat sequence_file | sed "y/atgc/tacg/"
  • This will turn a nucleotide sequence, "agcggtatac", into "tcgccatatg", it's compliment.

Reading Frames

  1. First Reading Frame (+1)
    • cat sequence_file | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
  2. Second Reading Frame (+2)
    • cat sequence_file | sed "s/^.//g" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
  3. Third Reading Frame (+3)
    • cat sequence_file | sed "s/^..//g" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
  4. Fourth Reading Frame (-1)
    • rev sequence_file | sed "y/atgc/tacg/" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
  5. Fifth Reading Frame (-2)
    • rev sequence_file | sed "y/atgc/tacg/" | sed "s/^.//g" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
  6. Sixth Reading Frame (-3)
    • rev sequence_file | sed "y/atgc/tacg/" | sed "s/^..//g" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed

XMLPipeDB Match Practice

  1. You must use the MATCH command java -jar xmlpipedb-match-1.1.1.jar "GO:000916." <493.P_falciparum.xml. to find the occurrences of the pattern in the file.
    • The MATCH command finds two unique solutions.
    • The pattern "go:0009165" appears twice and "go:0009168" appears once.
    • The pattern "GO:000916" seems to be an ID. It probably relates to a much longer sequence, because it includes a large number of identification, 000916. I assume there are other points in the sequence that are identified with other identification numbers as well.
  2. You must use the MATCH command java -jar xmlpipedb-match-1.1.1.jar "\"James.*\"" < 493.P_falciparum.xml to find the occurrences of the pattern in the file.
    • The MATCH command finds two unique solutions.
    • The pattern "james k.d." appears 8238 times and "james a.a." appears once.
    • I think the pattern "\"James.*\"" stands for someone's name. The name could appear in a book, movie script, or any other piece of writing.
  3. Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
    • MATCH gives the answer of 830101.
    • grep/wc give the answer 502410
    • These answers do make sense, because grep/wc only count the amount of lines that contain ATG, but MATCH will also take into account the number times ATG occurs in each line.
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox