Laurmagee: Week 3
From LMU BioDB 2013
Complement of a Strand
- The appropriate processing commands are the following: cat sequence_file | sed "y/atgc/tacg/"
- This will turn a nucleotide sequence, "agcggtatac", into "tcgccatatg", it's compliment.
Reading Frames
- First Reading Frame (+1)
- cat sequence_file | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
- Second Reading Frame (+2)
- cat sequence_file | sed "s/^.//g" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
- Third Reading Frame (+3)
- cat sequence_file | sed "s/^..//g" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
- Fourth Reading Frame (-1)
- rev sequence_file | sed "y/atgc/tacg/" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
- Fifth Reading Frame (-2)
- rev sequence_file | sed "y/atgc/tacg/" | sed "s/^.//g" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
- Sixth Reading Frame (-3)
- rev sequence_file | sed "y/atgc/tacg/" | sed "s/^..//g" | sed "s/.../&/g" | sed "s/t/u/g" | sed genetic-code.sed
XMLPipeDB Match Practice
- You must use the MATCH command java -jar xmlpipedb-match-1.1.1.jar "GO:000916." <493.P_falciparum.xml. to find the occurrences of the pattern in the file.
- The MATCH command finds two unique solutions.
- The pattern "go:0009165" appears twice and "go:0009168" appears once.
- The pattern "GO:000916" seems to be an ID. It probably relates to a much longer sequence, because it includes a large number of identification, 000916. I assume there are other points in the sequence that are identified with other identification numbers as well.
- You must use the MATCH command java -jar xmlpipedb-match-1.1.1.jar "\"James.*\"" < 493.P_falciparum.xml to find the occurrences of the pattern in the file.
- The MATCH command finds two unique solutions.
- The pattern "james k.d." appears 8238 times and "james a.a." appears once.
- I think the pattern "\"James.*\"" stands for someone's name. The name could appear in a book, movie script, or any other piece of writing.
- Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
- MATCH gives the answer of 830101.
- grep/wc give the answer of 502410.
- I would suspect that these answers do make sense, because grep/wc only count the amount of lines that contain ATG, but MATCH will also take into account the number times ATG occurs in each line.