Msaeedi23 Week 3
Contents
The Genetic Code, by Computer
Connect to the my.cs.lmu.edu workstation as shown in class and do the following exercises from there.
For these exercises, two files are available in the Keck lab system for practice; of course, you can always make your own sequences up. The practice files are ~dondi/xmlpipedb/data/prokaryote.txt and ~dondi/xmlpipedb/data/infA-E.coli-K12.txt.
Complement of a Strand
Write a sequence of piped text processing commands that, when given a nucleotide sequence, returns its complementary strand. In other words, fill in the question marks:
Implemented the command: cat "sequence_file" | sed "y/atcg/tagc/"
Reading Frames
Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence. In other words, fill in the question marks:
cat sequence_file | ?????
You should have 6 different sets of commands, one for each possible reading frame. For example, if sequence_file contains:
agcggtatac
Frame +1: Goal is to seperate the sequence into groups of 3 nucleotides
cat "sequence_file.txt" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/[acgu]/ /g"
Frame +2:
cat "sequence_file.txt" | sed "s/^./ /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/[acgu]/ /g"
Frame +3:
cat "sequence_file.txt" | sed "s/^../ /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/[acgu]/ /g"
Frame -1:
cat "sequence_file.txt" | sed "y/acgt/tgca/" | rev | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/[acgu]/ /g"
Frame -2:
cat "sequence_file.txt" | sed "y/acgt/tgca/" | rev | sed "s/^./ /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/[acgu]/ /g"
Frame -3:
cat "sequence_file.txt" | sed "y/actg/tgca/" | rev | sed "s^../ /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/[acgu]/ /g"
Check Your Work
Fortunately, online tools are available for checking your work; we recommend the ExPASy Translate Tool, sponsored by the same people who run SwissProt. You’re free to use this tool to see if your text processing commands produce the same results.
XMLPipeDB Match Practice
For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:
- What Match command tallies the occurrences of the pattern
GO:000[567]
in the 493.P_falciparum.xml file?-
java -jar xmlpipedb-match-1.1.1.jar GO:000[567] < 493.P_falciparum.xml
- How many unique matches are there?
- How many times does each unique match appear?
-
- Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
- Describe how you did this.
- Based on where you find this occurrence, what kind of information does this pattern represent?
- What Match command tallies the occurrences of the pattern
\"Yu.*\"
in the 493.P_falciparum.xml file?- How many unique matches are there?
- How many times does each unique match appear?
- What information do you think this pattern represents?
- Use Match to count the occurrences of the pattern
ATG
in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.- What answer does Match give you?
- What answer does grep + wc give you?
- Explain why the counts are different. (Hint: Make sure you understand what exactly is being counted by each approach.)