Lenaolufson Week 3
Contents
The Genetic Code, by Computer
Complement of a Strand
Write a sequence of piped text processing commands that, when given a nucleotide sequence, returns its complementary strand. In other words, fill in the question marks:
cat sequence_file.txt | sed "y/atgc/tacg/" tcgccatatg
Reading Frames
Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence. In other words, fill in the question marks:
The sequence used was "agcggtatac"
- +1
cat sequence_file.txt | sed "s/ .../&g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/ [acgu] / /g" SGI
- +2
cat sequence_file.txt | sed "s/^./ /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/ [acgu]/ /g" AVY
- +3
cat sequence_file.txt | sed "s/^../ /g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/[acgu]/ /g" RY
- -1
cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/[acgu] / /g" VYR
- -2
cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/^./ /g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/[acgu] / /g" YTA
- -3
cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/^../ / g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ / /g" | sed "s/[acgu] / /g" IP
XMLPipeDB Match Practice
For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:
- What Match command tallies the occurrences of the pattern GO:000[567] in the 493.P_falciparum.xml file?
- How many unique matches are there?
- How many times does each unique match appear?
- Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
- Describe how you did this.
- Based on where you find this occurrence, what kind of information does this pattern represent?
- What Match command tallies the occurrences of the pattern \"Yu.*\" in the 493.P_falciparum.xml file?
- How many unique matches are there?
- How many times does each unique match appear?
- What information do you think this pattern represents?
- Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
- What answer does Match give you?
- What answer does grep + wc give you?
- Explain why the counts are different. (Hint: Make sure you understand what exactly is being counted by each approach.)