Nanguiano Week 3
Contents
The Genetic Code, by Computer
Connect to the my.cs.lmu.edu workstation as shown in class and do the following exercises from there.
For this exercise, I performed the following series of commands to prepare for the assignment.
ssh my.cs.lmu.edu -l nanguia1 mkdir biodb cat >"sequence_file.txt" agcggtatac cd biodb mkdir week3 mv sequence_file.txt biodb/week3 cd ~dondi/xmlpipedb/data cp genetic-code.sed ~nanguia1/biodb/week3 cp xmlpipedb-match-1.1.1.jar ~nanguia1/biodb/week3 cp 493.P_falciparum.xml ~nanguia1/biodb/week3 cp hs_ref_GRCh37_chr19.fa ~nanguia1/biodb/week3 cd ~nanguia1/biodb/week3
Complement of a Strand
Write a sequence of piped text processing commands that, when given a nucleotide sequence, returns its complementary strand.
On a sequence_file.txt file containing the sequence "agcggtatac", the command and output was as follows:
cat sequence_file.txt | sed "y/atgc/tacg/" tcgccatatg
Reading Frames
Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence. You should have 6 different sets of commands, one for each possible reading frame.
On a sequence_file.txt containing the sequence "agcggtatac", the command and output was as follows:
+1
cat sequence_file.txt | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g" SGI
+2
cat sequence_file.txt | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g" AVY
+3
cat sequence_file.txt | sed "s/^..//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g" RY
The remaining three were divided onto two lines on this wiki because they could not fit onto one without causing graphical bugs. The actual command was written without newlines.
-1
cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g" VYR
-2
cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g" YTA
-3
cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/^..//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g" IP
Check Your Work
Utilizing the ExPASy Translate Tool, I inputted my sample dna sequence, "agcggtatac". The result was as follows:
XMLPipeDB Match Practice
For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:
Note: I used this wiki page to learn about the match utility.
- What Match command tallies the occurrences of the pattern
GO:000[567]
in the 493.P_falciparum.xml file?-
java -jar xmlpipedb-match-1.1.1.jar GO:000[567] < 493.P_falciparum.xml
- How many unique matches are there?
- 3
- How many times does each unique match appear?
- GO:007 : 113
- GO:006 : 1100
- GO:008 : 1371
-
- Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
- One example was: <dbReference type="GO" id="GO:0005622">
- Describe how you did this.
-
grep "GO:000[567]" 493.P_falciparum.xml | more
-
- Based on where you find this occurrence, what kind of information does this pattern represent?
- Based on where I found it, this pattern shows the gene ontology ID of a particular gene in the database.
- What Match command tallies the occurrences of the pattern
\"Yu.*\"
in the 493.P_falciparum.xml file?-
java -jar xmlpipedb-match-1.1.1.jar \"Yu.*\" < 493.P_falciparum.xml
- How many unique matches are there?
- 3
- How many times does each unique match appear?
- "Yu b." : 1
- "Yu k." : 228
- "Yu m." : 1
- What information do you think this pattern represents?
- I believe this pattern represents a name.
- This was confirmed by running the command
grep "Yu.*" 493.P_falciparum.xml
-
- Use Match to count the occurrences of the pattern
ATG
in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.- What answer does Match give you?
- What answer does grep + wc give you?
- Explain why the counts are different. (Hint: Make sure you understand what exactly is being counted by each approach.)
Links
Nicole Anguiano
BIOL 367, Fall 2015
Assignment Links
- Week 1 Assignment
- Week 2 Assignment
- Week 3 Assignment
- Week 4 Assignment
- Week 5 Assignment
- Week 6 Assignment
- Week 7 Assignment
- Week 8 Assignment
- Week 9 Assignment
- Week 10 Assignment
- Week 11 Assignment
- Week 12 Assignment
- Week 14 Assignment
- Week 15 Assignment
Individual Journals
- Individual Journal Week 2
- Individual Journal Week 3
- Individual Journal Week 4
- Individual Journal Week 5
- Individual Journal Week 6
- Individual Journal Week 7
- Individual Journal Week 8
- Individual Journal Week 9
- Individual Journal Week 10
- Individual Journal Week 11
- Individual Assessment
- Deliverables