Difference between revisions of "Anuvarsh Week 3"
(→The Genetic Code, by Computer: finished first part of assignment) |
(→Check Your Work: added details and fixed formatting) |
||
Line 107: | Line 107: | ||
== Check Your Work == | == Check Your Work == | ||
− | I checked my work with ExPASy Translate Tool. | + | I checked my work with ExPASy Translate Tool. I input my original DNA strand (agcgguauac), and received the following results from the translator: |
* 5'3' Frame 1: S G I | * 5'3' Frame 1: S G I | ||
Line 116: | Line 116: | ||
* 3'5' Frame 3: I P | * 3'5' Frame 3: I P | ||
− | + | == XMLPipeDB Match Practice == | |
For your convenience, the XMLPipeDB Match Utility (''xmlpipedb-match-1.1.1.jar'') has been installed in the ''~dondi/xmlpipedb/data'' directory alongside the other practice files. Use this utility to answer the following questions: | For your convenience, the XMLPipeDB Match Utility (''xmlpipedb-match-1.1.1.jar'') has been installed in the ''~dondi/xmlpipedb/data'' directory alongside the other practice files. Use this utility to answer the following questions: |
Revision as of 07:30, 20 September 2015
Contents
The Genetic Code, by Computer
Connect to the my.cs.lmu.edu workstation as shown in class and do the following exercises from there.
I did so by performing the following command:
ssh avarshne@my.cs.lmu.edu
and then inputting my password.
I then created a folder for this class.
mkdir biodb2015
And created a sequence_file.txt file
echo 'agcggtatac' >sequence_file.txt
I then moved to Dondi's repository and copied over some files using the following commands:
cd ~dondi/xmlpipedb/data cp genetic-code.sed ~avarshne/biodb2015 cp xmlpipedb-match-1.1.1.jar ~avarshne/biodb2015 cp prokaryote.txt ~avarshne/biodb2015 cp infA-E.coli-K12.txt ~avarshne/biodb2015
Complement of a Strand
Write a sequence of piped text processing commands that, when given a nucleotide sequence, returns its complementary strand. In other words, fill in the question marks:
cat sequence_file | ?????
For example, if sequence_file contains:
agcggtatac
Then your text processing commands should display:
tcgccatatg
In order to do this, I first set out to determine what all needs to be done by the computer consecutively.
- sequence_file.txt must be concatenated in order for any of the next commands to work on the text within that file.
- Replace A, T, C, and G with it's corresponding base pairs.
These steps can be achieved with the following commands, and produces the following result:
cat "sequence_file.txt" | sed "y/atcg/tagc/" tcgccatatg
Reading Frames
Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence. In other words, fill in the question marks:
cat sequence_file | ?????
In this case, the steps that the computer needs to complete are as follows:
- Concatenate the sequence_file.txt file.
- cat "sequence_file.txt"
- Replace any "t"s with "u"s when finding the +1, +2, and +3 protein sequences. For the -1, -2, and -3 sequences, we must create the complementary strand, replace each A, T, C, and G with its corresponding RNA base pair (U, A, G, and C), and then reverse the strand.
- sed "s/t/u/g"
- sed "s/atcg/uagc/g" | rev
- Remove any necessary bases from the beginning of the sequence in order to start at the correct reading frame.
- either not applicable, sed "s/^.//g", or sed "s/^..//g"
- Add a space after every codon (every 3 characters).
- sed "s/.../& /g"
- Reach into the genetic-code.sed file and utilize the sed commands already written into it in order to convert each codon into it's corresponding protein.
- sed -f genetic-code.sed
- Removed all added spaces between codons.
- sed "s/ //g"
- Remove any left over bases that weren't a part of a codon, and couldn't be used to translate into a protein sequence.
- sed "s/[aucg]//g"
Because we are looking at 6 different reading frames on that fragment of DNA, 6 different commands will need to be written for each protein sequence. Each of the following commands represents one reading frame, and is followed by the resulting protein sequence.
+1
cat "sequence_file.txt" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[aucg]//g" SGI
+2
cat "sequence_file.txt" | sed "s/t/u/g" | sed "s/^.//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[aucg]//g" AVY
+3
cat "sequence_file.txt" | sed "s/t/u/g" | sed "s/^..//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[aucg]//g" RY
-1
cat "sequence_file.txt" | sed "y/atcg/uagc/" | rev | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[aucg]//g" VYR
-2
cat "sequence_file.txt" | sed "y/atcg/uagc/" | rev | sed "s/^.//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[aucg]//g" YTA
-3
cat "sequence_file.txt" | sed "y/atcg/uagc/" | rev | sed "s/^..//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[aucg]//g" IP
Check Your Work
I checked my work with ExPASy Translate Tool. I input my original DNA strand (agcgguauac), and received the following results from the translator:
- 5'3' Frame 1: S G I
- 5'3' Frame 2: A V Y
- 5'3' Frame 3: R Y
- 3'5' Frame 1: V Y R
- 3'5' Frame 2: Y T A
- 3'5' Frame 3: I P
XMLPipeDB Match Practice
For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:
- What Match command tallies the occurrences of the pattern
GO:000[567]
in the 493.P_falciparum.xml file?- How many unique matches are there?
- How many times does each unique match appear?
- Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
- Describe how you did this.
- Based on where you find this occurrence, what kind of information does this pattern represent?
- What Match command tallies the occurrences of the pattern
\"Yu.*\"
in the 493.P_falciparum.xml file?- How many unique matches are there?
- How many times does each unique match appear?
- What information do you think this pattern represents?
- Use Match to count the occurrences of the pattern
ATG
in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.- What answer does Match give you?
- What answer does grep + wc give you?
- Explain why the counts are different. (Hint: Make sure you understand what exactly is being counted by each approach.)