Vpachec3 Week 3
Contents
The Genetic Code, by Computer
. cat prokaryote.txt | see "y/atcg/tagc/"
Reading Frames
+1
cat prokaryote.txt | sed "s/..$//g" | sed "y/t/u/" | sed "s/.../& /g" | sed -f genetic-code.sed
S T I F Q - V R W P K K T I L N L K R C L I P C S A Y N P A A S S A G G I L
+2
cat prokaryote.txt | sed "s/^.//g" | sed "s/.$//g" | sed "y/t/u" | sed "s/.../& /g" | sed -f genetic-code.sed
L L Y F N R Y D G Q R R Q Y - T - N V A - Y H V P R I T Q P P V P L A A F -
+3
cat prokaryote.txt | sed "s/^..//g" | sed "y/t/u/" | sed "s/.../& /g" | sed -f genetic-code.sed
Y Y I S I G T M A K E D N I E L E T L P N T M F R V - P S R Q F R W R H F N
-1
-2
-3
. cat sequence_file | ?????
You should have 6 different sets of commands, one for each possible reading frame. For example, if sequence_file contains:
agcggtatac
Then your text processing commands for 5’-3’ frame 1 should display:
SGI
Your text processing commands for 5’-3’ frame 3 should display:
RY
...and so on.
Hint 1: The 6 sets of commands are very similar to each other. Hint 2: Under the ~dondi/xmlpipedb/data directory in the Keck lab, you will find a file called genetic-code.sed. To save you some typing, this file has already been prepared with the correct sequence of sed commands for converting any base triplets into the corresponding amino acid. For example, this line in that file: s/ugc/C/g ...corresponds to a uracil-guanine-cytosine sequence transcribing to the cysteine amino acid (C). The trick is to figure out how to use this file to your advantage, in the commands that you'll be forming.