Difference between revisions of "Troque Week 3"
m (→The Genetic Code, by computer: Added the bit about using infA-E.coli-K12.txt instead of prokaryote.txt) |
m (→The Genetic Code, by computer: Made minor edits) |
||
Line 3: | Line 3: | ||
== The Genetic Code, by computer == | == The Genetic Code, by computer == | ||
=== Complement of DNA === | === Complement of DNA === | ||
− | To find the complementary strand when given a standard 5' to 3' DNA strand, match each of the four base pairs A,T,C, and G with T, A, G, and C, respectively. Done in the computer, we use the sed command for replacing the bases with the ones they correspond to. | + | To find the complementary strand when given a standard 5' to 3' DNA strand, match each of the four base pairs A,T,C, and G with T, A, G, and C, respectively. Done in the computer, we use the ''sed'' command for replacing the bases with the ones they correspond to. |
To find the complement of the DNA strand, the following command is used: | To find the complement of the DNA strand, the following command is used: | ||
Command: sed "y/actg/tgac/" prokaryote.txt | Command: sed "y/actg/tgac/" prokaryote.txt | ||
Line 13: | Line 13: | ||
=== Reading Frames === | === Reading Frames === | ||
− | For each of the reading frames for the mRNA strand, we first have to convert the DNA strand into mRNA. We use the sed command again in order to change all the T's into U's: | + | For each of the reading frames for the mRNA strand, we first have to convert the DNA strand into mRNA. We use the ''sed'' command again in order to change all the T's into U's: |
sed "s/t/u/g" | sed "s/t/u/g" | ||
− | Then we divide the strand into its codons so we use the wildcard "." since we do not care which letters we look to replace. Then, we use "& " because we want to keep those same letters, but we want to add a space between them. This would result in the same string of letters, but with the space character every 3 letters. | + | Then we divide the strand into its codons so we use the wildcard "." (in this case, since we want 3 characters, we use "...") since we do not care which letters we look to replace. Then, we use "& " because we want to keep those same letters, but we want to add a space between them. This would result in the same string of letters, but with the space character every 3 letters. |
sed "s/.../& /g" | sed "s/.../& /g" | ||
− | Next, we use the existing file genetic-code.sed, which already has all the codons and their corresponding amino acids. We use the following command: | + | Next, we use the existing file ''genetic-code.sed'', which already has all the codons and their corresponding amino acids. We use the following command: |
sed -f genetic-code.sed | sed -f genetic-code.sed | ||
− | Then, we will get something that looks like this (using the prokaryote.txt file as an example): | + | Then, we will get something that looks like this (using the ''prokaryote.txt'' file as an example): |
S T I F Q - V R W P K K T I L N L K R C L I P C S A Y N P A A S S A G G I L ac | S T I F Q - V R W P K K T I L N L K R C L I P C S A Y N P A A S S A G G I L ac | ||
− | Since this strand still has the residual bases that are not converted to codons, we want to remove these bases. We use the command: | + | Since this strand still has the residual bases at the end that are not converted to codons, we want to remove these bases. We use the command: |
sed "y/acug/ /" | sed "y/acug/ /" | ||
So that each of the residual bases is replaced with a space character. We can then remove ALL space characters with the following command: | So that each of the residual bases is replaced with a space character. We can then remove ALL space characters with the following command: | ||
Line 30: | Line 30: | ||
Yields: STIFQ-VRWPKKTILNLKRCLIPCSAYNPAASSAGGIL | Yields: STIFQ-VRWPKKTILNLKRCLIPCSAYNPAASSAGGIL | ||
− | However, for the +2 and +3 reading frames, we have to shift reading the codons by 1 and 2 letters, respectively. The same commands from above are still used, but we add another sed command so that we shift by a certain number of letters. For +2, we add the command: | + | However, for the +2 and +3 reading frames, we have to shift reading the codons by 1 and 2 letters, respectively. The same commands from above are still used, but we add another ''sed'' command so that we shift by a certain number of letters. For +2, we add the command: |
sed "s/^.//g" | sed "s/^.//g" | ||
− | so that we shift by 1 letter from the very first letter (the symbol "^" means that we only want the | + | so that we shift by 1 letter from the very first letter (the symbol "^" means that we only want the beginning character(s)). |
+2: cat prokaryote.txt | sed "s/t/u/g" | sed "s/^.//g" | sed "s/.../& /g" | sed -f genetic-code.sed | | +2: cat prokaryote.txt | sed "s/t/u/g" | sed "s/^.//g" | sed "s/.../& /g" | sed -f genetic-code.sed | | ||
sed "y/acug/ /" | sed "s/ //g" | sed "y/acug/ /" | sed "s/ //g" | ||
Yields: LLYFNRYDGQRRQY-T-NVA-YHVPRITQPPVPLAAF- | Yields: LLYFNRYDGQRRQY-T-NVA-YHVPRITQPPVPLAAF- | ||
− | For +3, we use the | + | For +3, similar to +2, we use the command: |
sed "s/^..//g" | sed "s/^..//g" | ||
for shifting by 2 letters in the strand. | for shifting by 2 letters in the strand. |
Revision as of 22:12, 21 September 2015
Contents
The Genetic Code, by computer
Complement of DNA
To find the complementary strand when given a standard 5' to 3' DNA strand, match each of the four base pairs A,T,C, and G with T, A, G, and C, respectively. Done in the computer, we use the sed command for replacing the bases with the ones they correspond to. To find the complement of the DNA strand, the following command is used:
Command: sed "y/actg/tgac/" prokaryote.txt Yields: agatgatataaagttatccatgctaccggtttcttctgttataacttgaactttgcaacggattatggtacaaggcgcatattgggtcggcggtcaaggcgaccgccgtaaaattg
where "prokaryote.txt" is the file that contains the original DNA strand. Similarly, we can use this command for a file with a longer DNA strand:
Command: sed "y/actg/tgac/" infA-E.coli-K12.txt (The resulting strand is not shown since it is too long).
Reading Frames
For each of the reading frames for the mRNA strand, we first have to convert the DNA strand into mRNA. We use the sed command again in order to change all the T's into U's:
sed "s/t/u/g"
Then we divide the strand into its codons so we use the wildcard "." (in this case, since we want 3 characters, we use "...") since we do not care which letters we look to replace. Then, we use "& " because we want to keep those same letters, but we want to add a space between them. This would result in the same string of letters, but with the space character every 3 letters.
sed "s/.../& /g"
Next, we use the existing file genetic-code.sed, which already has all the codons and their corresponding amino acids. We use the following command:
sed -f genetic-code.sed
Then, we will get something that looks like this (using the prokaryote.txt file as an example):
S T I F Q - V R W P K K T I L N L K R C L I P C S A Y N P A A S S A G G I L ac
Since this strand still has the residual bases at the end that are not converted to codons, we want to remove these bases. We use the command:
sed "y/acug/ /"
So that each of the residual bases is replaced with a space character. We can then remove ALL space characters with the following command:
sed "s/ //g"
For the +1 reading frame, the above commands would suffice and when we combine them into a pipeline of commands, we get the following:
+1: cat prokaryote.txt | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "y/acug/ /" | sed "s/ //g" Yields: STIFQ-VRWPKKTILNLKRCLIPCSAYNPAASSAGGIL
However, for the +2 and +3 reading frames, we have to shift reading the codons by 1 and 2 letters, respectively. The same commands from above are still used, but we add another sed command so that we shift by a certain number of letters. For +2, we add the command:
sed "s/^.//g"
so that we shift by 1 letter from the very first letter (the symbol "^" means that we only want the beginning character(s)).
+2: cat prokaryote.txt | sed "s/t/u/g" | sed "s/^.//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "y/acug/ /" | sed "s/ //g" Yields: LLYFNRYDGQRRQY-T-NVA-YHVPRITQPPVPLAAF-
For +3, similar to +2, we use the command:
sed "s/^..//g"
for shifting by 2 letters in the strand.
+3: cat prokaryote.txt | sed "s/t/u/g" | sed "s/^..//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "y/acug/ /" | sed "s/ //g" Yields: YYISIGTMAKEDNIELETLPNTMFRV-PSRQFRWRHFN
For the -1, -2, and -3 reading frames, 2 additional commands are needed: the commands rev, to reverse the strand, and sed "y/acug/ugac/", to find the complementary mRNA strand. By doing this, we do not have to deviate much from our previous commands shown above. Instead, we are only adding 2 additional steps. The resulting reading frames are as follows:
-1: rev prokaryote.txt | sed "s/t/u/g" | sed "y/acug/ugac/" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "y/acug/ /" | sed "s/ //g" Yields: VKMPPAELAAGLYAEHGIRQRFKFNIVFFGHRTY-NIV
-2: rev prokaryote.txt | sed "s/t/u/g" | sed "y/acug/ugac/" | sed "s/^.//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "y/acug/ /" | sed "s/ //g" Yields: LKCRQRNWRLGYTRNMVLGNVSSSILSSLAIVPIEI--
-3: rev prokaryote.txt | sed "s/t/u/g" | sed "y/acug/ugac/" | sed "s/^..//g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed "y/acug/ /" | sed "s/ //g" Yields: -NAASGTGGWVIRGTWY-ATFQVQYCLLWPSYLLKYSR
For the other file, we need only replace the command "cat prokaryote.txt" with "cat infA-E.coli-K12.txt".
Assignment Links
Weekly Assignments
- Week 1
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12
- No Week 13 Assignment
- Week 14
- Week 15
Individual Journal Entries
- Week 1 - This is technically the user page.
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12
- No Week 13 Assignment
- Week 14
- Week 15