Bklein7 Week 4
From LMU BioDB 2015
Revision as of 19:14, 28 September 2015 by Bklein7 (Talk | contribs) (Added the minus 35 box procedure)
Contents
Transcription and Translation "Taken to the Next Level"
This assignment centers on text manipulation of the nucleotide sequence present in the file infA-E.coli-K12.txt. Therefore, I began this assignment by accessing the directory ~dondi/xmlpipedb/data and copying this file to my personal directory using the command cp infA-E.coli-K12.txt bklein7
. From there, I evoked cd
to return to the directory bklein7 and begin the assignment.
Using Code to Tag the Nucleotide Sequence (Part #1)
- Minus 35 Box
- I began by identifying the minus 35 box. Using the tips from the assignment, I used the command
grep "tt[gt]ac[at]" infA-E.coli-K12.txt
to identify sequences matches for the minus 35 box. There were two matches. - To identify which match was the the minus 35 box, I performed the command
grep "[ct]at[at]at" infA-E.coli-K12.txt
to identify possible minus 10 box matches. Because we know the minus 10 box match must occur 17 nucleotides after the minus 35 box, only one pair of matches for the minus 35 & 10 boxes was possible. Although this process does require some subjective analysis, we discussed in class that this was permissible in this case to initiate the tagging process. - Having identified the minus 35 box as the first of the two grep matches, a sed command was used to tag this sequence.
- A sed command in the form s///1 was used to manipulate the first match to the sequence tt[gt]ac[at].
- The "\" symbol was used to escape the forward slash in the tag.
- A new line was started after the tag to make it simpler to perform subsequent operations on the sequence.
- The command is as follows:
- I began by identifying the minus 35 box. Using the tips from the assignment, I used the command
sed "s/tt[gt]ac[at]/<minus35box>&<\/minux35box>\n/1" infA-E.coli-K12.txt
- Modify the gene sequence string so that it highlights or “tags” the special sequences within this gene, as follows (ellipses indicate bases in the sequence; note the spaces before the start tag and after the end tag):
- -35 box of the promoter
... <minus35box>...</minus35box> ...
- -10 box of the promoter
... <minus10box>...</minus10box> ...
- transcription start site
... <tss>...</tss> ...
- ribosome binding site
... <rbs>...</rbs> ...
- start codon
... <start_codon>...</start_codon> ...
- stop codon
... <stop_codon>...</stop_codon> ...
- terminator
... <terminator>...</terminator> ...
- -35 box of the promoter
- What is the exact mRNA sequence that is transcribed from this gene?
- What is the amino acid sequence that is translated from this mRNA?
Preliminary Code
sed "s/tt[gt]ac[at]/<minus35box>&<\/minux35box>\n/1" infA-E.coli-K12.txt | sed -r "2s/^. {17}/&<minus10box>/g" | sed "s/<minus10box>....../&<\/minus10box/g" | sed -r "s/<\/minus10box>.{11}/&<tss>/g" | sed "s/<tss>./&<\/tss/g" | sed "s/gagg/<rbs>&<\/rbs>\n/g" | sed "3s/atg/<start_codon>&<\/start_codon>\n/1" | sed "s/aaaaggt/\n<terminator>&/g" | sed - r "s/aaaaggt.*gcctttt.{4}/&<\/terminator>/g" | sed "4s/.../& /g" | sed -r "4s/taa|tag|tga/<stop_codon>&<\/stop_codon>/g" | sed "4s/ //g" | sed ':a;N;$!ba;s/\n//g'
- String together redundant sed commands with ; or find a way to make them more compact
- Verify when the terminator sequence starts
- Is there a way to count backwards in a line with sed? or so replace only the last instance that matches? previous experiments
- sed "s/x.*$/y/g" - does not work, wild card overtakes the last instance
- sed "s/x/y/1$" - does not work
- answer: use rev commands!
Links
- User Page: Brandon Klein
- Team Page: The Class Whoopers
Assignments Pages
- Week 1 Assignment
- Week 2 Assignment
- Week 3 Assignment
- Week 4 Assignment
- Week 5 Assignment
- Week 6 Assignment
- Week 7 Assignment
- Week 8 Assignment
- Week 9 Assignment
- Week 10 Assignment
- Week 11 Assignment
- Week 12 Assignment
- No Week 13 Assignment
- Week 14 Assignment
- Week 15 Assignment
Individual Journal Entries
- Week 1 Individual Journal
- Week 2 Individual Journal
- Week 3 Individual Journal
- Week 4 Individual Journal
- Week 5 Individual Journal
- Week 6 Individual Journal
- Week 7 Individual Journal
- Week 8 Individual Journal
- Week 9 Individual Journal
- Week 10 Individual Journal
- Week 11 Individual Journal
- Week 12 Individual Journal
- No Week 13 Journal
- Week 14 Individual Journal
- Week 15 Individual Journal
- Week 1 Class Journal
- Week 2 Class Journal
- Week 3 Class Journal
- Week 4 Class Journal
- Week 5 Class Journal
- Week 6 Class Journal
- Week 7 Class Journal
- Week 8 Class Journal
- Week 9 Class Journal
- Week 10 Team Journal
- Week 11 Team Journal
- Week 12 Team Journal
- No Week 13 Journal
- Week 14 Team Journal
- Week 15 Team Journal