Kevin Wyllie Week 4
Contents
Transcription and Translation “Taken to the Next Level”
Question 1
cat infA-E.coli-K12.txt | sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g"
The sed command shown finds the beginning tag for the -35 box and the end tag for the -10 box, based on their consensus sequence and known distance from each other...
cat infA-E.coli-K12.txt | sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | sed "s/<minus35box>tt[gt]ac[at]/&<\/minus35box> /g" | sed "s/[ct]at[at]at<\/minus10box>/ <minus10box>&/g"
...and these sed commands add the remaining tags for the -35 and -10 box, based on the location of either existing tag.
cat infA-E.coli-K12.txt | ... | sed -r "s/[ct]at[at]at<\/minus10box>(.){6}/& <tss>/g" | sed "s/<tss>./&<\/tss> /g"
These commands add the tags for the transcription start site (TSS), based on its consensus sequence and known distance from the -10 box.
cat infA-E.coli-K12.txt | ... | sed "s/gagg/ <rbs>&<\/rbs> /g"
This sed command adds the ribosome binding site (RBS), based its consensus sequence.
cat infA-E.coli-K12.txt | ... | sed "s/aaaaggt/ <terminator>&/g" | sed "s/gcctttt…./&<\/terminator> /g"
This sed command adds the tags for the terminator, based off of its consensus sequence and hairpin behavior.
cat infA-E.coli-K12.txt | ... | sed "s/<\/rbs> /&\n/g" | sed "2s/atg/ <start_codon>&<\/start_codon> /1"
These commands start a new line after the RBS and then add start-codon tags to the first occurring "atg" sequence on the newly separated line.
cat infA-E.coli-K12.txt | ... | sed "2s/ta[ag]/\n&/g" | sed "3,10s/tga/\n&/g"
These commands start a new line for each potential stop codon sequence occurring after the RBS.
cat infA-E.coli-K12.txt | ... | sed "17s/tga/ <stop_codon>&<\/stop_codon> /g"
These commands at stop-codon tags to the potential stop codon sequence occurring closest to (but not after) the terminator.
cat infA-E.coli-K12.txt | ... | sed ':a;N;$!ba;s/\n//g'
This rather unintelligible command undoes any line breaks, combining the text file back into one line.
- The final pipe is shown entered into the command line, along with the output. The tagged sequence is:
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataa ggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgc cgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact</minus35box> tatttacagaacttcgg <minus10box> cattat</minus10box> cttgc <tss>c</tss> ggttcaaattacggtagtgatacccca <rbs>gagg</rbs> attag <start_codon>atg</start_codon> gccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttc cgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtg actgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcc <stop_codon>tga</stop_codon> tgggcgaagagaaagaacgagt <terminator>aaaaggtcggtttaaccggcctttttatt </terminator> ttat
Question 2
The mRNA sequence can be isolated from the file using some of the same commands as before.
sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | sed "s/<minus35box>tt[gt]ac[at]/&<\/minus35box> /g" | sed "s/[ct]at[at]at<\/minus10box>/ <minus10box>&/g" | sed -r "s/[ct]at[at]at<\/minus10box>(.){6}/& <tss>/g"
The pipe starts wiith question 1's first few commands, up to the point of adding the TSS tags. However, the TSS end tag has been omitted.
sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | ... | sed "s/gcctttt..../&<\/terminator> /g"
This command adds the terminator's end tag, but not its beginning tag.
sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | ... | sed "s/<tss>/&\n/g" | sed "s/<\/terminator>/\n&/g"
These commands begin new lines after the TSS beginning tag and before the terminator's end tag.
sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | ... | sed "3D" | sed "1D"
These commands delete the top and bottom (first and third) lines, leaving only the middle.
- The final pipe is shown entered into the command line, along with the output. The transcribed sequence is:
cggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacg tggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctg attgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttatt
Question 3
Again, the amino acid sequence can be found by using modified bits and pieces from question 1.
- The pipe is shown entered into the command line, along with the output. The amino sequence this returns is:
M A K E D N I E M Q G T V L E T L P N T M F R V E L E N G H V V T A H I S G K M R K N Y I R I L T G D K V T V E L T P Y D L S K G R I V F R S R - L F Y R L M G E E K E R V K G R F N R P F Y