Kevin Wyllie Week 4

Transcription and Translation “Taken to the Next Level”

Question 1

cat infA-E.coli-K12.txt | sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g"

The sed command shown finds the beginning tag for the -35 box and the end tag for the -10 box, based on their consensus sequence and known distance from each other...

cat infA-E.coli-K12.txt | sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | sed "s/<minus35box>tt[gt]ac[at]/&<\/minus35box> /g" | sed "s/[ct]at[at]at<\/minus10box>/ <minus10box>&/g"

...and these sed commands add the remaining tags for the -35 and -10 box, based on the location of either existing tag.

cat infA-E.coli-K12.txt | ... | sed -r "s/[ct]at[at]at<\/minus10box>(.){6}/& <tss>/g" | sed "s/<tss>./&<\/tss> /g"

These commands add the tags for the transcription start site (TSS), based on its consensus sequence and known distance from the -10 box.

cat infA-E.coli-K12.txt | ... | sed "s/gagg/ <rbs>&<\/rbs> /g"

This sed command adds the ribosome binding site (RBS), based its consensus sequence.

cat infA-E.coli-K12.txt | ... | sed "s/aaaaggt/ <terminator>&/g" | sed "s/gcctttt…./&<\/terminator> /g"

This sed command adds the tags for the terminator, based off of its consensus sequence and hairpin behavior.

cat infA-E.coli-K12.txt | ... | sed "s/<\/rbs> /&\n/g" | sed "2s/atg/ <start_codon>&<\/start_codon> /1"

These commands start a new line after the RBS and then add start-codon tags to the first occurring "atg" sequence on the newly separated line.

cat infA-E.coli-K12.txt | ... | sed "2s/ta[ag]/\n&/g" | sed "3,10s/tga/\n&/g"

These commands start a new line for each potential stop codon sequence occurring after the RBS.

cat infA-E.coli-K12.txt | ... | sed "17s/tga/ <stop_codon>&<\/stop_codon> /g"

These commands at stop-codon tags to the potential stop codon sequence occurring closest to (but not after) the terminator.

cat infA-E.coli-K12.txt | ... | sed ':a;N;$!ba;s/\n//g'

This rather unintelligible command undoes any line breaks, combining the text file back into one line.

right

The final pipe is shown entered into the command line, along with the output. The tagged sequence is:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataa ggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgc cgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact</minus35box> tatttacagaacttcgg <minus10box> cattat</minus10box> cttgc <tss>c</tss> ggttcaaattacggtagtgatacccca <rbs>gagg</rbs> attag <start_codon>atg</start_codon> gccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttc cgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtg actgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcc <stop_codon>tga</stop_codon> tgggcgaagagaaagaacgagt <terminator>aaaaggtcggtttaaccggcctttttatt </terminator> ttat

Question 2

The mRNA sequence can be isolated from the file using some of the same commands as before.

sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | sed "s/<minus35box>tt[gt]ac[at]/&<\/minus35box> /g" | sed "s/[ct]at[at]at<\/minus10box>/ <minus10box>&/g" | sed -r "s/[ct]at[at]at<\/minus10box>(.){6}/& <tss>/g"

The pipe starts wiith question 1's first few commands, up to the point of adding the TSS tags. However, the TSS end tag has been omitted.

sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | ... | sed "s/gcctttt..../&<\/terminator> /g"

This command adds the terminator's end tag, but not its beginning tag.

sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | ... | sed "s/<tss>/&\n/g" | sed "s/<\/terminator>/\n&/g"

These commands begin new lines after the TSS beginning tag and before the terminator's end tag.

sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | ... | sed "3D" | sed "1D"

These commands delete the top and bottom (first and third) lines, leaving only the middle.

right

The final pipe is shown entered into the command line, along with the output. The transcribed sequence is:

cggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacg tggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctg attgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttatt

Question 3

Again, the amino acid sequence can be found by using modified bits and pieces from question 1.

right

The pipe is shown entered into the command line, along with the output. The amino sequence this returns is:

M A K E D N I E M Q G T V L E T L P N T M F R V E L E N G H V V T A H I S G K M R K N Y I R I L T G D K V T V E L T P Y D L S K G R I V F R S R - L F Y R L M G E E K E R V K G R F N R P F Y

Kevin Wyllie Week 4

Contents

Transcription and Translation “Taken to the Next Level”

Question 1

Question 2

Question 3

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools