**#A new line was started after the tag to make it simpler to perform subsequent operations on the sequence.

**#The command is as follows:

−

sed "s/tt[gt]ac[at]/<minus35box>&<\/minux35box>\n/1" infA-E.coli-K12.txt

*'''Minus 10 Box'''

**After identifying the minus 35 box, I broke the nucleotide sequence up into two lines as detailed above. Because we know that the minus 10 box exists 17 nucleotides after the minus 35 box and is 6 nucleotides long, this meant that the minus 10 box presumably would always consist of characters 18-23 in line 2 of the code. I verified this for the sequence present in ''infA-E.coli-K12.txt'' by checking that the sequence 17 nucleotides after the tagged minus 35 box did indeed match the minus 10 box sequence.

**#I wrote the second sed command to add the tag <code></minus10box></code> 6 characters after the first minus 10 box tag.

**#With the addition of these two commands, my command sequence was as follows:

−

sed "s/tt[gt]ac[at]/<minus35box>&<\/minux35box>\n/1" infA-E.coli-K12.txt | sed -r "2s/^.

−

{17}/&<minus10box>/g" | sed "s/<minus10box>....../&<\/minus10box>/g"

*'''tss'''

**Because the transcription start site is located 12 nucleotides from the beginning of the 6 nucleotide long minus 10 box, I reasoned that it would be simplest to calculate its location as 6 characters after the <code></minus10box></code> tag. This means that there are 5 characters between that tag before the <code><tss></code> tag should be inserted. Subsequently, the <code></tss></code> end tag should be placed one character after the tss start tag.

**#To accomplish this, I strung together two sed commands using a semicolon to both insert the start and end tags at the right locations.

**#I piped this command to the growing sequence:

−

sed "s/tt[gt]ac[at]/<minus35box>&<\/minux35box>\n/1" infA-E.coli-K12.txt | sed -r "2s/^.

−

{17}/&<minus10box>/g" | sed "s/<minus10box>....../&<\/minus10box>/g" | sed -r

−

"s/<\/minus10box>.{5}/&<tss>/g;s/<tss>./&<\/tss>\n/g"

*'''rbs'''

**The consensus sequence for the ribosome binding site is ''gagg''. Although a quick grep search shows there is only one instance of this sequence in '' infA-E.coli-K12.txt'', I chose to write code that would label only the first instance of ''gagg'' after the tss (as there could be many instances of this sequence in a long gene). To accomplish this, I wrote a sed command to insert the rbs labels before and after the first occurrence of ''gagg'' in the third line (this line begins after the tss). In addition, I decided to start a new line after the rbs. I piped this command to the sequence:

−

sed "s/tt[gt]ac[at]/<minus35box>&<\/minux35box>\n/1" infA-E.coli-K12.txt | sed -r "2s/^.

−

{17}/&<minus10box>/g" | sed "s/<minus10box>....../&<\/minus10box>/g" | sed -r

−

"s/<\/minus10box>.{5}/&<tss>/g;s/<tss>./&<\/tss>\n/g" | sed "3s/gagg/<rbs>&<\/rbs>\n/1"

*'''start codon'''

**The method I used for labelling the start codon was nearly identical to that used to label the rbs above. The only difference was that the sed command I wrote was specified to work on the 4th line (beginning after the rbs) instead of the third line. With the addition of this command, the new command sequence labelled the first instance of the sequence ''atg'' after the rbs as the start codon. The updated version of the code was as follows:

−

sed "s/tt[gt]ac[at]/<minus35box>&<\/minux35box>\n/1" infA-E.coli-K12.txt | sed -r "2s/^.

−

{17}/&<minus10box>/g" | sed "s/<minus10box>....../&<\/minus10box>/g" | sed -r

−

"s/<\/minus10box>.{5}/&<tss>/g;s/<tss>./&<\/tss>\n/g" | sed "3s/gagg/<rbs>&<\/rbs>\n/1"

−

sed "4s/atg/<start_codon>&<\/start_codon>\n/1"

*'''terminator'''

**I chose to tag the terminator region prior to tagging the stop codon. This is because I found it most sensible to separate the region of the sequence between the start codon and terminator into its own line prior to writing code to find and tag the stop codon.

**#Next, I wrote a command using the wildcard function to identify the entire terminator hairpin sequence, account for the remaining 4 nucleotides in the terminator, and place the <code></terminator></code> tag at its end. It wasn't necessary to introduce a new line break after this tag.

**#The updated code is as follows:

−

sed "s/tt[gt]ac[at]/<minus35box>&<\/minux35box>\n/1" infA-E.coli-K12.txt | sed -r "2s/^.

−

{17}/&<minus10box>/g" | sed "s/<minus10box>....../&<\/minus10box>/g" | sed -r

−

"s/<\/minus10box>.{5}/&<tss>/g;s/<tss>./&<\/tss>\n/g" | sed "3s/gagg/<rbs>&<\/rbs>\n/1"

−

sed "4s/atg/<start_codon>&<\/start_codon>\n/1" | sed "5s/aaaaggt/\n<terminator>&/g" |

−

sed -r "6s/aaaaggt.*gcctttt.{4}/&<\/terminator>/g"

*'''stop codon'''

**Having isolated the region of the sequence between the start codon and the terminator region in line 5, I set out to identify the stop codon within this line. This presented two main challenges. First, it was necessary to search for all three possible stop codons at once in order to identify the first occurrence of a stop codon within the line. Second, it was also necessary to account for reading frames when finding the stop codon.

**#Finally, I added a command to the sequence to delete spaces present in the 5th line that were no longer necessary.

**#Piping these commands yielded the following sequence:

−

sed "s/tt[gt]ac[at]/<minus35box>&<\/minux35box>\n/1" infA-E.coli-K12.txt | sed -r "2s/^.

−

{17}/&<minus10box>/g" | sed "s/<minus10box>....../&<\/minus10box>/g" | sed -r

−

"s/<\/minus10box>.{5}/&<tss>/g;s/<tss>./&<\/tss>\n/g" | sed "3s/gagg/<rbs>&<\/rbs>\n/1"

−

sed "4s/atg/<start_codon>&<\/start_codon>\n/1" | sed "5s/aaaaggt/\n<terminator>&/g" |

−

sed -r "6s/aaaaggt.*gcctttt.{4}/&<\/terminator>/g" | sed"5s/.../& /g" | sed -r "5s/taa|tag|tga/

−

<stop_codon>&<\/stop_codon>/1" | sed "5s/ //g"

*'''Combining Lines'''

**To conclude the tagging command sequence, I added one final command to combine the 6 lines that I created to unify the sequence. The final command sequence and its output are listed below:

−

sed "s/tt[gt]ac[at]/<minus35box>&<\/minux35box>\n/1" infA-E.coli-K12.txt | sed -r "2s/^.

−

{17}/&<minus10box>/g" | sed "s/<minus10box>....../&<\/minus10box>/g" | sed -r

−

"s/<\/minus10box>.{5}/&<tss>/g;s/<tss>./&<\/tss>\n/g" | sed "3s/gagg/<rbs>&<\/rbs>\n/1"

−

sed "4s/atg/<start_codon>&<\/start_codon>\n/1" | sed "5s/aaaaggt/\n<terminator>&/g" |

−

sed -r "6s/aaaaggt.*gcctttt.{4}/&<\/terminator>/g" | sed"5s/.../& /g" | sed -r "5s/taa|tag|tga/

−

<stop_codon>&<\/stop_codon>/1" | sed "5s/ //g" | sed ':a;N;$!ba;s/\n//g'

−

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccg

−

ctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagcc

−

gtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</mius35box>t

−

atttacagaacttcgg<minus10box>cattat</minus10box>cttgc<tss>c</tss>ggttcaaattacggtagtga

−

tacccca<rbs>gagg</rbs>attag<start_codon>atg</start_codon>gccaaagaagacaatattgaaatgca

−

aggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggta

−

aaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggcc

−

gcattgtcttccgtagtcgc<stop_codon>tga</stop_codon>ttgttttaccgcctgatgggcgaagagaaagaacg

−

agt<terminator>aaaaggtcggtttaaccggcctttttatt</terminator>ttat

===Determining the Transcription Product (Part #2)===

#*Finally, I added a command to convert the t's in the mRNA-like strand to u's, representing the process of transcription: <code>sed "s/t/u/g"</code>.

#*The final command sequence including the above steps and its output was as follows:

Exception encountered, of type "Error"
[a9da31e4] /biodb/fall2015/index.php?diff=next&oldid=1875&title=Bklein7_Week_4 Error from line 434 of /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php: Call to undefined function each()
Backtrace:
#0 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(544): DiffEngine->diag()
#1 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(344): DiffEngine->compareSeq()
#2 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(227): DiffEngine->diffLocal()
#3 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(721): DiffEngine->diff()
#4 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(859): Diff->__construct()
#5 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(980): MappedDiff->__construct()
#6 /apps/xmlpipedb/biodb/fall2015/includes/diff/TableDiffFormatter.php(194): WordLevelDiff->__construct()
#7 /apps/xmlpipedb/biodb/fall2015/includes/diff/DiffFormatter.php(140): TableDiffFormatter->changed()
#8 /apps/xmlpipedb/biodb/fall2015/includes/diff/DiffFormatter.php(82): DiffFormatter->block()
#9 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(888): DiffFormatter->format()
#10 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(802): DifferenceEngine->generateTextDiffBody()
#11 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(733): DifferenceEngine->generateContentDiffBody()
#12 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(662): DifferenceEngine->getDiffBody()
#13 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(632): DifferenceEngine->getDiff()
#14 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(453): DifferenceEngine->showDiff()
#15 /apps/xmlpipedb/biodb/fall2015/includes/page/Article.php(795): DifferenceEngine->showDiffPage()
#16 /apps/xmlpipedb/biodb/fall2015/includes/page/Article.php(506): Article->showDiffPage()
#17 /apps/xmlpipedb/biodb/fall2015/includes/actions/ViewAction.php(44): Article->view()
#18 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(395): ViewAction->show()
#19 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(273): MediaWiki->performAction()
#20 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(566): MediaWiki->performRequest()
#21 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(414): MediaWiki->main()
#22 /apps/xmlpipedb/biodb/fall2015/index.php(44): MediaWiki->run()
#23 {main}