I opened up terminal, and used the ssh command to get into dondi's directory: ~dondi/xmlpipedb/data. In there I got access to infA-E.coli-K12.txt which is the nucleotide sequence I will be using for this assignment.

−

# Modify the gene sequence string so that it highlights or “tags” the special sequences within this gene, as follows (ellipses indicate bases in the sequence; note the spaces before the start tag and after the end tag):

* Modify the gene sequence string so that it highlights or “tags” the special sequences within this gene, as follows (ellipses indicate bases in the sequence; note the spaces before the start tag and after the end tag):

* -35 box of the promoter

**As shown in class I used the sed command to get the first occurrence of the minus 35 strand in the sequence:

** The stop codon requires I find one of three possible three character sequences. At first I tried using brackets: "t[ag][ag]", but I soon found out that that yielded too many results. There are only three stop codons and the brackets give me 4 unique codons. So into the wiki I went, and realized I could use a vertical bar to separate three unique codons, and search for them. The problem however, was that this did not work. After being stumped for awhile I realized that before I piped to that command I needed to break up the line into sets of 3, just like I did in the week 3 assignment. As a result I got this command: <code> cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box>\n/1" | sed -r "2s/^.{17}/&\n/g" | sed -r "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" | sed -r "4s/^.{5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>\n/1" | sed "7s/atg/<start_codon>&<\/start_codon>\n/1" | sed "8s/.../& /g"| sed -r "8s/tag|tga|taa/<stop_codon>&<\/stop_codon>/1"</code>

*terminator

−

** The first part of the terminator hairpin is: <code> aaaaggt </code>, which means, abiding by the rules of the terminator provided to us, that the first half bonds with <code> gcctttt </code>. So now the trick is to grab the correct terminator sequence. I ended up breaking the terminator command into two different commands. I used one to insert the first tag, and the second one to insert the second tag. I did this because I wasn't sure how long the sequence would be between the two hairpin sequences. This is what I got to capture the terminator sequence: <code> cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box>\n/1" | sed -r "2s/^.{17}/&\n/g" | sed -r "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" | sed -r "4s/^.{5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>\n/1" | sed "7s/atg/<start_codon>&<\/start_codon>\n/1" | sed "8s/.../& /g"| sed -r "8s/tag|tga|taa/<stop_codon>&<\/stop_codon>/1" | sed "8s/ //g" | sed "8s/aaaaggt/<terminator>&/g" | sed -r "8s/gcctttt..../&<\/terminator>/g" </code>

** The first part of the terminator hairpin is: <code>aaaaggt</code>, which means, abiding by the rules of the terminator provided to us, that the first half bonds with <code> gcctttt </code>. So now the trick is to grab the correct terminator sequence. I ended up breaking the terminator command into two different commands. I used one to insert the first tag, and the second one to insert the second tag. I did this because I wasn't sure how long the sequence would be between the two hairpin sequences. This is what I got to capture the terminator sequence: <code> cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box>\n/1" | sed -r "2s/^.{17}/&\n/g" | sed -r "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" | sed -r "4s/^.{5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>\n/1" | sed "7s/atg/<start_codon>&<\/start_codon>\n/1" | sed "8s/.../& /g"| sed -r "8s/tag|tga|taa/<stop_codon>&<\/stop_codon>/1" | sed "8s/ //g" | sed "8s/aaaaggt/<terminator>&/g" | sed -r "8s/gcctttt..../&<\/terminator>/g" </code>

* And so, finally, it is all marked up. However I'm not quite done yet, I need to get rid of all the new lines I created. In order to do this I used this command: sed ':a;N;$!ba;s/\n//g' (from wiki), so the final output is as follows.

−

#*(a)<code> cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box>\n/1" | sed -r "2s/^.{17}/&\n/g" | sed -r "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" | sed -r "4s/^.{5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>\n/1" | sed "7s/atg/<start_codon>&<\/start_codon>\n/1" | sed "8s/.../& /g"| sed -r "8s/tag|tga|taa/<stop_codon>&<\/stop_codon>/1" | sed "8s/ //g" | sed "8s/aaaaggt/<terminator>&/g" | sed -r "8s/gcctttt..../&<\/terminator>/g" | sed ':a;N;$!ba;s/\n//g' </code>

*(a)

−

~~#*(b)~~

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcg

gagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</minus35box>tatttacagaacttcgg<minus10box>cattat</minus10box>cttgc<tss>c</tss>ggttcaaattacggtagtgatacccca

<rbs>gagg</rbs>attag<start_codon>atg</start_codon>gccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttac

tgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgc<stop_codon>tga</stop_codon>

ttgttttaccgcctgatgggcgaagagaaagaacgagt<terminator>aaaaggtcggtttaaccggcctttttatt</terminator>ttat

−

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcg gagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</minus35box>tatttacagaacttcgg<minus10box>cattat</minus10box>cttgc<tss>c</tss>ggttcaaattacggtagtgatacccca<rbs>gagg</r bs>attag<start_codon>atg</start_codon>gccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctg acgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgc<stop_codon>tga</stop_codon>ttgttttaccgcctgatgggcgaagagaaagaacgagt<terminator>aaaaggtcggtttaaccggcctttttat t</terminator>ttat

*(b)And the final command is as follows:

<code> cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box>\n/1" | sed -r "2s/^.{17}/&\n/g" | sed -r "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" | sed -r "4s/^.{5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>\n/1" | sed "7s/atg/<start_codon>&<\/start_codon>\n/1" | sed "8s/.../& /g"| sed -r "8s/tag|tga|taa/<stop_codon>&<\/stop_codon>/1" | sed "8s/ //g" | sed "8s/aaaaggt/<terminator>&/g" | sed -r "8s/gcctttt..../&<\/terminator>/g" | sed ':a;N;$!ba;s/\n//g' </code>

−

~~# What is the ''exact'' mRNA sequence that is transcribed from this gene?~~

−

~~#*(a)~~

−

~~#*(b)~~

−

~~# What is the amino acid sequence that is translated from this mRNA?~~

−

~~#*(a)~~

−

~~#*(b)~~

−

~~==== Supplementary Information ====~~

* What is the ''exact'' mRNA sequence that is transcribed from this gene?

**In order to get the mRNA sequence I need to get the sequence between the transcription start site and the terminator. I found it easiest to make new lines based on the mark up tags already there. From that point I can pick and choose which lines I need to transcribe. Using sed, I can delete lines. Example: <code> sed "2,4D"</code> So, using this trick, I deleted all unnecessary lines. From there all nucleotides not deleted should be transcribed into mRNA. I was going to make new lines by typing out a bunch of different sed commands for each different tag, but I can do it simply by using two. This puts each tag on its own line: <code> sed "s/>/&\n/g" | sed "s/</\n&/g"</code>. Now I go through, delete the tags and the useless sequences, remove the extra lines, and transcribe. Here is the sequence followed by the command.

**(a)

cgguucaaauuacgguagugauaccccagaggauuagauggccaaagaagacaauauugaaaugcaagguaccguucuug

aaacguugccuaauaccauguuccgcguagaguuagaaaacggucacgugguuacugcacacaucuccgguaaaaugcgca

aaaacuacauccgcauccugacgggcgacaaagugacuguugaacugaccccguacgaccugagcaaaggccgcauugu

cuuccguagucgcugauuguuuuaccgccugaugggcgaagagaaagaacgaguaaaaggucgguuuaaccggccuuuuuauu

**(b)

<code>cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box>\n/1"

| sed -r "2s/^.{17}/&\n/g" | sed -r "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" | sed -r "4s/^.{5}/&\n/g"

| sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>\n/1" | sed "7s/atg/<start_codon>&<\/start_codon>\n/1"

| sed "8s/aaaaggt/<terminator>&/g" | sed -r "8s/gcctttt..../&<\/terminator>/g" | sed ':a;N;$!ba;s/\n//g'

| sed "s/>/&\n/g" | sed "s/</\n&/g" | sed "1,10D;12D;14D;16D;18D;20D;22D;24D;26D;28D;29D"

| sed ':a;N;$!ba;s/\n//g' | sed "s/t/u/g" </code>

Exception encountered, of type "Error"
[57f9324b] /biodb/fall2015/index.php?diff=cur&oldid=1825&title=Jwoodlee_Week_4 Error from line 434 of /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php: Call to undefined function each()
Backtrace:
#0 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(544): DiffEngine->diag()
#1 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(344): DiffEngine->compareSeq()
#2 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(227): DiffEngine->diffLocal()
#3 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(721): DiffEngine->diff()
#4 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(859): Diff->__construct()
#5 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(980): MappedDiff->__construct()
#6 /apps/xmlpipedb/biodb/fall2015/includes/diff/TableDiffFormatter.php(194): WordLevelDiff->__construct()
#7 /apps/xmlpipedb/biodb/fall2015/includes/diff/DiffFormatter.php(140): TableDiffFormatter->changed()
#8 /apps/xmlpipedb/biodb/fall2015/includes/diff/DiffFormatter.php(111): DiffFormatter->block()
#9 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(888): DiffFormatter->format()
#10 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(802): DifferenceEngine->generateTextDiffBody()
#11 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(733): DifferenceEngine->generateContentDiffBody()
#12 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(662): DifferenceEngine->getDiffBody()
#13 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(632): DifferenceEngine->getDiff()
#14 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(453): DifferenceEngine->showDiff()
#15 /apps/xmlpipedb/biodb/fall2015/includes/page/Article.php(795): DifferenceEngine->showDiffPage()
#16 /apps/xmlpipedb/biodb/fall2015/includes/page/Article.php(506): Article->showDiffPage()
#17 /apps/xmlpipedb/biodb/fall2015/includes/actions/ViewAction.php(44): Article->view()
#18 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(395): ViewAction->show()
#19 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(273): MediaWiki->performAction()
#20 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(566): MediaWiki->performRequest()
#21 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(414): MediaWiki->main()
#22 /apps/xmlpipedb/biodb/fall2015/index.php(44): MediaWiki->run()
#23 {main}