I opened up terminal, and used the ssh command to get into dondi's directory: ~dondi/xmlpipedb/data.  In there I got access to infA-E.coli-K12.txt which is the nucleotide sequence I will be using for this assignment.   
 
I opened up terminal, and used the ssh command to get into dondi's directory: ~dondi/xmlpipedb/data.  In there I got access to infA-E.coli-K12.txt which is the nucleotide sequence I will be using for this assignment.   
# Modify the gene sequence string so that it highlights or “tags” the special sequences within this gene, as follows (ellipses indicate bases in the sequence; note the spaces before the start tag and after the end tag):
+
* Modify the gene sequence string so that it highlights or “tags” the special sequences within this gene, as follows (ellipses indicate bases in the sequence; note the spaces before the start tag and after the end tag):
 
* -35 box of the promoter
 
* -35 box of the promoter
 
**As shown in class I used the sed command to get the first occurrence of the minus 35 strand in the sequence:
 
**As shown in class I used the sed command to get the first occurrence of the minus 35 strand in the sequence:
 
** The stop codon requires I find one of three possible three character sequences.  At first I tried using brackets: "t[ag][ag]", but I soon found out that that yielded too many results.  There are only three stop codons and the brackets give me 4 unique codons.  So into the wiki I went, and realized I could use a vertical bar to separate three unique codons, and search for them.  The problem however, was that this did not work.  After being stumped for awhile I realized that before I piped to that command I needed to break up the line into sets of 3, just like I did in the week 3 assignment.  As a result I got this command: <code> cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box>\n/1" | sed -r "2s/^.{17}/&\n/g" | sed -r "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" | sed -r "4s/^.{5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>\n/1" | sed "7s/atg/<start_codon>&<\/start_codon>\n/1" | sed "8s/.../& /g"| sed -r "8s/tag|tga|taa/<stop_codon>&<\/stop_codon>/1"</code>
 
** The stop codon requires I find one of three possible three character sequences.  At first I tried using brackets: "t[ag][ag]", but I soon found out that that yielded too many results.  There are only three stop codons and the brackets give me 4 unique codons.  So into the wiki I went, and realized I could use a vertical bar to separate three unique codons, and search for them.  The problem however, was that this did not work.  After being stumped for awhile I realized that before I piped to that command I needed to break up the line into sets of 3, just like I did in the week 3 assignment.  As a result I got this command: <code> cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box>\n/1" | sed -r "2s/^.{17}/&\n/g" | sed -r "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" | sed -r "4s/^.{5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>\n/1" | sed "7s/atg/<start_codon>&<\/start_codon>\n/1" | sed "8s/.../& /g"| sed -r "8s/tag|tga|taa/<stop_codon>&<\/stop_codon>/1"</code>
 
*terminator
 
*terminator
** The first part of the terminator hairpin is: <code> aaaaggt </code>, which means, abiding by the rules of the terminator provided to us, that the first half bonds with <code> gcctttt </code>.  So now the trick is to grab the correct terminator sequence.  I ended up breaking the terminator command into two different commands.  I used one to insert the first tag, and the second one to insert the second tag.  I did this because I wasn't sure how long the sequence would be between the two hairpin sequences. This is what I got to capture the terminator sequence: <code> cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box>\n/1" | sed -r "2s/^.{17}/&\n/g" | sed -r "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" | sed -r "4s/^.{5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>\n/1" | sed "7s/atg/<start_codon>&<\/start_codon>\n/1" | sed "8s/.../& /g"| sed -r "8s/tag|tga|taa/<stop_codon>&<\/stop_codon>/1" | sed "8s/ //g" | sed "8s/aaaaggt/<terminator>&/g" | sed -r "8s/gcctttt..../&<\/terminator>/g" </code>
+
** The first part of the terminator hairpin is: <code>aaaaggt</code>, which means, abiding by the rules of the terminator provided to us, that the first half bonds with <code> gcctttt </code>.  So now the trick is to grab the correct terminator sequence.  I ended up breaking the terminator command into two different commands.  I used one to insert the first tag, and the second one to insert the second tag.  I did this because I wasn't sure how long the sequence would be between the two hairpin sequences. This is what I got to capture the terminator sequence: <code> cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box>\n/1" | sed -r "2s/^.{17}/&\n/g" | sed -r "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" | sed -r "4s/^.{5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>\n/1" | sed "7s/atg/<start_codon>&<\/start_codon>\n/1" | sed "8s/.../& /g"| sed -r "8s/tag|tga|taa/<stop_codon>&<\/stop_codon>/1" | sed "8s/ //g" | sed "8s/aaaaggt/<terminator>&/g" | sed -r "8s/gcctttt..../&<\/terminator>/g" </code>
 
* And so, finally, it is all marked up.  However I'm not quite done yet, I need to get rid of all the new lines I created.  In order to do this I used this command:  sed ':a;N;$!ba;s/\n//g' (from wiki), so the final output is as follows.
 
* And so, finally, it is all marked up.  However I'm not quite done yet, I need to get rid of all the new lines I created.  In order to do this I used this command:  sed ':a;N;$!ba;s/\n//g' (from wiki), so the final output is as follows.
#*(a)<code> cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box>\n/1" | sed -r "2s/^.{17}/&\n/g" | sed -r "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" | sed -r "4s/^.{5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>\n/1" | sed "7s/atg/<start_codon>&<\/start_codon>\n/1" | sed "8s/.../& /g"| sed -r "8s/tag|tga|taa/<stop_codon>&<\/stop_codon>/1" | sed "8s/ //g" | sed "8s/aaaaggt/<terminator>&/g" | sed -r "8s/gcctttt..../&<\/terminator>/g" | sed ':a;N;$!ba;s/\n//g' </code>
+
*(a)
#*(b) 
+
    +
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcg
 +
gagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</minus35box>tatttacagaacttcgg<minus10box>cattat</minus10box>cttgc<tss>c</tss>ggttcaaattacggtagtgatacccca
 +
<rbs>gagg</rbs>attag<start_codon>atg</start_codon>gccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttac
 +
tgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgc<stop_codon>tga</stop_codon>
 +
ttgttttaccgcctgatgggcgaagagaaagaacgagt<terminator>aaaaggtcggtttaaccggcctttttatt</terminator>ttat
   −
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcg    gagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</minus35box>tatttacagaacttcgg<minus10box>cattat</minus10box>cttgc<tss>c</tss>ggttcaaattacggtagtgatacccca<rbs>gagg</r bs>attag<start_codon>atg</start_codon>gccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctg acgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgc<stop_codon>tga</stop_codon>ttgttttaccgcctgatgggcgaagagaaagaacgagt<terminator>aaaaggtcggtttaaccggcctttttat t</terminator>ttat
+
*(b)And the final command is as follows:
    +
<code> cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box>\n/1" | sed -r "2s/^.{17}/&\n/g" | sed -r "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" | sed -r "4s/^.{5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>\n/1" | sed "7s/atg/<start_codon>&<\/start_codon>\n/1" | sed "8s/.../& /g"| sed -r "8s/tag|tga|taa/<stop_codon>&<\/stop_codon>/1" | sed "8s/ //g" | sed "8s/aaaaggt/<terminator>&/g" | sed -r "8s/gcctttt..../&<\/terminator>/g" | sed ':a;N;$!ba;s/\n//g' </code>
   −
# What is the ''exact'' mRNA sequence that is transcribed from this gene?
  −
#*(a)
  −
#*(b)
  −
# What is the amino acid sequence that is translated from this mRNA?
  −
#*(a)
  −
#*(b)
     −
==== Supplementary Information ====
+
* What is the ''exact'' mRNA sequence that is transcribed from this gene?
 +
**In order to get the mRNA sequence I need to get the sequence between the transcription start site and the terminator.  I found it easiest to make new lines based on the mark up tags already there.  From that point I can pick and choose which lines I need to transcribe.  Using sed, I can delete lines.  Example:  <code> sed "2,4D"</code>  So, using this trick, I deleted all unnecessary lines.  From there all nucleotides not deleted should be transcribed into mRNA.  I was going to make new lines by typing out a bunch of different sed commands for each different tag, but I can do it simply by using two.  This puts each tag on its own line: <code> sed "s/>/&\n/g" | sed "s/</\n&/g"</code>.  Now I go through, delete the tags and the useless sequences, remove the extra lines, and transcribe.  Here is the sequence followed by the command.
 +
**(a)
 +
cgguucaaauuacgguagugauaccccagaggauuagauggccaaagaagacaauauugaaaugcaagguaccguucuug
 +
aaacguugccuaauaccauguuccgcguagaguuagaaaacggucacgugguuacugcacacaucuccgguaaaaugcgca
 +
aaaacuacauccgcauccugacgggcgacaaagugacuguugaacugaccccguacgaccugagcaaaggccgcauugu
 +
cuuccguagucgcugauuguuuuaccgccugaugggcgaagagaaagaacgaguaaaaggucgguuuaaccggccuuuuuauu
 +
**(b)
 +
<code>cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box>\n/1"
 +
| sed -r "2s/^.{17}/&\n/g" | sed -r "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" | sed -r "4s/^.{5}/&\n/g"
 +
| sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>\n/1" | sed "7s/atg/<start_codon>&<\/start_codon>\n/1"
 +
| sed "8s/.../& /g"| sed -r "8s/tag|tga|taa/<stop_codon>&<\/stop_codon>/1" | sed "8s/ //g"
 +
| sed "8s/aaaaggt/<terminator>&/g" | sed -r "8s/gcctttt..../&<\/terminator>/g" | sed ':a;N;$!ba;s/\n//g'
 +
| sed "s/>/&\n/g" | sed "s/</\n&/g" | sed "1,10D;12D;14D;16D;18D;20D;22D;24D;26D;28D;29D"
 +
| sed ':a;N;$!ba;s/\n//g' | sed "s/t/u/g" </code>
   Exception encountered, of type "Error"
[99fb0a40] /biodb/fall2015/index.php?diff=cur&oldid=1825&title=Jwoodlee_Week_4 Error from line 434 of /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php: Call to undefined function each()
Backtrace:
#0 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(544): DiffEngine->diag()
#1 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(344): DiffEngine->compareSeq()
#2 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(227): DiffEngine->diffLocal()
#3 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(721): DiffEngine->diff()
#4 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(859): Diff->__construct()
#5 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(980): MappedDiff->__construct()
#6 /apps/xmlpipedb/biodb/fall2015/includes/diff/TableDiffFormatter.php(194): WordLevelDiff->__construct()
#7 /apps/xmlpipedb/biodb/fall2015/includes/diff/DiffFormatter.php(140): TableDiffFormatter->changed()
#8 /apps/xmlpipedb/biodb/fall2015/includes/diff/DiffFormatter.php(111): DiffFormatter->block()
#9 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(888): DiffFormatter->format()
#10 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(802): DifferenceEngine->generateTextDiffBody()
#11 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(733): DifferenceEngine->generateContentDiffBody()
#12 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(662): DifferenceEngine->getDiffBody()
#13 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(632): DifferenceEngine->getDiff()
#14 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(453): DifferenceEngine->showDiff()
#15 /apps/xmlpipedb/biodb/fall2015/includes/page/Article.php(795): DifferenceEngine->showDiffPage()
#16 /apps/xmlpipedb/biodb/fall2015/includes/page/Article.php(506): Article->showDiffPage()
#17 /apps/xmlpipedb/biodb/fall2015/includes/actions/ViewAction.php(44): Article->view()
#18 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(395): ViewAction->show()
#19 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(273): MediaWiki->performAction()
#20 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(566): MediaWiki->performRequest()
#21 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(414): MediaWiki->main()
#22 /apps/xmlpipedb/biodb/fall2015/index.php(44): MediaWiki->run()
#23 {main}