I began by using grep to find the potential -35 box and -10 box because grep highlights the searched pattern in red. I simply entered

−

cat infA-E.coli-K12.txt | grep "tt[gt]ac[at]"

which gave me two possible answers for the -35 box, '''tttact''' and '''tttaca''', both of which fit the pattern. Now it was a matter of finding out which one was the correct one. I also searched for the -10 box using

−

cat infA-E.coli-K12.txt | grep "[ct]at[at]at"

which also revealed two potential sites at '''tataat''' and '''cattat'''. I realized that in order to find out which sequences were the correct ones I needed to visualize them both together, but grep doesn't do this, so instead I used sed. To do this, I entered the sed commands as a pipe, and added three space on either side of each occurrence of the consensus sequences (both -35 and -10) in the file to make the sequences more visible.. This is done by adding sed "s/<pattern>/& /g" where <pattern> is what I wish to find and each space after the "&" sign is what I wished to add to each side of the pattern (instructions found [[Introduction to the Command Line | here]]). The pipe looked like this:

−

cat infA-E.coli-K12.txt | sed "[ct]at[at]at/ & /g" | sed "tt[gt]ac[at]/ & /g"

This made it clear that it was the first -35 box option, '''tttact''', and the second -10 box option, '''cattat''', that I was looking for in this gene. Using this information, it was then much simpler for me to highlight the specific sequences for the assignment.

To highlight the -35 box, I needed to use sed to put <minus35box> on each side of the first option, along with three spaces. To do this, I consulted the Text Processing page of the wiki and found out that to do this I can replace g with the number of the occurrence I wish to change. Because I only needed the first option to be highlighted ('''tttact'''), the command looked like this:

−

cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1"

Next, to highlight the -10 box, I did the same thing except my goal was to add <minus10box> to each side of the second -10 box option. The command looked like this:

−

cat infA-E.coli-K12.txt | sed "s/[ct]at[at]at/ <minus10box>&<\/minus10box> /2"

Which highlighted the -10 box, '''cattat'''.

In order to find the transcription start site, I learned from the assignment page that the site is located at the 12th nucleotide after the first nucleotide of the -10 box. This means that the start of transcription was the sixth codon after '''cattat'''. To find this, I broke up the gene and inserted a new line right after the -35 box. In the "picking lines" section of More Text Processing Features, I found that to do this I had to replace sed s///g with sed 2s///g. This command looked like this:

−

cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" | sed "s/<\/minus35box>/&\n/g" | sed "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1"

cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" | sed "s/<\/minus35box>/&\n/g" | sed

"2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1"

I noted that it should be /1, not /2, after the -10 box because since I'm only looking at things after the -35 box it would be the first occurrence of [ct]at[at]at.

My next goal was to find a command that would allow me to skip over 5 more nucleotides to the transcription start site <tss>...</tss> on the 6th nucleotide after the -10 box. I did this by adding the command

−

sed -r "s/<\/minus10box> (.){5}/&\n/g"

Which indicated that I meant to skip over 5 nucleotides (in the curly braces). the '''-r''' meant each repetition of the pattern.

This had me starting at the 10th nucleotide, not the 12th. I realized that this was because I had added extra spaces around the <minus10box>...</minus10box>, and the spaces counted as (.). To fix this, I put {7} in curly braces instead of {5}, which gave me a newline at the right nucleotide (the 12th one). Then, to highlight the transcription start site I added

−

sed "3s/^./<tss>&<\/tss> /g"

to tell the computer that I wished to add <tss> labels around the first character in the third line. The command looked like this:

−

cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" | sed "s/<\/minus35box>/&\n/g" | sed

"2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1" | sed -r "s/<\/minus10box> (.){7}/&\n/g" | sed "3s/^./<tss>&<\/tss> /g"

Next, to find the ribosome binding site (which has to be after the transcription start site), I searched the same line (line 3) for gagg, as hinted by the assignment page. I did this by invoking the command

−

sed "3s/^./<tss>&<\/tss> /g"

just like I did for the -35 box much earlier. The sequence then looked like this:

−

cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" | sed "s/<\/minus35box>/&\n/g" | sed "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1" | sed -r "s/<\/minus10box> (.){7}/&\n/g" | sed "3s/^./<tss>&<\/tss> /g" | sed "3s/gagg/ <rbs>&\/rbs> /1"

cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" | sed "s/<\/minus35box>/&\n/g" | sed

"2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1" | sed -r "s/<\/minus10box> (.){7}/&\n/g" | sed "3s/^./<tss>&<\/tss> /g" | sed

"3s/gagg/ <rbs>&\/rbs> /1"

For the next part I needed to find the start codon, f-Met. This is coded for by AUG, but since this is the mRNA-like strand, the sequence is ATG. To find this ATG, I added a new line after the ribosome binding site and used sed to search for the next occurrence of ATG after that. I did this by adding two commands to the pipe, as seen below. This pattern followed the same pattern as the other sites.

−

cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" | sed "s/<\/minus35box>/&\n/g" | sed

"2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1" | sed -r "s/<\/minus10box> (.){7}/&\n/g" | sed "3s/^./<tss>&<\/tss> /g" | sed

"3s/gagg/ <rbs>&<\/rbs> /1" | sed "s/<\/rbs>/&\n/g" | sed "4s/atg/ <start_codon>&<\/start_codon> /1"

Next I was presented with the challenge of finding the stop codon, which is coded for by either TAA, TAG, or TGA on this strand of the DNA. From our Week 3 assignment I remembered that it would be necessary to space out the nucleotides in 3-nucleotide codons in order to find the stop codon, and from [[Introduction to the Command Line | Intro the Command Line]] I was able to recall the command for this, which was sed "s/.../& /g". I invoked this and began a newline using the same command as earlier for a newline (sed "s//&\n/g"). Once everything was separated into codons it became very easy to find the stop codon. All I had to do was add a new line and then tag it. The only difference was that the first term in the pattern was t[ag][ga], with the brackets representing an either/or situation. I then used /1" with the newline in order to find the first occurrence of t[ag][ga]. The pipe looked like this:

Exception encountered, of type "Error"
[1e1fcee6] /biodb/fall2015/index.php?diff=next&oldid=1911&title=Kzebrows_Week_4 Error from line 434 of /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php: Call to undefined function each()
Backtrace:
#0 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(544): DiffEngine->diag()
#1 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(344): DiffEngine->compareSeq()
#2 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(227): DiffEngine->diffLocal()
#3 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(721): DiffEngine->diff()
#4 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(859): Diff->__construct()
#5 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(980): MappedDiff->__construct()
#6 /apps/xmlpipedb/biodb/fall2015/includes/diff/TableDiffFormatter.php(194): WordLevelDiff->__construct()
#7 /apps/xmlpipedb/biodb/fall2015/includes/diff/DiffFormatter.php(140): TableDiffFormatter->changed()
#8 /apps/xmlpipedb/biodb/fall2015/includes/diff/DiffFormatter.php(111): DiffFormatter->block()
#9 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(888): DiffFormatter->format()
#10 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(802): DifferenceEngine->generateTextDiffBody()
#11 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(733): DifferenceEngine->generateContentDiffBody()
#12 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(662): DifferenceEngine->getDiffBody()
#13 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(632): DifferenceEngine->getDiff()
#14 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(453): DifferenceEngine->showDiff()
#15 /apps/xmlpipedb/biodb/fall2015/includes/page/Article.php(795): DifferenceEngine->showDiffPage()
#16 /apps/xmlpipedb/biodb/fall2015/includes/page/Article.php(506): Article->showDiffPage()
#17 /apps/xmlpipedb/biodb/fall2015/includes/actions/ViewAction.php(44): Article->view()
#18 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(395): ViewAction->show()
#19 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(273): MediaWiki->performAction()
#20 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(566): MediaWiki->performRequest()
#21 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(414): MediaWiki->main()
#22 /apps/xmlpipedb/biodb/fall2015/index.php(44): MediaWiki->run()
#23 {main}