==Transcription and Translation "Taken to the Next Level"==
+
==Modifying the Gene Sequence==
    
To start this assignment I began by opening Terminal on my laptop. I entered  
 
To start this assignment I began by opening Terminal on my laptop. I entered  
    
I began by using grep to find the potential -35 box and -10 box because grep highlights the searched pattern in red. I simply entered
 
I began by using grep to find the potential -35 box and -10 box because grep highlights the searched pattern in red. I simply entered
cat infA-E.coli-K12.txt | grep "tt[gt]ac[at]"
+
  cat infA-E.coli-K12.txt | grep "tt[gt]ac[at]"
 
which gave me two possible answers for the -35 box, '''tttact''' and '''tttaca''', both of which fit the pattern. Now it was a matter of finding out which one was the correct one. I also searched for the -10 box using
 
which gave me two possible answers for the -35 box, '''tttact''' and '''tttaca''', both of which fit the pattern. Now it was a matter of finding out which one was the correct one. I also searched for the -10 box using
cat infA-E.coli-K12.txt | grep "[ct]at[at]at"
+
  cat infA-E.coli-K12.txt | grep "[ct]at[at]at"
 
which also revealed two potential sites at '''tataat''' and '''cattat'''. I realized that in order to find out which sequences were the correct ones I needed to visualize them both together, but grep doesn't do this, so instead I used sed. To do this, I entered the sed commands as a pipe, and added three space on either side of each occurrence of the consensus sequences (both -35 and -10) in the file to make the sequences more visible.. This is done by adding  sed "s/<pattern>/& /g" where <pattern> is what I wish to find and each space after the "&" sign is what I wished to add to each side of the pattern (instructions found [[Introduction to the Command Line | here]]). The pipe looked like this:
 
which also revealed two potential sites at '''tataat''' and '''cattat'''. I realized that in order to find out which sequences were the correct ones I needed to visualize them both together, but grep doesn't do this, so instead I used sed. To do this, I entered the sed commands as a pipe, and added three space on either side of each occurrence of the consensus sequences (both -35 and -10) in the file to make the sequences more visible.. This is done by adding  sed "s/<pattern>/& /g" where <pattern> is what I wish to find and each space after the "&" sign is what I wished to add to each side of the pattern (instructions found [[Introduction to the Command Line | here]]). The pipe looked like this:
cat infA-E.coli-K12.txt | sed "[ct]at[at]at/  &  /g" | sed "tt[gt]ac[at]/  &  /g"
+
  cat infA-E.coli-K12.txt | sed "[ct]at[at]at/  &  /g" | sed "tt[gt]ac[at]/  &  /g"
 
This made it clear that it was the first -35 box option, '''tttact''', and the second -10 box option, '''cattat''', that I was looking for in this gene. Using this information, it was then much simpler for me to highlight the specific sequences for the assignment.  
 
This made it clear that it was the first -35 box option, '''tttact''', and the second -10 box option, '''cattat''', that I was looking for in this gene. Using this information, it was then much simpler for me to highlight the specific sequences for the assignment.  
    
To highlight the -35 box, I needed to use sed to put <minus35box> on each side of the first option, along with three spaces. To do this, I consulted the Text Processing page of the wiki and found out that to do this I can replace g with the number of the occurrence I wish to change. Because I only needed the first option to be highlighted ('''tttact'''), the command looked like this:
 
To highlight the -35 box, I needed to use sed to put <minus35box> on each side of the first option, along with three spaces. To do this, I consulted the Text Processing page of the wiki and found out that to do this I can replace g with the number of the occurrence I wish to change. Because I only needed the first option to be highlighted ('''tttact'''), the command looked like this:
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/  <minus35box>&<\/minus35box>  /1"  
+
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/  <minus35box>&<\/minus35box>  /1"  
    
Next, to highlight the -10 box, I did the same thing except my goal was to add <minus10box> to each side of the second -10 box option. The command looked like this:
 
Next, to highlight the -10 box, I did the same thing except my goal was to add <minus10box> to each side of the second -10 box option. The command looked like this:
cat infA-E.coli-K12.txt | sed "s/[ct]at[at]at/  <minus10box>&<\/minus10box>  /2"
+
  cat infA-E.coli-K12.txt | sed "s/[ct]at[at]at/  <minus10box>&<\/minus10box>  /2"
 
Which highlighted the -10 box, '''cattat'''.
 
Which highlighted the -10 box, '''cattat'''.
    
In order to find the transcription start site, I learned from the assignment page that the site is located at the 12th nucleotide after the first nucleotide of the -10 box. This means that the start of transcription was the sixth codon after '''cattat'''. To find this, I broke up the gene and inserted a new line right after the -35 box. In the "picking lines" section of More Text Processing Features, I found that to do this I had to replace sed s///g with sed 2s///g. This command looked like this:  
 
In order to find the transcription start site, I learned from the assignment page that the site is located at the 12th nucleotide after the first nucleotide of the -10 box. This means that the start of transcription was the sixth codon after '''cattat'''. To find this, I broke up the gene and inserted a new line right after the -35 box. In the "picking lines" section of More Text Processing Features, I found that to do this I had to replace sed s///g with sed 2s///g. This command looked like this:  
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/  <minus35box>&<\/minus35box>  /1" | sed "s/<\/minus35box>/&\n/g" | sed "2s/[ct]at[at]at/  <minus10box>&<\/minus10box>  /1"
+
   
 +
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/  <minus35box>&<\/minus35box>  /1" | sed "s/<\/minus35box>/&\n/g" | sed  
 +
  "2s/[ct]at[at]at/  <minus10box>&<\/minus10box>  /1"
    
I noted that it should be /1, not /2, after the -10 box because since I'm only looking at things after the -35 box it would be the first occurrence of [ct]at[at]at.
 
I noted that it should be /1, not /2, after the -10 box because since I'm only looking at things after the -35 box it would be the first occurrence of [ct]at[at]at.
    
My next goal was to find a command that would allow me to skip over 5 more nucleotides to the transcription start site <tss>...</tss> on the 6th nucleotide after the -10 box. I did this by adding the command
 
My next goal was to find a command that would allow me to skip over 5 more nucleotides to the transcription start site <tss>...</tss> on the 6th nucleotide after the -10 box. I did this by adding the command
sed -r "s/<\/minus10box> (.){5}/&\n/g"
+
  sed -r "s/<\/minus10box> (.){5}/&\n/g"
 
Which indicated that I meant to skip over 5 nucleotides (in the curly braces). the '''-r''' meant each repetition of the pattern.  
 
Which indicated that I meant to skip over 5 nucleotides (in the curly braces). the '''-r''' meant each repetition of the pattern.  
    
This had me starting at the 10th nucleotide, not the 12th. I realized that this was because I had added extra spaces around the <minus10box>...</minus10box>, and the spaces counted as (.). To fix this, I put {7} in curly braces instead of {5}, which gave me a newline at the right nucleotide (the 12th one). Then, to highlight the transcription start site I added  
 
This had me starting at the 10th nucleotide, not the 12th. I realized that this was because I had added extra spaces around the <minus10box>...</minus10box>, and the spaces counted as (.). To fix this, I put {7} in curly braces instead of {5}, which gave me a newline at the right nucleotide (the 12th one). Then, to highlight the transcription start site I added  
sed "3s/^./<tss>&<\/tss> /g"  
+
  sed "3s/^./<tss>&<\/tss> /g"  
    
to tell the computer that I wished to add <tss> labels around the first character in the third line. The command looked like this:  
 
to tell the computer that I wished to add <tss> labels around the first character in the third line. The command looked like this:  
   −
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/  <minus35box>&<\/minus35box>  /1" | sed "s/<\/minus35box>/&\n/g" | sed "2s/[ct]at[at]at/  <minus10box>&<\/minus10box>  /1" | sed -r "s/<\/minus10box> (.){7}/&\n/g" | sed "3s/^./<tss>&<\/tss> /g"
+
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/  <minus35box>&<\/minus35box>  /1" | sed "s/<\/minus35box>/&\n/g" | sed  
 +
  "2s/[ct]at[at]at/  <minus10box>&<\/minus10box>  /1" | sed -r "s/<\/minus10box> (.){7}/&\n/g" | sed "3s/^./<tss>&<\/tss> /g"
    
Next, to find the ribosome binding site (which has to be after the transcription start site), I searched the same line (line 3) for gagg, as hinted by the assignment page. I did this by invoking the command  
 
Next, to find the ribosome binding site (which has to be after the transcription start site), I searched the same line (line 3) for gagg, as hinted by the assignment page. I did this by invoking the command  
   −
sed "3s/^./<tss>&<\/tss> /g"
+
  sed "3s/^./<tss>&<\/tss> /g"
    
just like I did for the -35 box much earlier. The sequence then looked like this:  
 
just like I did for the -35 box much earlier. The sequence then looked like this:  
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/  <minus35box>&<\/minus35box>  /1" | sed "s/<\/minus35box>/&\n/g" | sed "2s/[ct]at[at]at/  <minus10box>&<\/minus10box>  /1" | sed -r "s/<\/minus10box> (.){7}/&\n/g" | sed "3s/^./<tss>&<\/tss> /g" | sed "3s/gagg/ <rbs>&\/rbs> /1"
+
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/  <minus35box>&<\/minus35box>  /1" | sed "s/<\/minus35box>/&\n/g" | sed  
 +
  "2s/[ct]at[at]at/  <minus10box>&<\/minus10box>  /1" | sed -r "s/<\/minus10box> (.){7}/&\n/g" | sed "3s/^./<tss>&<\/tss> /g" | sed  
 +
  "3s/gagg/ <rbs>&\/rbs> /1"
    
For the next part I needed to find the start codon, f-Met. This is coded for by AUG, but since this is the mRNA-like strand, the sequence is ATG. To find this ATG, I added a new line after the ribosome binding site and used sed to search for the next occurrence of ATG after that. I did this by adding two commands to the pipe, as seen below. This pattern followed the same pattern as the other sites.
 
For the next part I needed to find the start codon, f-Met. This is coded for by AUG, but since this is the mRNA-like strand, the sequence is ATG. To find this ATG, I added a new line after the ribosome binding site and used sed to search for the next occurrence of ATG after that. I did this by adding two commands to the pipe, as seen below. This pattern followed the same pattern as the other sites.
   −
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/  <minus35box>&<\/minus35box>  /1" | sed "s/<\/minus35box>/&\n/g" | sed "2s/[ct]at[at]at/  <minus10box>&<\/minus10box>  /1" | sed -r "s/<\/minus10box> (.){7}/&\n/g" | sed "3s/^./<tss>&<\/tss> /g" | sed "3s/gagg/ <rbs>&<\/rbs> /1" | sed "s/<\/rbs>/&\n/g" | sed "4s/atg/ <start_codon>&<\/start_codon> /1"
+
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/  <minus35box>&<\/minus35box>  /1" | sed "s/<\/minus35box>/&\n/g" | sed  
 +
  "2s/[ct]at[at]at/  <minus10box>&<\/minus10box>  /1" | sed -r "s/<\/minus10box> (.){7}/&\n/g" | sed "3s/^./<tss>&<\/tss> /g" | sed  
 +
  "3s/gagg/ <rbs>&<\/rbs> /1" | sed "s/<\/rbs>/&\n/g" | sed "4s/atg/ <start_codon>&<\/start_codon> /1"
    
Next I was presented with the challenge of finding the stop codon, which is coded for by either TAA, TAG, or TGA on this strand of the DNA. From our Week 3 assignment I remembered that it would be necessary to space out the nucleotides in 3-nucleotide codons in order to find the stop codon, and from [[Introduction to the Command Line | Intro the Command Line]] I was able to recall the command for this, which was sed "s/.../& /g". I invoked this and began a newline using the same command as earlier for a newline (sed "s//&\n/g"). Once everything was separated into codons it became very easy to find the stop codon. All I had to do was add a new line and then tag it. The only difference was that the first term in the pattern was t[ag][ga], with the brackets representing an either/or situation. I then used /1" with the newline in order to find the first occurrence of t[ag][ga]. The pipe looked like this:  
 
Next I was presented with the challenge of finding the stop codon, which is coded for by either TAA, TAG, or TGA on this strand of the DNA. From our Week 3 assignment I remembered that it would be necessary to space out the nucleotides in 3-nucleotide codons in order to find the stop codon, and from [[Introduction to the Command Line | Intro the Command Line]] I was able to recall the command for this, which was sed "s/.../& /g". I invoked this and began a newline using the same command as earlier for a newline (sed "s//&\n/g"). Once everything was separated into codons it became very easy to find the stop codon. All I had to do was add a new line and then tag it. The only difference was that the first term in the pattern was t[ag][ga], with the brackets representing an either/or situation. I then used /1" with the newline in order to find the first occurrence of t[ag][ga]. The pipe looked like this:  
Exception encountered, of type "Error"
[b627b2c0] /biodb/fall2015/index.php?diff=1985&oldid=1911&title=Kzebrows_Week_4 Error from line 434 of /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php: Call to undefined function each()
Backtrace:
#0 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(544): DiffEngine->diag()
#1 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(344): DiffEngine->compareSeq()
#2 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(227): DiffEngine->diffLocal()
#3 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(721): DiffEngine->diff()
#4 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(859): Diff->__construct()
#5 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(980): MappedDiff->__construct()
#6 /apps/xmlpipedb/biodb/fall2015/includes/diff/TableDiffFormatter.php(194): WordLevelDiff->__construct()
#7 /apps/xmlpipedb/biodb/fall2015/includes/diff/DiffFormatter.php(140): TableDiffFormatter->changed()
#8 /apps/xmlpipedb/biodb/fall2015/includes/diff/DiffFormatter.php(111): DiffFormatter->block()
#9 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(888): DiffFormatter->format()
#10 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(802): DifferenceEngine->generateTextDiffBody()
#11 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(733): DifferenceEngine->generateContentDiffBody()
#12 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(662): DifferenceEngine->getDiffBody()
#13 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(632): DifferenceEngine->getDiff()
#14 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(453): DifferenceEngine->showDiff()
#15 /apps/xmlpipedb/biodb/fall2015/includes/page/Article.php(795): DifferenceEngine->showDiffPage()
#16 /apps/xmlpipedb/biodb/fall2015/includes/page/Article.php(506): Article->showDiffPage()
#17 /apps/xmlpipedb/biodb/fall2015/includes/actions/ViewAction.php(44): Article->view()
#18 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(395): ViewAction->show()
#19 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(273): MediaWiki->performAction()
#20 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(566): MediaWiki->performRequest()
#21 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(414): MediaWiki->main()
#22 /apps/xmlpipedb/biodb/fall2015/index.php(44): MediaWiki->run()
#23 {main}