==Transcription and Translation "Taken to the Next Level"==
+
==Modifying the Gene Sequence==
    
To start this assignment I began by opening Terminal on my laptop. I entered  
 
To start this assignment I began by opening Terminal on my laptop. I entered  
    
I began by using grep to find the potential -35 box and -10 box because grep highlights the searched pattern in red. I simply entered
 
I began by using grep to find the potential -35 box and -10 box because grep highlights the searched pattern in red. I simply entered
cat infA-E.coli-K12.txt | grep "tt[gt]ac[at]"
+
  cat infA-E.coli-K12.txt | grep "tt[gt]ac[at]"
 
which gave me two possible answers for the -35 box, '''tttact''' and '''tttaca''', both of which fit the pattern. Now it was a matter of finding out which one was the correct one. I also searched for the -10 box using
 
which gave me two possible answers for the -35 box, '''tttact''' and '''tttaca''', both of which fit the pattern. Now it was a matter of finding out which one was the correct one. I also searched for the -10 box using
cat infA-E.coli-K12.txt | grep "[ct]at[at]at"
+
  cat infA-E.coli-K12.txt | grep "[ct]at[at]at"
 
which also revealed two potential sites at '''tataat''' and '''cattat'''. I realized that in order to find out which sequences were the correct ones I needed to visualize them both together, but grep doesn't do this, so instead I used sed. To do this, I entered the sed commands as a pipe, and added three space on either side of each occurrence of the consensus sequences (both -35 and -10) in the file to make the sequences more visible.. This is done by adding  sed "s/<pattern>/& /g" where <pattern> is what I wish to find and each space after the "&" sign is what I wished to add to each side of the pattern (instructions found [[Introduction to the Command Line | here]]). The pipe looked like this:
 
which also revealed two potential sites at '''tataat''' and '''cattat'''. I realized that in order to find out which sequences were the correct ones I needed to visualize them both together, but grep doesn't do this, so instead I used sed. To do this, I entered the sed commands as a pipe, and added three space on either side of each occurrence of the consensus sequences (both -35 and -10) in the file to make the sequences more visible.. This is done by adding  sed "s/<pattern>/& /g" where <pattern> is what I wish to find and each space after the "&" sign is what I wished to add to each side of the pattern (instructions found [[Introduction to the Command Line | here]]). The pipe looked like this:
cat infA-E.coli-K12.txt | sed "[ct]at[at]at/  &  /g" | sed "tt[gt]ac[at]/  &  /g"
+
  cat infA-E.coli-K12.txt | sed "[ct]at[at]at/  &  /g" | sed "tt[gt]ac[at]/  &  /g"
 
This made it clear that it was the first -35 box option, '''tttact''', and the second -10 box option, '''cattat''', that I was looking for in this gene. Using this information, it was then much simpler for me to highlight the specific sequences for the assignment.  
 
This made it clear that it was the first -35 box option, '''tttact''', and the second -10 box option, '''cattat''', that I was looking for in this gene. Using this information, it was then much simpler for me to highlight the specific sequences for the assignment.  
    
To highlight the -35 box, I needed to use sed to put <minus35box> on each side of the first option, along with three spaces. To do this, I consulted the Text Processing page of the wiki and found out that to do this I can replace g with the number of the occurrence I wish to change. Because I only needed the first option to be highlighted ('''tttact'''), the command looked like this:
 
To highlight the -35 box, I needed to use sed to put <minus35box> on each side of the first option, along with three spaces. To do this, I consulted the Text Processing page of the wiki and found out that to do this I can replace g with the number of the occurrence I wish to change. Because I only needed the first option to be highlighted ('''tttact'''), the command looked like this:
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/  <minus35box>&<\/minus35box>  /1"  
+
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/  <minus35box>&<\/minus35box>  /1"  
    
Next, to highlight the -10 box, I did the same thing except my goal was to add <minus10box> to each side of the second -10 box option. The command looked like this:
 
Next, to highlight the -10 box, I did the same thing except my goal was to add <minus10box> to each side of the second -10 box option. The command looked like this:
cat infA-E.coli-K12.txt | sed "s/[ct]at[at]at/  <minus10box>&<\/minus10box>  /2"
+
  cat infA-E.coli-K12.txt | sed "s/[ct]at[at]at/  <minus10box>&<\/minus10box>  /2"
 
Which highlighted the -10 box, '''cattat'''.
 
Which highlighted the -10 box, '''cattat'''.
    
In order to find the transcription start site, I learned from the assignment page that the site is located at the 12th nucleotide after the first nucleotide of the -10 box. This means that the start of transcription was the sixth codon after '''cattat'''. To find this, I broke up the gene and inserted a new line right after the -35 box. In the "picking lines" section of More Text Processing Features, I found that to do this I had to replace sed s///g with sed 2s///g. This command looked like this:  
 
In order to find the transcription start site, I learned from the assignment page that the site is located at the 12th nucleotide after the first nucleotide of the -10 box. This means that the start of transcription was the sixth codon after '''cattat'''. To find this, I broke up the gene and inserted a new line right after the -35 box. In the "picking lines" section of More Text Processing Features, I found that to do this I had to replace sed s///g with sed 2s///g. This command looked like this:  
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/  <minus35box>&<\/minus35box>  /1" | sed "s/<\/minus35box>/&\n/g" | sed "2s/[ct]at[at]at/  <minus10box>&<\/minus10box>  /1"
+
   
 +
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/  <minus35box>&<\/minus35box>  /1" | sed "s/<\/minus35box>/&\n/g" | sed  
 +
  "2s/[ct]at[at]at/  <minus10box>&<\/minus10box>  /1"
    
I noted that it should be /1, not /2, after the -10 box because since I'm only looking at things after the -35 box it would be the first occurrence of [ct]at[at]at.
 
I noted that it should be /1, not /2, after the -10 box because since I'm only looking at things after the -35 box it would be the first occurrence of [ct]at[at]at.
    
My next goal was to find a command that would allow me to skip over 5 more nucleotides to the transcription start site <tss>...</tss> on the 6th nucleotide after the -10 box. I did this by adding the command
 
My next goal was to find a command that would allow me to skip over 5 more nucleotides to the transcription start site <tss>...</tss> on the 6th nucleotide after the -10 box. I did this by adding the command
sed -r "s/<\/minus10box> (.){5}/&\n/g"
+
  sed -r "s/<\/minus10box> (.){5}/&\n/g"
 
Which indicated that I meant to skip over 5 nucleotides (in the curly braces). the '''-r''' meant each repetition of the pattern.  
 
Which indicated that I meant to skip over 5 nucleotides (in the curly braces). the '''-r''' meant each repetition of the pattern.  
    
This had me starting at the 10th nucleotide, not the 12th. I realized that this was because I had added extra spaces around the <minus10box>...</minus10box>, and the spaces counted as (.). To fix this, I put {7} in curly braces instead of {5}, which gave me a newline at the right nucleotide (the 12th one). Then, to highlight the transcription start site I added  
 
This had me starting at the 10th nucleotide, not the 12th. I realized that this was because I had added extra spaces around the <minus10box>...</minus10box>, and the spaces counted as (.). To fix this, I put {7} in curly braces instead of {5}, which gave me a newline at the right nucleotide (the 12th one). Then, to highlight the transcription start site I added  
sed "3s/^./<tss>&<\/tss> /g"  
+
  sed "3s/^./<tss>&<\/tss> /g"  
    
to tell the computer that I wished to add <tss> labels around the first character in the third line. The command looked like this:  
 
to tell the computer that I wished to add <tss> labels around the first character in the third line. The command looked like this:  
   Exception encountered, of type "Error"
[6fde8c5e] /biodb/fall2015/index.php?diff=1985&oldid=1879&title=Kzebrows_Week_4 Error from line 434 of /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php: Call to undefined function each()
Backtrace:
#0 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(544): DiffEngine->diag()
#1 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(344): DiffEngine->compareSeq()
#2 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(227): DiffEngine->diffLocal()
#3 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(721): DiffEngine->diff()
#4 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(859): Diff->__construct()
#5 /apps/xmlpipedb/biodb/fall2015/includes/diff/DairikiDiff.php(980): MappedDiff->__construct()
#6 /apps/xmlpipedb/biodb/fall2015/includes/diff/TableDiffFormatter.php(194): WordLevelDiff->__construct()
#7 /apps/xmlpipedb/biodb/fall2015/includes/diff/DiffFormatter.php(140): TableDiffFormatter->changed()
#8 /apps/xmlpipedb/biodb/fall2015/includes/diff/DiffFormatter.php(111): DiffFormatter->block()
#9 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(888): DiffFormatter->format()
#10 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(802): DifferenceEngine->generateTextDiffBody()
#11 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(733): DifferenceEngine->generateContentDiffBody()
#12 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(662): DifferenceEngine->getDiffBody()
#13 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(632): DifferenceEngine->getDiff()
#14 /apps/xmlpipedb/biodb/fall2015/includes/diff/DifferenceEngine.php(453): DifferenceEngine->showDiff()
#15 /apps/xmlpipedb/biodb/fall2015/includes/page/Article.php(795): DifferenceEngine->showDiffPage()
#16 /apps/xmlpipedb/biodb/fall2015/includes/page/Article.php(506): Article->showDiffPage()
#17 /apps/xmlpipedb/biodb/fall2015/includes/actions/ViewAction.php(44): Article->view()
#18 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(395): ViewAction->show()
#19 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(273): MediaWiki->performAction()
#20 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(566): MediaWiki->performRequest()
#21 /apps/xmlpipedb/biodb/fall2015/includes/MediaWiki.php(414): MediaWiki->main()
#22 /apps/xmlpipedb/biodb/fall2015/index.php(44): MediaWiki->run()
#23 {main}