Difference between revisions of "Troque Week 4"

From LMU BioDB 2015
Jump to: navigation, search
(Adding tags to the strand: Continuing to edit this journal)
(Adding tags to the strand: Saving for now, WIP)
Line 13: Line 13:
  
 
Type the following command to insert the tags around the pattern on each line of its occurrence:
 
Type the following command to insert the tags around the pattern on each line of its occurrence:
  sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/2" infA-E.coli-K12.txt
+
  sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/g" infA-E.coli-K12.txt
 
Since we want to keep the pattern in the same file, but we want to add the tags around it, we use the ''&'' symbol. We are also adding a new line after we add the tags using <code>\n</code>. This will be especially useful later on when we are looking for the -10 box. Note: we cannot simply type a forward slash (/) into the code; a regular forward slash is treated differently by the command line so we have to "escape" it using the escape character backslash (\). After running the command above, the command will output something like this:
 
Since we want to keep the pattern in the same file, but we want to add the tags around it, we use the ''&'' symbol. We are also adding a new line after we add the tags using <code>\n</code>. This will be especially useful later on when we are looking for the -10 box. Note: we cannot simply type a forward slash (/) into the code; a regular forward slash is treated differently by the command line so we have to "escape" it using the escape character backslash (\). After running the command above, the command will output something like this:
 +
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattacccc
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattacccc
 
  gctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttag
 
  gctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttag
  ccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatctttactta<minus35box>tttaca</minus35box>
+
  ccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</minus35box>
 +
ta<minus35box>tttaca</minus35box>
 
  gaacttcggcattatcttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaa
 
  gaacttcggcattatcttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaa
 
  ggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaa
 
  ggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaa
  atgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtc
+
  atgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtc  
 
  ttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
 
  ttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
Note: This is actually just 2 lines of bases: the second line starts after the tag <code></minus35box></code>. For visual purposes, I decided to break up the strand into lines for this wiki.
 
  
Next, we can add the -10 box. We will use a similar command that we use in adding the -35 box:
+
(Note: This is actually just 2 lines of bases: the second line starts after the tag <code></minus35box></code>. For visual purposes, I decided to break up the strand into lines for this wiki.)
 +
 
 +
Notice that <code>sed</code> found 2 matches for the pattern for the -35 box. We'll have to decide which is the real one by adjusting our command for sed; more specifically, we need to change the last argument such that we are inspecting one of the two matches. For looking at just the first match, we use the command:
 +
sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" infA-E.coli-K12.txt
 +
For the second match only:
 +
sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/2" infA-E.coli-K12.txt
 +
We have to decide which one to use; in this case, we'll just choose the first one and hope we get lucky! So then we will have the following:
 +
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattacccc
 +
gctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttag
 +
ccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</minus35box>
 +
tatttacagaacttcggcattatcttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattg
 +
aaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatct
 +
ccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggcc
 +
gcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggccttttt
 +
attttat
 +
 
 +
Next, we can add the -10 box tag. We will use a similar command that we use in adding the -35 box:
 
  sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g"
 
  sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g"
 
When we pipeline the commands for -35 and -10,  
 
When we pipeline the commands for -35 and -10,  
  cat infA-E.coli-K12.txt |  sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/2" |  sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>/g"
+
  cat infA-E.coli-K12.txt |  sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" |  sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g"
 
the tags around the patterns found will be added so we have the following strand:
 
the tags around the patterns found will be added so we have the following strand:
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt<minus10box>tataat</minus10box>
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt<minus10box>tataat</minus10box>
 
  tgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgc
 
  tgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgc
 
  tcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaa
 
  tcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaa
  atctttactta<minus35box>tttaca</minus35box>
+
  atc<minus35box>tttact</minus35box> tatttacagaacttcgg<minus10box>cattat</minus10box>
gaacttcgg<minus10box>cattat</minus10box>
+
 
  cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaa
 
  cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaa
 
  acgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactac
 
  acgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactac
Line 39: Line 55:
 
  ttgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
 
  ttgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
  
Notice that <code>sed</code> found 2 matches for the pattern for the -10 box. We can determine which is the correct one by remembering that the -35 box and -10 box are generally 17 bases apart. This means that, from the end of the -35 box to the start of the -10 box, there are 17 bases. We use the sed
+
Notice that this, too, matched with 2 patterns. We can determine which is the correct one by remembering that the -35 box always comes before the -10 box and that the -35 box and -10 box are generally 17 bases apart. This means that, from the end of the -35 box to the start of the -10 box, there are 17 bases. We use the <code>sed</code> again for the purpose of finding the 17 bases; for this reason, we turn to the information provided [[More_Text_Processing_Features | here]] for matching a certain number of characters without typing 17 dots and for selecting which match to use; in this case, it would be the second -10 box match. Since we do not care which bases they are, we use the "." as placeholder:
 +
cat infA-E.coli-K12.txt |
 +
sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" |
 +
sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" |
 +
sed -r "2s/^(.){17}/&\n/g"
 +
So then we get:
 +
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattacccc
 +
gctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttag
 +
ccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</minus35box>
 +
tatttacagaacttcgg     
 +
<minus10box>cattat</minus10box>
 +
cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaa
 +
acgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactac
 +
atccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctga
 +
ttgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
 +
 
 +
Next, we find the transcription start site. We know that it is located 12 characters after the start of the -10 box. Since the -10 box already has 6 characters, we should actually be looking for the 6 bases after the -10 box. We use sed again to find the 12th base after the -10 box; We look for 5 bases after the -10 box and the 6th one is the transcription start site to attach the <code><tss></tss></code> tags:
 +
cat infA-E.coli-K12.txt |
 +
sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" |
 +
sed -r "2s/^(.){17}/&\n/g" |
 +
sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" |
 +
sed -r "5s/^(.){5}/&\n/g" |
 +
sed "6s/^./<tss>&<\/tss>\n/g"
 +
 
 +
Then the result will be the following:
 +
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt<minus10box>tataat</minus10box>
 +
tgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccg
 +
ctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgc
 +
aaatc<minus35box>tttact</minus35box>
 +
tatttacagaacttcgg
 +
<minus10box>cattat</minus10box>
 +
cttgc
 +
<tss>c</tss>
 +
ggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgtt
 +
gcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatcc
 +
gcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattg
 +
ttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
 +
 
 +
Next is the ribosome binding site. We know that the sequence will be <code>gagg</code>. We also know that this pattern should come after the transcription start site and so we start our search from the end of the <code><tss></code> tag, i.e. we start on the 7th line.
 +
cat infA-E.coli-K12.txt |
 +
sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" |
 +
sed -r "2s/^(.){17}/&\n/g" |
 +
sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" |
 +
sed -r "5s/^(.){5}/&\n/g" |
 +
sed "6s/^./<tss>&<\/tss>\n/g" |
 +
sed "7s/gagg/\n<rbs>&<\/rbs>\n/"
 +
And we get the following:
 +
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt<minus10box>tataat</minus10box>
 +
tgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccg
 +
ctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgc
 +
aaatc<minus35box>tttact</minus35box>
 +
tatttacagaacttcgg
 +
<minus10box>cattat</minus10box>
 +
cttgc
 +
<tss>c</tss>
 +
ggttcaaattacggtagtgatacccca
 +
<rbs>gagg</rbs>
 +
attagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaa
 +
acggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaac
 +
tgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgag
 +
taaaaggtcggtttaaccggcctttttattttat
 +
(Note: the <code>rbs</code> tag is on the 8th line on a command window even though it's shown to be the 10th here.)
 +
 
 +
 
 +
 
  
 
{{Template:Troque_Journal}}
 
{{Template:Troque_Journal}}

Revision as of 05:27, 29 September 2015

User Page        Bio Databases Main Page       


Transcription and Translation “Taken to the Next Level”

First, login to the LMU CS server using ssh. Type in the following in a command prompt (Windows) or terminal (Mac) window:

ssh <username@my.cs.lmu.edu>

Enter your password. Note: You will not visibly see the cursor move when typing in your password so just keep typing. Then change directories to dondi's using the following commands to find the practice files and other miscellaneous files:

cd ~dondi/xmlpipedb/data

Here, you can use the command ls in order to see the list of files in the directory. Then we can start manipulating some files. Note: I collaborated with Lena Olufson when starting this assignment. We first decided to use grep in order to visually see where the pattern would be (it was actually my fault since we could've jumped to using sed right away, but I didn't read the assignment description thoroughly; I didn't notice that we were supposed to add the tags).

In this assignment, we will be manipulating the file infA-E.coli-K12.txt.

Adding tags to the strand

We start off by adding the -35 box of the promoter. The tag that we will add is

...<minus35box>...</minus35box>...

We do this by using the sed command in order to "replace" the empty string around the pattern we are looking for. For this part, we are looking for the pattern tt[gt]ac[at].

Type the following command to insert the tags around the pattern on each line of its occurrence:

sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/g" infA-E.coli-K12.txt

Since we want to keep the pattern in the same file, but we want to add the tags around it, we use the & symbol. We are also adding a new line after we add the tags using \n. This will be especially useful later on when we are looking for the -10 box. Note: we cannot simply type a forward slash (/) into the code; a regular forward slash is treated differently by the command line so we have to "escape" it using the escape character backslash (\). After running the command above, the command will output something like this:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattacccc
gctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttag
ccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</minus35box>
ta<minus35box>tttaca</minus35box>
gaacttcggcattatcttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaa
ggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaa
atgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtc 
ttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat

(Note: This is actually just 2 lines of bases: the second line starts after the tag </minus35box>. For visual purposes, I decided to break up the strand into lines for this wiki.)

Notice that sed found 2 matches for the pattern for the -35 box. We'll have to decide which is the real one by adjusting our command for sed; more specifically, we need to change the last argument such that we are inspecting one of the two matches. For looking at just the first match, we use the command:

sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" infA-E.coli-K12.txt

For the second match only:

sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/2" infA-E.coli-K12.txt

We have to decide which one to use; in this case, we'll just choose the first one and hope we get lucky! So then we will have the following:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattacccc
gctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttag
ccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</minus35box>
tatttacagaacttcggcattatcttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattg
aaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatct
ccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggcc
gcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggccttttt
attttat

Next, we can add the -10 box tag. We will use a similar command that we use in adding the -35 box:

sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g"

When we pipeline the commands for -35 and -10,

cat infA-E.coli-K12.txt |  sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" |  sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g"

the tags around the patterns found will be added so we have the following strand:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt<minus10box>tataat</minus10box>
tgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgc
tcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaa
atc<minus35box>tttact</minus35box> tatttacagaacttcgg<minus10box>cattat</minus10box>
cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaa
acgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactac
atccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctga
ttgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat

Notice that this, too, matched with 2 patterns. We can determine which is the correct one by remembering that the -35 box always comes before the -10 box and that the -35 box and -10 box are generally 17 bases apart. This means that, from the end of the -35 box to the start of the -10 box, there are 17 bases. We use the sed again for the purpose of finding the 17 bases; for this reason, we turn to the information provided here for matching a certain number of characters without typing 17 dots and for selecting which match to use; in this case, it would be the second -10 box match. Since we do not care which bases they are, we use the "." as placeholder:

cat infA-E.coli-K12.txt |
sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" |
sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" |
sed -r "2s/^(.){17}/&\n/g"

So then we get:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattacccc
gctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttag
ccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</minus35box>
tatttacagaacttcgg      
<minus10box>cattat</minus10box>
cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaa
acgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactac
atccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctga
ttgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat

Next, we find the transcription start site. We know that it is located 12 characters after the start of the -10 box. Since the -10 box already has 6 characters, we should actually be looking for the 6 bases after the -10 box. We use sed again to find the 12th base after the -10 box; We look for 5 bases after the -10 box and the 6th one is the transcription start site to attach the <tss></tss> tags:

cat infA-E.coli-K12.txt | 
sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" | 
sed -r "2s/^(.){17}/&\n/g" |
sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" |
sed -r "5s/^(.){5}/&\n/g" |
sed "6s/^./<tss>&<\/tss>\n/g"

Then the result will be the following:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt<minus10box>tataat</minus10box>
tgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccg
ctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgc
aaatc<minus35box>tttact</minus35box>
tatttacagaacttcgg
<minus10box>cattat</minus10box>
cttgc
<tss>c</tss>
ggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgtt
gcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatcc
gcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattg
ttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat

Next is the ribosome binding site. We know that the sequence will be gagg. We also know that this pattern should come after the transcription start site and so we start our search from the end of the <tss> tag, i.e. we start on the 7th line.

cat infA-E.coli-K12.txt | 
sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" | 
sed -r "2s/^(.){17}/&\n/g" |
sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/1" |
sed -r "5s/^(.){5}/&\n/g" |
sed "6s/^./<tss>&<\/tss>\n/g" |
sed "7s/gagg/\n<rbs>&<\/rbs>\n/"

And we get the following:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt<minus10box>tataat</minus10box>
tgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccg
ctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgc
aaatc<minus35box>tttact</minus35box>
tatttacagaacttcgg
<minus10box>cattat</minus10box>
cttgc
<tss>c</tss>
ggttcaaattacggtagtgatacccca
<rbs>gagg</rbs>
attagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaa
acggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaac
tgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgag
taaaaggtcggtttaaccggcctttttattttat

(Note: the rbs tag is on the 8th line on a command window even though it's shown to be the 10th here.)



Assignment Links

Weekly Assignments

Individual Journal Entries

Shared Journal Entries