Vpachec3 Week 4

Modify the Gene Sequence with Tags

-35 Box

-10 Box

My lab partner, Nicole, was a big help and helped me go through the Week 4 homework. Here is how far we got:

vpachec3@ab201:/nfs/home/dondi/xmlpipedb/data$ cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/ <minus35box>& <\/minus35box> /1"|sed "s/[ct]at[at]at/ <minus10box>& <\/minus10box> /2"

This is what the command gave us:

 ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact </minus35box> tatttacagaacttcgg <minus10box>cattat </minus10box>     cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat

Right before we were stopped to bring it back into a larger group discussion, Nicole taught me that \n would break it into two lines. We just didn't get to apply it to the command line just yet.

Transcription Start Site

Now trying this on my own. I used the \n to break the line to start to figure out how to add the transcription start site. I wanted to break the information into two line so i used this command:

cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>& <\/minus35box> /1" |sed "s/[ct]at[at]at/ <minus10box>& <\/minus10box> /2"| sed "s/ <minus10box>/& \n/g"

However, I wanted to break the line after the minus 10 box so I modified the command:

cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>& <\/minus35box> /1" |sed "s/[ct]at[at]at/ <minus10box>& <\/minus10box> /2"| sed "s/ <\/minus10box> /&\n/g"

This command gave me:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact </minus35box> tatttacagaacttcgg <minus10box>cattat </minus10box>

cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat

Breaking it into two lines would be easier to insert the transcription start site because we were told:The transcription start site is located at the 12th nucleotide after the first nucleotide of the -10 box.

This means that I could count to insert the transcription start site and use commands that I have used before to insert the tag. I used :

cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>& <\/minus35box> /1" |sed "s/[ct]at[at]at/ <minus10box>& <\/minus10box> /2"| sed "s/ <\/minus10box> /&\n/g"| sed "2s/cc/ <tts> /1"

However this was problematic because it replace the nucleotide rather than put it in front. Therefore, revision of the command was needed.

New command

cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>& <\/minus35box> /1" |sed "s/[ct]at[at]at/ <minus10box>& <\/minus10box> /2"| sed "s/ <\/minus10box> /&\n/g"| sed "2s/c/ <tss>& <\/tss> /5"

This command did exactly what I needed:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact </minus35box> tatttacagaacttcgg <minus10box>cattat </minus10box>

cttgccggttcaaatta <tss>c </tss>ggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat

Finally to tie it ll back together, I used this command at the end : sed ':a;N;$!ba;s/\n//g'

This leaves me with this sequence:

 ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact </minus35box> tatttacagaacttcgg <minus10box>cattat </minus10box> cttgccggttcaaatta <tss>c </tss> ggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat

Ribosome Binding Site

I assume that adding the ribosome binding site would be similar to adding in the other tags. The tricky part should be where to put the tag on the sequence. This is the information given:"A consensus sequence for the ribosome binding site is gagg". So I'm going to break the sequence into two lines and look for the gagg pattern after the transcription start site.

So I broke the line into two:

cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>& <\/minus35box> /1" |sed "s/[ct]at[at]at/ <minus10box>& <\/minus10box> /2"| sed "s/ <\/minus10box> /&\n/g"| sed "2s/c/ <tss>& <\/tss> /5"|  sed ':a;N;$!ba;s/\n//g'| sed "s/ <\/tss> /&\n/g"

Start Codon

Stop Codon

Terminator

Links

Vpachec3 User Page

Vpachec3 Week 4

Contents

Modify the Gene Sequence with Tags

-35 Box

-10 Box

Transcription Start Site

Ribosome Binding Site

Start Codon

Stop Codon

Terminator

Links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools