Malverso Week 4

From LMU BioDB 2015
Revision as of 20:46, 28 September 2015 by Malverso (Talk | contribs) ("cleaned up" answer for #1 start_codon)

Jump to: navigation, search

Transcription and Translation "Taken to the Next Level"

I completed this assignment using Putty.exe by accessing infA-E.coli-K12.txt in ~dondi/xmlpipedb/data. I added line breaks in the sequences and commands when necessary to enhance readability.

#1

-35 box of the promoter

  • Looking over my notes of when I first attempted this assignment in class, I used the sed command to find all the places where the pattern tt[gt]ac[at] occurred and attached a <minus35box> tag to the beginning of that sequence and a </minus35box> to the end. This resulted with two possible locations for the -35 box being tagged.
  • Since it was given that there were 17 base pairs between the -35 box and the -10 box, I used that clue to identify which -35 box tags were correct, as well as a bit pf guess and check. I just assumed the first -35 box tag was correct and modified my sed command to only tag the first instance of the -35 box pattern by replacing the g at the end with a 1. I also added an \n to the end of the box tags so that a new line would start after the last -35 box tag. Referring back to my in class notes, I added a sed command on to search for the base pair possibilities for the -10 box as well as a command that would show me the point after 17 characters from the end of my -35 box:
cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" 
| sed -r "2s/^(.){17}/&<here?>/g" | sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>/g"

This confirmed that the first instance of the -35 box pattern match was the correct one. To calculate just where the -35 box is, I can now use the code:

cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>/1"

Which produces:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttac
gctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcccgctcccttatac
gttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgc
aaatc<minus35box>tttact</minus35box>tatttacagaacttcggcattatcttgccggttcaaatt
acggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgt
tgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgca
aaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgca
ttgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaacc
ggcctttttattttat

-10 box of the promoter

  • By figuring out which -35 box tags were correct I also figured out which -10 box tags were correct. I added more line breaks to make this clear, and used the code below to produce the code with the correct -10 box tags and correct -35 box tags:
 cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" 
| sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>/g"

Which returned:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacg
ctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacg
ttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaa
atc<minus35box>tttact</minus35box>
tatttacagaacttcgg
<minus10box>cattat</minus10box>cttgccggttcaaattacggtagtgataccccagaggattagatg
gccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttag
aaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaa
agtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgc
ctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat

transcription start site

  • It was given that the tss is at the 12th nucleotide after the first nucleotide of the -10 box. I put a line break after the -10 box and then used the sed command: sed -r "4s/^(.){6}/&<tss>/g" to tag the beginning of the tss. Since the -10 box is 6 nucleotides, I put the tss tag before the 6th character after the line break. I wasn't sure if it should be at the 6th or 7th character after, so I asked Mahrad.

This sed command, however, is not at helpful to tag the end of the tss. In order to tag both the beginning and the end of the location, I changed the command to:

cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" 
| sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g" 
| sed -r "4s/^(.){5}/&\n/g" | sed "5s/^./<tss>&<\/tss>/g"


This way, the nucleotide that was at the tss location would be at the beginning of the line, and I could easily surround it with the appropriate tags, as shown below:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacg
ctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacg
ttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaa
atc<minus35box>tttact</minus35box>
tatttacagaacttcgg
<minus10box>cattat</minus10box>
cttgc
<tss>c</tss>ggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgc
aaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcaca
catctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtac
gacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgag
taaaaggtcggtttaaccggcctttttattttat


ribosome binding site

  • It was given that the ribsome binding site was gagg, so I just used a sed command to find an occurence of that pattern after the tss.

I modified the sed command so that only the first instance of the pattern would show, in case the pattern occurred in the sequence more than once:

cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" 
| sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g" 
| sed -r "4s/^(.){5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>/1"


Which produced the sequence:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacg
ctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacg
ttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaa
atc<minus35box>tttact</minus35box>
tatttacagaacttcgg
<minus10box>cattat</minus10box>
cttgc
<tss>c</tss>
ggttcaaattacggtagtgatacccca<rbs>gagg</rbs>attagatggccaaagaagacaatattgaaatgca
aggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacac
atctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacg
acctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagt
aaaaggtcggtttaaccggcctttttattttat


start codon

  • At this point I am pretty comfortable repeating the same techniques, and I add a line break after the rbs so that I can find only the nearest occurrence of the pattern "atg" and insert the start_codon tags. I needed to refer to my notes to remember that "atg" = start codon.To "clean it up", I added sed ':a;N;$!ba;s/\n//g' to the end to negate all of the line breaks I had added, and the final command sequence is as follows:
cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" 
| sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g" 
| sed -r "4s/^(.){5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>\n/1"
| sed "7s/atg/<start_codon>&<\/start_codon>/1" | sed ':a;N;$!ba;s/\n//g'

Which produced:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgct
gccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgt
gttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</minus35box>tatttaca
gaacttcgg<minus10box>cattat</minus10box>cttgc<tss>c</tss>ggttcaaattacggtagtgatacccca<rbs>g
agg</rbs>attag<start_codon>atg</start_codon>gccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgt
tgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgca
tcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttacc
gcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat





Team Page

Heavy Metal HaterZ

Assignments

Individual Journal Entries

Shared Journal Entries