Malverso Week 4

From LMU BioDB 2015
Revision as of 20:54, 28 September 2015 by Malverso (Talk | contribs) ("cleaned up" all answers by removing line breaks)

Jump to: navigation, search

Transcription and Translation "Taken to the Next Level"

I completed this assignment using Putty.exe by accessing infA-E.coli-K12.txt in ~dondi/xmlpipedb/data. I added line breaks in the sequences and commands when necessary to enhance readability. I also re read the directions halfway through this assignment and decided to "clean up" all of my answers by removing the line breaks that would have shown up in the command line by adding the sed command ':a;N;$!ba;s/\n//g'.

#1

-35 box of the promoter

  • Looking over my notes of when I first attempted this assignment in class, I used the sed command to find all the places where the pattern tt[gt]ac[at] occurred and attached a <minus35box> tag to the beginning of that sequence and a </minus35box> to the end. This resulted with two possible locations for the -35 box being tagged.
  • Since it was given that there were 17 base pairs between the -35 box and the -10 box, I used that clue to identify which -35 box tags were correct, as well as a bit pf guess and check. I just assumed the first -35 box tag was correct and modified my sed command to only tag the first instance of the -35 box pattern by replacing the g at the end with a 1. I also added an \n to the end of the box tags so that a new line would start after the last -35 box tag. Referring back to my in class notes, I added a sed command on to search for the base pair possibilities for the -10 box as well as a command that would show me the point after 17 characters from the end of my -35 box:
cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" 
| sed -r "2s/^(.){17}/&<here?>/g" | sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>/g"

This confirmed that the first instance of the -35 box pattern match was the correct one. To calculate just where the -35 box is, I can now use the code:

cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>/1"

Which produces:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttac
gctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcccgctcccttatac
gttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgc
aaatc<minus35box>tttact</minus35box>tatttacagaacttcggcattatcttgccggttcaaatt
acggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgt
tgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgca
aaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgca
ttgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaacc
ggcctttttattttat

-10 box of the promoter

  • By figuring out which -35 box tags were correct I also figured out which -10 box tags were correct. I added more line breaks to make this clear, and used the code below to produce the code with the correct -10 box tags and correct -35 box tags:
 cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" 
| sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>/g" 
| sed ':a;N;$!ba;s/\n//g'

Which returned:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacg
ctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacg
ttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaa
atc<minus35box>tttact</minus35box>tatttacagaacttcgg<minus10box>cattat</minus
10box>cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgc
aaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcaca
catctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtac
gacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgag
taaaaggtcggtttaaccggcctttttattttat

transcription start site

  • It was given that the tss is at the 12th nucleotide after the first nucleotide of the -10 box. I put a line break after the -10 box and then used the sed command: sed -r "4s/^(.){6}/&<tss>/g" to tag the beginning of the tss. Since the -10 box is 6 nucleotides, I put the tss tag before the 6th character after the line break. I wasn't sure if it should be at the 6th or 7th character after, so I asked Mahrad.

This sed command, however, is not at helpful to tag the end of the tss. In order to tag both the beginning and the end of the location, I changed the command to:

cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" 
| sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g" 
| sed -r "4s/^(.){5}/&\n/g" | sed "5s/^./<tss>&<\/tss>/g" | sed ':a;N;$!ba;s/\n//g'


This way, the nucleotide that was at the tss location would be at the beginning of the line, and I could easily surround it with the appropriate tags, as shown below:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacg
ctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacg
ttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaa
atc<minus35box>tttact</minus35box>tatttacagaacttcgg<minus10box>cattat</minus
10box>cttgc<tss>c</tss>ggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaa
tattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtg
gttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaac
tgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaaga
gaaagaacgagtaaaaggtcggtttaaccggcctttttattttat


ribosome binding site

  • It was given that the ribsome binding site was gagg, so I just used a sed command to find an occurence of that pattern after the tss.

I modified the sed command so that only the first instance of the pattern would show, in case the pattern occurred in the sequence more than once:

cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" 
| sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g" 
| sed -r "4s/^(.){5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>/1"
| sed ':a;N;$!ba;s/\n//g'


Which produced the sequence:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgct
cattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgc
gcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<mi
nus35box>tttact</minus35box>tatttacagaacttcgg<minus10box>cattat</minus10box>ct
tgc<tss>c</tss>ggttcaaattacggtagtgatacccca<rbs>gagg</rbs>attagatggccaaagaagaca
atattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtgg
ttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactga
ccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaag
aacgagtaaaaggtcggtttaaccggcctttttattttat


start codon

  • At this point I am pretty comfortable repeating the same techniques, and I add a line break after the rbs so that I can find only the nearest occurrence of the pattern "atg" and insert the start_codon tags. I needed to refer to my notes to remember that "atg" = start codon.To "clean it up", I added sed ':a;N;$!ba;s/\n//g' to the end to negate all of the line breaks I had added, and the final command sequence is as follows:
cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" 
| sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g" 
| sed -r "4s/^(.){5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>\n/1"
| sed "7s/atg/<start_codon>&<\/start_codon>/1" | sed ':a;N;$!ba;s/\n//g'

Which produced:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgct
gccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgt
gttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</minus35box>tatttaca
gaacttcgg<minus10box>cattat</minus10box>cttgc<tss>c</tss>ggttcaaattacggtagtgatacccca<rbs>g
agg</rbs>attag<start_codon>atg</start_codon>gccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgt
tgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgca
tcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttacc
gcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat





Team Page

Heavy Metal HaterZ

Assignments

Individual Journal Entries

Shared Journal Entries