Difference between revisions of "Malverso Week 4"

From LMU BioDB 2015
Jump to: navigation, search
("cleaned up" answer for #1 start_codon)
("cleaned up" all answers by removing line breaks)
Line 3: Line 3:
 
I completed this assignment using Putty.exe by accessing infA-E.coli-K12.txt in ~dondi/xmlpipedb/data.
 
I completed this assignment using Putty.exe by accessing infA-E.coli-K12.txt in ~dondi/xmlpipedb/data.
 
I added line breaks in the sequences and commands when necessary to enhance readability.  
 
I added line breaks in the sequences and commands when necessary to enhance readability.  
 +
I also re read the directions halfway through this assignment and decided to "clean up" all of my answers by removing the line breaks that would have shown up in the command line by adding the sed command ':a;N;$!ba;s/\n//g'.
  
 
===#1===
 
===#1===
Line 33: Line 34:
  
 
   cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1"  
 
   cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1"  
  | sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>/g"
+
  | sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>/g"  
 +
| sed ':a;N;$!ba;s/\n//g'
  
 
Which returned:  
 
Which returned:  
Line 40: Line 42:
 
  ctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacg
 
  ctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacg
 
  ttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaa
 
  ttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaa
  atc<minus35box>tttact</minus35box>
+
  atc<minus35box>tttact</minus35box>tatttacagaacttcgg<minus10box>cattat</minus
tatttacagaacttcgg
+
10box>cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgc
<minus10box>cattat</minus10box>cttgccggttcaaattacggtagtgataccccagaggattagatg
+
  aaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcaca
  gccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttag
+
  catctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtac
  aaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaa
+
  gacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgag
  agtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgc
+
  taaaaggtcggtttaaccggcctttttattttat
  ctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
+
  
 
====transcription start site====
 
====transcription start site====
Line 56: Line 57:
 
  cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1"  
 
  cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1"  
 
  | sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g"  
 
  | sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g"  
  | sed -r "4s/^(.){5}/&\n/g" | sed "5s/^./<tss>&<\/tss>/g"
+
  | sed -r "4s/^(.){5}/&\n/g" | sed "5s/^./<tss>&<\/tss>/g" | sed ':a;N;$!ba;s/\n//g'
  
  
Line 64: Line 65:
 
  ctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacg
 
  ctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacg
 
  ttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaa
 
  ttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaa
  atc<minus35box>tttact</minus35box>
+
  atc<minus35box>tttact</minus35box>tatttacagaacttcgg<minus10box>cattat</minus
tatttacagaacttcgg
+
  10box>cttgc<tss>c</tss>ggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaa
<minus10box>cattat</minus10box>
+
  tattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtg
  cttgc
+
  gttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaac
<tss>c</tss>ggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgc
+
  tgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaaga
  aaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcaca
+
  gaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
  catctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtac
+
  gacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgag
+
  taaaaggtcggtttaaccggcctttttattttat
+
  
  
Line 83: Line 81:
 
  | sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g"  
 
  | sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g"  
 
  | sed -r "4s/^(.){5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>/1"
 
  | sed -r "4s/^(.){5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>/1"
 +
| sed ':a;N;$!ba;s/\n//g'
  
  
 
Which produced the sequence:
 
Which produced the sequence:
  
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacg
+
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgct
  ctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacg
+
  cattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgc
  ttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaa
+
  gcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<mi
  atc<minus35box>tttact</minus35box>
+
  nus35box>tttact</minus35box>tatttacagaacttcgg<minus10box>cattat</minus10box>ct
tatttacagaacttcgg
+
  tgc<tss>c</tss>ggttcaaattacggtagtgatacccca<rbs>gagg</rbs>attagatggccaaagaagaca
<minus10box>cattat</minus10box>
+
  atattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtgg
  cttgc
+
  ttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactga
<tss>c</tss>
+
  ccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaag
ggttcaaattacggtagtgatacccca<rbs>gagg</rbs>attagatggccaaagaagacaatattgaaatgca
+
  aacgagtaaaaggtcggtttaaccggcctttttattttat
  aggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacac
+
  atctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacg
+
  acctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagt
+
  aaaaggtcggtttaaccggcctttttattttat
+
  
  

Revision as of 20:54, 28 September 2015

Transcription and Translation "Taken to the Next Level"

I completed this assignment using Putty.exe by accessing infA-E.coli-K12.txt in ~dondi/xmlpipedb/data. I added line breaks in the sequences and commands when necessary to enhance readability. I also re read the directions halfway through this assignment and decided to "clean up" all of my answers by removing the line breaks that would have shown up in the command line by adding the sed command ':a;N;$!ba;s/\n//g'.

#1

-35 box of the promoter

  • Looking over my notes of when I first attempted this assignment in class, I used the sed command to find all the places where the pattern tt[gt]ac[at] occurred and attached a <minus35box> tag to the beginning of that sequence and a </minus35box> to the end. This resulted with two possible locations for the -35 box being tagged.
  • Since it was given that there were 17 base pairs between the -35 box and the -10 box, I used that clue to identify which -35 box tags were correct, as well as a bit pf guess and check. I just assumed the first -35 box tag was correct and modified my sed command to only tag the first instance of the -35 box pattern by replacing the g at the end with a 1. I also added an \n to the end of the box tags so that a new line would start after the last -35 box tag. Referring back to my in class notes, I added a sed command on to search for the base pair possibilities for the -10 box as well as a command that would show me the point after 17 characters from the end of my -35 box:
cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" 
| sed -r "2s/^(.){17}/&<here?>/g" | sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>/g"

This confirmed that the first instance of the -35 box pattern match was the correct one. To calculate just where the -35 box is, I can now use the code:

cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>/1"

Which produces:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttac
gctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcccgctcccttatac
gttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgc
aaatc<minus35box>tttact</minus35box>tatttacagaacttcggcattatcttgccggttcaaatt
acggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgt
tgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgca
aaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgca
ttgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaacc
ggcctttttattttat

-10 box of the promoter

  • By figuring out which -35 box tags were correct I also figured out which -10 box tags were correct. I added more line breaks to make this clear, and used the code below to produce the code with the correct -10 box tags and correct -35 box tags:
 cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" 
| sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>/g" 
| sed ':a;N;$!ba;s/\n//g'

Which returned:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacg
ctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacg
ttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaa
atc<minus35box>tttact</minus35box>tatttacagaacttcgg<minus10box>cattat</minus
10box>cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgc
aaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcaca
catctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtac
gacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgag
taaaaggtcggtttaaccggcctttttattttat

transcription start site

  • It was given that the tss is at the 12th nucleotide after the first nucleotide of the -10 box. I put a line break after the -10 box and then used the sed command: sed -r "4s/^(.){6}/&<tss>/g" to tag the beginning of the tss. Since the -10 box is 6 nucleotides, I put the tss tag before the 6th character after the line break. I wasn't sure if it should be at the 6th or 7th character after, so I asked Mahrad.

This sed command, however, is not at helpful to tag the end of the tss. In order to tag both the beginning and the end of the location, I changed the command to:

cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" 
| sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g" 
| sed -r "4s/^(.){5}/&\n/g" | sed "5s/^./<tss>&<\/tss>/g" | sed ':a;N;$!ba;s/\n//g'


This way, the nucleotide that was at the tss location would be at the beginning of the line, and I could easily surround it with the appropriate tags, as shown below:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacg
ctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacg
ttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaa
atc<minus35box>tttact</minus35box>tatttacagaacttcgg<minus10box>cattat</minus
10box>cttgc<tss>c</tss>ggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaa
tattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtg
gttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaac
tgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaaga
gaaagaacgagtaaaaggtcggtttaaccggcctttttattttat


ribosome binding site

  • It was given that the ribsome binding site was gagg, so I just used a sed command to find an occurence of that pattern after the tss.

I modified the sed command so that only the first instance of the pattern would show, in case the pattern occurred in the sequence more than once:

cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" 
| sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g" 
| sed -r "4s/^(.){5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>/1"
| sed ':a;N;$!ba;s/\n//g'


Which produced the sequence:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgct
cattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgc
gcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<mi
nus35box>tttact</minus35box>tatttacagaacttcgg<minus10box>cattat</minus10box>ct
tgc<tss>c</tss>ggttcaaattacggtagtgatacccca<rbs>gagg</rbs>attagatggccaaagaagaca
atattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtgg
ttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactga
ccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaag
aacgagtaaaaggtcggtttaaccggcctttttattttat


start codon

  • At this point I am pretty comfortable repeating the same techniques, and I add a line break after the rbs so that I can find only the nearest occurrence of the pattern "atg" and insert the start_codon tags. I needed to refer to my notes to remember that "atg" = start codon.To "clean it up", I added sed ':a;N;$!ba;s/\n//g' to the end to negate all of the line breaks I had added, and the final command sequence is as follows:
cat infA-E.coli-K12.txt |sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>\n/1" 
| sed -r "2s/^(.){17}/&\n/g" | sed "3s/[ct]at[at]at/<minus10box>&<\/minus10box>\n/g" 
| sed -r "4s/^(.){5}/&\n/g" | sed "5s/^./<tss>&<\/tss>\n/g" | sed "6s/gagg/<rbs>&<\/rbs>\n/1"
| sed "7s/atg/<start_codon>&<\/start_codon>/1" | sed ':a;N;$!ba;s/\n//g'

Which produced:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgct
gccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgt
gttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</minus35box>tatttaca
gaacttcgg<minus10box>cattat</minus10box>cttgc<tss>c</tss>ggttcaaattacggtagtgatacccca<rbs>g
agg</rbs>attag<start_codon>atg</start_codon>gccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgt
tgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgca
tcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttacc
gcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat





Team Page

Heavy Metal HaterZ

Assignments

Individual Journal Entries

Shared Journal Entries