Blitvak Week 4

From LMU BioDB 2015
Revision as of 06:45, 27 September 2015 by Blitvak (Talk | contribs) (second edit of the -10/-35 section)

Jump to: navigation, search

Individual Journal Assignment Week 4

Finding the -35 and -10 boxes of the promoter

  • I first found infA-E.coli-K12.txt by entering the correct directory, cd ~dondi/xmlpipedb/data
  • I copied the sequence kept in that file on an external space for future reference/checking
  • I assumed that the sequence is the mRNA-like strand and that it runs from 5'- 3'
  • By reading the Week 4 Assignment Page, I found that the -10 box is generally [ct]at[at]at, and that the -35 box is generally tt[gt]ac[at]
  • I skimmed over the More Text Processing Features page, and I found that sed "s/Title/<h1>&<\/h1>/g" results in an output of <h1>Title</h1>; this command would be useful in tagging the sequence with its various parts
  • I then tried cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>/g"| sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>/g"
    • Gave me the output:
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt<minus10box>tataat</minus10box>tgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggag
taatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</minus35box>ta<minus35box>tttaca</minus35box>gaacttcgg<minus10box>cattat</minus10box>cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgc
aaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaa
gagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
  • I realized that there are two possibilities for the minus35box and one for the minus10box (since the minus10box must come after the minus35box, the first instance of a "minus10box" is to be ignored)
  • I looked at the Week 4 Assignment Page, and I found that there is an ideal number of 17 base pairs between the -35 and -10 box. Only <minus35box>tttact</minus35box> fits this criteria (is 17 bp away from <minus10box>cattat</minus10box>)