Difference between revisions of "Blitvak Week 4"
From LMU BioDB 2015
(first segment of the introductory portion of the Week 4 assignment) |
(second edit of the -10/-35 section) |
||
Line 8: | Line 8: | ||
*By reading the [[Week 4 | Week 4 Assignment Page]], I found that the -10 box is generally <code>[ct]at[at]at</code>, and that the -35 box is generally <code>tt[gt]ac[at]</code> | *By reading the [[Week 4 | Week 4 Assignment Page]], I found that the -10 box is generally <code>[ct]at[at]at</code>, and that the -35 box is generally <code>tt[gt]ac[at]</code> | ||
*I skimmed over the [[More Text Processing Features |More Text Processing Features page]], and I found that <code><nowiki>sed "s/Title/<h1>&<\/h1>/g"</nowiki></code> results in an output of <nowiki><h1>Title</h1></nowiki>; this command would be useful in tagging the sequence with its various parts | *I skimmed over the [[More Text Processing Features |More Text Processing Features page]], and I found that <code><nowiki>sed "s/Title/<h1>&<\/h1>/g"</nowiki></code> results in an output of <nowiki><h1>Title</h1></nowiki>; this command would be useful in tagging the sequence with its various parts | ||
+ | *I then tried <code>cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>/g"| sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>/g"</code> | ||
+ | **Gave me the output: | ||
+ | ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt<minus10box>tataat</minus10box>tgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggag | ||
+ | taatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</minus35box>ta<minus35box>tttaca</minus35box>gaacttcgg<minus10box>cattat</minus10box>cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgc | ||
+ | aaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaa | ||
+ | gagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat | ||
+ | *I realized that there are two possibilities for the minus35box and one for the minus10box (since the minus10box must come after the minus35box, the first instance of a "minus10box" is to be ignored) | ||
+ | *I looked at the [[Week 4 | Week 4 Assignment Page]], and I found that there is an ideal number of 17 base pairs between the -35 and -10 box. Only <code><nowiki><minus35box>tttact</minus35box></nowiki></code> fits this criteria (is 17 bp away from <code><nowiki><minus10box>cattat</minus10box></nowiki></code>) |
Revision as of 06:45, 27 September 2015
Individual Journal Assignment Week 4
Finding the -35 and -10 boxes of the promoter
- I first found infA-E.coli-K12.txt by entering the correct directory,
cd ~dondi/xmlpipedb/data
- I copied the sequence kept in that file on an external space for future reference/checking
- I assumed that the sequence is the mRNA-like strand and that it runs from 5'- 3'
- By reading the Week 4 Assignment Page, I found that the -10 box is generally
[ct]at[at]at
, and that the -35 box is generallytt[gt]ac[at]
- I skimmed over the More Text Processing Features page, and I found that
sed "s/Title/<h1>&<\/h1>/g"
results in an output of <h1>Title</h1>; this command would be useful in tagging the sequence with its various parts - I then tried
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>/g"| sed "s/[ct]at[at]at/<minus10box>&<\/minus10box>/g"
- Gave me the output:
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt<minus10box>tataat</minus10box>tgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggag taatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</minus35box>ta<minus35box>tttaca</minus35box>gaacttcgg<minus10box>cattat</minus10box>cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgc aaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaa gagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
- I realized that there are two possibilities for the minus35box and one for the minus10box (since the minus10box must come after the minus35box, the first instance of a "minus10box" is to be ignored)
- I looked at the Week 4 Assignment Page, and I found that there is an ideal number of 17 base pairs between the -35 and -10 box. Only
<minus35box>tttact</minus35box>
fits this criteria (is 17 bp away from<minus10box>cattat</minus10box>
)