Emilysimso Week 4
From LMU BioDB 2015
Question 1
- Used grep "[ct]at[at]at" infA-E.coli-K12.txt to highlight the -10box
- Used grep "tt[gt]ac[at]" infA-E.coli-K12.txt to highlight the -35 box
- Used sed "s/cattat/<minus10box>&<\/minus10box>/g" infA-E.coli-K12.txt | sed "s/tttact/<minus35box>&<\/minus35box>/g" to add the labels to the -35box and -10box
- Added sed -r "s/<\/minus10box>.{11}/<tss>&<\/tss>/g"
- This did not work because the <tss> label got added before the </minus10box>
- Used sed -r "s/<\/minus10box>.{5}/&<\/tss>/g" | sed "s/<\/tss>/<tss>c&/g" to add the tss site markers
- Added grep "gagg" to find the rbs
- Added sed "s/gagg/<rbs>&<\/rbs>/g" around the gagg to mark the rbs
- Used grep "atg" to find possible start codons
- Added sed -r "s/<\/rbs>.{8}/&<\/startcodon>/g" | sed "s/<\/startcodon>/<startcodon>atg&/g" to mark the start codon (atg)
- Possible stop codons - taa, tag, or tga
- Used sed "s/.../ & /g" infA-E.coli-K12.txt | grep "taa" | grep "tag" | grep "tga" to find possible stop codons
- tga is only possible stop codon
- Added sed "1s/tga/<stop_codon>&<\/stop_codon>/g"
- Looked for first one after the start codon
- Added sed "1s/tga/<stop_codon>&<\/stop_codon>/3"
- Used sed "s/aaaaggt/<terminator>&/g" to mark the first part of the terminator
- Used grep "gcctttt" infA-E.coli-K12.txt to find the rest of the hairpin
- Looked for next four bases - they were tatt
- Used sed "s/gcctttttatt/&<\/terminator>/g" to mark the end of the terminator
- Final command: sed "s/cattat/<minus10box>&<\/minus10box>/g" infA-E.coli-K12.txt | sed "s/tttact/<minus35box>&<\/minus35box>/g" | sed -r "s/<\/minus10box>.{5}/&<\/tss>/g" | sed "s/<\/tss>/<tss>c&/g" | sed "s/gagg/<rbs>&<\/rbs>/g" | sed -r "s/<\/rbs>.{8}/&<\/start_codon>/g" | sed "s/<\/start_codon>/<startcodon>atg&/g" | sed "1s/tga/<stop_codon>&<\/stop_codon>/3" | sed "s/aaaaggt/<terminator>&/g" | sed "s/gcctttttatt/&<\/terminator>/g"
- Final result: ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc<minus35box>tttact</minus35box>tatttacagaacttcgg<minus10box>cattat</minus10box>cttgc<tss>c</tss>cggttcaaattacggtagtgatacccca<rbs>gagg</rbs>attagatg<startcodon>atg</start_codon>gccaaagaagacaatat<stop_codon>tga</stop_codon>aatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagt<terminator>aaaaggtcggtttaaccggcctttttatt</terminator>ttat
Question 2
- Took the strand from the tss to the end of the terminator
- Used sed "y/atcg/uagc/"
- Resulting sequence: ggccaaguuuaaugccaucacuauggggucuccuaaucuacuaccgguuucuucuguuauaacuuuacguuccauggcaagaacuuugcaacggauuaugguacaaggcgcaucucaaucuuuugccagugcaccaaugacguguguagaggccauuuuacgcguuuuugauguaggcguaggacugcccgcuguuucacugacaacuugacuggggcaugcuggacucguuuccggcguaacagaaggcaucagcgacuaacaaaauggcggacuacccgcuucucuuucuugcucauuuuccagccaaauuggccggaaaaauaa