Difference between revisions of "Kevin Wyllie Week 4"
(Added a screenshot for the first pipe.) |
(Added the answer, in text, for question 1.) |
||
Line 2: | Line 2: | ||
===Adding Tags for Each Gene "Landmark"=== | ===Adding Tags for Each Gene "Landmark"=== | ||
+ | |||
+ | ====Question 1==== | ||
cat infA-E.coli-K12.txt | '''sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g"''' | cat infA-E.coli-K12.txt | '''sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g"''' | ||
Line 11: | Line 13: | ||
...and these sed commands add the remaining tags for the -35 and -10 box, based on the location of either existing tag. | ...and these sed commands add the remaining tags for the -35 and -10 box, based on the location of either existing tag. | ||
− | + | cat infA-E.coli-K12.txt | ... | '''sed -r "s/[ct]at[at]at<\/minus10box>(.){6}/& <tss>/g" | sed "s/<tss>./&<\/tss> /g"''' | |
These commands add the tags for the transcription start site (TSS), based on its consensus sequence and known distance from the -10 box. | These commands add the tags for the transcription start site (TSS), based on its consensus sequence and known distance from the -10 box. | ||
Line 40: | Line 42: | ||
− | The final pipe is shown entered into the command line, along with the output. | + | *The final pipe is shown entered into the command line, along with the output. |
[[image:KwWeek4screenshot1.jpg|right|thumb]] | [[image:KwWeek4screenshot1.jpg|right|thumb]] | ||
+ | |||
+ | ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgat ttagcgcgcaaatc <minus35box>tttact</minus35box> tatttacagaacttcgg <minus10box>cattat</minus10box> cttgc <tss>c</tss> ggttcaaattacggtagtgatacccca <rbs>gagg</rbs> attag <start_codon>atg</start_codon> gccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccg tagtcgctgattgttttaccgcc <stop_codon>tga</stop_codon> tgggcgaagagaaagaacgagt <terminator>aaaaggtcggtttaaccggcctttttatt</terminator> ttat | ||
+ | |||
+ | |||
+ | |||
+ | ====Question 2==== |
Revision as of 04:00, 29 September 2015
Contents
Transcription and Translation “Taken to the Next Level”
Adding Tags for Each Gene "Landmark"
Question 1
cat infA-E.coli-K12.txt | sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g"
The sed command shown finds the beginning tag for the -35 box and the end tag for the -10 box, based on their consensus sequence and known distance from each other...
cat infA-E.coli-K12.txt | sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | sed "s/<minus35box>tt[gt]ac[at]/&<\/minus35box> /g" | sed "s/[ct]at[at]at<\/minus10box>/ <minus10box>&/g"
...and these sed commands add the remaining tags for the -35 and -10 box, based on the location of either existing tag.
cat infA-E.coli-K12.txt | ... | sed -r "s/[ct]at[at]at<\/minus10box>(.){6}/& <tss>/g" | sed "s/<tss>./&<\/tss> /g"
These commands add the tags for the transcription start site (TSS), based on its consensus sequence and known distance from the -10 box.
cat infA-E.coli-K12.txt | ... | sed "s/gagg/ <rbs>&<\/rbs> /g"
This sed command adds the ribosome binding site (RBS), based its consensus sequence.
cat infA-E.coli-K12.txt | ... | sed "s/aaaaggt/ <terminator>&/g" | sed "s/gcctttt…./&<\/terminator> /g"
This sed command adds the tags for the terminator, based off of its consensus sequence and hairpin behavior.
cat infA-E.coli-K12.txt | ... | sed "s/<\/rbs> /&\n/g" | sed "2s/atg/ <start_codon>&<\/start_codon> /1"
These commands start a new line after the RBS and then add start-codon tags to the first occurring "atg" sequence on the newly separated line.
cat infA-E.coli-K12.txt | ... | sed "2s/ta[ag]/\n&/g" | sed "3,10s/tga/\n&/g"
These commands start a new line for each potential stop codon sequence occurring after the RBS.
cat infA-E.coli-K12.txt | ... | sed "17s/tga/ <stop_codon>&<\/stop_codon> /g"
These commands at stop-codon tags to the potential stop codon sequence occurring closest to (but not after) the terminator.
cat infA-E.coli-K12.txt | ... | sed ':a;N;$!ba;s/\n//g'
This rather unintelligible command undoes any line breaks, combining the text file back into one line.
- The final pipe is shown entered into the command line, along with the output.
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgat ttagcgcgcaaatc <minus35box>tttact</minus35box> tatttacagaacttcgg <minus10box>cattat</minus10box> cttgc <tss>c</tss> ggttcaaattacggtagtgatacccca <rbs>gagg</rbs> attag <start_codon>atg</start_codon> gccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccg tagtcgctgattgttttaccgcc <stop_codon>tga</stop_codon> tgggcgaagagaaagaacgagt <terminator>aaaaggtcggtttaaccggcctttttatt</terminator> ttat