Difference between revisions of "Kevin Wyllie Week 4"

From LMU BioDB 2015
Jump to: navigation, search
(More formatting edits.)
(Finished the questions and added the answers in-text. Hopefully the formatting will work out.)
Line 43: Line 43:
 
[[image:KwWeek4screenshot1.jpg|right|thumb]]
 
[[image:KwWeek4screenshot1.jpg|right|thumb]]
  
*The final pipe is shown entered into the command line, along with the output.
+
*The final pipe is shown entered into the command line, along with the output. The tagged sequence is:
 +
 
 +
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctca
 +
ccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact</minus35box> tatttacag
 +
aacttcgg <minus10box>cattat</minus10box> cttgc <tss>c</tss> ggttcaaattacggtagtgatacccca <rbs>gagg</rbs> attag <start_codon>atg</start_codon> gccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatc
 +
cgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcc <stop_codon>tga</stop_codon> tgggc
 +
gaagagaaagaacgagt <terminator>aaaaggtcggtttaaccggcctttttatt</terminator> ttat
 +
 
  
  
Line 78: Line 85:
 
[[image:KwWeek4screenshot2.jpg|right|thumb]]
 
[[image:KwWeek4screenshot2.jpg|right|thumb]]
  
* The final pipe is shown entered into the command line, along with the output.
+
* The final pipe is shown entered into the command line, along with the output. The transcribed sequence is:
 +
cggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacg
 +
tggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctg
 +
attgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttatt
  
  
Line 93: Line 103:
  
 
====Question 3====
 
====Question 3====
 +
 +
Again, the amino acid sequence can be found by using modified bits and pieces from question 1.
 +
 +
[[image:KwWeek4screenshot3.jpg|right|thumb]]
 +
 +
* The pipe is shown entered into the command line, along with the output. The amino sequence this returns is:
 +
 +
M A K E D N I E M Q G T V L E T L P N T M F R V E L E N G H V V T A H I S G K M R K N Y I R I L T G D K V T V E L T P Y D L S K G R I V F R S R - L F Y R L M G E E K E R V K G R F N R P F Y

Revision as of 05:29, 29 September 2015

Transcription and Translation “Taken to the Next Level”

Adding Tags for Each Gene "Landmark"

Question 1

cat infA-E.coli-K12.txt | sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g"

The sed command shown finds the beginning tag for the -35 box and the end tag for the -10 box, based on their consensus sequence and known distance from each other...

cat infA-E.coli-K12.txt | sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | sed "s/<minus35box>tt[gt]ac[at]/&<\/minus35box> /g" | sed "s/[ct]at[at]at<\/minus10box>/ <minus10box>&/g"

...and these sed commands add the remaining tags for the -35 and -10 box, based on the location of either existing tag.

cat infA-E.coli-K12.txt | ... | sed -r "s/[ct]at[at]at<\/minus10box>(.){6}/& <tss>/g" | sed "s/<tss>./&<\/tss> /g"

These commands add the tags for the transcription start site (TSS), based on its consensus sequence and known distance from the -10 box.

cat infA-E.coli-K12.txt | ... | sed "s/gagg/ <rbs>&<\/rbs> /g"

This sed command adds the ribosome binding site (RBS), based its consensus sequence.

cat infA-E.coli-K12.txt | ... | sed "s/aaaaggt/ <terminator>&/g" | sed "s/gcctttt…./&<\/terminator> /g"

This sed command adds the tags for the terminator, based off of its consensus sequence and hairpin behavior.

cat infA-E.coli-K12.txt | ... | sed "s/<\/rbs> /&\n/g" | sed "2s/atg/ <start_codon>&<\/start_codon> /1"

These commands start a new line after the RBS and then add start-codon tags to the first occurring "atg" sequence on the newly separated line.

cat infA-E.coli-K12.txt | ... | sed "2s/ta[ag]/\n&/g" | sed "3,10s/tga/\n&/g"

These commands start a new line for each potential stop codon sequence occurring after the RBS.

cat infA-E.coli-K12.txt | ... | sed "17s/tga/ <stop_codon>&<\/stop_codon> /g"

These commands at stop-codon tags to the potential stop codon sequence occurring closest to (but not after) the terminator.

cat infA-E.coli-K12.txt | ... | sed ':a;N;$!ba;s/\n//g'

This rather unintelligible command undoes any line breaks, combining the text file back into one line.

KwWeek4screenshot1.jpg
  • The final pipe is shown entered into the command line, along with the output. The tagged sequence is:

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctca ccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact</minus35box> tatttacag aacttcgg <minus10box>cattat</minus10box> cttgc <tss>c</tss> ggttcaaattacggtagtgatacccca <rbs>gagg</rbs> attag <start_codon>atg</start_codon> gccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatc cgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcc <stop_codon>tga</stop_codon> tgggc gaagagaaagaacgagt <terminator>aaaaggtcggtttaaccggcctttttatt</terminator> ttat







Question 2

The mRNA sequence can be isolated from the file using some of the same commands as before.

sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | sed "s/<minus35box>tt[gt]ac[at]/&<\/minus35box> /g" | sed "s/[ct]at[at]at<\/minus10box>/ <minus10box>&/g" | sed -r "s/[ct]at[at]at<\/minus10box>(.){6}/& <tss>/g"

The pipe starts wiith question 1's first few commands, up to the point of adding the TSS tags. However, the TSS end tag has been omitted.

sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | ... | sed "s/gcctttt..../&<\/terminator> /g"

This command adds the terminator's end tag, but not its beginning tag.

sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | ... | sed "s/<tss>/&\n/g" | sed "s/<\/terminator>/\n&/g" 

These commands begin new lines after the TSS beginning tag and before the terminator's end tag.

sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | ... | sed "3D" | sed "1D"

These commands delete the top and bottom (first and third) lines, leaving only the middle.

KwWeek4screenshot2.jpg
  • The final pipe is shown entered into the command line, along with the output. The transcribed sequence is:

cggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacg tggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctg attgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttatt







Question 3

Again, the amino acid sequence can be found by using modified bits and pieces from question 1.

KwWeek4screenshot3.jpg
  • The pipe is shown entered into the command line, along with the output. The amino sequence this returns is:

M A K E D N I E M Q G T V L E T L P N T M F R V E L E N G H V V T A H I S G K M R K N Y I R I L T G D K V T V E L T P Y D L S K G R I V F R S R - L F Y R L M G E E K E R V K G R F N R P F Y