Difference between revisions of "Lenaolufson Week 4"

From LMU BioDB 2015
Jump to: navigation, search
(added link to template with the curly brackets)
(fixed the bullet points from double to single)
 
Line 29: Line 29:
 
*-35 box of the promoter
 
*-35 box of the promoter
 
  ... <minus35box>...</minus35box> ...
 
  ... <minus35box>...</minus35box> ...
**By using the info that the consensus sequence for the -35 site is tt[gt]ac[at] as well as the hints and help from class, I was able to determine that in order to add a tag for the -35 box, the command is:
+
*By using the info that the consensus sequence for the -35 site is tt[gt]ac[at] as well as the hints and help from class, I was able to determine that in order to add a tag for the -35 box, the command is:
 
  cat inca-E.coli-K12.txt | sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>/1"
 
  cat inca-E.coli-K12.txt | sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>/1"
**The 2 is used instead of "g" at the end in order to change the global into the number 1 to find the specific match. This is the output:
+
*The 2 is used instead of "g" at the end in order to change the global into the number 1 to find the specific match. This is the output:
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgc
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgc
 
  gtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttag
 
  gtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttag
Line 40: Line 40:
 
*-10 box of the promoter
 
*-10 box of the promoter
 
  ... <minus10box>...</minus10box> ...
 
  ... <minus10box>...</minus10box> ...
**By using the info that the consensus sequence for the -10 site is [ct]at[at]at, that there are 17 nucleotides between the -35 and the -10 box sites, and the instructions given in class, I was able to figure out that the command for the -10 box tag is:
+
*By using the info that the consensus sequence for the -10 site is [ct]at[at]at, that there are 17 nucleotides between the -35 and the -10 box sites, and the instructions given in class, I was able to figure out that the command for the -10 box tag is:
 
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" |  
 
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" |  
 
  sed "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1" | sed ':a;N;$!ba;s/\n//g'
 
  sed "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1" | sed ':a;N;$!ba;s/\n//g'
**The output of this command is:
+
*The output of this command is:
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcg
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcg
 
  tttatctcaccgctcccttatacgttgc  gcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact</minus35box>     
 
  tttatctcaccgctcccttatacgttgc  gcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact</minus35box>     
Line 52: Line 52:
 
*transcription start site
 
*transcription start site
 
  ... <tss>...</tss> ...
 
  ... <tss>...</tss> ...
**By using the info that the transcription start site is located at the 12th nucleotide after the first nucleotide of the -10 box, in addition to the help provided in class as well as from my homework partner, it was revealed to me that since the newline created a after the -35 box was still there, the second line could be searched for with ">". The character after the end of the of the tag by 6 nucleotides is the tss. In order to make it easier on myself, -r was used with sed to allow me to create a repetitive pattern. The command inputted is:
+
*By using the info that the transcription start site is located at the 12th nucleotide after the first nucleotide of the -10 box, in addition to the help provided in class as well as from my homework partner, it was revealed to me that since the newline created a after the -35 box was still there, the second line could be searched for with ">". The character after the end of the of the tag by 6 nucleotides is the tss. In order to make it easier on myself, -r was used with sed to allow me to create a repetitive pattern. The command inputted is:
 
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" |  
 
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" |  
 
  sed -r "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1;2s/> (.){5}/&\n/g" | sed "3s/^./ <tss>&<\/tss> /g" |  
 
  sed -r "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1;2s/> (.){5}/&\n/g" | sed "3s/^./ <tss>&<\/tss> /g" |  
 
  sed ':a;N;$!ba;s/\n//g'
 
  sed ':a;N;$!ba;s/\n//g'
**The output performed by this command is:
+
*The output performed by this command is:
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcg
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcg
 
  tttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact</minus35box>
 
  tttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact</minus35box>
Line 66: Line 66:
 
*Ribosome binding site
 
*Ribosome binding site
 
  ... <rbs>...</rbs> ...
 
  ... <rbs>...</rbs> ...
**I used the info that consensus sequence for the ribosome binding site is gagg, as well as help form my homework partner to figure out the correct command. The transcription start site is on the third line, which I used to help save me time when typing in the command:
+
*I used the info that consensus sequence for the ribosome binding site is gagg, as well as help form my homework partner to figure out the correct command. The transcription start site is on the third line, which I used to help save me time when typing in the command:
 
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" |  
 
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" |  
 
  sed -r "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1;2s/> (.){5}/&\n/g" |  
 
  sed -r "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1;2s/> (.){5}/&\n/g" |  
 
  sed "3s/^./ <tss>&<\/tss> /g;3s/gagg/ <rbs>&<\/rbs> /1" | sed ':a;N;$!ba;s/\n//g'
 
  sed "3s/^./ <tss>&<\/tss> /g;3s/gagg/ <rbs>&<\/rbs> /1" | sed ':a;N;$!ba;s/\n//g'
**The output was as follows:
+
*The output was as follows:
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgt
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgt
 
  caggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcg
 
  caggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcg
Line 79: Line 79:
 
*start codon
 
*start codon
 
  ... <start_codon>...</start_codon> ...
 
  ... <start_codon>...</start_codon> ...
**This input was very similar to the rbs, as I created a newline after the rbs and searched for the start codon on the 4th line, the command is:
+
*This input was very similar to the rbs, as I created a newline after the rbs and searched for the start codon on the 4th line, the command is:
 
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" |  
 
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" |  
 
  sed -r "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1;2s/> (.){5}/&\n/g" |  
 
  sed -r "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1;2s/> (.){5}/&\n/g" |  
 
  sed "3s/^./ <tss>&<\/tss> /g;3s/gagg/ <rbs>&<\/rbs> /1;3s/<\/rbs> /&\n/g" |  
 
  sed "3s/^./ <tss>&<\/tss> /g;3s/gagg/ <rbs>&<\/rbs> /1;3s/<\/rbs> /&\n/g" |  
 
  sed "4s/atg/ <start_codon>&<\/start_codon> /1" | sed ':a;N;$!ba;s/\n//g'
 
  sed "4s/atg/ <start_codon>&<\/start_codon> /1" | sed ':a;N;$!ba;s/\n//g'
**The output was:
+
*The output was:
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaa
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaa
 
  cgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc
 
  cgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc
Line 94: Line 94:
 
*Stop codon
 
*Stop codon
 
  ... <stop_codon>...</stop_codon> ...
 
  ... <stop_codon>...</stop_codon> ...
**This stop codon was challenging for me to figure out as it was pretty advanced and required some knowledge of the command line. I honestly had to look at the work of my peers in order to help me figure out what the correct command was, but after searching and asking my homework partner some questions, I was able to come up with the command line as follows:
+
*This stop codon was challenging for me to figure out as it was pretty advanced and required some knowledge of the command line. I honestly had to look at the work of my peers in order to help me figure out what the correct command was, but after searching and asking my homework partner some questions, I was able to come up with the command line as follows:
 
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" |  
 
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" |  
 
  sed -r "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1;2s/> (.){5}/&\n/g" |  
 
  sed -r "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1;2s/> (.){5}/&\n/g" |  
Line 101: Line 101:
 
  sed -r "5s/.../& /g;5s/tag|tga|taa/ <stop_codon>&<\/stop_codon> /1;5s/ //g;5s/<stop_codon>/ &/g;5s/<\/stop_codon>/& /g" |  
 
  sed -r "5s/.../& /g;5s/tag|tga|taa/ <stop_codon>&<\/stop_codon> /1;5s/ //g;5s/<stop_codon>/ &/g;5s/<\/stop_codon>/& /g" |  
 
  sed ':a;N;$!ba;s/\n//g'
 
  sed ':a;N;$!ba;s/\n//g'
**The output of this command was:
+
*The output of this command was:
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttc
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttc
 
  gcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc
 
  gcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc
Line 111: Line 111:
 
*terminator
 
*terminator
 
  ... <terminator>...</terminator> ...
 
  ... <terminator>...</terminator> ...
**From my knowledge and Dondi's demonstration in class, I know that a hairpin loops around itself and thus binds to itself, aaaaggt is the sequence where the t binds with a g. Gcctttt will also exist in the terminator and this makes it simp enough to construct a command to tag the terminator:
+
*From my knowledge and Dondi's demonstration in class, I know that a hairpin loops around itself and thus binds to itself, aaaaggt is the sequence where the t binds with a g. Gcctttt will also exist in the terminator and this makes it simp enough to construct a command to tag the terminator:
 
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" |  
 
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" |  
 
  sed -r "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1;2s/> (.){5}/&\n/g" |  
 
  sed -r "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1;2s/> (.){5}/&\n/g" |  
Line 119: Line 119:
 
  5s/<\/stop_codon> /&\n/g" | sed "6s/aaaaggt/ <terminator>&\n/g" | sed "7s/gcctttt..../&<\/terminator> /g" |  
 
  5s/<\/stop_codon> /&\n/g" | sed "6s/aaaaggt/ <terminator>&\n/g" | sed "7s/gcctttt..../&<\/terminator> /g" |  
 
  sed ':a;N;$!ba;s/\n//g'
 
  sed ':a;N;$!ba;s/\n//g'
**The output of the command is:
+
*The output of the command is:
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggta
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggta
 
  acgcccatcgtttatctcaccgctcccttatacgttgc  gcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc
 
  acgcccatcgtttatctcaccgctcccttatacgttgc  gcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc

Latest revision as of 21:52, 29 September 2015

Transcription and Translation "Taken to the Next Level"

  • The first step was logging in to the Terminal app to access the files for this assignment.
ssh eolufson@my.cs.lmu.edu
  • Then I accessed the folder for the class and this assignment.
cd biodb
mkdir week4
  • Then I went into Dondi's files to get the assigned file for the assignment.
cd ~dondi/xmlpipedb/data
cp infA-E.coli-K12.txt ~eolufson/biodb/week4
  • Next I went into my directory to do assignment.
cd ~eolufson/biodb/week4

This computer exercise examines gene expression at a much more detailed level than before, requiring knowledge in both the biological aspects of the process and the translation of these steps into computer text-processing equivalents.

The following sequence represents a real gene, called infA and found in E. coli K12. As you might have guessed, it’s stored as infA-E.coli-K12.txt in ~dondi/xmlpipedb/data.

ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgc
tcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgtt
gcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc
tttacttatttacagaacttcggcattatcttgccggttcaaattacggtagtgataccccagaggattagatggcc
aaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaa
cggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtga
ctgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatg
ggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat

For each of the following questions pertaining to this gene, provide (a) the actual answer, and (b) the sequence of text-processing commands that calculates this answer. Specific information about how these sequences can be identified is included after the list of questions.

Modify the gene sequence string so that it highlights or “tags” the special sequences within this gene, as follows (ellipses indicate bases in the sequence; note the spaces before the start tag and after the end tag):

  • -35 box of the promoter
... <minus35box>...</minus35box> ...
  • By using the info that the consensus sequence for the -35 site is tt[gt]ac[at] as well as the hints and help from class, I was able to determine that in order to add a tag for the -35 box, the command is:
cat inca-E.coli-K12.txt | sed "s/tt[gt]ac[at]/<minus35box>&<\/minus35box>/1"
  • The 2 is used instead of "g" at the end in order to change the global into the number 1 to find the specific match. This is the output:
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgc
gtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttag
cgcgcaaatc<minus35box>tttact</minus35box>tatttacagaacttcggcattatcttgccggttcaaattacggtagtgataccccagaggatt
agatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacac
atctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgt
agtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
  • -10 box of the promoter
... <minus10box>...</minus10box> ...
  • By using the info that the consensus sequence for the -10 site is [ct]at[at]at, that there are 17 nucleotides between the -35 and the -10 box sites, and the instructions given in class, I was able to figure out that the command for the -10 box tag is:
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" | 
sed "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1" | sed ':a;N;$!ba;s/\n//g'
  • The output of this command is:
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcg
tttatctcaccgctcccttatacgttgc  gcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact</minus35box>    
tatttacagaacttcgg  <minus10box>cattat</minus10box>  cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatg
caaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacggg
cgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggt
ttaaccggcctttttattttat
  • transcription start site
... <tss>...</tss> ...
  • By using the info that the transcription start site is located at the 12th nucleotide after the first nucleotide of the -10 box, in addition to the help provided in class as well as from my homework partner, it was revealed to me that since the newline created a after the -35 box was still there, the second line could be searched for with ">". The character after the end of the of the tag by 6 nucleotides is the tss. In order to make it easier on myself, -r was used with sed to allow me to create a repetitive pattern. The command inputted is:
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" | 
sed -r "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1;2s/> (.){5}/&\n/g" | sed "3s/^./ <tss>&<\/tss> /g" | 
sed ':a;N;$!ba;s/\n//g'
  • The output performed by this command is:
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcg
tttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact</minus35box>
tatttacagaacttcgg <minus10box>cattat</minus10box> cttgc <tss>c</tss>    
ggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgc
ctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctg
acgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgg
gcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
  • Ribosome binding site
... <rbs>...</rbs> ...
  • I used the info that consensus sequence for the ribosome binding site is gagg, as well as help form my homework partner to figure out the correct command. The transcription start site is on the third line, which I used to help save me time when typing in the command:
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" | 
sed -r "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1;2s/> (.){5}/&\n/g" | 
sed "3s/^./ <tss>&<\/tss> /g;3s/gagg/ <rbs>&<\/rbs> /1" | sed ':a;N;$!ba;s/\n//g'
  • The output was as follows:
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgt
caggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcg
caaatc <minus35box>tttact</minus35box> tatttacagaacttcgg <minus10box>cattat</minus10box> cttgc <tss>c</tss> 
ggttcaaattacggtagtgatacccca <rbs>gagg</rbs>        attagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaatac
catgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaact
gaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
  • start codon
... <start_codon>...</start_codon> ...
  • This input was very similar to the rbs, as I created a newline after the rbs and searched for the start codon on the 4th line, the command is:
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" | 
sed -r "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1;2s/> (.){5}/&\n/g" | 
sed "3s/^./ <tss>&<\/tss> /g;3s/gagg/ <rbs>&<\/rbs> /1;3s/<\/rbs> /&\n/g" | 
sed "4s/atg/ <start_codon>&<\/start_codon> /1" | sed ':a;N;$!ba;s/\n//g'
  • The output was:
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggtaa
cgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc
 <minus35box>tttact</minus35box> tatttacagaacttcgg <minus10box>cattat</minus10box> cttgc <tss>c</tss> ggttcaa
attacggtagtgatacccca <rbs>gagg</rbs>  attag<start_codon>atg</start_codon>gccaaagaagacaatattgaaatgcaaggtaccgttcttga
aacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatg
cgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgt
cttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
  • Stop codon
... <stop_codon>...</stop_codon> ...
  • This stop codon was challenging for me to figure out as it was pretty advanced and required some knowledge of the command line. I honestly had to look at the work of my peers in order to help me figure out what the correct command was, but after searching and asking my homework partner some questions, I was able to come up with the command line as follows:
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" | 
sed -r "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1;2s/> (.){5}/&\n/g" | 
sed "3s/^./ <tss>&<\/tss> /g;3s/gagg/ <rbs>&<\/rbs> /1;3s/<\/rbs> /&\n/g" | 
sed "4s/atg/ <start_codon>&<\/start_codon> /1;4s/<\/start_codon> /&\n/g" | 
sed -r "5s/.../& /g;5s/tag|tga|taa/ <stop_codon>&<\/stop_codon> /1;5s/ //g;5s/<stop_codon>/ &/g;5s/<\/stop_codon>/& /g" | 
sed ':a;N;$!ba;s/\n//g'
  • The output of this command was:
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttc
gcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc
<minus35box>tttact</minus35box> tatttacagaacttcgg <minus10box>cattat</minus10box> cttgc <tss>c</tss> ggttcaaattac
ggtagtgatacccca <rbs>gagg</rbs> attag <start_codon>atg</start_codon>
gccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtg
gttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgta
gtcgc   <stop_codon>tga</stop_codon> ttgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
  • terminator
... <terminator>...</terminator> ...
  • From my knowledge and Dondi's demonstration in class, I know that a hairpin loops around itself and thus binds to itself, aaaaggt is the sequence where the t binds with a g. Gcctttt will also exist in the terminator and this makes it simp enough to construct a command to tag the terminator:
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" | 
sed -r "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1;2s/> (.){5}/&\n/g" | 
sed "3s/^./ <tss>&<\/tss> /g;3s/gagg/ <rbs>&<\/rbs> /1" | sed "3s/<\/rbs> /&\n/g" | 
sed "4s/atg/ <start_codon>&<\/start_codon> /1;4s/<\/start_codon> /&\n/g" | 
sed -r "5s/.../& /g;5s/tag|tga|taa/ <stop_codon>&<\/stop_codon> /1;5s/ //g;5s/<stop_codon>/ &/g;5s/<\/stop_codon>/& /g;
5s/<\/stop_codon> /&\n/g" | sed "6s/aaaaggt/ <terminator>&\n/g" | sed "7s/gcctttt..../&<\/terminator> /g" | 
sed ':a;N;$!ba;s/\n//g'
  • The output of the command is:
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttcgcgtcaggta
acgcccatcgtttatctcaccgctcccttatacgttgc  gcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc
<minus35box>tttact</minus35box> tatttacagaacttcgg <minus10box>cattat</minus10box> cttgc <tss>c</tss> ggttca
aattacggtagtgatacccca <rbs>gagg</rbs> attag <start_codon>atg</start_codon>g
ccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatg
cgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgc <stop_codon>tga</stop_codon> 
ttgttttaccgcctgatgggcgaagagaaagaacgagt <terminator>aaaaggtcggtttaaccggcctttttatt</terminator> ttat

What is the exact mRNA sequence that is transcribed from this gene?

  • I used sed many times while creating the command to solve this question because it allows me to delete lines so that I can manipulate the data into the form I want. I put each tag on its own line, which I followed by deleting the tags and other useless information to transcribe. The command is:
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" | 
sed -r "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1;2s/> (.){5}/&\n/g" | 
sed "3s/^./ <tss>&<\/tss> /g;3s/gagg/ <rbs>&<\/rbs> /1;3s/<\/rbs> /&\n/g" | 
sed "4s/atg/ <start_codon>&<\/start_codon> /1;4s/<\/start_codon> /&\n/g" | 
sed -r "5s/.../& /g;5s/tag|tga|taa/ <stop_codon>&<\/stop_codon> /1;5s/ //g;5s/<stop_codon>/ &/g;
5s/<\/stop_codon>/& /g;5s/<\/stop_codon> /&\n/g" | sed "6s/aaaaggt/ <terminator>&\n/g" | 
sed "7s/gcctttt..../&<\/terminator> /g" | sed ':a;N;$!ba;s/\n//g' | sed "s/ //g" | 
sed -r "s/<|>/\n/g" | sed "1,10D;12D;14D;16D;18D;20D;22D;24D;26D;28,29D" | sed ':a;N;$!ba;s/\n//g' | sed "s/t/u/g"
  • The sequence is:
cgguucaaauuacgguagugauaccccagaggauuagauggccaaagaagacaauauugaaaugcaagguaccguucuug
aaacguugccuaauaccauguuccgcguagaguuagaaaacggucacgugguuacugcacacaucuccgguaaaaugcgca
aaaacuacauccgcauccugacgggcgacaaagugacuguugaacugaccccguacgaccugagcaaaggccgcauugu
cuuccguagucgcugauuguuuuaccgccugaugggcgaagagaaagaacgaguaaaaggucgguuuaaccggccuuuuuauu

What is the amino acid sequence that is translated from this mRNA?

  • Using the same technique as before, I figured out I needed to separate the lines into codons, similar to the week 3 assignment. The command is:
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1;s/<\/minus35box>/&\n/g" | 
sed -r "2s/[ct]at[at]at/ <minus10box>&<\/minus10box> /1;2s/> (.){5}/&\n/g" | 
sed "3s/^./ <tss>&<\/tss> /g;3s/gagg/ <rbs>&<\/rbs> /1;3s/<\/rbs> /&\n/g" | 
sed "4s/atg/ <start_codon>&<\/start_codon> /1;4s/<\/start_codon> /&\n/g" | 
sed -r "5s/.../& /g;5s/tag|tga|taa/ <stop_codon>&<\/stop_codon> /1;5s/ //g;5s/<stop_codon>/ &/g;
5s/<\/stop_codon>/& /g;5s/<\/stop_codon> /&\n/g" | sed "6s/aaaaggt/ <terminator>&\n/g" | 
sed "7s/gcctttt..../&<\/terminator> /g" | sed ':a;N;$!ba;s/\n//g' | sed -r "s/ //g;s/<|>/\n/g" | 
sed "1,18D;20D;22,29D" | sed ':a;N;$!ba;s/\n//g' | sed "s/.../& /g;s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g"
  • The amino acid sequence is:
MAKEDNIEMQGTVLETLPNTMFRVELENGHVVTAHISGKMRKNYIRILTGDKVTVELTPYDLSKGRIVFRSR

Loyola Marymount University: website


Weekly Assignments Individual Journal Pages Shared Journal Pages
Lenaolufson (talk) 22:33, 28 September 2015 (PDT)