Difference between revisions of "Rlegaspi Week 4"

From LMU BioDB 2015
Jump to: navigation, search
(Inserted answer for tss portion of assignment.)
(Answer for RBS section inserted into page.)
Line 27: Line 27:
 
=== -35 box and -10 box ===
 
=== -35 box and -10 box ===
 
Firstly, we were able to learn the placement of possible -35 box and -10 box locations in the sequence and how to "tag" these special sequences. We were able to search for both of the special sequences concurrently with the knowledge that the sequence for a -35 box is <code>tt[gt]ac[at]</code> and the sequence for a -10 box is <code>[ct]at[at]at</code>. We could have used <code>grep</code> to search for the sequences individually, but to see where the potential locations for both -35 box and -10 box we used the following command:
 
Firstly, we were able to learn the placement of possible -35 box and -10 box locations in the sequence and how to "tag" these special sequences. We were able to search for both of the special sequences concurrently with the knowledge that the sequence for a -35 box is <code>tt[gt]ac[at]</code> and the sequence for a -10 box is <code>[ct]at[at]at</code>. We could have used <code>grep</code> to search for the sequences individually, but to see where the potential locations for both -35 box and -10 box we used the following command:
  '''cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ *&* /g" | sed "s/[ct]at[at]at/ **&** /g"'''
+
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ *&* /g" | sed "s/[ct]at[at]at/ **&** /g"
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt **tataat** tgcggtcgcagagttggttacgct
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt **tataat** tgcggtcgcagagttggttacgct
 
  cattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgtt
 
  cattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgtt
Line 37: Line 37:
 
  ggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
 
  ggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
 
After finding the locations of the potential -35 box and -10 box locations, it was clear where the '''actual''' -35 box and '''actual''' -10 box were (the first instance of the sequence for the -35 box and the second instance of the sequence for the -10 box). To ensure that we would only be highlighting the '''actual''' -35 box and -10 box sequences, all we need to do is replace the "g" in <code>sed "s///g"</code> with a "1" or "2" or the desired instance. This tells <code>sed</code> to replace the instance we desire, which is what was described in the [[More Text Processing Features]] page. The resulting command would produce the following result:
 
After finding the locations of the potential -35 box and -10 box locations, it was clear where the '''actual''' -35 box and '''actual''' -10 box were (the first instance of the sequence for the -35 box and the second instance of the sequence for the -10 box). To ensure that we would only be highlighting the '''actual''' -35 box and -10 box sequences, all we need to do is replace the "g" in <code>sed "s///g"</code> with a "1" or "2" or the desired instance. This tells <code>sed</code> to replace the instance we desire, which is what was described in the [[More Text Processing Features]] page. The resulting command would produce the following result:
  '''cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" | sed "s/[ct]at[at]at/ <minus10box>&<\/minus10box> /2"'''
+
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" |  
 +
sed "s/[ct]at[at]at/ <minus10box>&<\/minus10box> /2"  
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattacccc
 
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattacccc
 
  gctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttag
 
  gctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttag
Line 50: Line 51:
  
 
After finding and tagging the -35 and -10 box, the assignment required us to highlight the transcription start site (TSS), which is the twelfth nucleotide after the first nucleotide in the -10 box. Therefore, I could infer that the character 7 nucleotides after the end of the -10 box tag is the TSS because the -10 box sequence is 6 nucleotides long. In order to find the TSS with command sequences, I will need to create a new line using a technique described in [[More Text Processing Features]]. With the technique, I will tag the TSS in line 2 which will be the first nucleotide on the line.
 
After finding and tagging the -35 and -10 box, the assignment required us to highlight the transcription start site (TSS), which is the twelfth nucleotide after the first nucleotide in the -10 box. Therefore, I could infer that the character 7 nucleotides after the end of the -10 box tag is the TSS because the -10 box sequence is 6 nucleotides long. In order to find the TSS with command sequences, I will need to create a new line using a technique described in [[More Text Processing Features]]. With the technique, I will tag the TSS in line 2 which will be the first nucleotide on the line.
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" | sed "s/[ct]at[at]at/ <minus10box>&<\/minus10box> /2" | sed -r "s/<\/minus10box> (.)6/&\n/g" | sed "2s/^./ <tss>&<\/tss> /g"
+
  cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" |  
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccga
+
sed "s/[ct]at[at]at/ <minus10box>&<\/minus10box> /2" | sed -r "s/<\/minus10box> (.)6/&\n/g" |  
  taaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagta
+
sed "2s/^./ <tss>&<\/tss> /g"
  atgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact</minus35box> tatttacagaacttcgg <minus10box>cattat</minus10box> cttgcc
+
  ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttc
 +
  gcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcga
 +
  tttagcgcgcaaatc <minus35box>tttact</minus35box> tatttacagaacttcgg <minus10box>cattat</minus10box> cttgcc
 
  <tss>g<\tss> gttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaa
 
  <tss>g<\tss> gttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaa
 
  acgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgc
 
  acgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgc
Line 61: Line 64:
 
=== Ribosome Binding Site (RBS) ===
 
=== Ribosome Binding Site (RBS) ===
  
 +
Now that I've found the TSS, it's time to find the ribosome binding site (RBS) and tag the RBS. In order to find and tag the RBS, I used the same technique as the previous portion of the assignment (finding the TSS). The special sequence for the RBS is <code>gagg<\/code> as indicated in the hints section of the homework assignment.
 +
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" |
 +
sed "s/[ct]at[at]at/ <minus10box>&<\/minus10box> /2" | sed -r "s/<\/minus10box> (.)6/&\n/g" |
 +
sed "2s/^./ <tss>&<\/tss> /g" | sed "s/<\/tss> /&\n/g" | sed "3s/gagg/ <rbs>&<\/rbs> /1"
 +
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttc
 +
gcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcga
 +
tttagcgcgcaaatc <minus35box>tttact</minus35box> tatttacagaacttcgg <minus10box>cattat</minus10box> cttgcc
 +
  <tss>g<\tss>
 +
gttcaaattacggtagtgatacccca <rbs>gagg</rbs> attagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaa
 +
acgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgc
 +
atcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcc
 +
tgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
  
 
=== Start Codon ===
 
=== Start Codon ===

Revision as of 08:20, 29 September 2015

Transcription and Translation “Taken to the Next Level”

This computer exercise examines gene expression at a much more detailed level than before, requiring knowledge in both the biological aspects of the process and the translation of these steps into computer text-processing equivalents.

To begin the assignment, I needed to log into my account through the Terminal application on my MacBook:

ssh rlegaspi@my.cs.lmu.edu

And I typed in my password; then, I accessed the Dondi's folder which contained the infA-E.coli-K12.txt file:

cd ~dondi/xmlpipedb/data 

To view the specific sequence that we are practicing for this week's assignment I inputed the following command:

cat infA-E.coli-K12.txt
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgc
tcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgtt
gcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc
tttacttatttacagaacttcggcattatcttgccggttcaaattacggtagtgataccccagaggattagatggcc
aaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaa
cggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtga
ctgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatg
ggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat

Modifying the gene sequence string: Highlighting ("Tagging") the special sequences within the gene

Completing this assignment required a review of Introduction to the Command Line page and a reading of the More Text Processing Features page. In class, my Homework Partner Anu Varshneya and I were able to get clues and hints on how to complete our assignment from Professor User:Dondi.

-35 box and -10 box

Firstly, we were able to learn the placement of possible -35 box and -10 box locations in the sequence and how to "tag" these special sequences. We were able to search for both of the special sequences concurrently with the knowledge that the sequence for a -35 box is tt[gt]ac[at] and the sequence for a -10 box is [ct]at[at]at. We could have used grep to search for the sequences individually, but to see where the potential locations for both -35 box and -10 box we used the following command:

cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ *&* /g" | sed "s/[ct]at[at]at/ **&** /g"
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt **tataat** tgcggtcgcagagttggttacgct
cattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgtt
gcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc
*tttact* ta *tttaca* gaacttcgg **cattat** cttgccggttcaaattacggtagtgataccccagaggattagatggcc
aaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaa
cggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtga
ctgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatg
ggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat

After finding the locations of the potential -35 box and -10 box locations, it was clear where the actual -35 box and actual -10 box were (the first instance of the sequence for the -35 box and the second instance of the sequence for the -10 box). To ensure that we would only be highlighting the actual -35 box and -10 box sequences, all we need to do is replace the "g" in sed "s///g" with a "1" or "2" or the desired instance. This tells sed to replace the instance we desire, which is what was described in the More Text Processing Features page. The resulting command would produce the following result:

cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" | 
sed "s/[ct]at[at]at/ <minus10box>&<\/minus10box> /2" 
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattacccc
gctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttag
ccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc 
<minus35box>tttact</minus35box> tatttacagaacttcgg <minus10box>cattat</minus10box> 
cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttctt
gaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaa
aaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttc
cgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat

Transcription Start Site (TSS)

After finding and tagging the -35 and -10 box, the assignment required us to highlight the transcription start site (TSS), which is the twelfth nucleotide after the first nucleotide in the -10 box. Therefore, I could infer that the character 7 nucleotides after the end of the -10 box tag is the TSS because the -10 box sequence is 6 nucleotides long. In order to find the TSS with command sequences, I will need to create a new line using a technique described in More Text Processing Features. With the technique, I will tag the TSS in line 2 which will be the first nucleotide on the line.

cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" | 
sed "s/[ct]at[at]at/ <minus10box>&<\/minus10box> /2" | sed -r "s/<\/minus10box> (.)6/&\n/g" | 
sed "2s/^./ <tss>&<\/tss> /g"
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttc
gcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcga
tttagcgcgcaaatc <minus35box>tttact</minus35box> tatttacagaacttcgg <minus10box>cattat</minus10box> cttgcc
<tss>g<\tss> gttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaa
acgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgc
atcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcc
tgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat

Ribosome Binding Site (RBS)

Now that I've found the TSS, it's time to find the ribosome binding site (RBS) and tag the RBS. In order to find and tag the RBS, I used the same technique as the previous portion of the assignment (finding the TSS). The special sequence for the RBS is gagg<\/code> as indicated in the hints section of the homework assignment.

cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" | 
sed "s/[ct]at[at]at/ <minus10box>&<\/minus10box> /2" | sed -r "s/<\/minus10box> (.)6/&\n/g" | 
sed "2s/^./ <tss>&<\/tss> /g" | sed "s/<\/tss> /&\n/g" | sed "3s/gagg/ <rbs>&<\/rbs> /1"
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgataaggaatttttc
gcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcga
tttagcgcgcaaatc <minus35box>tttact</minus35box> tatttacagaacttcgg <minus10box>cattat</minus10box> cttgcc
 <tss>g<\tss> 
gttcaaattacggtagtgatacccca <rbs>gagg</rbs> attagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaa
acgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgc
atcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcc
tgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat

Start Codon

Stop Codon

Terminator

Exact mRNA sequence transcribed from this gene

Amino Acid sequence translated from this mRNA

Electronic Lab Notebook

  • Discussion in class with Anu - Finding the minus35box and Finding the minus10box and inserting the descriptions around the codes (in-class work time) - To be written

Links to User Page and Journal Pages

Ron Legaspi
BIOL 367, Fall 2015

Assignment Links
Individual Weekly Journals
Shared Weekly Journals

Homework Partner: Anu Varshneya