Difference between revisions of "Rlegaspi Week 4"
(Edit headings throughout page. New format. Will do this to other pages.) |
(Saved my answer for tagging the -35 and -10 box and started the next section which is the transcription start site. Sectioned off different parts of assignment.) |
||
Line 23: | Line 23: | ||
== Modifying the gene sequence string: Highlighting ("Tagging") the special sequences within the gene == | == Modifying the gene sequence string: Highlighting ("Tagging") the special sequences within the gene == | ||
− | Completing this assignment required a review of [[Introduction to the Command Line]] page and a reading of the [[More Text Processing Features]] page. In class, my Homework Partner [[User:Anuvarsh | Anu Varshneya]] and I were able to get clues and hints on how to complete our assignment from Professor [[User:Dondi]]. Firstly, we were able to learn the placement of possible -35 box and -10 box locations in the sequence and how to "tag" these special sequences. We were able to search for both of the special sequences concurrently with the knowledge that the sequence for a -35 box is <code>tt[gt]ac[at]</code> and the sequence for a -10 box is <code>[ct]at[at]at</code>. We could have used <code>grep</code> to search for the sequences individually, but to see where the potential locations for both -35 box and -10 box we used the following command: | + | Completing this assignment required a review of [[Introduction to the Command Line]] page and a reading of the [[More Text Processing Features]] page. In class, my Homework Partner [[User:Anuvarsh | Anu Varshneya]] and I were able to get clues and hints on how to complete our assignment from Professor [[User:Dondi]]. |
+ | |||
+ | === -35 box and -10 box === | ||
+ | Firstly, we were able to learn the placement of possible -35 box and -10 box locations in the sequence and how to "tag" these special sequences. We were able to search for both of the special sequences concurrently with the knowledge that the sequence for a -35 box is <code>tt[gt]ac[at]</code> and the sequence for a -10 box is <code>[ct]at[at]at</code>. We could have used <code>grep</code> to search for the sequences individually, but to see where the potential locations for both -35 box and -10 box we used the following command: | ||
'''cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ *&* /g" | sed "s/[ct]at[at]at/ **&** /g"''' | '''cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ *&* /g" | sed "s/[ct]at[at]at/ **&** /g"''' | ||
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt **tataat** tgcggtcgcagagttggttacgct | ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt **tataat** tgcggtcgcagagttggttacgct | ||
Line 32: | Line 35: | ||
cggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtga | cggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtga | ||
ctgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatg | ctgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatg | ||
− | ggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat | + | ggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat |
+ | After finding the locations of the potential -35 box and -10 box locations, it was clear where the '''actual''' -35 box and '''actual''' -10 box were (the first instance of the sequence for the -35 box and the second instance of the sequence for the -10 box). To ensure that we would only be highlighting the '''actual''' -35 box and -10 box sequences, all we need to do is replace the "g" in <code>sed "s///g"</code> with a "1" or "2" or the desired instance. This tells <code>sed</code> to replace the instance we desire, which is what was described in the [[More Text Processing Features]] page. The resulting command would produce the following result: | ||
+ | '''cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" | sed "s/[ct]at[at]at/ <minus10box>&<\/minus10box> /2"''' | ||
+ | ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattacccc | ||
+ | gctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttag | ||
+ | ccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc | ||
+ | <minus35box>tttact</minus35box> tatttacagaacttcgg <minus10box>cattat</minus10box> | ||
+ | cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttctt | ||
+ | gaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaa | ||
+ | aaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttc | ||
+ | cgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat | ||
+ | |||
+ | === Transcription Start Site (TSS) === | ||
+ | |||
+ | After finding and tagging the -35 and -10 box, the assignment required us to highlight the transcription start site (TSS), which is the twelfth nucleotide after the first nucleotide in the -10 box. Therefore, I could infer that the character 6 nucleotides after the end of the -10 box tag is the TSS because the -10 box sequence is 6 nucleotides long. | ||
== Exact mRNA sequence transcribed from this gene == | == Exact mRNA sequence transcribed from this gene == |
Revision as of 22:46, 28 September 2015
Contents
Transcription and Translation “Taken to the Next Level”
This computer exercise examines gene expression at a much more detailed level than before, requiring knowledge in both the biological aspects of the process and the translation of these steps into computer text-processing equivalents.
To begin the assignment, I needed to log into my account through the Terminal application on my MacBook:
ssh rlegaspi@my.cs.lmu.edu
And I typed in my password; then, I accessed the Dondi's folder which contained the infA-E.coli-K12.txt file:
cd ~dondi/xmlpipedb/data
To view the specific sequence that we are practicing for this week's assignment I inputed the following command:
cat infA-E.coli-K12.txt ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgc tcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgtt gcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc tttacttatttacagaacttcggcattatcttgccggttcaaattacggtagtgataccccagaggattagatggcc aaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaa cggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtga ctgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatg ggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
Modifying the gene sequence string: Highlighting ("Tagging") the special sequences within the gene
Completing this assignment required a review of Introduction to the Command Line page and a reading of the More Text Processing Features page. In class, my Homework Partner Anu Varshneya and I were able to get clues and hints on how to complete our assignment from Professor User:Dondi.
-35 box and -10 box
Firstly, we were able to learn the placement of possible -35 box and -10 box locations in the sequence and how to "tag" these special sequences. We were able to search for both of the special sequences concurrently with the knowledge that the sequence for a -35 box is tt[gt]ac[at]
and the sequence for a -10 box is [ct]at[at]at
. We could have used grep
to search for the sequences individually, but to see where the potential locations for both -35 box and -10 box we used the following command:
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ *&* /g" | sed "s/[ct]at[at]at/ **&** /g" ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt **tataat** tgcggtcgcagagttggttacgct cattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgtt gcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc *tttact* ta *tttaca* gaacttcgg **cattat** cttgccggttcaaattacggtagtgataccccagaggattagatggcc aaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaa cggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtga ctgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatg ggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
After finding the locations of the potential -35 box and -10 box locations, it was clear where the actual -35 box and actual -10 box were (the first instance of the sequence for the -35 box and the second instance of the sequence for the -10 box). To ensure that we would only be highlighting the actual -35 box and -10 box sequences, all we need to do is replace the "g" in sed "s///g"
with a "1" or "2" or the desired instance. This tells sed
to replace the instance we desire, which is what was described in the More Text Processing Features page. The resulting command would produce the following result:
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" | sed "s/[ct]at[at]at/ <minus10box>&<\/minus10box> /2" ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattacccc gctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttag ccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact</minus35box> tatttacagaacttcgg <minus10box>cattat</minus10box> cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttctt gaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaa aaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttc cgtagtcgctgattgttttaccgcctgatgggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
Transcription Start Site (TSS)
After finding and tagging the -35 and -10 box, the assignment required us to highlight the transcription start site (TSS), which is the twelfth nucleotide after the first nucleotide in the -10 box. Therefore, I could infer that the character 6 nucleotides after the end of the -10 box tag is the TSS because the -10 box sequence is 6 nucleotides long.
Exact mRNA sequence transcribed from this gene
Amino Acid sequence translated from this mRNA
Electronic Lab Notebook
- Discussion in class with Anu - Finding the minus35box and Finding the minus10box and inserting the descriptions around the codes (in-class work time) - To be written
Links to User Page and Journal Pages
Ron Legaspi
BIOL 367, Fall 2015
Assignment Links
- Week 1 Assignment
- Week 2 Assignment
- Week 3 Assignment
- Week 4 Assignment
- Week 5 Assignment
- Week 6 Assignment
- Week 7 Assignment
- Week 8 Assignment
- Week 9 Assignment
- Week 10 Assignment
- Week 11 Assignment
- Week 12 Assignment
- Week 14 Assignment
- Week 15 Assignment
Individual Weekly Journals
- Individual Journal Week 1 - This is my User Page
- Individual Journal Week 2
- Individual Journal Week 3
- Individual Journal Week 4
- Individual Journal Week 5
- Individual Journal Week 6
- Individual Journal Week 7
- Individual Journal Week 8
- Individual Journal Week 9
- Individual Journal Week 10
- Individual Journal Week 11
- Individual Journal Week 12
- Individual Journal Week 14
- Individual Journal Week 15
- Shared Journal Week 1
- Shared Journal Week 2
- Shared Journal Week 3
- Shared Journal Week 4
- Shared Journal Week 5
- Shared Journal Week 6
- Shared Journal Week 7
- Shared Journal Week 8
- Shared Journal Week 9
- Heavy Metal HaterZ Team Page - Week 10-15 Shared Journal
Homework Partner: Anu Varshneya