Difference between revisions of "Rlegaspi Week 4"
(Leaving a note under my lab notebook to discuss in-class activities.) |
(Inserted first part of the assignment to save progress. And created different sections.) |
||
Line 1: | Line 1: | ||
− | + | = Individual Journal Assignment = | |
− | + | == Homework Partner == | |
+ | [[User:Anuvarsh | Anindita Varshneya]] | ||
− | + | == Transcription and Translation “Taken to the Next Level” == | |
This computer exercise examines gene expression at a much more detailed level than before, requiring knowledge in both the biological aspects of the process and the translation of these steps into computer text-processing equivalents. | This computer exercise examines gene expression at a much more detailed level than before, requiring knowledge in both the biological aspects of the process and the translation of these steps into computer text-processing equivalents. | ||
− | + | To begin the assignment, I needed to log into my account through the Terminal application on my MacBook: | |
+ | ssh rlegaspi@my.cs.lmu.edu | ||
+ | |||
+ | And I typed in my password; then, I accessed the Dondi's folder which contained the ''infA-E.coli-K12.txt'' file: | ||
+ | |||
+ | cd ~dondi/xmlpipedb/data | ||
+ | |||
+ | To view the specific sequence that we are practicing for this week's assignment I inputed the following command: | ||
+ | cat infA-E.coli-K12.txt | ||
ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgc | ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgc | ||
tcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgtt | tcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgtt | ||
Line 19: | Line 28: | ||
ggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat | ggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat | ||
− | + | === Modifying the gene sequence string: Highlighting ("Tagging") the special sequences within the gene === | |
+ | Completing this assignment required a review of [[Introduction to the Command Line]] page and a reading of the [[More Text Processing Features]] page. In class, my Homework Partner [[User:Anuvarsh | Anu Varshneya]] and I were able to get clues and hints on how to complete our assignment from Professor [[User:Dondi]]. Firstly, we were able to learn the placement of possible -35 box and -10 box locations in the sequence and how to "tag" these special sequences. We were able to search for both of the special sequences concurrently with the knowledge that the sequence for a -35 box is <code>tt[gt]ac[at]</code> and the sequence for a -10 box is <code>[ct]at[at]at</code>. We could have used <code>grep</code> to search for the sequences individually, but to see where the potential locations for both -35 box and -10 box we used the following command: | ||
+ | '''cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ *&* /g" | sed "s/[ct]at[at]at/ **&** /g"''' | ||
+ | ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt **tataat** tgcggtcgcagagttggttacgct | ||
+ | cattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgtt | ||
+ | gcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc | ||
+ | *tttact* ta *tttaca* gaacttcgg **cattat** cttgccggttcaaattacggtagtgataccccagaggattagatggcc | ||
+ | aaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaa | ||
+ | cggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtga | ||
+ | ctgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatg | ||
+ | ggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat | ||
+ | |||
+ | === Exact mRNA sequence transcribed from this gene === | ||
+ | |||
+ | === Amino Acid sequence translated from this mRNA === | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
==== Supplementary Information ==== | ==== Supplementary Information ==== |
Revision as of 21:55, 28 September 2015
Contents
Individual Journal Assignment
Homework Partner
Transcription and Translation “Taken to the Next Level”
This computer exercise examines gene expression at a much more detailed level than before, requiring knowledge in both the biological aspects of the process and the translation of these steps into computer text-processing equivalents.
To begin the assignment, I needed to log into my account through the Terminal application on my MacBook:
ssh rlegaspi@my.cs.lmu.edu
And I typed in my password; then, I accessed the Dondi's folder which contained the infA-E.coli-K12.txt file:
cd ~dondi/xmlpipedb/data
To view the specific sequence that we are practicing for this week's assignment I inputed the following command:
cat infA-E.coli-K12.txt ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgc tcattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgtt gcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc tttacttatttacagaacttcggcattatcttgccggttcaaattacggtagtgataccccagaggattagatggcc aaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaa cggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtga ctgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatg ggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
Modifying the gene sequence string: Highlighting ("Tagging") the special sequences within the gene
Completing this assignment required a review of Introduction to the Command Line page and a reading of the More Text Processing Features page. In class, my Homework Partner Anu Varshneya and I were able to get clues and hints on how to complete our assignment from Professor User:Dondi. Firstly, we were able to learn the placement of possible -35 box and -10 box locations in the sequence and how to "tag" these special sequences. We were able to search for both of the special sequences concurrently with the knowledge that the sequence for a -35 box is tt[gt]ac[at]
and the sequence for a -10 box is [ct]at[at]at
. We could have used grep
to search for the sequences individually, but to see where the potential locations for both -35 box and -10 box we used the following command:
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ *&* /g" | sed "s/[ct]at[at]at/ **&** /g" ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt **tataat** tgcggtcgcagagttggttacgct cattaccccgctgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgtt gcgcttttggtgcggcttagccgtgtgttttcggagtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc *tttact* ta *tttaca* gaacttcgg **cattat** cttgccggttcaaattacggtagtgataccccagaggattagatggcc aaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaa cggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtga ctgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatg ggcgaagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
Exact mRNA sequence transcribed from this gene
Amino Acid sequence translated from this mRNA
Supplementary Information
As a sample answer for the first question, Week 2’s paper handout sequence would have been marked as follows (line breaks are included only for clarity):
agtgta <minus35box>ttgaca</minus35box> tgatagaagcactctac <minus10box>tatatt</minus10box> tcaat <tss>a</tss> ttcctag <rbs>gagg</rbs> tttgacct <start_codon>atg</start_codon> attgaacttgaa...aataccatggta <stop_codon>taa</stop_codon> ccca <terminator>gccgccagttccgctggcggcatttt</terminator> aac
Note: The commands needed to generate the output above will be similar, but not exactly the same as the ones needed for infA.
Base your commands on the following hints/guidelines about the gene, plus your own knowledge learned from the past few weeks:
- The consensus sequence for the -10 site is
[ct]at[at]at
. - The consensus sequence for the -35 site is
tt[gt]ac[at]
. - The ideal number of base pairs between the -35 and -10 box is 17, counting from the first nucleotide after the end of the -35 sequence up to the last nucleotide before the -10 sequence.
- The transcription start site is located at the 12th nucleotide after the first nucleotide of the -10 box.
- A consensus sequence for the ribosome binding site is
gagg
. - The first half of the terminator “hairpin” is
aaaaggt
, where theu
in the mRNA binds with ag
instead of the usuala
. - The terminator includes 4 more nucleotides after the hairpin completes.
Computer Tips
- Remember that
sed
is line-based, and that you can add and count lines to get certain things done, say strictly before or after a certain point. - Don’t forget how you enforced reading frames in Week 3.
- If you do add lines or spaces to get the job done, make sure to clean up after yourself by removing them from the final answer.
- This exercise is difficult enough that you might be thinking to yourself, “I’d rather do this by hand!” This sentiment is understandable, but when you find yourself feeling this way, consider the following:
- Part of the difficulty is learning these things for the first time. Once you’ve gotten the hang of it, there’s no way that doing things by hand will be faster.
- Consider trying to do this over and over, for multiple genes, with lots of potential variations. Doing this by hand not only takes longer at this point, but risks errors that a computer won’t make (once the correct commands have been determined).
- Form your commands so that they can be strung together into a single pipeline of processing directives in the end. In other words, once you’ve figured out how to do each step, no human intervention should be needed to perform everything from beginning to end.
- You will need the More Text Processing Features wiki page to complete this assignment. The How to Read XML Files wiki page gives you an idea for why the requested output was formatted the way it was.
Electronic Lab Notebook
- Discussion in class with Anu - Finding the minus35box and Finding the minus10box and inserting the descriptions around the codes (in-class work time) - To be written
Links to User Page and Journal Pages
Ron Legaspi
BIOL 367, Fall 2015
Assignment Links
- Week 1 Assignment
- Week 2 Assignment
- Week 3 Assignment
- Week 4 Assignment
- Week 5 Assignment
- Week 6 Assignment
- Week 7 Assignment
- Week 8 Assignment
- Week 9 Assignment
- Week 10 Assignment
- Week 11 Assignment
- Week 12 Assignment
- Week 14 Assignment
- Week 15 Assignment
Individual Weekly Journals
- Individual Journal Week 1 - This is my User Page
- Individual Journal Week 2
- Individual Journal Week 3
- Individual Journal Week 4
- Individual Journal Week 5
- Individual Journal Week 6
- Individual Journal Week 7
- Individual Journal Week 8
- Individual Journal Week 9
- Individual Journal Week 10
- Individual Journal Week 11
- Individual Journal Week 12
- Individual Journal Week 14
- Individual Journal Week 15