Difference between revisions of "Nanguiano Week 4"
From LMU BioDB 2015
(→Transcription and Translation “Taken to the Next Level”: added the homework questions) |
(→Transcription and Translation “Taken to the Next Level”: added initial thoughts and concepts for question 1) |
||
Line 15: | Line 15: | ||
# Modify the gene sequence string so that it highlights or “tags” the special sequences within this gene, as follows (ellipses indicate bases in the sequence; note the spaces before the start tag and after the end tag): | # Modify the gene sequence string so that it highlights or “tags” the special sequences within this gene, as follows (ellipses indicate bases in the sequence; note the spaces before the start tag and after the end tag): | ||
#* -35 box of the promoter <pre>... <minus35box>...</minus35box> ...</pre> | #* -35 box of the promoter <pre>... <minus35box>...</minus35box> ...</pre> | ||
+ | #** First, I knew I needed to identify the sequence that I'd be looking for within the file. The week 4 assignment indicated that the consensus sequence for the -35 promoter sequence is <code>tt[gt]ac[at]</code>. In thus, I knew I needed to plug this sequence into <code>sed</code> in order to filter for this sequence. Because I wanted a single replacement of one sequence, I knew that <code>sed s//g</code> would be the best option. My first theory was to try for <code>sed s/tt[gt]ac[at]/ & /g</code>, to put a space on either side of the sequence. This would test whether or not it was finding the sequence correctly, before I put in the tag. | ||
+ | #** I tested using the command <code> cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ & /g" </code> However, this command did not work, since it changed every single one that appeared, not just the first! Since I only wanted the first one to be changed, I did some research to find out how to change the first iteration using sed. Using [http://unix.stackexchange.com/questions/155805/sed-replace-first-k-instances-of-a-word-in-the-file this link from Stack Overflow], I learned that the /g in the command was indicating to change every single iteration. Changing it to /1 would cause it to change only the first iteration! Running <code> cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ & /1" </code> resulted in the output I expected. As a result, all that was left was to find the first and last space and replace then with the starting and ending tags. | ||
+ | #** | ||
+ | #** | ||
#* -10 box of the promoter <pre>... <minus10box>...</minus10box> ...</pre> | #* -10 box of the promoter <pre>... <minus10box>...</minus10box> ...</pre> | ||
#* transcription start site <pre>... <tss>...</tss> ...</pre> | #* transcription start site <pre>... <tss>...</tss> ...</pre> |
Revision as of 23:43, 22 September 2015
Contents
Transcription and Translation “Taken to the Next Level”
- First, I needed to log in to my LMU CS account to access the data used in this weeks assignment.
ssh nanguia1@lion.lmu.edu
- Next, I needed to enter the folder that I'd created for the class, and create a new folder for this week's assignment.
cd biodb mkdir week4
- Next, I moved into Dondi's directory so I could obtain the file required for the assignment - infA-E.coli-K12.txt.
cd ~dondi/xmlpipedb/data cp infA-E.coli-K12.txt ~nanguia1/biodb/week4
- Then, I moved into my directory to prepare to do the assignment.
cd ~nanguia1/biodb/week4
For each of the following questions pertaining to this gene, provide (a) the actual answer, and (b) the sequence of text-processing commands that calculates this answer. Specific information about how these sequences can be identified is included after the list of questions.
- Modify the gene sequence string so that it highlights or “tags” the special sequences within this gene, as follows (ellipses indicate bases in the sequence; note the spaces before the start tag and after the end tag):
- -35 box of the promoter
... <minus35box>...</minus35box> ...
- First, I knew I needed to identify the sequence that I'd be looking for within the file. The week 4 assignment indicated that the consensus sequence for the -35 promoter sequence is
tt[gt]ac[at]
. In thus, I knew I needed to plug this sequence intosed
in order to filter for this sequence. Because I wanted a single replacement of one sequence, I knew thatsed s//g
would be the best option. My first theory was to try forsed s/tt[gt]ac[at]/ & /g
, to put a space on either side of the sequence. This would test whether or not it was finding the sequence correctly, before I put in the tag. - I tested using the command
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ & /g"
However, this command did not work, since it changed every single one that appeared, not just the first! Since I only wanted the first one to be changed, I did some research to find out how to change the first iteration using sed. Using this link from Stack Overflow, I learned that the /g in the command was indicating to change every single iteration. Changing it to /1 would cause it to change only the first iteration! Runningcat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ & /1"
resulted in the output I expected. As a result, all that was left was to find the first and last space and replace then with the starting and ending tags.
- First, I knew I needed to identify the sequence that I'd be looking for within the file. The week 4 assignment indicated that the consensus sequence for the -35 promoter sequence is
- -10 box of the promoter
... <minus10box>...</minus10box> ...
- transcription start site
... <tss>...</tss> ...
- ribosome binding site
... <rbs>...</rbs> ...
- start codon
... <start_codon>...</start_codon> ...
- stop codon
... <stop_codon>...</stop_codon> ...
- terminator
... <terminator>...</terminator> ...
- -35 box of the promoter
- What is the exact mRNA sequence that is transcribed from this gene?
- What is the amino acid sequence that is translated from this mRNA?
Links
Nicole Anguiano
BIOL 367, Fall 2015
Assignment Links
- Week 1 Assignment
- Week 2 Assignment
- Week 3 Assignment
- Week 4 Assignment
- Week 5 Assignment
- Week 6 Assignment
- Week 7 Assignment
- Week 8 Assignment
- Week 9 Assignment
- Week 10 Assignment
- Week 11 Assignment
- Week 12 Assignment
- Week 14 Assignment
- Week 15 Assignment
Individual Journals
- Individual Journal Week 2
- Individual Journal Week 3
- Individual Journal Week 4
- Individual Journal Week 5
- Individual Journal Week 6
- Individual Journal Week 7
- Individual Journal Week 8
- Individual Journal Week 9
- Individual Journal Week 10
- Individual Journal Week 11
- Individual Assessment
- Deliverables