Difference between revisions of "Nanguiano Week 4"

From LMU BioDB 2015
Jump to: navigation, search
(Transcription and Translation “Taken to the Next Level”: added the homework questions)
(Transcription and Translation “Taken to the Next Level”: added initial thoughts and concepts for question 1)
Line 15: Line 15:
 
# Modify the gene sequence string so that it highlights or “tags” the special sequences within this gene, as follows (ellipses indicate bases in the sequence; note the spaces before the start tag and after the end tag):
 
# Modify the gene sequence string so that it highlights or “tags” the special sequences within this gene, as follows (ellipses indicate bases in the sequence; note the spaces before the start tag and after the end tag):
 
#* -35 box of the promoter <pre>... <minus35box>...</minus35box> ...</pre>
 
#* -35 box of the promoter <pre>... <minus35box>...</minus35box> ...</pre>
 +
#** First, I knew I needed to identify the sequence that I'd be looking for within the file. The week 4 assignment indicated that the consensus sequence for the -35 promoter sequence is <code>tt[gt]ac[at]</code>. In thus, I knew I needed to plug this sequence into <code>sed</code> in order to filter for this sequence. Because I wanted a single replacement of one sequence, I knew that <code>sed s//g</code> would be the best option. My first theory was to try for <code>sed s/tt[gt]ac[at]/ & /g</code>, to put a space on either side of the sequence. This would test whether or not it was finding the sequence correctly, before I put in the tag.
 +
#** I tested using the command <code> cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ & /g" </code> However, this command did not work, since it changed every single one that appeared, not just the first! Since I only wanted the first one to be changed, I did some research to find out how to change the first iteration using sed. Using [http://unix.stackexchange.com/questions/155805/sed-replace-first-k-instances-of-a-word-in-the-file this link from Stack Overflow], I learned that the /g in the command was indicating to change every single iteration. Changing it to /1 would cause it to change only the first iteration! Running <code> cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ & /1" </code> resulted in the output I expected. As a result, all that was left was to find the first and last space and replace then with the starting and ending tags.
 +
#**
 +
#**
 
#* -10 box of the promoter <pre>... <minus10box>...</minus10box> ...</pre>
 
#* -10 box of the promoter <pre>... <minus10box>...</minus10box> ...</pre>
 
#* transcription start site <pre>... <tss>...</tss> ...</pre>
 
#* transcription start site <pre>... <tss>...</tss> ...</pre>

Revision as of 23:43, 22 September 2015

Transcription and Translation “Taken to the Next Level”

  • First, I needed to log in to my LMU CS account to access the data used in this weeks assignment.
ssh nanguia1@lion.lmu.edu
  • Next, I needed to enter the folder that I'd created for the class, and create a new folder for this week's assignment.
cd biodb
mkdir week4
  • Next, I moved into Dondi's directory so I could obtain the file required for the assignment - infA-E.coli-K12.txt.
cd ~dondi/xmlpipedb/data
cp infA-E.coli-K12.txt ~nanguia1/biodb/week4
  • Then, I moved into my directory to prepare to do the assignment.
cd ~nanguia1/biodb/week4

For each of the following questions pertaining to this gene, provide (a) the actual answer, and (b) the sequence of text-processing commands that calculates this answer. Specific information about how these sequences can be identified is included after the list of questions.

  1. Modify the gene sequence string so that it highlights or “tags” the special sequences within this gene, as follows (ellipses indicate bases in the sequence; note the spaces before the start tag and after the end tag):
    • -35 box of the promoter
      ... <minus35box>...</minus35box> ...
      • First, I knew I needed to identify the sequence that I'd be looking for within the file. The week 4 assignment indicated that the consensus sequence for the -35 promoter sequence is tt[gt]ac[at]. In thus, I knew I needed to plug this sequence into sed in order to filter for this sequence. Because I wanted a single replacement of one sequence, I knew that sed s//g would be the best option. My first theory was to try for sed s/tt[gt]ac[at]/ & /g, to put a space on either side of the sequence. This would test whether or not it was finding the sequence correctly, before I put in the tag.
      • I tested using the command cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ & /g" However, this command did not work, since it changed every single one that appeared, not just the first! Since I only wanted the first one to be changed, I did some research to find out how to change the first iteration using sed. Using this link from Stack Overflow, I learned that the /g in the command was indicating to change every single iteration. Changing it to /1 would cause it to change only the first iteration! Running cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ & /1" resulted in the output I expected. As a result, all that was left was to find the first and last space and replace then with the starting and ending tags.
    • -10 box of the promoter
      ... <minus10box>...</minus10box> ...
    • transcription start site
      ... <tss>...</tss> ...
    • ribosome binding site
      ... <rbs>...</rbs> ...
    • start codon
      ... <start_codon>...</start_codon> ...
    • stop codon
      ... <stop_codon>...</stop_codon> ...
    • terminator
      ... <terminator>...</terminator> ...
  2. What is the exact mRNA sequence that is transcribed from this gene?
  3. What is the amino acid sequence that is translated from this mRNA?

Links

Nicole Anguiano
BIOL 367, Fall 2015

Assignment Links
Individual Journals
Shared Journals