Difference between revisions of "Kzebrows Week 4"

From LMU BioDB 2015
Jump to: navigation, search
(Intro to assignment.)
 
(Finding the -35 box and -10 box.)
Line 7: Line 7:
  
 
This took me to the E.coli file and showed me the nucleotide sequence. To complete this assignment I frequently used [[More Text Processing Features | this page]] as a resource.
 
This took me to the E.coli file and showed me the nucleotide sequence. To complete this assignment I frequently used [[More Text Processing Features | this page]] as a resource.
 +
 +
I began by using grep to find the potential -35 box and -10 box because grep highlights the searched pattern in red. I simply entered
 +
cat infA-E.coli-K12.txt | grep "tt[gt]ac[at]"
 +
which gave me two possible answers for the -35 box, '''tttact''' and '''tttaca''', both of which fit the pattern. Now it was a matter of finding out which one was the correct one. I also searched for the -10 box using
 +
cat infA-E.coli-K12.txt | grep "[ct]at[at]at"
 +
which also revealed two potential sites at '''tataat''' and '''cattat'''. I realized that in order to find out which sequences were the correct ones I needed to visualize them both together, but grep doesn't do this, so instead I used sed. To do this, I entered the sed commands as a pipe, and added three space on either side of each occurrence of the consensus sequences (both -35 and -10) in the file to make the sequences more visible.. This is done by adding  sed "s/<pattern>/& /g" where <pattern> is what I wish to find and each space after the "&" sign is what I wished to add to each side of the pattern (instructions found [[Introduction to the Command Line | here]]). The pipe looked like this:
 +
cat infA-E.coli-K12.txt | sed "[ct]at[at]at/  &  /g" | sed "tt[gt]ac[at]/  &  /g"
 +
This made it clear that it was the first -35 box option, '''tttact''', and the second -10 box option, '''cattat''', that I was looking for in this gene.

Revision as of 22:06, 27 September 2015

Transcription and Translation "Taken to the Next Level"

To start this assignment I began by opening Terminal on my laptop. I entered

ssh kzebrows@my.cs.lmu.edu 

followed by my password to log into the LMU CMSI database. As I usually do, I entered the following commands in order to enter Dr. Dionisio's directory, list the files in the directory, and choose the appropriate file for this assignment:

~cd dondi/xmlpipedb/data | ls | cat infA-E.coli-K12.txt

This took me to the E.coli file and showed me the nucleotide sequence. To complete this assignment I frequently used this page as a resource.

I began by using grep to find the potential -35 box and -10 box because grep highlights the searched pattern in red. I simply entered

cat infA-E.coli-K12.txt | grep "tt[gt]ac[at]"

which gave me two possible answers for the -35 box, tttact and tttaca, both of which fit the pattern. Now it was a matter of finding out which one was the correct one. I also searched for the -10 box using

cat infA-E.coli-K12.txt | grep "[ct]at[at]at"

which also revealed two potential sites at tataat and cattat. I realized that in order to find out which sequences were the correct ones I needed to visualize them both together, but grep doesn't do this, so instead I used sed. To do this, I entered the sed commands as a pipe, and added three space on either side of each occurrence of the consensus sequences (both -35 and -10) in the file to make the sequences more visible.. This is done by adding sed "s/<pattern>/& /g" where <pattern> is what I wish to find and each space after the "&" sign is what I wished to add to each side of the pattern (instructions found here). The pipe looked like this:

cat infA-E.coli-K12.txt | sed "[ct]at[at]at/   &   /g" | sed "tt[gt]ac[at]/   &   /g"

This made it clear that it was the first -35 box option, tttact, and the second -10 box option, cattat, that I was looking for in this gene.