Difference between revisions of "Kzebrows Week 4"
(Intro to assignment.) |
(Finding the -35 box and -10 box.) |
||
Line 7: | Line 7: | ||
This took me to the E.coli file and showed me the nucleotide sequence. To complete this assignment I frequently used [[More Text Processing Features | this page]] as a resource. | This took me to the E.coli file and showed me the nucleotide sequence. To complete this assignment I frequently used [[More Text Processing Features | this page]] as a resource. | ||
+ | |||
+ | I began by using grep to find the potential -35 box and -10 box because grep highlights the searched pattern in red. I simply entered | ||
+ | cat infA-E.coli-K12.txt | grep "tt[gt]ac[at]" | ||
+ | which gave me two possible answers for the -35 box, '''tttact''' and '''tttaca''', both of which fit the pattern. Now it was a matter of finding out which one was the correct one. I also searched for the -10 box using | ||
+ | cat infA-E.coli-K12.txt | grep "[ct]at[at]at" | ||
+ | which also revealed two potential sites at '''tataat''' and '''cattat'''. I realized that in order to find out which sequences were the correct ones I needed to visualize them both together, but grep doesn't do this, so instead I used sed. To do this, I entered the sed commands as a pipe, and added three space on either side of each occurrence of the consensus sequences (both -35 and -10) in the file to make the sequences more visible.. This is done by adding sed "s/<pattern>/& /g" where <pattern> is what I wish to find and each space after the "&" sign is what I wished to add to each side of the pattern (instructions found [[Introduction to the Command Line | here]]). The pipe looked like this: | ||
+ | cat infA-E.coli-K12.txt | sed "[ct]at[at]at/ & /g" | sed "tt[gt]ac[at]/ & /g" | ||
+ | This made it clear that it was the first -35 box option, '''tttact''', and the second -10 box option, '''cattat''', that I was looking for in this gene. |
Revision as of 22:06, 27 September 2015
Transcription and Translation "Taken to the Next Level"
To start this assignment I began by opening Terminal on my laptop. I entered
ssh kzebrows@my.cs.lmu.edu
followed by my password to log into the LMU CMSI database. As I usually do, I entered the following commands in order to enter Dr. Dionisio's directory, list the files in the directory, and choose the appropriate file for this assignment:
~cd dondi/xmlpipedb/data | ls | cat infA-E.coli-K12.txt
This took me to the E.coli file and showed me the nucleotide sequence. To complete this assignment I frequently used this page as a resource.
I began by using grep to find the potential -35 box and -10 box because grep highlights the searched pattern in red. I simply entered
cat infA-E.coli-K12.txt | grep "tt[gt]ac[at]"
which gave me two possible answers for the -35 box, tttact and tttaca, both of which fit the pattern. Now it was a matter of finding out which one was the correct one. I also searched for the -10 box using
cat infA-E.coli-K12.txt | grep "[ct]at[at]at"
which also revealed two potential sites at tataat and cattat. I realized that in order to find out which sequences were the correct ones I needed to visualize them both together, but grep doesn't do this, so instead I used sed. To do this, I entered the sed commands as a pipe, and added three space on either side of each occurrence of the consensus sequences (both -35 and -10) in the file to make the sequences more visible.. This is done by adding sed "s/<pattern>/& /g" where <pattern> is what I wish to find and each space after the "&" sign is what I wished to add to each side of the pattern (instructions found here). The pipe looked like this:
cat infA-E.coli-K12.txt | sed "[ct]at[at]at/ & /g" | sed "tt[gt]ac[at]/ & /g"
This made it clear that it was the first -35 box option, tttact, and the second -10 box option, cattat, that I was looking for in this gene.