Difference between revisions of "Kzebrows Week 4"

Revision as of 22:06, 27 September 2015

Transcription and Translation "Taken to the Next Level"

To start this assignment I began by opening Terminal on my laptop. I entered

ssh kzebrows@my.cs.lmu.edu

followed by my password to log into the LMU CMSI database. As I usually do, I entered the following commands in order to enter Dr. Dionisio's directory, list the files in the directory, and choose the appropriate file for this assignment:

~cd dondi/xmlpipedb/data | ls | cat infA-E.coli-K12.txt

This took me to the E.coli file and showed me the nucleotide sequence. To complete this assignment I frequently used this page as a resource.

I began by using grep to find the potential -35 box and -10 box because grep highlights the searched pattern in red. I simply entered

cat infA-E.coli-K12.txt | grep "tt[gt]ac[at]"

which gave me two possible answers for the -35 box, tttact and tttaca, both of which fit the pattern. Now it was a matter of finding out which one was the correct one. I also searched for the -10 box using

cat infA-E.coli-K12.txt | grep "[ct]at[at]at"

which also revealed two potential sites at tataat and cattat. I realized that in order to find out which sequences were the correct ones I needed to visualize them both together, but grep doesn't do this, so instead I used sed. To do this, I entered the sed commands as a pipe, and added three space on either side of each occurrence of the consensus sequences (both -35 and -10) in the file to make the sequences more visible.. This is done by adding sed "s/<pattern>/& /g" where <pattern> is what I wish to find and each space after the "&" sign is what I wished to add to each side of the pattern (instructions found here). The pipe looked like this:

cat infA-E.coli-K12.txt | sed "[ct]at[at]at/   &   /g" | sed "tt[gt]ac[at]/   &   /g"

This made it clear that it was the first -35 box option, tttact, and the second -10 box option, cattat, that I was looking for in this gene.

@@ Line 7: / Line 7: @@
 This took me to the E.coli file and showed me the nucleotide sequence. To complete this assignment I frequently used [[More Text Processing Features | this page]] as a resource.
+I began by using grep to find the potential -35 box and -10 box because grep highlights the searched pattern in red. I simply entered
+ cat infA-E.coli-K12.txt | grep "tt[gt]ac[at]"
+which gave me two possible answers for the -35 box, '''tttact''' and '''tttaca''', both of which fit the pattern. Now it was a matter of finding out which one was the correct one. I also searched for the -10 box using
+ cat infA-E.coli-K12.txt | grep "[ct]at[at]at"
+which also revealed two potential sites at '''tataat''' and '''cattat'''. I realized that in order to find out which sequences were the correct ones I needed to visualize them both together, but grep doesn't do this, so instead I used sed. To do this, I entered the sed commands as a pipe, and added three space on either side of each occurrence of the consensus sequences (both -35 and -10) in the file to make the sequences more visible.. This is done by adding  sed "s/<pattern>/& /g" where <pattern> is what I wish to find and each space after the "&" sign is what I wished to add to each side of the pattern (instructions found [[Introduction to the Command Line | here]]). The pipe looked like this:
+ cat infA-E.coli-K12.txt | sed "[ct]at[at]at/   &   /g" | sed "tt[gt]ac[at]/   &   /g"
+This made it clear that it was the first -35 box option, '''tttact''', and the second -10 box option, '''cattat''', that I was looking for in this gene.

Difference between revisions of "Kzebrows Week 4"

Revision as of 22:06, 27 September 2015

Transcription and Translation "Taken to the Next Level"

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools