Difference between revisions of "Kevin Wyllie Week 4"

From LMU BioDB 2015
Jump to: navigation, search
(Just saving my progress in case of timeout.)
 
(Had to collapse the middle portion of each command because the pipes were beginning to run off of the page.)
Line 11: Line 11:
 
...and these sed commands add the remaining tags for the -35 and -10 box, based on the location of either existing tag.
 
...and these sed commands add the remaining tags for the -35 and -10 box, based on the location of either existing tag.
  
   cat infA-E.coli-K12.txt | sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | sed "s/<minus35box>tt[gt]ac[at]/&<\/minus35box> /g" | sed "s/[ct]at[at]at<\/minus10box>/ <minus10box>&/g" | '''sed -r "s/[ct]at[at]at<\/minus10box>(.){6}/& <tss>/g" | sed "s/<tss>./&<\/tss> /g"'''
+
   cat infA-E.coli-K12.txt | ... | '''sed -r "s/[ct]at[at]at<\/minus10box>(.){6}/& <tss>/g" | sed "s/<tss>./&<\/tss> /g"'''
  
These commands add the tags for the transcription start site (TSS), based on its consensus sequence and known distance from the -10 box.
+
These commands add the tags for the transcription start site (TSS), based on its consensus sequence and known distance from the -10 box.  
  
  cat infA-E.coli-K12.txt | sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | sed "s/<minus35box>tt[gt]ac[at]/&<\/minus35box> /g" | sed "s/[ct]at[at]at<\/minus10box>/ <minus10box>&/g" | sed -r "s/[ct]at[at]at<\/minus10box>(.){6}/& <tss>/g" | sed "s/<tss>./&<\/tss> /g" | '''sed "s/gagg/ <rbs>&<\/rbs> /g"'''
+
  cat infA-E.coli-K12.txt | ... | '''sed "s/gagg/ <rbs>&<\/rbs> /g"'''
  
 
This sed command adds the ribosome binding site (RBS), based its consensus sequence.
 
This sed command adds the ribosome binding site (RBS), based its consensus sequence.
  
  cat infA-E.coli-K12.txt | sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | sed "s/<minus35box>tt[gt]ac[at]/&<\/minus35box> /g" | sed "s/[ct]at[at]at<\/minus10box>/ <minus10box>&/g" | sed -r "s/[ct]at[at]at<\/minus10box>(.){6}/& <tss>/g" | sed "s/<tss>./&<\/tss> /g" | sed "s/gagg/ <rbs>&<\/rbs> /g" | '''sed "s/aaaaggt/ <terminator>&/g" | sed "s/gcctttt…./&<\/terminator> /g"'''
+
  cat infA-E.coli-K12.txt | ... | '''sed "s/aaaaggt/ <terminator>&/g" | sed "s/gcctttt…./&<\/terminator> /g"'''
  
 
This sed command adds the tags for the terminator, based off of its consensus sequence and hairpin behavior.
 
This sed command adds the tags for the terminator, based off of its consensus sequence and hairpin behavior.
  
  cat infA-E.coli-K12.txt | sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | sed "s/<minus35box>tt[gt]ac[at]/&<\/minus35box> /g" | sed "s/[ct]at[at]at<\/minus10box>/ <minus10box>&/g" | sed -r "s/[ct]at[at]at<\/minus10box>(.){6}/& <tss>/g" | sed "s/<tss>./&<\/tss> /g" | sed "s/gagg/ <rbs>&<\/rbs> /g" | sed "s/aaaaggt/ <terminator>&/g" | sed "s/gcctttt…./&<\/terminator> /g" | '''sed "s/<\/rbs> /&\n/g" | sed "2s/atg/ <start_codon>&<\/start_codon> /1"'''
+
  cat infA-E.coli-K12.txt | ... | '''sed "s/<\/rbs> /&\n/g" | sed "2s/atg/ <start_codon>&<\/start_codon> /1"'''
  
 
These commands start a new line after the RBS and then add start-codon tags to the first occurring "atg" sequence on the newly separated line.
 
These commands start a new line after the RBS and then add start-codon tags to the first occurring "atg" sequence on the newly separated line.
  
  cat infA-E.coli-K12.txt | sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | sed "s/<minus35box>tt[gt]ac[at]/&<\/minus35box> /g" | sed "s/[ct]at[at]at<\/minus10box>/ <minus10box>&/g" | sed -r "s/[ct]at[at]at<\/minus10box>(.){6}/& <tss>/g" | sed "s/<tss>./&<\/tss> /g" | sed "s/gagg/ <rbs>&<\/rbs> /g" | sed "s/aaaaggt/ <terminator>&/g" | sed "s/gcctttt…./&<\/terminator> /g" | sed "s/<\/rbs> /&\n/g" | sed "2s/atg/ <start_codon>&<\/start_codon> /1" | '''sed "2s/ta[ag]/\n&/g" | sed "3,10s/tga/\n&/g"'''
+
  cat infA-E.coli-K12.txt | ... | '''sed "2s/ta[ag]/\n&/g" | sed "3,10s/tga/\n&/g"'''
  
 
These commands start a new line for each potential stop codon sequence occurring after the RBS.
 
These commands start a new line for each potential stop codon sequence occurring after the RBS.
  
  cat infA-E.coli-K12.txt | sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | sed "s/<minus35box>tt[gt]ac[at]/&<\/minus35box> /g" | sed "s/[ct]at[at]at<\/minus10box>/ <minus10box>&/g" | sed -r "s/[ct]at[at]at<\/minus10box>(.){6}/& <tss>/g" | sed "s/<tss>./&<\/tss> /g" | sed "s/gagg/ <rbs>&<\/rbs> /g" | sed "s/aaaaggt/ <terminator>&/g" | sed "s/gcctttt…./&<\/terminator> /g" | sed "s/<\/rbs> /&\n/g" | sed "2s/atg/ <start_codon>&<\/start_codon> /1" | sed "2s/ta[ag]/\n&/g" | sed "3,10s/tga/\n&/g" | '''sed "17s/tga/ <stop_codon>&<\/stop_codon> /g"'''
+
  cat infA-E.coli-K12.txt | ... | '''sed "17s/tga/ <stop_codon>&<\/stop_codon> /g"'''
  
 
These commands at stop-codon tags to the potential stop codon sequence occurring closest to (but not after) the terminator.
 
These commands at stop-codon tags to the potential stop codon sequence occurring closest to (but not after) the terminator.
  
  cat infA-E.coli-K12.txt | sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | sed "s/<minus35box>tt[gt]ac[at]/&<\/minus35box> /g" | sed "s/[ct]at[at]at<\/minus10box>/ <minus10box>&/g" | sed -r "s/[ct]at[at]at<\/minus10box>(.){6}/& <tss>/g" | sed "s/<tss>./&<\/tss> /g" | sed "s/gagg/ <rbs>&<\/rbs> /g" | sed "s/aaaaggt/ <terminator>&/g" | sed "s/gcctttt…./&<\/terminator> /g" | sed "s/<\/rbs> /&\n/g" | sed "2s/atg/ <start_codon>&<\/start_codon> /1" | sed "2s/ta[ag]/\n&/g" | sed "3,10s/tga/\n&/g" | sed "17s/tga/ <stop_codon>&<\/stop_codon> /g" | '''sed ':a;N;$!ba;s/\n//g''''
+
  cat infA-E.coli-K12.txt | ... | '''sed ':a;N;$!ba;s/\n//g''''
  
 
This rather unintelligible command undoes any line breaks, combining the text file back into one line.
 
This rather unintelligible command undoes any line breaks, combining the text file back into one line.

Revision as of 03:45, 29 September 2015

Transcription and Translation “Taken to the Next Level”

Adding Tags for Each Gene "Landmark"

cat infA-E.coli-K12.txt | sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g"

The sed command shown finds the beginning tag for the -35 box and the end tag for the -10 box, based on their consensus sequence and known distance from each other...

cat infA-E.coli-K12.txt | sed -r "s/tt[gt]ac[at](.){17}[ct]at[at]at/ <minus35box>&<\/minus10box> /g" | sed "s/<minus35box>tt[gt]ac[at]/&<\/minus35box> /g" | sed "s/[ct]at[at]at<\/minus10box>/ <minus10box>&/g"

...and these sed commands add the remaining tags for the -35 and -10 box, based on the location of either existing tag.

 cat infA-E.coli-K12.txt | ... | sed -r "s/[ct]at[at]at<\/minus10box>(.){6}/& <tss>/g" | sed "s/<tss>./&<\/tss> /g"

These commands add the tags for the transcription start site (TSS), based on its consensus sequence and known distance from the -10 box.

cat infA-E.coli-K12.txt | ... | sed "s/gagg/ <rbs>&<\/rbs> /g"

This sed command adds the ribosome binding site (RBS), based its consensus sequence.

cat infA-E.coli-K12.txt | ... | sed "s/aaaaggt/ <terminator>&/g" | sed "s/gcctttt…./&<\/terminator> /g"

This sed command adds the tags for the terminator, based off of its consensus sequence and hairpin behavior.

cat infA-E.coli-K12.txt | ... | sed "s/<\/rbs> /&\n/g" | sed "2s/atg/ <start_codon>&<\/start_codon> /1"

These commands start a new line after the RBS and then add start-codon tags to the first occurring "atg" sequence on the newly separated line.

cat infA-E.coli-K12.txt | ... | sed "2s/ta[ag]/\n&/g" | sed "3,10s/tga/\n&/g"

These commands start a new line for each potential stop codon sequence occurring after the RBS.

cat infA-E.coli-K12.txt | ... | sed "17s/tga/ <stop_codon>&<\/stop_codon> /g"

These commands at stop-codon tags to the potential stop codon sequence occurring closest to (but not after) the terminator.

cat infA-E.coli-K12.txt | ... | sed ':a;N;$!ba;s/\n//g'

This rather unintelligible command undoes any line breaks, combining the text file back into one line.