Week 4 E-notes Eyanosch
3 command sequences
- one for each question
so far for the first question
cat infA-E.coli-K12.txt | grep "[ct]at[at]at" | grep "tt[gt]ac[at]" | sed "s/tttact/ <minus35box> & <\/minus35box> /g" | sed "s/cattat/ <minus 10box> & <\/minus10box> \n/" | sed "s/a/ <TSS>&<\/TSS>/" | sed "2s/gagg/ <RBS>&<\/RBS> /" | grep "aaaaggt.*gcctttt"
- the only problem that I'm having is only the second line shows when I add the grep "aaaaggt.*gcctttt" at the end to find the hairpin loop
| sed "s/aaaaggt.*tttttatt/ <Terminator>&<\/Terminator/g"
- adds the description of the terminator sequence
What I'm trying to do is use the sed ':a;N;$!ba;s/\n//g' format to combine line 1 with line 2 but I'm unable to do so. I think it has to do with the way I'm writing the code into the mac terminal. My thinking process was that when finding the "a" for TSS I started a new line and counted down 12 nucleotides which happened to be an a, no prior nucletides were adenine. The problem is combining line 1 and 2 after finding the TSS.
- changed my plan of attack after going through more of the wiki. copied and asted the 3 sed commands for manipulating lines and it worked out
Code as is:
cat infA-E.coli-K12.txt | grep "[ct]at[at]at" | grep "tt[gt]ac[at]" | sed "s/tttact/ <minus35box> & <\/minus35box> /g" | sed "s/cattat/ <minus 10box> & <\/minus10box> \" | sed "s/ccggttc/&\n/g" | sed "2s/a/ <TSS>&<\/TSS> /1" | sed ':a;N;$!bs;s/\n//g' | sed "2s/gagg/ <RBS>&<\/RBS> /" | grep "aaaaggt.*gcctttt"
Question 1 code:
eyanosch@ab201:/nfs/home/dondi/xmlpipedb/data$ cat infA-E.coli-K12.txt | grep "[ct]at[at]at" | grep "tt[gt]ac[at]" | sed "s/tttact/ <minus35box> & <\/minus35box> /g" | sed "s/cattat/ <minus 10box> & <\/minus10box> /" | sed "s/ccggttc/&\n/g" | sed "2s/a/ <TSS>&<\/TSS>/1" | sed ':a;N;$!ba;s/\n//g' | sed "s/gagg/ <RBS>&<\/RBS> /g" | grep "aaaaggt.*gcctttt" | sed "s/aaaaggtc.*tttttatt/ <terminator>&<\/terminator> /g" | sed "s/atg/ <start codon>&<\/start codon> /4" | sed "s/tga/ <stop codon>&<\/stop codon> /11"
- After making a few adjustments:
cat infA-E.coli-K12.txt | grep "[ct]at[at]at" | grep "tt[gt]ac[at]" | sed "s/tttact/ <MINUS35BOX> & <\/MINUS35BOX> /g" | sed "s/cattat/ <MINUS10BOX> & <\/MINUS10BOX> /" | sed "s/cttgcc/&\n/g" | sed "2s/g/ <TSS>&<\/TSS>/1" | sed ':a;N;$!ba;s/\n//g' | sed "s/gagg/ <RBS>&<\/RBS> /g" | grep "aaaaggt.*gcctttt" | sed "s/aaaaggtc.*tttttatt/ <TERMINATOR>&<\/TERMINATOR> /g" | sed "s/tac/ <START CODON>&<\/START CODON> /7" | sed "s/att/ <STOP CODON>&<\/STOP CODON> /11"