Difference between revisions of "Anuvarsh Week 4"
(created week 4 journal page) |
(→Transcription and Translation "Taken to the Next Level": first few steps of hw4) |
||
Line 1: | Line 1: | ||
=Transcription and Translation "Taken to the Next Level"= | =Transcription and Translation "Taken to the Next Level"= | ||
+ | |||
+ | Before anything else, I logged into my account using: | ||
+ | |||
+ | ssh avarshne@my.cs.lmu.edu | ||
+ | |||
+ | And put in my password. Then, I entered the directory within which I copied infA-E.coli-k12.txt from Dondi's library. | ||
+ | |||
+ | cd biodb2015 | ||
+ | |||
+ | ==Modify the gene sequence string so that it highlights or “tags” the special sequences within this gene== | ||
+ | |||
+ | In order to complete this task, I reviewed the [[Introduction to the Command Line]] page and looked over the [[More Text Processing Features]] page. At this point, my partner [[User:Rlegaspi | Ron Legaspi]] and I were led through the first couple steps of the homework in class. In particular, we learned how to go about adding the -35 box and -10 box tags. In order to do this, we first searched infA-E.coli-K12.txt for all instances of the -35 sequence, which was provided to us as a hint on the homework assignment. In order to do this, we used the following command: | ||
+ | |||
+ | grep "tt[gt]ac[at]" infA-E.coli-K12.txt | ||
+ | ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgcc | ||
+ | gataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcgg | ||
+ | agtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatctttacttatttacagaacttcggcattatcttgccggttcaaattacggtagt | ||
+ | gataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgta | ||
+ | gagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtga | ||
+ | ctgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaaga | ||
+ | acgagtaaaaggtcggtttaaccggcctttttattttat | ||
+ | |||
+ | When we ran this test, we noticed that there were 2 instances of this pattern with only two base pairs between them. Because we understood that the -10 box must occur after the -35 box, we searched for the -10 box sequence while also searching for the -35 box. In this instance, we could not use <code>grep</code> because only one sequence can be searched at any given time. In order to locate both sequences relative to each other, we ran the following command: | ||
+ | |||
+ | cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ ***&*** /g" | sed "s/[ct]at[at]at/ ***&*** /g" | ||
+ | ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt ***tataat*** tgcggtcgcagagttggttacgctcattaccccgc | ||
+ | tgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcgg | ||
+ | agtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc ***tttact*** ta ***tttaca*** gaacttcgg ***cattat*** | ||
+ | cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcc | ||
+ | taataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacg | ||
+ | ggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcga | ||
+ | agagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat | ||
+ | |||
+ | At this point it became very clear that the "real" -35 box was the first instance of the sequence, and the "real" -10 box was the second instance of the sequence, or the first instance after the -35 box. We began with tagging the -35 box. In order to replace just the first instance of a sequence using sed, we found that we just needed to replace the "g" in <code>sed "s///g"</code> with "1". This tells sed to only replace the first instance of a sequence. We found this information in the [[More Text Processing Features]] page. The resulting command was as follows: | ||
+ | |||
+ | cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" | ||
+ | ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgat | ||
+ | aaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaa | ||
+ | tgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact</minus35box> tatttacagaacttcggcattatct | ||
+ | tgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcct | ||
+ | aataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacg | ||
+ | ggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcg | ||
+ | aagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat | ||
+ | |||
{{Template:Anuvarsh}} | {{Template:Anuvarsh}} |
Revision as of 20:24, 27 September 2015
Contents
Transcription and Translation "Taken to the Next Level"
Before anything else, I logged into my account using:
ssh avarshne@my.cs.lmu.edu
And put in my password. Then, I entered the directory within which I copied infA-E.coli-k12.txt from Dondi's library.
cd biodb2015
Modify the gene sequence string so that it highlights or “tags” the special sequences within this gene
In order to complete this task, I reviewed the Introduction to the Command Line page and looked over the More Text Processing Features page. At this point, my partner Ron Legaspi and I were led through the first couple steps of the homework in class. In particular, we learned how to go about adding the -35 box and -10 box tags. In order to do this, we first searched infA-E.coli-K12.txt for all instances of the -35 sequence, which was provided to us as a hint on the homework assignment. In order to do this, we used the following command:
grep "tt[gt]ac[at]" infA-E.coli-K12.txt ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgcc gataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcgg agtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatctttacttatttacagaacttcggcattatcttgccggttcaaattacggtagt gataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgta gagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtga ctgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaaga acgagtaaaaggtcggtttaaccggcctttttattttat
When we ran this test, we noticed that there were 2 instances of this pattern with only two base pairs between them. Because we understood that the -10 box must occur after the -35 box, we searched for the -10 box sequence while also searching for the -35 box. In this instance, we could not use grep
because only one sequence can be searched at any given time. In order to locate both sequences relative to each other, we ran the following command:
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ ***&*** /g" | sed "s/[ct]at[at]at/ ***&*** /g" ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt ***tataat*** tgcggtcgcagagttggttacgctcattaccccgc tgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcgg agtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc ***tttact*** ta ***tttaca*** gaacttcgg ***cattat*** cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcc taataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacg ggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcga agagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
At this point it became very clear that the "real" -35 box was the first instance of the sequence, and the "real" -10 box was the second instance of the sequence, or the first instance after the -35 box. We began with tagging the -35 box. In order to replace just the first instance of a sequence using sed, we found that we just needed to replace the "g" in sed "s///g"
with "1". This tells sed to only replace the first instance of a sequence. We found this information in the More Text Processing Features page. The resulting command was as follows:
cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1" ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgat aaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaa tgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact</minus35box> tatttacagaacttcggcattatct tgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcct aataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacg ggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcg aagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
Other Links
User Page: Anindita Varshneya
Class Page: BIOL/CMSI 367: Biological Databases, Fall 2015
Group Page: GÉNialOMICS
Assignment Pages
Week 1 Assignment
Week 2 Assignment
Week 3 Assignment
Week 4 Assignment
Week 5 Assignment
Week 6 Assignment
Week 7 Assignment
Week 8 Assignment
Week 9 Assignment
Week 10 Assignment
Week 11 Assignment
Week 12 Assignment
No Week 13 Assignment
Week 14 Assignment
Week 15 Assignment
Individual Journals
Individual Journal Week 2
Individual Journal Week 3
Individual Journal Week 4
Individual Journal Week 5
Individual Journal Week 6
Individual Journal Week 7
Individual Journal Week 8
Individual Journal Week 9
Individual Journal Week 10
Individual Journal Week 11
Individual Journal Week 12
Individual Journal Week 14
Individual Journal Week 15
Class Journal Week 1
Class Journal Week 2
Class Journal Week 3
Class Journal Week 4
Class Journal Week 5
Class Journal Week 6
Class Journal Week 7
Class Journal Week 8
Class Journal Week 9
GÉNialOMICS Journal Week 10
GÉNialOMICS Journal Week 11
GÉNialOMICS Journal Week 12
GÉNialOMICS Journal Week 14
GÉNialOMICS Journal Week 15