Difference between revisions of "Anuvarsh Week 4"

From LMU BioDB 2015
Jump to: navigation, search
(created week 4 journal page)
 
(Transcription and Translation "Taken to the Next Level": first few steps of hw4)
Line 1: Line 1:
 
=Transcription and Translation "Taken to the Next Level"=
 
=Transcription and Translation "Taken to the Next Level"=
 +
 +
Before anything else, I logged into my account using:
 +
 +
    ssh avarshne@my.cs.lmu.edu
 +
 +
And put in my password. Then, I entered the directory within which I copied infA-E.coli-k12.txt from Dondi's library.
 +
 +
    cd biodb2015
 +
 +
==Modify the gene sequence string so that it highlights or “tags” the special sequences within this gene==
 +
 +
In order to complete this task, I reviewed the [[Introduction to the Command Line]] page and looked over the [[More Text Processing Features]] page. At this point, my partner [[User:Rlegaspi | Ron Legaspi]] and I were led through the first couple steps of the homework in class. In particular, we learned how to go about adding the -35 box and -10 box tags. In order to do this, we first searched infA-E.coli-K12.txt for all instances of the -35 sequence, which was provided to us as a hint on the homework assignment. In order to do this, we used the following command:
 +
 +
    grep "tt[gt]ac[at]" infA-E.coli-K12.txt
 +
    ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgcc
 +
    gataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcgg
 +
    agtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatctttacttatttacagaacttcggcattatcttgccggttcaaattacggtagt
 +
    gataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgta
 +
    gagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtga
 +
    ctgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaaga
 +
    acgagtaaaaggtcggtttaaccggcctttttattttat
 +
 +
When we ran this test, we noticed that there were 2 instances of this pattern with only two base pairs between them. Because we understood that the -10 box must occur after the -35 box, we searched for the -10 box sequence while also searching for the -35 box. In this instance, we could not use <code>grep</code> because only one sequence can be searched at any given time. In order to locate both sequences relative to each other, we ran the following command:
 +
 +
    cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/    ***&***    /g" | sed "s/[ct]at[at]at/    ***&***    /g"
 +
    ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt    ***tataat***  tgcggtcgcagagttggttacgctcattaccccgc
 +
    tgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcgg
 +
    agtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc    ***tttact***    ta    ***tttaca***    gaacttcgg    ***cattat*** 
 +
    cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcc
 +
    taataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacg
 +
    ggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcga
 +
    agagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
 +
 +
At this point it became very clear that the "real" -35 box was the first instance of the sequence, and the "real" -10 box was the second instance of the sequence, or the first instance after the -35 box. We began with tagging the -35 box. In order to replace just the first instance of a sequence using sed, we found that we just needed to replace the "g" in <code>sed "s///g"</code> with "1". This tells sed to only replace the first instance of a sequence. We found this information in the [[More Text Processing Features]] page. The resulting command was as follows:
 +
 +
    cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1"
 +
    ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgat
 +
    aaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaa
 +
    tgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact</minus35box> tatttacagaacttcggcattatct
 +
    tgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcct
 +
    aataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacg
 +
    ggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcg
 +
    aagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat
 +
  
 
{{Template:Anuvarsh}}
 
{{Template:Anuvarsh}}

Revision as of 20:24, 27 September 2015

Transcription and Translation "Taken to the Next Level"

Before anything else, I logged into my account using:

   ssh avarshne@my.cs.lmu.edu

And put in my password. Then, I entered the directory within which I copied infA-E.coli-k12.txt from Dondi's library.

   cd biodb2015

Modify the gene sequence string so that it highlights or “tags” the special sequences within this gene

In order to complete this task, I reviewed the Introduction to the Command Line page and looked over the More Text Processing Features page. At this point, my partner Ron Legaspi and I were led through the first couple steps of the homework in class. In particular, we learned how to go about adding the -35 box and -10 box tags. In order to do this, we first searched infA-E.coli-K12.txt for all instances of the -35 sequence, which was provided to us as a hint on the homework assignment. In order to do this, we used the following command:

   grep "tt[gt]ac[at]" infA-E.coli-K12.txt
   ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgcc
   gataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcgg
   agtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatctttacttatttacagaacttcggcattatcttgccggttcaaattacggtagt
   gataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgta
   gagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtga
   ctgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcgaagagaaaga
   acgagtaaaaggtcggtttaaccggcctttttattttat

When we ran this test, we noticed that there were 2 instances of this pattern with only two base pairs between them. Because we understood that the -10 box must occur after the -35 box, we searched for the -10 box sequence while also searching for the -35 box. In this instance, we could not use grep because only one sequence can be searched at any given time. In order to locate both sequences relative to each other, we ran the following command:

   cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/    ***&***    /g" | sed "s/[ct]at[at]at/    ***&***    /g"
   ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgt    ***tataat***   tgcggtcgcagagttggttacgctcattaccccgc
   tgccgataaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcgg
   agtaatgtgccgaacctgtttgttgcgatttagcgcgcaaatc    ***tttact***    ta    ***tttaca***    gaacttcgg    ***cattat***   
   cttgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcc
   taataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacg
   ggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcga
   agagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat

At this point it became very clear that the "real" -35 box was the first instance of the sequence, and the "real" -10 box was the second instance of the sequence, or the first instance after the -35 box. We began with tagging the -35 box. In order to replace just the first instance of a sequence using sed, we found that we just needed to replace the "g" in sed "s///g" with "1". This tells sed to only replace the first instance of a sequence. We found this information in the More Text Processing Features page. The resulting command was as follows:

   cat infA-E.coli-K12.txt | sed "s/tt[gt]ac[at]/ <minus35box>&<\/minus35box> /1"
   ttttcaccacaagaatgaatgttttcggcacatttctccccagagtgttataattgcggtcgcagagttggttacgctcattaccccgctgccgat
   aaggaatttttcgcgtcaggtaacgcccatcgtttatctcaccgctcccttatacgttgcgcttttggtgcggcttagccgtgtgttttcggagtaa
   tgtgccgaacctgtttgttgcgatttagcgcgcaaatc <minus35box>tttact</minus35box> tatttacagaacttcggcattatct
   tgccggttcaaattacggtagtgataccccagaggattagatggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcct
   aataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacg
   ggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgattgttttaccgcctgatgggcg
   aagagaaagaacgagtaaaaggtcggtttaaccggcctttttattttat


Other Links

User Page: Anindita Varshneya
Class Page: BIOL/CMSI 367: Biological Databases, Fall 2015
Group Page: GÉNialOMICS

Assignment Pages

Week 1 Assignment
Week 2 Assignment
Week 3 Assignment
Week 4 Assignment
Week 5 Assignment
Week 6 Assignment
Week 7 Assignment
Week 8 Assignment
Week 9 Assignment
Week 10 Assignment
Week 11 Assignment
Week 12 Assignment
No Week 13 Assignment
Week 14 Assignment
Week 15 Assignment

Individual Journals

Individual Journal Week 2
Individual Journal Week 3
Individual Journal Week 4
Individual Journal Week 5
Individual Journal Week 6
Individual Journal Week 7
Individual Journal Week 8
Individual Journal Week 9
Individual Journal Week 10
Individual Journal Week 11
Individual Journal Week 12
Individual Journal Week 14
Individual Journal Week 15

Shared Journals

Class Journal Week 1
Class Journal Week 2
Class Journal Week 3
Class Journal Week 4
Class Journal Week 5
Class Journal Week 6
Class Journal Week 7
Class Journal Week 8
Class Journal Week 9
GÉNialOMICS Journal Week 10
GÉNialOMICS Journal Week 11
GÉNialOMICS Journal Week 12
GÉNialOMICS Journal Week 14
GÉNialOMICS Journal Week 15