Difference between revisions of "Rlegaspi Week 3"

From LMU BioDB 2015
Jump to: navigation, search
(Saved new edit in order to insert updated template.)
(Inserted answer for Compliment of a Strand exercise and added the electronic lab notebook passage that accompanies the exercise.)
Line 17: Line 17:
 
     cat ''sequence_file'' | '''?????'''
 
     cat ''sequence_file'' | '''?????'''
  
For example, if ''sequence_file'' contains:
+
'''Sequence file:''' ''~dondi/xmlpipedb/data/prokaryote.txt''
 +
 +
tctactatatttcaataggtacgatggccaaagaagacaatattgaacttgaaacgttgcctaataccatgttccgcgtataacccagccgccagttccgctggcggcattttaac
  
    agcggtatac
+
'''Sequence of Piped Text Processing Commands:'''
 +
 
 +
cat ''prokaryote.txt'' | sed "y/atcg/tagc/"
 +
 
 +
'''Result of Text Processing Commands:''' The Complimentary Strand of the Nucleotide Sequence from ''~dondi/xmlpipedb/data/prokaryote.txt''
  
Then your text processing commands should display:
+
agatgatataaagttatccatgctaccggtttcttctgttataacttgaactttgcaacggattatggtacaaggcgcatattgggtcggcggtcaaggcgaccgccgtaaaattg
  
    tcgccatatg
 
  
 
==== Reading Frames ====
 
==== Reading Frames ====
Line 70: Line 75:
 
#* What answer does '''grep''' + '''wc''' give you?
 
#* What answer does '''grep''' + '''wc''' give you?
 
#* Explain why the counts are different. (''Hint:'' Make sure you understand what exactly is being counted by each approach.)
 
#* Explain why the counts are different. (''Hint:'' Make sure you understand what exactly is being counted by each approach.)
 +
 +
== Electronic Lab Notebook ==
 +
 +
#To complete the first part of '''The Genetic Code, by Computer''' ('''Complement of a Strand'''), I firstly had to connect to the ''my.cs.lmu.edu'' work station on my MacBook Pro laptop. Thankfully, my homework partner [[User:Kwyllie | Kevin]]<br> told me in class that we need to be connected to the LMU network [i.e. '''Student(Secure)'''], which saved a lot of frustration because I was planning to work from home. I do understand that it is required to connect via LMU network because the computer that operates the ''my.cs.lmu.edu'' is on campus. Focusing more on the assignment, I found it useful to write my thoughts out on paper before using my Terminal app and ''my.cs.lmu.edu'' work station. I chose the shortest of the two practice files. When thinking about complimentary strand of the DNA sequence, I figured that all of the nucleotides needed to be changed into their compliments ('''a''' to '''t''', '''t''' to '''a''', '''c''' to '''g''', '''g''' to '''c'''). From the Tuesday Dondi lecture, I remember the command used to replace characters into desired characters - '''sed "y/  /  /"'''. The piped text processing commands of '''cat prokaryote.txt | sed "y/atcg/tagc/"''' produced the expected output that I wanted, which was the compliment strand.
  
 
== Links to User Page and Journal Pages ==
 
== Links to User Page and Journal Pages ==

Revision as of 17:09, 18 September 2015

Individual Journal Assignment

Homework Partner

Kevin Wyllie

The Genetic Code, by Computer

Connect to the my.cs.lmu.edu workstation as shown in class and do the following exercises from there.

For these exercises, two files are available in the Keck lab system for practice; of course, you can always make your own sequences up. The practice files are ~dondi/xmlpipedb/data/prokaryote.txt and ~dondi/xmlpipedb/data/infA-E.coli-K12.txt.

Complement of a Strand

Write a sequence of piped text processing commands that, when given a nucleotide sequence, returns its complementary strand. In other words, fill in the question marks:

   cat sequence_file | ?????

Sequence file: ~dondi/xmlpipedb/data/prokaryote.txt

tctactatatttcaataggtacgatggccaaagaagacaatattgaacttgaaacgttgcctaataccatgttccgcgtataacccagccgccagttccgctggcggcattttaac

Sequence of Piped Text Processing Commands:

cat prokaryote.txt | sed "y/atcg/tagc/"

Result of Text Processing Commands: The Complimentary Strand of the Nucleotide Sequence from ~dondi/xmlpipedb/data/prokaryote.txt

agatgatataaagttatccatgctaccggtttcttctgttataacttgaactttgcaacggattatggtacaaggcgcatattgggtcggcggtcaaggcgaccgccgtaaaattg


Reading Frames

Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence. In other words, fill in the question marks:

   cat sequence_file | ?????

You should have 6 different sets of commands, one for each possible reading frame. For example, if sequence_file contains:

   agcggtatac

Then your text processing commands for 5’-3’ frame 1 should display:

   SGI

Your text processing commands for 5’-3’ frame 3 should display:

   RY

...and so on.

  • Hint 1: The 6 sets of commands are very similar to each other.
  • Hint 2: Under the ~dondi/xmlpipedb/data directory in the Keck lab, you will find a file called genetic-code.sed. To save you some typing, this file has already been prepared with the correct sequence of sed commands for converting any base triplets into the corresponding amino acid. For example, this line in that file:
    s/ugc/C/g
    ...corresponds to a uracil-guanine-cytosine sequence transcribing to the cysteine amino acid (C). The trick is to figure out how to use this file to your advantage, in the commands that you'll be forming.

Check Your Work

Fortunately, online tools are available for checking your work; we recommend the ExPASy Translate Tool, sponsored by the same people who run SwissProt. You’re free to use this tool to see if your text processing commands produce the same results.

XMLPipeDB Match Practice

For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:

  1. What Match command tallies the occurrences of the pattern GO:000[567] in the 493.P_falciparum.xml file?
    • How many unique matches are there?
    • How many times does each unique match appear?
  2. Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
    • Describe how you did this.
    • Based on where you find this occurrence, what kind of information does this pattern represent?
  3. What Match command tallies the occurrences of the pattern \"Yu.*\" in the 493.P_falciparum.xml file?
    • How many unique matches are there?
    • How many times does each unique match appear?
    • What information do you think this pattern represents?
  4. Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
    • What answer does Match give you?
    • What answer does grep + wc give you?
    • Explain why the counts are different. (Hint: Make sure you understand what exactly is being counted by each approach.)

Electronic Lab Notebook

  1. To complete the first part of The Genetic Code, by Computer (Complement of a Strand), I firstly had to connect to the my.cs.lmu.edu work station on my MacBook Pro laptop. Thankfully, my homework partner Kevin
    told me in class that we need to be connected to the LMU network [i.e. Student(Secure)], which saved a lot of frustration because I was planning to work from home. I do understand that it is required to connect via LMU network because the computer that operates the my.cs.lmu.edu is on campus. Focusing more on the assignment, I found it useful to write my thoughts out on paper before using my Terminal app and my.cs.lmu.edu work station. I chose the shortest of the two practice files. When thinking about complimentary strand of the DNA sequence, I figured that all of the nucleotides needed to be changed into their compliments (a to t, t to a, c to g, g to c). From the Tuesday Dondi lecture, I remember the command used to replace characters into desired characters - sed "y/ / /". The piped text processing commands of cat prokaryote.txt | sed "y/atcg/tagc/" produced the expected output that I wanted, which was the compliment strand.

Links to User Page and Journal Pages

Ron Legaspi
BIOL 367, Fall 2015

Assignment Links
Individual Weekly Journals
Shared Weekly Journals