Nanguiano Week 3

From LMU BioDB 2015
Revision as of 22:53, 15 September 2015 by Nanguiano (Talk | contribs) (XMLPipeDB Match Practice: answered most of question 4)

Jump to: navigation, search

The Genetic Code, by Computer

Connect to the my.cs.lmu.edu workstation as shown in class and do the following exercises from there.

For this exercise, I performed the following series of commands to prepare for the assignment.

ssh my.cs.lmu.edu -l nanguia1 
mkdir biodb
cat >"sequence_file.txt" 
agcggtatac 
cd biodb 
mkdir week3
mv sequence_file.txt biodb/week3
cd ~dondi/xmlpipedb/data
cp genetic-code.sed ~nanguia1/biodb/week3
cp xmlpipedb-match-1.1.1.jar ~nanguia1/biodb/week3
cp 493.P_falciparum.xml ~nanguia1/biodb/week3
cp hs_ref_GRCh37_chr19.fa ~nanguia1/biodb/week3
cd ~nanguia1/biodb/week3

Complement of a Strand

Write a sequence of piped text processing commands that, when given a nucleotide sequence, returns its complementary strand.

On a sequence_file.txt file containing the sequence "agcggtatac", the command and output was as follows:

cat sequence_file.txt | sed "y/atgc/tacg/"
tcgccatatg

Reading Frames

Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence. You should have 6 different sets of commands, one for each possible reading frame.

On a sequence_file.txt containing the sequence "agcggtatac", the command and output was as follows:

+1

cat sequence_file.txt | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g"
SGI

+2

cat sequence_file.txt | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g"
AVY

+3

cat sequence_file.txt | sed "s/^..//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g"
RY

The remaining three were divided onto two lines on this wiki because they could not fit onto one without causing graphical bugs. The actual command was written without newlines.

-1

cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | 
sed "s/ //g" | sed "s/[acgu]//g"
VYR

-2

cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | 
sed "s/ //g" | sed "s/[acgu]//g"
YTA

-3

cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/^..//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | 
sed "s/ //g" | sed "s/[acgu]//g"
IP

Check Your Work

Utilizing the ExPASy Translate Tool, I inputted my sample dna sequence, "agcggtatac". The result was as follows:

NAW3TranslationTest.png

XMLPipeDB Match Practice

For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:

Note: I used this wiki page to learn about the match utility.

  1. What Match command tallies the occurrences of the pattern GO:000[567] in the 493.P_falciparum.xml file?
    • java -jar xmlpipedb-match-1.1.1.jar GO:000[567] < 493.P_falciparum.xml
    • How many unique matches are there?
      • 3
    • How many times does each unique match appear?
      • GO:007 : 113
      • GO:006 : 1100
      • GO:008 : 1371
  2. Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
    • One example was: <dbReference type="GO" id="GO:0005622">
    • Describe how you did this.
      • grep "GO:000[567]" 493.P_falciparum.xml | more
    • Based on where you find this occurrence, what kind of information does this pattern represent?
      • Based on where I found it, this pattern shows the gene ontology ID of a particular gene in the database.
  3. What Match command tallies the occurrences of the pattern \"Yu.*\" in the 493.P_falciparum.xml file?
    • java -jar xmlpipedb-match-1.1.1.jar \"Yu.*\" < 493.P_falciparum.xml
    • How many unique matches are there?
      • 3
    • How many times does each unique match appear?
      • "Yu b." : 1
      • "Yu k." : 228
      • "Yu m." : 1
    • What information do you think this pattern represents?
      • I believe this pattern represents a name.
      • This was confirmed by running the command grep "Yu.*" 493.P_falciparum.xml
  4. Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
    • What answer does Match give you?
      • java -jar xmlpipedb-match-1.1.1.jar ATG < hs_ref_GRCh37_chr19.fa
      • Total unique matches: 1
      • Number of matches: 830101
    • What answer does grep + wc give you?
      • grep "ATG" hs_ref_GRCh37_chr19.fa | wc
      • Lines: 502410
      • Words: 502410
      • Characters: 35671048
    • Explain why the counts are different. (Hint: Make sure you understand what exactly is being counted by each approach.)

Links

Nicole Anguiano
BIOL 367, Fall 2015

Assignment Links
Individual Journals
Shared Journals