Difference between revisions of "Nanguiano Week 3"

From LMU BioDB 2015
Jump to: navigation, search
(Reading Frames: clarified the divided lines)
(added more commands, and added part 2 of the assignment)
Line 13: Line 13:
 
  cd ~dondi/xmlpipedb/data
 
  cd ~dondi/xmlpipedb/data
 
  cp genetic-code.sed ~nanguia1/biodb/week3
 
  cp genetic-code.sed ~nanguia1/biodb/week3
 +
cp xmlpipedb-match-1.1.1.jar ~nanguia1/biodb/week3
 +
cp 493.P_falciparum.xml ~nanguia1/biodb/week3
 +
cp hs_ref_GRCh37_chr19.fa ~nanguia1/biodb/week3
 
  cd ~nanguia1/biodb/week3
 
  cd ~nanguia1/biodb/week3
  
==== Complement of a Strand ====
+
=== Complement of a Strand ===
  
 
Write a sequence of piped text processing commands that, when given a nucleotide sequence, returns its complementary strand.  
 
Write a sequence of piped text processing commands that, when given a nucleotide sequence, returns its complementary strand.  
Line 24: Line 27:
 
  tcgccatatg
 
  tcgccatatg
  
==== Reading Frames ====
+
=== Reading Frames ===
  
 
Write ''6'' sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence.
 
Write ''6'' sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence.
Line 60: Line 63:
 
  IP
 
  IP
  
==== Check Your Work ====
+
=== Check Your Work ===
  
 
Utilizing the [http://web.expasy.org/translate/ ExPASy Translate Tool], I inputted my sample dna sequence, "agcggtatac". The result was as follows:
 
Utilizing the [http://web.expasy.org/translate/ ExPASy Translate Tool], I inputted my sample dna sequence, "agcggtatac". The result was as follows:
  
 
[[File:NAW3TranslationTest.png]]
 
[[File:NAW3TranslationTest.png]]
 +
 +
== XMLPipeDB Match Practice ==
 +
 +
For your convenience, the XMLPipeDB Match Utility (''xmlpipedb-match-1.1.1.jar'') has been installed in the ''~dondi/xmlpipedb/data'' directory alongside the other practice files. Use this utility to answer the following questions:
 +
 +
# What Match command tallies the occurrences of the pattern <code>GO:000[567]</code> in the ''493.P_falciparum.xml'' file?
 +
#* How many unique matches are there?
 +
#* How many times does each unique match appear?
 +
# Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
 +
#* Describe how you did this.
 +
#* Based on where you find this occurrence, what kind of information does this pattern represent?
 +
# What Match command tallies the occurrences of the pattern <code>\"Yu.*\"</code> in the ''493.P_falciparum.xml'' file?
 +
#* How many unique matches are there?
 +
#* How many times does each unique match appear?
 +
#* What information do you think this pattern represents?
 +
# Use Match to count the occurrences of the pattern <code>ATG</code> in the ''hs_ref_GRCh37_chr19.fa'' file (this may take a while).  Then, use '''grep''' and '''wc''' to do the same thing.
 +
#* What answer does Match give you?
 +
#* What answer does '''grep''' + '''wc''' give you?
 +
#* Explain why the counts are different. (''Hint:'' Make sure you understand what exactly is being counted by each approach.)
 +
  
 
== Links ==
 
== Links ==
 
{{Template:Nanguiano}}
 
{{Template:Nanguiano}}

Revision as of 22:26, 15 September 2015

The Genetic Code, by Computer

Connect to the my.cs.lmu.edu workstation as shown in class and do the following exercises from there.

For this exercise, I performed the following series of commands to prepare for the assignment.

ssh my.cs.lmu.edu -l nanguia1 
mkdir biodb
cat >"sequence_file.txt" 
agcggtatac 
cd biodb 
mkdir week3
mv sequence_file.txt biodb/week3
cd ~dondi/xmlpipedb/data
cp genetic-code.sed ~nanguia1/biodb/week3
cp xmlpipedb-match-1.1.1.jar ~nanguia1/biodb/week3
cp 493.P_falciparum.xml ~nanguia1/biodb/week3
cp hs_ref_GRCh37_chr19.fa ~nanguia1/biodb/week3
cd ~nanguia1/biodb/week3

Complement of a Strand

Write a sequence of piped text processing commands that, when given a nucleotide sequence, returns its complementary strand.

On a sequence_file.txt file containing the sequence "agcggtatac", the command and output was as follows:

cat sequence_file.txt | sed "y/atgc/tacg/"
tcgccatatg

Reading Frames

Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence. You should have 6 different sets of commands, one for each possible reading frame.

On a sequence_file.txt containing the sequence "agcggtatac", the command and output was as follows:

+1

cat sequence_file.txt | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g"
SGI

+2

cat sequence_file.txt | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g"
AVY

+3

cat sequence_file.txt | sed "s/^..//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g"
RY

The remaining three were divided onto two lines on this wiki because they could not fit onto one without causing graphical bugs. The actual command was written without newlines.

-1

cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | 
sed "s/ //g" | sed "s/[acgu]//g"
VYR

-2

cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | 
sed "s/ //g" | sed "s/[acgu]//g"
YTA

-3

cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/^..//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | 
sed "s/ //g" | sed "s/[acgu]//g"
IP

Check Your Work

Utilizing the ExPASy Translate Tool, I inputted my sample dna sequence, "agcggtatac". The result was as follows:

NAW3TranslationTest.png

XMLPipeDB Match Practice

For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:

  1. What Match command tallies the occurrences of the pattern GO:000[567] in the 493.P_falciparum.xml file?
    • How many unique matches are there?
    • How many times does each unique match appear?
  2. Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
    • Describe how you did this.
    • Based on where you find this occurrence, what kind of information does this pattern represent?
  3. What Match command tallies the occurrences of the pattern \"Yu.*\" in the 493.P_falciparum.xml file?
    • How many unique matches are there?
    • How many times does each unique match appear?
    • What information do you think this pattern represents?
  4. Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
    • What answer does Match give you?
    • What answer does grep + wc give you?
    • Explain why the counts are different. (Hint: Make sure you understand what exactly is being counted by each approach.)


Links

Nicole Anguiano
BIOL 367, Fall 2015

Assignment Links
Individual Journals
Shared Journals