Difference between revisions of "Vpachec3 Week 3"

From LMU BioDB 2015
Jump to: navigation, search
(Added answer to -1)
(Added to number 5)
 
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
==The Genetic Code, by Computer==
 
==The Genetic Code, by Computer==
  
   cat prokaryote.txt | see "y/atcg/tagc/"
+
   cat prokaryote.txt | sed "y/atcg/tagc/"
  
 
==Reading Frames==
 
==Reading Frames==
Line 26: Line 26:
  
 
===-2===
 
===-2===
 +
  cat prokaryote.txt | sed "y/tagc/aucg" | rev | sed "s/^..//g" | sed "s/.../& /g" | sed -f genetic-code.sed
 +
 +
L K C R Q R N W R L G Y T  R N M V L G N V S S S I L S S L A I V P I E I - -
 +
 
===-3===
 
===-3===
  . cat sequence_file | ?????
 
You should have 6 different sets of commands, one for each possible reading frame. For example, if sequence_file contains:
 
  
  agcggtatac
+
cat prokaryote.txt | sed "y/tagc/aucg/" | rev | sed "s/^..//g" | sed "s/.../& /g" | sed -f genetic-code.sed
Then your text processing commands for 5’-3’ frame 1 should display:
+
 
 +
N A A S G T G G W V I R G T W Y - A T F Q V Q Y C L L W P S Y L L K Y S R
 +
 
 +
 
 +
==XMLPipeDB Match Practice==
 +
 
 +
''What Match command tallies the occurrences of the pattern GO:000[567] in the 493.P_falciparum.xml file?''
 +
'''How many unique matches are there?'''
 +
There were 3 unique matches.
 +
 
 +
'''How many times does each unique match appear?'''
 +
go:0007: 113
 +
go:0006: 1100
 +
go:0005: 1371
 +
 
 +
''Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.''
 +
 
 +
Example:<dbReference type="GO" id="GO:0005622">
 +
 
 +
grep “GO:000[567]” 493.P_falciparum.xml | more
 +
 
 +
Based on where you find this occurrence, what kind of information does this pattern represent?
 +
ontology ID of a gene.
 +
 
 +
'''What Match command tallies the occurrences of the pattern \"Yu.*\" in the 493.P_falciparum.xml file?'''
 +
java -jar xmlpipedb-match-1.1.1.jar \"Yu.*\" < 493.P_falciparum.xml
 +
 
 +
'''How many unique matches are there?'''
 +
3
 +
'''How many times does each unique match appear?'''
 +
"yu b.": 1
 +
"yu k.": 228
 +
"yu m.": 1
 +
 
 +
'''What information do you think this pattern represents?'''
 +
I think that Yu is a last name and the letters following the the first letter of the first name,
 +
 
 +
Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
 +
 
 +
What answer does Match give you?
 +
java -jar xmlpipedb-match-1.1.1.jar ATG < hs_ref_GRCh37_chr19.fa
 +
 
 +
atg: 830101
 +
 
 +
Total unique matches: 1
  
  SGI
+
'''What answer does grep + wc give you?'''
Your text processing commands for 5’-3’ frame 3 should display:
+
502410  502410 35671048
  
  RY
+
'''Explain why the counts are different. (Hint: Make sure you understand what exactly is being counted by each approach.)'''
...and so on.
+
  
Hint 1: The 6 sets of commands are very similar to each other.
+
I think they are counting the lines, words and characters. I recall it going over was wc meant in class and going over how to get the different counts.
Hint 2: Under the ~dondi/xmlpipedb/data directory in the Keck lab, you will find a file called genetic-code.sed. To save you some typing, this file has already been prepared with the correct sequence of sed commands for converting any base triplets into the corresponding amino acid. For example, this line in that file:
+
s/ugc/C/g
+
...corresponds to a uracil-guanine-cytosine sequence transcribing to the cysteine amino acid (C). The trick is to figure out how to use this file to your advantage, in the commands that you'll be forming.
+
  
 +
==Electronic Lab Book==
 +
# Go to the magnifying glass symbol at the top of the computer screen and type in ' Terminal'
 +
# Click on Terminal and type in: ssh my dot cs dot lmu dot edu and click 'Enter'
 +
# Type in password and press enter
 +
# ( I personally took a while to figure this out) Then type: cd ~dondi/xmlpipedb/data - this gets you into the directory
 +
# For each section of the assignment, there were different files to be accessed. See individual questions above.
  
 
==Links==
 
==Links==

Latest revision as of 07:10, 22 September 2015

The Genetic Code, by Computer

  cat prokaryote.txt | sed "y/atcg/tagc/"

Reading Frames

+1

cat prokaryote.txt | sed "s/..$//g" | sed "y/t/u/" | sed "s/.../& /g" | sed -f genetic-code.sed 
 S T I F Q - V R W P K K T I L N L K R C L I P C S A Y N P A A S S A G G I L

+2

cat prokaryote.txt | sed "s/^.//g" | sed "s/.$//g" | sed "y/t/u" | sed "s/.../& /g" | sed -f genetic-code.sed
L L Y F N R Y D G Q R R Q Y - T - N V A - Y H V P R I T Q  P P V P L A  A  F -

+3

cat prokaryote.txt | sed "s/^..//g" | sed "y/t/u/" | sed "s/.../& /g" | sed -f genetic-code.sed
Y Y I S I G T M A K E D N I E L E T L P N  T M F R V - P S R Q F R W R H F N

-1

cat prokaryote.txt | sed "y/tagc/aucg/" | rev | sed "s/.../& /g" | sed "s/..$//g" | sed -f genetic-code.sed
V K M P P A E L A A G L Y A E H G I R Q R F K F N I V F F G H R T Y - N I V

-2

 cat prokaryote.txt | sed "y/tagc/aucg" | rev | sed "s/^..//g" | sed "s/.../& /g" | sed -f genetic-code.sed
L K C R Q R N W R L G Y T  R N M V L G N V S S S I L S S L A I V P I E I - -

-3

cat prokaryote.txt | sed "y/tagc/aucg/" | rev | sed "s/^..//g" | sed "s/.../& /g" | sed -f genetic-code.sed
N A A S G T G G W V I R G T W Y - A T F Q V Q Y C L L W P S Y L L K Y S R


XMLPipeDB Match Practice

What Match command tallies the occurrences of the pattern GO:000[567] in the 493.P_falciparum.xml file? How many unique matches are there?

There were 3 unique matches.

How many times does each unique match appear?

go:0007: 113
go:0006: 1100
go:0005: 1371

Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.

Example:<dbReference type="GO" id="GO:0005622">

grep “GO:000[567]” 493.P_falciparum.xml | more

Based on where you find this occurrence, what kind of information does this pattern represent?

ontology ID of a gene.

What Match command tallies the occurrences of the pattern \"Yu.*\" in the 493.P_falciparum.xml file?

java -jar xmlpipedb-match-1.1.1.jar \"Yu.*\" < 493.P_falciparum.xml

How many unique matches are there?

3

How many times does each unique match appear?

"yu b.": 1
"yu k.": 228
"yu m.": 1

What information do you think this pattern represents? I think that Yu is a last name and the letters following the the first letter of the first name,

Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.

What answer does Match give you?

java -jar xmlpipedb-match-1.1.1.jar ATG < hs_ref_GRCh37_chr19.fa

atg: 830101

Total unique matches: 1

What answer does grep + wc give you?

502410  502410 35671048

Explain why the counts are different. (Hint: Make sure you understand what exactly is being counted by each approach.)

I think they are counting the lines, words and characters. I recall it going over was wc meant in class and going over how to get the different counts.

Electronic Lab Book

  1. Go to the magnifying glass symbol at the top of the computer screen and type in ' Terminal'
  2. Click on Terminal and type in: ssh my dot cs dot lmu dot edu and click 'Enter'
  3. Type in password and press enter
  4. ( I personally took a while to figure this out) Then type: cd ~dondi/xmlpipedb/data - this gets you into the directory
  5. For each section of the assignment, there were different files to be accessed. See individual questions above.

Links

Vpachec3 User Page