Difference between revisions of "Kzebrows Week 3"

From LMU BioDB 2015
Jump to: navigation, search
(+2 and +3 reading frames.)
(Match Practice #1 answer.)
Line 31: Line 31:
 
  sed "s/^..//g"
 
  sed "s/^..//g"
 
indicating that I wanted to delete the first TWO characters in the first line. This entire command was:
 
indicating that I wanted to delete the first TWO characters in the first line. This entire command was:
 
 
  cat infA-E.coli-K12.txt | sed "s/^..//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed “s/[acug]//g”
 
  cat infA-E.coli-K12.txt | sed "s/^..//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed “s/[acug]//g”
 +
 +
==XML PipeDB Match Utility Practice==
 +
# What Match command tallies the occurrences of the pattern <code>GO:000[567]</code> in the ''493.P_falciparum.xml'' file? '''java -jar xmlpipedb-match-1.1.1.jar GO:000[567] < 493.P_falciparum.xml'''
 +
#* How many unique matches are there? '''3'''
 +
#* How many times does each unique match appear? '''The first appears 113 times, the second appears 1,100 times, and the third appears 1,371 times.'''
 +
# Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
 +
#* Describe how you did this.
 +
#* Based on where you find this occurrence, what kind of information does this pattern represent?
 +
# What Match command tallies the occurrences of the pattern <code>\"Yu.*\"</code> in the ''493.P_falciparum.xml'' file?
 +
#* How many unique matches are there?
 +
#* How many times does each unique match appear?
 +
#* What information do you think this pattern represents?
 +
# Use Match to count the occurrences of the pattern <code>ATG</code> in the ''hs_ref_GRCh37_chr19.fa'' file (this may take a while).  Then, use '''grep''' and '''wc''' to do the same thing.
 +
#* What answer does Match give you?
 +
#* What answer does '''grep''' + '''wc''' give you?
 +
#* Explain why the counts are different. (''Hint:'' Make sure you understand what exactly is being counted by each approach.)

Revision as of 01:39, 22 September 2015

Complement of a Strand

I decided to use the E. coli file for practice in the first part of the assignment. Initially, my instinct was to use SED as the command to replace the letters in the sequence. I needed to replace A with T, T with A, C with G, and G with C; however, I realized that SED only replaces things in a sequence, and if I used SED then every letter that changed would immediately change back to the original, defeating the purpose of the command. I then remembered that I can use sed “y”/<original characters>/<new characters>/ to replace everything in one go.

I opened the prokaryote file using cat infA-E.coli-K12.txt, which gave me the DNA sequence of the mRNA-like strand. If this is read from 5’ to 3’, I needed to create the complementary strand. I then typed in the sed rule indicating what I wanted to replace, which gave me the complementary strand. The complete command was

cat infA-E.coli-K12.txt | sed “y/atcg/tagc/”.

Reading Frames

First I opened the file and replaced all of the T's with U's using

sed "s/t/u/g" 

Which gave me the DNA sequence translated into mRNA. This still gave me a long string of letters so I used

sed "s/.../& /g" 

to indicate that I wanted a space every three letters, separating the sequence into codons.

Then, from looking at the file genetic-code.sed, which contains a separate list of each codon and the letter of the corresponding amino acid, I knew that this file needed to be added to the list of commands in order for its information to be used with the infA-E.coli-K12.txt file. The final string of commands for the +1 sequence then looks like this:

cat infA-E.coli-K12.txt | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed

For the +2 reading frame I needed to figure out how to delete the first nucleotide in the sequence. I did this by adding the command

sed "s/^.//g" 

This command has to come right after the file is opened. Each (.) after the carrot indicates a deletion of one character starting from the beginning of the first line. I also realized that there would be a nucleotide or two left over so I needed to truncate it somehow so only the codons that would be translated into amino acids would show. This is done by using this command:

sed “s/[acug]//g”
I then proceeded with the rest of the commands so the list of commands looked like this:
cat infA-E.coli-K12.txt | sed "s/^.//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed “s/[acug]//g”

To get the +3 reading frame I added one more (.) after the carrot so the command was

sed "s/^..//g"

indicating that I wanted to delete the first TWO characters in the first line. This entire command was:

cat infA-E.coli-K12.txt | sed "s/^..//g" | sed "s/t/u/g" | sed "s/.../& /g" | sed -f genetic-code.sed | sed “s/[acug]//g”

XML PipeDB Match Utility Practice

  1. What Match command tallies the occurrences of the pattern GO:000[567] in the 493.P_falciparum.xml file? java -jar xmlpipedb-match-1.1.1.jar GO:000[567] < 493.P_falciparum.xml
    • How many unique matches are there? 3
    • How many times does each unique match appear? The first appears 113 times, the second appears 1,100 times, and the third appears 1,371 times.
  2. Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
    • Describe how you did this.
    • Based on where you find this occurrence, what kind of information does this pattern represent?
  3. What Match command tallies the occurrences of the pattern \"Yu.*\" in the 493.P_falciparum.xml file?
    • How many unique matches are there?
    • How many times does each unique match appear?
    • What information do you think this pattern represents?
  4. Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
    • What answer does Match give you?
    • What answer does grep + wc give you?
    • Explain why the counts are different. (Hint: Make sure you understand what exactly is being counted by each approach.)