Nanguiano Week 3

The Genetic Code, by Computer

Connect to the my.cs.lmu.edu workstation as shown in class and do the following exercises from there.

For this exercise, I performed the following series of commands to prepare for the assignment.

ssh my.cs.lmu.edu -l nanguia1 
mkdir biodb
cat >"sequence_file.txt" 
agcggtatac 
cd biodb 
mkdir week3
mv sequence_file.txt biodb/week3
cd ~dondi/xmlpipedb/data
cp genetic-code.sed ~nanguia1/biodb/week3
cp xmlpipedb-match-1.1.1.jar ~nanguia1/biodb/week3
cp 493.P_falciparum.xml ~nanguia1/biodb/week3
cp hs_ref_GRCh37_chr19.fa ~nanguia1/biodb/week3
cd ~nanguia1/biodb/week3

Complement of a Strand

Write a sequence of piped text processing commands that, when given a nucleotide sequence, returns its complementary strand.

On a sequence_file.txt file containing the sequence "agcggtatac", the command and output was as follows:

cat sequence_file.txt | sed "y/atgc/tacg/"
tcgccatatg

Reading Frames

Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence. You should have 6 different sets of commands, one for each possible reading frame.

On a sequence_file.txt containing the sequence "agcggtatac", the command and output was as follows:

+1

cat sequence_file.txt | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g"
SGI

+2

cat sequence_file.txt | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g"
AVY

+3

cat sequence_file.txt | sed "s/^..//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[acgu]//g"
RY

The remaining three were divided onto two lines on this wiki because they could not fit onto one without causing graphical bugs. The actual command was written without newlines.

-1

cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | 
sed "s/ //g" | sed "s/[acgu]//g"
VYR

-2

cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/^.//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | 
sed "s/ //g" | sed "s/[acgu]//g"
YTA

-3

cat sequence_file.txt | sed "y/acgt/tgca/" | rev | sed "s/^..//g" | sed "s/.../& /g" | sed "s/t/u/g" | sed -f genetic-code.sed | 
sed "s/ //g" | sed "s/[acgu]//g"
IP

Check Your Work

Utilizing the ExPASy Translate Tool, I inputted my sample dna sequence, "agcggtatac". The result was as follows:

XMLPipeDB Match Practice

For your convenience, the XMLPipeDB Match Utility (xmlpipedb-match-1.1.1.jar) has been installed in the ~dondi/xmlpipedb/data directory alongside the other practice files. Use this utility to answer the following questions:

Note: I used this wiki page to learn about the match utility.

What Match command tallies the occurrences of the pattern GO:000[567] in the 493.P_falciparum.xml file?
- java -jar xmlpipedb-match-1.1.1.jar GO:000[567] < 493.P_falciparum.xml
- How many unique matches are there?
  - 3
- How many times does each unique match appear?
  - GO:007 : 113
  - GO:006 : 1100
  - GO:008 : 1371
Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
- One example was: <dbReference type="GO" id="GO:0005622">
- Describe how you did this.
  - grep "GO:000[567]" 493.P_falciparum.xml | more
- Based on where you find this occurrence, what kind of information does this pattern represent?
  - Based on where I found it, this pattern shows the gene ontology ID of a particular gene in the database.
What Match command tallies the occurrences of the pattern \"Yu.*\" in the 493.P_falciparum.xml file?
- java -jar xmlpipedb-match-1.1.1.jar \"Yu.*\" < 493.P_falciparum.xml
- How many unique matches are there?
  - 3
- How many times does each unique match appear?
  - "Yu b." : 1
  - "Yu k." : 228
  - "Yu m." : 1
- What information do you think this pattern represents?
  - I believe this pattern represents a name.
  - This was confirmed by running the command grep "Yu.*" 493.P_falciparum.xml
Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
- What answer does Match give you?
  - java -jar xmlpipedb-match-1.1.1.jar ATG < hs_ref_GRCh37_chr19.fa
  - Total unique matches: 1
  - Number of matches: 830101
- What answer does grep + wc give you?
  - grep "ATG" hs_ref_GRCh37_chr19.fa | wc
  - Lines: 502410
  - Words: 502410
  - Characters: 35671048
- Explain why the counts are different. (Hint: Make sure you understand what exactly is being counted by each approach.)

Links

Nicole Anguiano
BIOL 367, Fall 2015

Assignment Links

Individual Journals

Shared Journals

Nanguiano Week 3

Contents

The Genetic Code, by Computer

Complement of a Strand

Reading Frames

Check Your Work

XMLPipeDB Match Practice

Links

Assignment Links

Individual Journals

Shared Journals

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools