Kevin Wyllie Week 3

These pipes yield the following amino acid sequences (shown on right):
- +1 Nter- S T I F Q - V R W P K K T I L N L K R C L I P C S A Y N P A A S S A G G I L -Cter (shown in red)
- +2 Nter- L L Y F N R Y D G Q R R Q Y - T - N V A - Y H V P R I T Q P P V P L A A F -Cter (shown in green)
- +3 Nter- Y Y I S I G T M A K E D N I E L E T L P N T M F R V - P S R Q F R W R H F N -Cter (shown in blue)

For the -1 frame, open the file as usual, and then use the pipe from "Complement of a Strand" so that the commands after it will be applied to the complementary strand (instead of the original strand). Then, add the same pipe used for the +1 strand:

cat prokaryote.txt | sed "y/atgc/tacg/" | rev | sed "s/.../& /g" | sed "y/t/u/" | sed -f genetic-code.sed

As before, the -2 and -3 frames can be found by making a single adjustment to the pipe for the -1 frame. For the -2 frame:

cat prokaryote.txt | sed "y/atgc/tacg/" | rev | sed "s/^.//g" | sed "s/.../& /g" | sed "y/t/u/" | sed -f genetic-code.sed

cat prokaryote.txt | sed "y/atgc/tacg/" | rev | sed "s/^..//g" | sed "s/.../& /g" | sed "y/t/u/" | sed -f genetic-code.sed

These pipes yield the following amino acid sequences (shown on right):
- -1 Nter- V K M P P A E L A A G L Y A E H G I R Q R F K F N I V F F G H R T Y - N I V -Cter (shown in red)
- -2 Nter- L K C R Q R N W R L G Y T R N M V L G N V S S S I L S S L A I V P I E I - - -Cter (shown in green)
- -3 Nter - - N A A S G T G G W V I R G T W Y - A T F Q V Q Y C L L W P S Y L L K Y S R - Cter. (shown in blue)

cat 493.P_falciparum.xml | java -jar xmlpipedb-match-1.1.1.jar "GO:000[567]"

There are three unique matches (the maximum possible for this command).
- GO:0005 occurred 1,371 times.
- GO:0006 occurred 1,100 times.
- GO:0007 occurred 113 times.

grep "GO:0007" 493.P_falciparum.xml

Looking at the text found on the same lines as this pattern, it appears to be the first few characters of a gene ID. Based on prior knowledge, it also may have something to do with gene ontology, as I have seen "GO" as an acronym for that term before.

cat "493.P_falciparum.xml" | java -jar xmlpipedb-match-1.1.1.jar "\"Yu.*\""

There are three unique matches.
- "yu b." occurred one time.
- "yu k." occurred 228 times.
- "yu m." occurred one time.
A grep command for this pattern brings up lines such as:

<person name="Yu K."/>

So these may be names of biologists, perhaps those who were responsible for the discovery of a given gene.

To count occurrence of of "ATG."
- The match function finds 830,101 matches in hs_ref_GRCh37_chr19.fa (shown on right, in green).
- Connecting grep to wc finds 502,410 lines, 502,410 words and 35,671,048 characters (shown on right, in red).
- This discrepancy in matches is due to the differences in the functions. The Match function looks for the pattern outright, while grep-wc looks at the entirety of any line in which the pattern is found. The numbers that grep-wc returns apply to the lines that "ATG" is found in, not just the "ATG" pattern itself.

First, "ssh" into the server with the following command: ssh <username>@my.cs.lmu.edu
- The window will prompt you to enter your password. Type it in and press enter.
Gain access to Dondi's folder with: cd ~dondi/xmlpipedb/data
Open "prokaryote.txt" to view the DNA sequence it contains.
To sequence the complementary strand, two operations must be done to the original DNA sequence.
1. Each base of the original strand must be given its complement.
  - The command that corresponds to this step is: sed "y/atgc/tacg/" . This replaces all G's with C's, all T's with A's, and so on.
2. Since it is customary to express a nucleotide sequence in the 5' to 3' direction, the sequence must be reversed.
  - The command that corresponds to this step is: rev . Simply put, this reverses the sequence.
- Connecting these commands results in: cat prokaryote.txt | sed "y/atgc/tacg/" | rev

Contents