Blitvak Week 3
From LMU BioDB 2015
Revision as of 00:38, 21 September 2015 by Blitvak (Talk | contribs) (minor edit to initial preparation text)
Contents
Individual Journal Assignment Week 3
Initial Preparations
- PuTTY was downloaded, installed, and initialized
- Connected to my.cs.lmu.edu workstation via PuTTY
- Entered ~dondi/xmlpipedb/data using
cd ~dondi/xmlpipedb/data
- All results were checked using the ExPASy Translate Tool and the Nucleic Acid Sequence Massager, provided by Attotron
- prokaryote.txt in ~dondi/xmlpipedb/data was examined using
cat prokaryote.txt
- prokaryote.txt was chosen for use in the first part of this assignment
- The sequence in prokaryote.txt was copied and pasted on a separate file for future reference and checking
Finding the Complementary Strand
sed "y/atcg/tagc/"
was found to replace all lowercase a's, t's, c's, and g's with t's, a's, g's, and c's respectively (in lines of text)- Using prokarote.txt, the given nucleotide sequence, the complementary strand was found by using
cat prokaryote.txt | sed "y/atcg/tagc/"
- The given nucleotide sequence was:
tctactatatttcaataggtacgatggccaaagaagacaatattgaacttgaaacgttgcctaataccatgttccgcgtataacccagccgccagttccgctggcggcattttaac
- The complementary strand, using
cat prokaryote.txt | sed "y/atcg/tagc/"
, was found to be:
agatgatataaagttatccatgctaccggtttcttctgttataacttgaactttgcaacggattatggtacaaggcgcatattgggtcggcggtcaaggcgaccgccgtaaaattg
- This result was confirmed by the Nucleic Acid Sequence Massager
Finding the 6 Reading Frames of prokaryote.txt
Initial Findings
- While still in ~dondi/xmlpipedb/data, genetic-code.sed was examined using
cat genetic-code.sed
- genetic-code.sed was found to contain all of the sed replacement commands needed to convert any mRNA triplet to an amino acid
- The large amount of sed replacement commands in genetic-code.sed made it apparent that linking them all together in one pipeline would be difficult and tedious. All of genetic-code.sed, ideally, would be exploited in one command
cat prokaryote.txt | sed "s/^.//g"
was found to remove the first letter from the nucleotide sequencecat prokaryote.txt | sed "s/^..//g"
was found to remove the first two letters from the nucleotide sequencecat prokaryote.txt | sed "s/.../ & /g"
was found to make the nucleotide sequence a set of triplets, with spaces inbetween eachrev prokaryote.txt
was found to reverse the sequence (changes the direction from 5' - 3' to 3' - 5', or vice versa)- It was assumed that the sequence in prokaryote.txt ran from 5' to 3'
sed "s/[atcg]//g
" was found to delete any uncapitalized nucleotide sequence letterssed "y/t/u/"
was found to replace any uncapitalized t's with u's; would be useful in converting a nucleic acid sequence into RNA- It was realized that a file with a set of sed commands could be exploited by using
sed -f <filename>
; would be a good pairing with genetic-code.sed!
Finding the Reading Frames of the mRNA-like strand (5'-3')
- +1 reading frame was found by using:
cat prokaryote.txt | sed "s/.../ & /g" | sed "y/t/u/" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[atcg]//g"
- Output:
STIFQ-VRWPKKTILNLKRCLIPCSAYNPAASSAGGIL
- Output:
- +2 reading frame was found by using:
cat prokaryote.txt | sed "s/^.//g" | sed "s/.../ & /g" | sed "y/t/u/" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[atcg]//g"
- Output:
LLYFNRYDGQRRQY-T-NVA-YHVPRITQPPVPLAAF-
- Output:
- +3 reading frame was found by using:
cat prokaryote.txt | sed "s/^..//g" | sed "s/.../ & /g" | sed "y/t/u/" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[atcg]//g"
- Output:
YYISIGTMAKEDNIELETLPNTMFRV-PSRQFRWRHFN
- Output:
Finding the Reading Frames of the template strand (3'-5')
- -1 reading frame was found by using:
cat prokaryote.txt | rev prokaryote.txt | sed "y/atcg/tagc/" | sed "s/.../ & /g" | sed "y/t/u/" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[atcg]//g"
- Output:
VKMPPAELAAGLYAEHGIRQRFKFNIVFFGHRTY-NIV
- Output:
- -2 reading frame was found by using:
cat prokaryote.txt | rev prokaryote.txt | sed "y/atcg/tagc/" | sed "s/^.//g" | sed "s/.../ & /g" | sed "y/t/u/" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[atcg]//g"
- Output:
LKCRQRNWRLGYTRNMVLGNVSSSILSSLAIVPIEI--
- Output:
- -3 reading frame was found by using:
cat prokaryote.txt | rev prokaryote.txt | sed "y/atcg/tagc/" | sed "s/^..//g" | sed "s/.../ & /g" | sed "y/t/u/" | sed -f genetic-code.sed | sed "s/ //g" | sed "s/[atcg]//g"
- Output:
-NAASGTGGWVIRGTWY-ATFQVQYCLLWPSYLLKYSR
- Output:
Checking Results
- Using the ExPASy Translate Tool,
tctactatatttcaataggtacgatggccaaagaagacaatattgaacttgaaacgttgcctaataccatgttccgcgtataacccagccgccagttccgctggcggcattttaac
was entered and converted into the possible sequences of amino acids (output format was selected as compact). The 6 reading frames, as given by this tool, matched those found in the assignment
XMLPipeDB Match Practice
Preparations
- The program xmlpipedb-match-1.1.1.jar was found in ~dondi/xmlpipedb/data
- It was found that java programs can be run by using
java -jar <program name>
- xmlpipedb-match-1.1.1.jar would be run, for the purpose of matching patterns, by using
java -jar xmlpipedb-match-1.1.1.jar <pattern> < <filename>
- 493.P_falciparum.xml was found in ~dondi/xmlpipedb/data and examined using
cat 493.P_falciparum.xml
; it took quite some time to fully load (viewing using more seems like a good idea)
Working with XMLPipeDB Match
- Match command for the tallying of the occurrences of the pattern
GO:000[567]
in 493.P_falciparum.xmljava -jar xmlpipedb-match-1.1.1.jar GO:000[567] < 493.P_falciparum.xml
can be used to match occurrences ofGO:0005
,GO:0006
, andGO:0007
- 3 total unique matches were found:
GO:0005
,GO:0006
, andGO:0007
- Occurrences of each unique match: 113 for
GO:0007
, 1100 forGO:0006
, and 1371 forGO:0005
- Observing "in situ" occurences of
GO:000[567]
in 493.P_falciparum.xmlmore 493.P_falciparum.xml
was used to make the viewing of the file more manageable- While in more, by typing /GO:0006 and pressing enter, a line containing the pattern
GO:0006
was present at the top of the window (surrounded by the file's text) - Based on the surrounding text, the pattern likely represents the beginning portion of an ID string tied to various genes in a gene database for Plasmodium falciparum
- In the text, it was found that various processes/metabolic pathways are connected to each database ID string (likely influenced by the genes in question)
- Match command for the tallying of the occurrences of the pattern
\"Yu.*\"
in 493.P_falciparum.xml- 3 total unique matches were found:
"yu b."
,"yu k."
, and"yu m."
- Occurrences of each unique match: 1 for
"yu b."
, 228 for"yu k."
, and 1 for"yu m."
. - I'm fairly certain that this pattern represents a person's name. By using
more 493.P_falciparum.xml
, and typing /\"Yu.*\", an example of an in-text line containing this pattern was found. It was observed that this pattern is preceded by<person name=
- 3 total unique matches were found:
- Using Match and grep + wc to count occurences of the pattern
ATG
in hs_ref_GRCh37_chr19.fa- hs_ref_GRCh37_chr19.fa was found in ~dondi/xmlpipedb/data
java -jar xmlpipedb-match-1.1.1.jar ATG < hs_ref_GRCh37_chr19.fa
was employed to find the instances ofATG
via Match- Output: 1 unique match,
atg
, was found. There are 830101 instances ofatg
in the file
- Output: 1 unique match,
grep "ATG" hs_ref_GRCh37_chr19.fa | wc
was used to find the instances ofATG
using grep + wc- Output: 502410 lines, 502410 words, and 35671048 characters (the output of grep + wc is unlabeled, it is always lines, words, and characters from left to right)
- There is a large difference between the outputs of Match and grep + wc in regards to finding the occurrences of
ATG
. This big difference is due to the fact that Match finds specific instances of theATG
pattern (possibly several in a line) while grep + wc just finds lines that contain at least one instance ofATG
and counts those lines. grep + wc treats lines and words as the same since it sees the output lines (of grep) as words (there are no spaces/breaks within each individual line)
Brandon Litvak
BIOL 367, Fall 2015
Weekly Assignments | Individual Journal Pages | Shared Journal Pages |
---|---|---|
|
|
|