Emilysimso Week 3
From LMU BioDB 2015
Contents
- 1 Write a sequence of piped text processing commands that, when given a nucleotide sequence, returns its complementary strand.
- 2 Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence.
- 3 XMLPipeDB Match Practice
- 3.1 What Match command tallies the occurrences of the pattern GO:000[567] in the 493.P_falciparum.xml file?
- 3.2 Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
- 3.3 What Match command tallies the occurrences of the pattern \"Yu.*\" in the 493.P_falciparum.xml file?
- 3.4 Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
- 4 Weekly Assignment Information
Write a sequence of piped text processing commands that, when given a nucleotide sequence, returns its complementary strand.
Sequence used: 5'-gcattaggcaac-3'
- Used sed "y/atgc/tacg/" to perform complimentary base pairing
Resulting sequence: 3'-cgtaatccgttg-5'
Write 6 sets of text processing commands that, when given a nucleotide sequence, returns the resulting amino acid sequence, one for each possible reading frame for the nucleotide sequence.
Sequence used: 5'-gcattaggcaac-3'
- Used sed "y/t/u/" to change all t's to u's
Resulting sequence: 5'-gcauuaggcaac-3'
+1 Reading Frame: 5'-gca uua ggc aac-3'
- Used "sed "s/gca/A/g" to replace the first codon with A
Resulting sequence: 5'-AuuagAac-3'
- This is not the desired result
- Used "sed "s/^gca/A/g" to replace only the first codon with A
Resulting sequence: 5'-Auuaggcaac-3'
- Used "sed "s/uua/L/g" then sed "s/ggc/G/g" then sed "s/aac/N/g" to get the final result
+1 Reading Frame Amino Acids: ALGN
+2 Reading Frame: 5'-g cau uag gca ac-3'
- Used sed "s/^g/ /g" then sed "s/cau/H/g" then "s/uaggcaac/ stop/g"
+2 Reading Frame Amino Acids: H stop
+3 Reading Frame: 5'-gc auu agg caa c-3'
- Used sed "s/^gc/ /g" then sed "s/auu/I/g" then sed "s/agg/R/g" then sed "s/caa/Q/g" then sed "s/c$/ /g"
+3 Reading Frame Amino Acids: IRQ
===-1 Reading Frame: 5'-guu gcc uaa ugc-3'
- Used see "y/t/u/" to change t's to u's from complementary strand (3'-cgtaatccgttg-5' to 3'-cguaauccguug-5')
- Used echo "cguaauccguug" | rev" to reverse the strand
Resulting Stand: 5'-guugccuaaugc-3'
- Used sed "s/guu/V/g" then sed "s/gcc/A/g" then "s/uaa/ stop/g" then sed "s/ugc/ /g"
-1 Reading Frame Amino Acids: VA stop
-2 Reading Frame: 5'-g uug ccu aau gc-3'
- Used sed "s/^g/ /g" then sed "s/uug/L/g" then "s/ccu/P/g" then "s/aau/N/g" then sed "s/gc/ /g"
-2 Reading Frame Amino Acids: LPN
-3 Reading Frame: 5'-gu ugc cua aug c-3'
- Used sed "s/^gu/ /g" then sed "s/ugc/C/g" sed "s/cua/L/g" sed "s/aug/M/g" then sed "s/c/ /g"
-3 Reading Frame Amino Acids: CLM
XMLPipeDB Match Practice
What Match command tallies the occurrences of the pattern GO:000[567] in the 493.P_falciparum.xml file?
- Used the line: java -jar xmlpipedb-match-1.1.1.jar GO:000[567] < 493.P_falciparum.xml
There are 3 unique matches go:0007 appears 113 times, go:0006 appears 1100 times, go:0005 appears 1371 times
Try to find one such occurrence “in situ” within that file. Look at the neighboring content around that occurrence.
- Used grep "GO:0006" 493.P_falciparum.xml to find the lines containing the sequence
- This did not give context
- Used grep "...GO:0006" 493.P_falciparum.xml to find context
Result: id="GO:0006506
- These are ID numbers of some kinds, presumably
What Match command tallies the occurrences of the pattern \"Yu.*\" in the 493.P_falciparum.xml file?
- Used java -jar xmlpipedb-match-1.1.1.jar \"Yu.*\" < 493.P_falciparum.xml
There are 3 unique matches yu b. appears 1 time, yu k. appears 228 times, yu m. also appears once
- These are most likely names
Use Match to count the occurrences of the pattern ATG in the hs_ref_GRCh37_chr19.fa file (this may take a while). Then, use grep and wc to do the same thing.
- Used java -jar xmlpipedb-match-1.1.1.jar ATG < hs_ref_GRCh37_chr19.fa
Total matches: 830101
- This means that ATG appears together 830,101 times in the code
- Used grep "ATG" hs_ref_GRCh37_chr19.fa | wc
Result: 502410 502410 35671048
- This means that ATG appeared in 502,410 lines and contained 35,671,048 characters
The two counts are different because ATG may have appeared multiple times on in a line (the 502,410), explaining why this is a lower number.
Weekly Assignment Information
Assignments
- Week 1
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12
- Week 13
- Week 14
- Week 15
Individual Journal Entries
- Emilysimso Week 2
- Emilysimso Week 3
- Emilysimso Week 4
- Emilysimso Week 5
- Emilysimso Week 6
- Emilysimso Week 7
- Emilysimso Week 8
- Emilysimso Week 9
- Emilysimso Week 10
- Emilysimso Week 11
- Emilysimso Week 12
- Emilysimso Week 13
- Emilysimso Week 14
- Emilysimso Week 15
Class Journal Entries
- Class Journal Week 1
- Class Journal Week 2
- Class Journal Week 3
- Class Journal Week 4
- Class Journal Week 5
- Class Journal Week 6
- Class Journal Week 7
- Class Journal Week 8
- Class Journal Week 9
- Class Journal Week 10
- Class Journal Week 11
- Class Journal Week 12
- Class Journal Week 13
- Class Journal Week 14
- Class Journal Week 15