Stephen Louie Project Notebook
From LMU BioDB 2013
Contents |
Week 12
11/12/2013
- Gave presentation for Genome paper to class
11/14/2013
- Conducted meeting with guilds. No meeting was conducted for Quality Assurance
- Sat in on GenMAPP builder guild meeting for absent teammate
- Downloaded and extracted data source files with Mitchell
- UniProt XML
- Followed directions provided Here
- GOA
- Note:Current directions were not working. Follow these instructions for your respective species
- From Running GenMAPP Builder page, clicked on the UniProt-GOA Downloads link.
- Was given an error message. Changed url from "ftp" to "http" at beginning.
- Was entered, was taken to Index of/pub/database/GO/goa
- Clicked on "proteomes" folder
- Directed to Index of /pub/databases/GO/goa/proteomes. Downloaded 58.R_meliloti.goa
- Note: R. meliloti is an alternative name to S. Melitoti.
- GO OBO-XML
- Followed directions provided Here
- UniProt XML
- Created new database in PostgreSQL
- Followed directions provided Here
- Imported data into PostgreSQL
- Followed directions provided Here
- UniProt XML took 19.17 minutes
- GO OBO-XML took 17.81 minutes to import and to 15.54 minutes process
- GOA file took less than a minute
- Exported Gene Database
- Followed directions provided Here
- Export took ~8 hours
Week13
11/19/2013
- Conducted side by side comparison of GeneIDs for gdb. and microarray data
- For the gdb. file, used MS Access
11/21/2013
- Ran a preliminary sanity check
- Used GenMAPP to analyze microarray data to see if there was a discrepancy in the gene IDs between the microarray data and the GenMAPP database.
- Followed instructions provided [here]
- After the first run, the conversion yielded over 20,000 errors with no matches
- Used GenMAPP to analyze microarray data to see if there was a discrepancy in the gene IDs between the microarray data and the GenMAPP database.
- When observing the microarray data, the Gene IDs used an uncapitalized letter as the third character space in the ID. To see whether this was the cause of the discrepancy, one of the samples was changed to have a third capitalized letter in the ID
- After the second run, there was no change in the amount of errors or matches from the modification to the Gene ID in the microarray data
- In viewing the gdb. data in Microsoft Access, it was realized that the orderedlocusnames utilizes R.#### as the ID (Rhizobium Meliloti was the former name of the species).
- Instead of using SM, R will be used instead to see whether that will make any substantial difference.