Difference between revisions of "Stephen Louie Project Notebook"

From LMU BioDB 2013
Jump to: navigation, search
(11/21/2013: added more to entry)
(Week13: added info to entry)
Line 32: Line 32:
 
==Week13==
 
==Week13==
 
===11/19/2013===
 
===11/19/2013===
 +
*Conducted side by side comparison of GeneIDs for gdb. and microarray data
 +
*For the gdb. file, used MS Access
 
===11/21/2013===
 
===11/21/2013===
*Used GenMAPP to analyze microarray data to see if there was a discrepancy in the gene IDs between the microarray data and the GenMAPP database.
+
*Ran a preliminary sanity check
**Followed instructions provided [[http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols here]]
+
**Used GenMAPP to analyze microarray data to see if there was a discrepancy in the gene IDs between the microarray data and the GenMAPP database.
**After the first run, the conversion yielded over 20,000 errors with no matches  
+
***Followed instructions provided [[http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols here]]
**When observing the microarray data, the Gene IDs used an uncapitalized letter as the third character space in the ID.  To see whether this was the cause of the discrepancy, one of the samples was changed to have a third capitalized letter in the ID
+
***After the first run, the conversion yielded over 20,000 errors with no matches  
 +
***When observing the microarray data, the Gene IDs used an uncapitalized letter as the third character space in the ID.  To see whether this was the cause of the discrepancy, one of the samples was changed to have a third capitalized letter in the ID
 
**After the second run, there was no change in the amount of errors or matches from the modification to the Gene ID in the microarray data
 
**After the second run, there was no change in the amount of errors or matches from the modification to the Gene ID in the microarray data
 
*In viewing the gdb. data in Microsoft Access, it was realized that the orderedlocusnames utilizes R.#### as the ID (Rhizobium Meliloti was the former name of the species).   
 
*In viewing the gdb. data in Microsoft Access, it was realized that the orderedlocusnames utilizes R.#### as the ID (Rhizobium Meliloti was the former name of the species).   

Revision as of 07:48, 22 November 2013


Contents

Week 12

11/12/2013

  • Gave presentation for Genome paper to class

11/14/2013

  • Conducted meeting with guilds. No meeting was conducted for Quality Assurance
  • Sat in on GenMAPP builder guild meeting for absent teammate
  • Downloaded and extracted data source files with Mitchell
    • UniProt XML
      • Followed directions provided Here
    • GOA
      • Note:Current directions were not working. Follow these instructions for your respective species
      • From Running GenMAPP Builder page, clicked on the UniProt-GOA Downloads link.
      • Was given an error message. Changed url from "ftp" to "http" at beginning.
      • Was entered, was taken to Index of/pub/database/GO/goa
      • Clicked on "proteomes" folder
      • Directed to Index of /pub/databases/GO/goa/proteomes. Downloaded 58.R_meliloti.goa
      • Note: R. meliloti is an alternative name to S. Melitoti.
    • GO OBO-XML
      • Followed directions provided Here
  • Created new database in PostgreSQL
    • Followed directions provided Here
  • Imported data into PostgreSQL
    • Followed directions provided Here
    • UniProt XML took 19.17 minutes
    • GO OBO-XML took 17.81 minutes to import and to 15.54 minutes process
    • GOA file took less than a minute
  • Exported Gene Database
    • Followed directions provided Here
    • Export took ~8 hours

Week13

11/19/2013

  • Conducted side by side comparison of GeneIDs for gdb. and microarray data
  • For the gdb. file, used MS Access

11/21/2013

  • Ran a preliminary sanity check
    • Used GenMAPP to analyze microarray data to see if there was a discrepancy in the gene IDs between the microarray data and the GenMAPP database.
      • Followed instructions provided [here]
      • After the first run, the conversion yielded over 20,000 errors with no matches
      • When observing the microarray data, the Gene IDs used an uncapitalized letter as the third character space in the ID. To see whether this was the cause of the discrepancy, one of the samples was changed to have a third capitalized letter in the ID
    • After the second run, there was no change in the amount of errors or matches from the modification to the Gene ID in the microarray data
  • In viewing the gdb. data in Microsoft Access, it was realized that the orderedlocusnames utilizes R.#### as the ID (Rhizobium Meliloti was the former name of the species).
    • Instead of using SM, R will be used instead to see whether that will make any substantial difference.

External Links

User Page
Team Page
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox