Difference between revisions of "Jkuroda Week 15"

From LMU BioDB 2015
Jump to: navigation, search
(log)
(Log: log from class 12/08/15)
Line 1: Line 1:
 
==Log==
 
==Log==
 +
* Ran some incomplete statistical analysis data from the GenMAPP users through creating a new expression dataset and generated an exception file which found some issues with our database.
 +
* The first time we ran it, there were exceptions for every single gene, because we did not compensate for the underscore in the ID. After inserting the underscore after the 'SO', we were able to find the actual errors.
 +
* First of all, there are 5408 genes listed in their data, compared to the 4196 genes we have in our database.
 +
* There are 760 gene IDs that are in the form SO_####F, which are genes that don't exist in our database.
 +
* There are 681 gene IDs that are in a 'normal' form (either SO_#### or SO_A####) but do not exist in our database.
 +
* For some of the gene IDs that have 'F's, there are multiple genes of the same ID.
 +
* We attempted to do a batch search on Uniprot of all 1441 missing IDs, and got zero results for them in their database. Furthermore, we did a spot check by searching every 100 IDs or so in the Uniprot KB and found that none of the IDs we searched exist.
 +
* We also searched for the 'F' IDs in our MOD and none of them exist in that either.
 +
* After this analysis, we have come to the conclusion that these 1441 IDs can be safely ignored, since they do not exist in Uniprot. We will simply need to modify our code to account for the absence of an underscore, much like we did with Vibrio Cholerae.
 +
 
==Individual Assessment & Reflection==
 
==Individual Assessment & Reflection==
 
=== Statement of Work ===
 
=== Statement of Work ===

Revision as of 23:59, 8 December 2015

Log

  • Ran some incomplete statistical analysis data from the GenMAPP users through creating a new expression dataset and generated an exception file which found some issues with our database.
  • The first time we ran it, there were exceptions for every single gene, because we did not compensate for the underscore in the ID. After inserting the underscore after the 'SO', we were able to find the actual errors.
  • First of all, there are 5408 genes listed in their data, compared to the 4196 genes we have in our database.
  • There are 760 gene IDs that are in the form SO_####F, which are genes that don't exist in our database.
  • There are 681 gene IDs that are in a 'normal' form (either SO_#### or SO_A####) but do not exist in our database.
  • For some of the gene IDs that have 'F's, there are multiple genes of the same ID.
  • We attempted to do a batch search on Uniprot of all 1441 missing IDs, and got zero results for them in their database. Furthermore, we did a spot check by searching every 100 IDs or so in the Uniprot KB and found that none of the IDs we searched exist.
  • We also searched for the 'F' IDs in our MOD and none of them exist in that either.
  • After this analysis, we have come to the conclusion that these 1441 IDs can be safely ignored, since they do not exist in Uniprot. We will simply need to modify our code to account for the absence of an underscore, much like we did with Vibrio Cholerae.

Individual Assessment & Reflection

Statement of Work

  • Describe exactly what you did on the project.
  • Provide references or links to artifacts of your work, such as:
    • Wiki pages
    • Other files or documents
    • Code or scripts

Assessment of Project

  • Give an objective assessment of the success of your project workflow and teamwork.
  • What worked and what didn't work?
  • What would you do differently if you could do it all over again?
  • Evaluate the Gene Database Project and Group Report in the following areas:
    1. Content: What is the quality of the work?
    2. Organization: Comment on the organization of the project and of your group's wiki pages.
    3. Completeness: Did your team achieve all of the project objectives? Why or why not?

Reflection on the Process

  • What did you learn?
    • With your head (biological or computer science principles)
    • With your heart (personal qualities and teamwork qualities that make things work or not work)?
    • With your hands (technical skills)?
  • What lesson will you take away from this project that you will still use a year from now?

Josh Kuroda's page

Individual Journal Entries

Week 2
Week 3
Week 4
Week 5
Week 6
Week 7
Week 8
Week 9
Week 10
Week 11
Week 12
Week 13
Week 14
Week 15

Shared Journal Entries

Week 1
Week 2
Week 3
Week 4
Week 5
Week 6
Week 7
Week 8
Week 9
Week 10
Week 11
Week 12
Week 13
Week 14
Week 15