Jkuroda Week 15
From LMU BioDB 2015
Contents
Log
- Ran some incomplete statistical analysis data from the GenMAPP users through creating a new expression dataset and generated an exception file which found some issues with our database.
- The first time we ran it, there were exceptions for every single gene, because we did not compensate for the underscore in the ID. After inserting the underscore after the 'SO', we were able to find the actual errors.
- First of all, there are 5408 genes listed in their data, compared to the 4196 genes we have in our database.
- There are 760 gene IDs that are in the form SO_####F, which are genes that don't exist in our database.
- There are 681 gene IDs that are in a 'normal' form (either SO_#### or SO_A####) but do not exist in our database.
- For some of the gene IDs that have 'F's, there are multiple genes of the same ID.
- We attempted to do a batch search on Uniprot of all 1441 missing IDs, and got zero results for them in their database. Furthermore, we did a spot check by searching every 100 IDs or so in the Uniprot KB and found that none of the IDs we searched exist.
- We also searched for the 'F' IDs in our MOD and none of them exist in that either.
- After this analysis, we have come to the conclusion that these 1441 IDs can be safely ignored, since they do not exist in Uniprot. We will simply need to modify our code to account for the absence of an underscore, much like we did with Vibrio Cholerae.
- In class on 12/10/15, we worked on figuring out the corrections for our GenMAPP code and made a new dataset using the finished data from the GenMAPP users.
- Then we ran GenMAPP Finder and ran into an issue because we were using the wrong column.
- Our group met up on 12/12/15 and continued working.
Individual Assessment & Reflection
Statement of Work
- Describe exactly what you did on the project.
- Provide references or links to artifacts of your work, such as:
- Wiki pages
- Other files or documents
- Code or scripts
Assessment of Project
- Give an objective assessment of the success of your project workflow and teamwork.
- What worked and what didn't work?
- What would you do differently if you could do it all over again?
- Evaluate the Gene Database Project and Group Report in the following areas:
- Content: What is the quality of the work?
- Organization: Comment on the organization of the project and of your group's wiki pages.
- Completeness: Did your team achieve all of the project objectives? Why or why not?
Reflection on the Process
- What did you learn?
- With your head (biological or computer science principles)
- With your heart (personal qualities and teamwork qualities that make things work or not work)?
- With your hands (technical skills)?
- What lesson will you take away from this project that you will still use a year from now?
Individual Journal Entries
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12
- Week 13
- Week 14
- Week 15