LMU BioDB 2013 - New pages [en]

Stephen Louie Deliverables

2013-12-13T23:22:34Z

Slouie: /* Assessment of Project */

==Statement of Work==

==Assessment of Project==
:In terms of individual roles, every group member was fairly competent when completing his or her own tasks. However, things did not work out as smoothly when the project had to come together. While each group member had an idea of what the other roles were responsible for, the specific procedures and details were only known by the people who were assigned to the task. This made it slightly difficult in coordinating overall efforts. If this project were to be repeated, it would be more ideal if the roles were less program specific and team members shared more duties. This is not to say that there was an inordinate distribution of work, but that the work flow may have been better if everyone worked at the same pace.
:In terms of quality,a lot of the results were actually skewed due to oversights. This oversight led to a highly exaggerated error report which was only caught after the presentation. These corrections are reflected in the final report. As of organization, the wiki page is somewhat cluttered. This is simply due to the massive amount of content that is related to the project. The wiki is still navigable. All of the project content is present, just difficult to find. This has been a common issue with the group where all members will have their respective content, but will have a difficult time in linking the content together. The project was partially completed. With the NaCl dataset, a successful gene database was produced. For the sucrose dataset, there were several issues with the data itself which caused a delay for several days. Thus, the gene database that was exported was done at last minute and contained an exceptional amount of errors.

==Reflection of Process==
I learned a lot about wiki interface that I did not know about before. I also have had a small taste of coding from the Gene ID project. I learned that about the importance of communication between team members. In terms of technical skills, I have become slightly more proficient with computer interface. Interms of what I will be doing a year from now, I am sure that the wiki skill I learned will come in handy.

Tauras' Assessment and Reflection

2013-12-13T22:43:13Z

Taur.vil: /* Evaluation of the Project and Group Report */

==Statement of Work==
:On this project, I worked as a combination of the coder and a general assistant to the other tasks in the project. My particular accomplishments include downloading and updating files for the gene database, completing four import export cycles, modifying GenMAPP builder for our strain and species, and preparing final versions of the ReadMe file, Testing Report, powerpoints, and group report. Throughout the project, I did most of the group update wiki pages.
:Working with my teammates, I spent a lot of time working with Kevin to interpret, understand, and analyze the microarray data. Together, we worked to create the gene database schema, determine the amount of significant genes, work in MAPPfinder, and interpret our results. I worked with Alina to complete the gene database testing report and write the introduction to the final report.
:I also coordinated individual contributions to the powerpoints and deliverables as well as serving as a final, last step quality control.

===Some Primary Files and Wiki Pages===
*Code changes made to GenMAPP builder: [[Media:Code_to_redo_IDs.pdf|PDF file]]
*The final gene database: [[Media:Streptococcus_pneumoniae_TIGR4_20131125.gdb]]
**Other gene databases: [[Media:20131113 G45export tATK TPV.gdb|IE1 G54]], [[Media:20131113 R6export tATK TPV.gdb|IE1 R6]], [[Media:20131107_GenMAPPExport_tATK_TIGR4_TPV.gdb|IE1 TIGR4]], [[Media:20131118_E2_tATK_TIGR4.gdb|IE2 TIGR4]], and [[Media:20131120_E3_tATK_TIGR4.gdb|IE3 TIGR4]]
*Final Testing Report, particularly through TallyEngine results: [[tATK E4: TIGR4 Testing Report]]
**Partial Testing Report Files for Other Databases: [[tATK E2: TIGR4 Testing Report|IE2]] and [[tATK E3: TIGR4 Testing Report|IE3]]
*The ReadMe file: [[Media:ReadMe_Streptococcus_pneumoniae_TIGR4_20131125.pdf|ReadMe_Streptococcus_pneumoniae_TIGR4_20131125.pdf]]
*Gene Schema: [[Media:Streptococcus_schema_20131210.pdf|Streptococcus_schema_20131210.pdf]]
*Final Report: [[Media:TATK_GeneDatabaseReport.pdf|tATK Group Report]]

==Assessment of Project==
===Overall Assessment===
:I would say that this project was mostly successful. The GenMAPP work ran a bit behind schedule due to abnormalities in the data and the time available for in depth analysis was limited, but besides that everything in the project ran rather smoothly, especially the parts having to do with the gene database. We were able to successfully complete the project, found novel results, and everything came together in the end.

===Workflow and Teamwork===
:The three of us working together on this project had a good team chemistry and were all willing to come together to make things work. There were some issues of mass confusion but we were usually able to resolve those. The one issue we had in terms of teamwork was finding times to meet. We all had very busy schedules which made it difficult to find time outside of class, but by working independently and talking sporadically we were able to make progress.
:Workflow could have been optimized a lot. During the first couple weeks of the project when we didn't know precisely what we were doing, I wasted a lot of time running unnecessary I-E sequences that would then have to be done again the next week because of something that we had changed. For example, I think it would have been valuable to wait until after we had made the species profile to run the firs export. This would have opened up more time to really understand what is happening in the data before exporting unnecessary files and also would have helped me work with Kevin on the microarray data to start the class and make more progress there. As it was, microarray data took longer than any other part of the project and there was a stressful point to go with two weeks left because everything else was complete but we had no idea how long it would take.

===What Worked and Didn't Work===
:In this project, pretty much everything worked in the end. The main issues were understanding how the microarray data differed from the VC data while doing analysis. Due to different amounts of columns, the commands in the protocol ended up not referring to the proper cells and it was very difficult to determine what had to be done without that guide.
:Our team dynamic was very successful and pretty much everything else in the project worked well. From my perspective, working with Dondi on modifying GenMAPP builder was an especially smooth process.

===What I would do differently===
:If I were to do this project again, I would form a better basic understanding of what was happening in the first place. If I knew more about the project dynamics, we could have avoided unnecessary exports and gotten an earlier start on the microarray data which would have saved a lot of time and stress. Additionally, I felt like I really didn't understand anything for the first three weeks and I think I could have solved a lot of my confusion by looking at the project deliverables at that point instead of reading through the independent milestones and project overview.

==Evaluation of the Project and Group Report==
*Content: I consider the project to be of medium to medium high quality. For the group project, the introduction is very well written and the methods/results are easy to follow with an understanding of the field and thorough. The discussion itself seems a little weak in comparison to the rest of the project, likely because there was insufficient time for full indepth analysis. All of the other project deliverables are complete and I am particularly proud of the ReadMe file and Testing Report. More work could have been put into some of the later figures such as the MAPP file which still looks slightly odd.
*Organization: Our project is well organized, clear and easy to follow. The group wiki page is slightly more confusing because it was made as we were working through parts of the project at uneven paces, but can be easily interpretted by looking at the template or table of contents at the top of the page.
*Completeness: Yes, we did manage to achieve all the project objectives. We managed to do so by finally figuring out what to do in GenMAPP during the last week before the project deadline.

==Reflection on the Process==
*What did you learn?
**With your head (biological or computer science principles)
***I learned a lot from this project about how different parts of a project come together in multidisciplinary research. Before the project (and honestly halfway through it), I didn't quite grasp how research like this is not a linear line of thought but a convergent one where different lines of work combine in the end and build on each other.
**With your heart (personal qualities and teamwork qualities that make things work or not work)?
***In terms of personal skills, I learned about the importance of good communication and a clear structure to work in. Many times throughout the project, I had to help Kevin with the microarray information and if there wasn't good communication on the topic I was ultimately unable to help him. Other times when he could cleanly say what the issue was and what he had tried, we were much more successful. I also learned that it's very important to have group members who take initiative and read all the assignments. We had some difficulties early on in accurately completing projects because I was the only one who read the assignment and made sure each part was present. Later as well, there were many cases where Kevin needed help simply because he hadn't completely read the instructions/didn't know what the deliverable product was (although there were also many times the full directions were just confusing). The mindset of making sure everything is important and everybody in the group being obsessed with quality assurance is definitely a teamwork quality I will try to emphasize in future projects.
**With your hands (technical skills)?
***During the course of this project, I learned how to create a gene database and then how to use that database to interpret microarray data. In particular, I was fascinated by the way Eclipse works as a java editing software and, although I still have no idea what I'm doing in it, the small amount I learned working with the GenMAPP builder code was very valuable. I also learned technical skills related to maintaining a wiki page for a developing project, saving, updating, and organizing project files on my computer, and how to effectively transmit our results to other group members.
*What lesson will you take away from this project that you will still use a year from now?
**I don't know where I'll be a year from now (hopefully graduate school) or what I'll be doing, so I'm not sure if I will still use many, if any, of the technical skills from this class. The one thing I know I'll take away is an increased perspective on the research process as not just a series of linear steps but as a multi-headed creation with multiple avenues of approach that can be viewed simultaneously. I also hope to keep in mind that you can gain valuable new information by asking an old question with a new spin or looking for changes with prior research.

==Template==
{{Template:Team ATK}}

TATK Final Group Deliverables

2013-12-13T21:59:04Z

Taur.vil: /* Project Deliverables */ typo correction

==Team Presentations==

*[[Media:TeamATK.pdf|Journal Club Presentation]]
*[[Media:TeamATK_finalpresentation.pdf|tATK Final Presentation]]

==Project Deliverables==
*[[Media:Streptococcus_pneumoniae_TIGR4_20131125.gdb|GenMAPP Gene Database: Streptococcus Pneumoniae TIGR4 20131125]]
*[[Media:ReadMe_Streptococcus_pneumoniae_TIGR4_20131125.pdf|ReadMe_Streptococcus_pneumoniae_TIGR4_20131125.pdf]]
*[[Media:Streptococcus_schema_20131210.pdf|Gene Database Schema]]
*[[Media:Streptococcus pneumoniae TIGR4 20131125 GeneTestingReport.pdf|PDF Gene Testing Report]]
*[[Media:20131125_teamATK_KM_CompiledRawData.xls|Processed and Analyzed Microarray Dataset]]
*[[Media:20131212_teamATK_KM_compiledrawdata_GMAPP.gex|GenMAPP Expression Dataset File]]
*[[Media:20131210_tATK_filteredresults.xls|Filtered MAPPfinder Results]]
*[[Media:20131211_tATK_sugar_transmembrane_transporter_activity.mapp|MAPP File of Sugar Transmembrane Transporter Activity]]
**[[Media:MAPP.JPG|JPEG Image of MAPP File]]
*[[Media:TATK_GeneDatabaseReport.pdf|tATK Group Report]]

==Individual Assessments and Reflections==
*[[Kevin's Assessment and Reflection]]
*[[Alina's Assessment and Reflection]]
*[[Tauras' Assessment and Reflection]]

==Template==
{{Team ATK}}

Miles Malefyt deliverables

2013-12-13T21:48:43Z

Mmalefyt: /* week 15 */

==Statement of work==
==Week 12==
*Read the paper on the salinity and sucrose stress on gene expression
*Sorted the raw data into an XML file
*started to compile the raw data
**downloaded all raw data and sorted through the information needed
**used the cys5 and cys3 fold change as well as all the IDs
*Uploaded [[Media:Team_Name_NaCl_compiled_raw_Data.xls|300 NaCl compiled data set]]
==Week 13==
*I continued to sort the raw data and began to process the data in an xls file
*this was a very repetitive part because it involved a lot of replications for each time set
*finished my compiled raw data and processed raw data, as well as the data ready for GenMAPP
*Made individual LOG fold change ratios for each time point replicate then averaged all of the LOG fold change ratios for each of the time points
*preformed a Tstat test
*Preformed a Pvalue test
*added a row of N next to the gene ID name in the forGenMAPP tab
*uploaded [[Media:Complete_processed_Data.xls|Processed Data]]
**NOTE:the GenMAPP version of the tab is labeled Complete Processed data_MPM and not forGenMAPP
==week 15==
*I worked on some of the mistakes that I had made in my prior data sets
**removed AVG_LOGFC_ALL row
**added individual Pvalues and TSTAT for each individual replicate of the experiment instead of one Tstat and P value for the whole experiment
*sanity check concluded the number of genes significantly changed at each time point
**T15- 5520
**T30- 7484
**T60- 6711
**T240- 5901
*Removed all of the #DIV/0! from the data that was transferred over to the GenMapp data
*uploaded [[Media:Complete_processed_Data_MPM.xls|XLS Version]] and [[Media:Complete_processed_Data_MPM.txt|TXT version, USE THIS]]

*had to change names of the columns in order to correctly upload to GenMAPP
**system code column was renamed
**Gene ID column was renamed to ID on the Programmers computer to resolve some issues
*it appears that there is something wrong with the actual gene IDs that is not compatible with GenMAPP
*ran the first integration of the data and came up with 5535 errors which is roughly half of the overall genes we loaded.
*loaded genMAPP and Mappfinder results

[[Media:MAPPFinder_results_T15-Criterion_Increased-GO.txt|Increased expression at T15]]

[[Media:MAPPFinder_results_T15-Criterion_Decreased-GO.txt|Decreased expression at T15]]

[[Media:T30_increased-Criterion0-GO.txt|Increased expression at t30 300mm NaCl]]

[[Media:T30_MAPP_results-Criterion_Decreased-GO.txt|Decreased expression at t30 300mm NaCl]]

[[Media:MAPPFinder_t60-Criterion_Increased-GO.txt|Increased expression at t60]]

[[Media:MAPPFinder_t60-Criterion_Decreasaed-GO.txt|Decreased expression at t60]]

[[Media:300N_T240_MAPPFINDER-Criterion_Increased-GO.txt|Increased expression t240 300mmNaCl]]

[[Media:300N_T240_MAPPFINDER-Criterion_Decreased-GO.txt|decreased expression at t240 300mmNaCl]]
*analyzed the Gene Ontology report
**found a specific metabolic pathway that was pertinent to the treatment of NaCl

Alina's Assessment and Reflection

2013-12-13T18:53:43Z

Ajvree: /* Reflection on the Process */

== Statement of Work ==
===Identification of Gene IDs===
*Used Microsoft Access to investigate files of original 3 strains (TIGR4, R6, G54)
*Identified/checked gene ID formats for consistency and looked for orderedlocus totals
*Format/total for TIGR4 strain of interest were as follows:
**TIGR4: SP_#### Orderedlocus: 2126

===Updating of Gene Database Testing Reports===
*Was responsible for running counts (XMLPipeDB Match, Tally Engine, SQL)for all export testing reports
*Performed visual inspections within gdb files
**Systems table
**UniProt table
**RefSeq table
**OriginalRowCounts
*Used snipping tool to provide screenshots for most results in testing
*(Did not update GenMAPP aspects, left to Kevin)
*Compared found Gene ID formats with corresponding online resources (UniProt, RefSeq)
*Investigated Ensembl (MOD) to compare gene totals
*Export Reports contributed to:
**[https://xmlpipedb.cs.lmu.edu/biodb/fall2013/index.php/TATK_E4:_TIGR4_Testing_Report#Compare_Gene_Database_to_Outside_Resource|E4] [https://xmlpipedb.cs.lmu.edu/biodb/fall2013/index.php/TATK_E3:_TIGR4_Testing_Report|E3] [https://xmlpipedb.cs.lmu.edu/biodb/fall2013/index.php/TATK_Export_One:_TIGR4_Testing_Report|E1]

Electronic Notebooks: [[Ajvree Week 12|Week12]] [[Ajvree Week 13|Week13]] [[Ajvree Week 14|Week14]] [[Ajvree Week 15|Week15]]

===XML/Exception File Comparison===
*Used XMLPipeDB Match in order to search Uniprot XML file for species IDs with format SP_[0-9][0-9][0-9][0-9] and got 2126 results.
*This was saved as a text file in order to make compatible for Excel. This file and exceptions file were put in adjacent columns and Excel match function was used. All results found to be #N/A.
*''Excel File:'' [[Media:20131203_IDExcelcomparison_tATK_TIGR4_AJV.xls|ID Comparison]]

===Visual Representation of Data Information===
*Helped Kevin with visual representating replicate information from microarray paper
*Used Paint program to create diagram
[[Image:TATK tree.png|200px]]

* Provide references or links to artifacts of your work, such as:
** Wiki pages
** Other files or documents
** Code or scripts

== Assessment of Project ==

* '''Give an objective assessment of the success of your project workflow and teamwork.'''
**As a group we were very successful when it came to completing tasks and working together as a team. All members were committed to the tasks they were given and we had no problems getting together to work on the project. We had no problems with one another individually, and all group members were as supportive and as helpful as possible. Had it not been for the slight hiccup in terms of analyzing the microarray data, the workflow would have been very smooth.
* '''What worked and what didn't work?'''
** We had a little trouble getting the microarray data in a form we could use, but other than that most of our testing went flawlessly. All of our counts matched our expected value, and our ID exceptions/errors from the database could not be fixed due to their absence in our original file. Our team stayed on track despite the delay for the microarray data, but we were able to work efficiently enough to catch up once the data was in order.
* '''What would you do differently if you could do it all over again?'''
**I think I would try and learn more about the coder's tasks. While I did work side by side with my team, I feel I could have immersed myself a little more and not focused so much on just completing particular tasks. Another thing I would improve on is the detail content within the electronic journal. In some cases this semester I could have given more information on what I actually did, instead of just summarizing.
* Evaluate the Gene Database Project and Group Report in the following areas:
* '''Content: What is the quality of the work?'''
**The work we completed is high quality, due to the care and diligence my team put into making it the best they possibly could. The group had no problem asking each other questions or asking the professors questions when they needed clarification or advice on certain topics. The work really reflects the effort that was put into it.
* '''Organization: Comment on the organization of the project and of your group's wiki pages.'''
**Overall, I think the content on our wiki was pretty organized, despite having our respective information somewhat scattered in our journals. It all came together once we compiled everything into the testing reports and project. The team template really made it easy to navigate through all our team pages and keep track of our information. The project is also well-organized, with a flow of information that makes sense.
* '''Completeness: Did your team achieve all of the project objectives? Why or why not?'''
**My team was very on-the-ball when it came to completing objectives. The team worked very hard at making sure everything was done, and done correctly, using whatever resources were available.

== Reflection on the Process ==

* What did you learn?
** With your head (biological or computer science principles)
The different projects from the groups supplied me with some new knowledge about species I previously was not too familiar with. It was interesting to learn about what kinds of studies are being done on the different species using the skills and information about databases and the programs that go into creating them.
* With your heart (personal qualities and teamwork qualities that make things work or not work)?
A personal quality that I found I need to improve on is my public speaking ability. While I am able to perform fairly well, and perhaps better than quite a few of my peers, I still do not feel as confident as I should while talking about a topic in front of a group. Interacting with my group and seeing how well we worked together in comparison to some of the other groups gave me some insight into what working in real-life teams might be like, and you may not always get to work with people that have the team's success in mind.
* With your hands (technical skills)?
I got a chance to work with a lot of programs that I probably would have never even heard of had I not taken this class. The coding process was difficult, but I found it rewarding to try my hand at something I had never experienced. It was insightful to apply concepts I am familiar with in biology to a new frontier I have never stepped foot in.
* What lesson will you take away from this project that you will still use a year from now?
I will take the lessons I learned from being involved in a long term project and apply them to future group projects I will have to be involved with. Hopefully I will get to apply my new awareness of biological databases and use it for future research or in other future endeavours.

[[Category:Group Projects]]

Team H(oo)KD Final Project Deliverables

2013-12-13T11:44:43Z

Kdahlquist: fixed link to EB to RB with rif decreased

{{Team H(oo)KD}}

==Deliverables==

*Gene database for ''Chlamydia trachomatis'' A/HAR-13 (.gdb): [[Media:Ct-Std External 20131121.gdb|Ct-Std_External_20131121.gdb]]
*ReadMe file including the gene database schema:[[Media:ReadMe Ct-Std External 20131122.pdf|ReadMe_Ct-Std_External_20131122.pdf]]
*Gene database testing report: [[Media:Ct External 20131121 Gene Database Testing Report.pdf|Ct_External_20131121_Gene_Database_Testing_Report.pdf]]
*Processed and analyzed DNA microarray dataset (.xls):[[Media:Final Excel Sheet Used for Project.xls | processed and analyzed DNA microarray dataset]]
*GenMAPP Expression Dataset file (.gex):
:*For data collected in absence of rifampicin: [[Media: For GenMAPP Chlamydia V4 20131205 KS.gex|For_GenMAPP_Absence_of_Rifampicin.gex]]
:*For data collected in presence of rifampicin: [[Media:For GenMAPP Chlamydia V4 20131212 KS Presence of Rifampicin.gex|For_GenMAPP Chlamydia_V4_20131212_KS_Presence_of_Rifampicin.gex]]
*Filtered MAPPFinder Results (.xls):
*[[Media: EB to RB No Rif 20131207 KS DW-Criterion0-GO.xls|EB to RB No Rif Increased]]
*[[Media: EB to RB No Rif 20131207 KS DW-Criterion1-GO.xls|EB to RB No Rif decreased]]
*[[Media: MAPPFinder Results EB to RB Rifampicin 20131212-Criterion0-GO (1).xls ‎| EB to RB Rif Increased]]
*[[Media:MAPPFinder_Results_EB_to_RB_Rifampicin_20131212-Criterion1-GO_(1).xls|EB to RB Rif decreased]]
*Sample MAPP file of a relevant biological pathway for your species (.mapp):[[Media:EBtoRB_Rifampicin_Cellular_Carbohydrate_Metabolic_Process.mapp]]
:*Picture of the MAPP (.jpeg):[[Media:EBtoRB Rifampicin Cellular Carbohydrate Metabolic Process.jpg]]
*Group Report (.pdf): [[Media:Report for Final Project HDDWKS 20131213.pdf|Report_for_Final_Project_HDDWKS_20131213.pdf]]
*PowerPoint presentation (given on Thursday, December 12): [[Media:Transcriptional_Analysis_of_the_Developmental_Stages_of_Chlamydia_trachomatis_A_HAR-13_HDKSDW_20131212.pdf|Transcriptional_Analysis_of_the_Developmental_Stages_of_Chlamydia_trachomatis_A_HAR-13_HDKSDW_20131212.pdf]]

Streptococcus pneumoniae TIGR4 20131125 GeneTestingReport

2013-12-13T07:33:59Z

Taur.vil: new page name

==Export Information==

Version of GenMAPP Builder: 2.0b73
:Database called: tATK_TIGR4_2013NOV25

Computer on which export was run: Tauras' Personal Computer

Postgres Database name: tATK_TIGR4_2013NOV25

UniProt XML filename: [[Media:20131118_UniProtXML_tATK_TIGR4_TPV.xml|20131118_UniProtXML_tATK_TIGR4_TPV.xml]]
* UniProt XML version: UniProt Release 2013_11; 2013Nov13
* Time taken to import: 3.15min

GO OBO-XML filename: [[Media:20131120_OBOXML_tATK_TPV.gz|20131120_OBOXML_tATK_TPV.obo-xml]]
* GO OBO-XML version: 2013Nov20
* Time taken to import: 10.59min
* Time taken to process: 9.25min

GOA filename: [[Media:20131118_GOA_tATK_TIGR4_TPV.goa|20131118_GOA_tATK_TIGR4_TPV.goa]]
* GOA version: 2013Nov12 14:49
* Time taken to import: 0.03min

Name of .gdb file: [[Media:Streptococcus_pneumoniae_TIGR4_20131125.gdb|Streptococcus_pneumoniae_TIGR4_20131125.gdb]]
* Time taken to export .gdb: less than 1 hour
**Started at 22:53
**Finished by 23:50
* Upload your file and link to it here. [[Media:Streptococcus_pneumoniae_TIGR4_20131125.gdb|Streptococcus_pneumoniae_TIGR4_20131125.gdb]]

==TallyEngine==
*Tally Engine run on Tauras' personal computer.
*'''Final Results:'''
**Ordered Locus XML Count: 2126
**Ordered Locus Database Count: 2126

[[Image:TallyEngine Capture TPV.JPG|thumb|left|upright=1.5]]
 

== Using XMLPipeDB match to Validate the XML Results from the TallyEngine==
*XMLPipeDB match program was downloaded from Sourceforge [[http://sourceforge.net/projects/xmlpipedb/]]
*Moved xmlmatch jar file to Downloads folder on personal computer
*Ran cmd program on personal computer
*Ran query: ''cd Downloads file''
*Searched for pattern: SP_[0-9][0-9][0-9][0-9]
*Total unique matches found: 2126
*This total matched results found in Tally Engine count

[[Image:20131107 XMLmatch tATK TIGR4 AJV.PNG|500px|]]

== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine==
*Ran pgAdmin III through personal computer to run SQL query
*Command used:
**''select count(*) from genenametype where type = 'ordered locus' and value ~ 'SP_[0-9][0-9][0-9][0-9]';''
*Unique matches found: 2126
*These results matched those of Tally Engine and XMLPipeDB match, confirming values 
[[Image:20131119 SQLcountresults tATK TIGR4 AJV.PNG|500px]]

[[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways#SQL | Follow the instructions on this page to query the PostgreSQL Database.]]

==OriginalRowCounts Comparison==
*Original Row Counts for the gdb file contained had a UniProt Ordered Locus count of 4252
*This was a result of the databases including Gene IDs with and and without underscore

==Visual Inspection==
'''Systems Table''' 
*There are numerous missing dates for the gene ID systems.
[[Image:20131121 E3Systemstable tATK TIGR4 AJV.PNG|400px]]

'''OrderedLocusNames Table'''
*ID's take the forms SP_#### and SP####

'''UniProt Table'''
*ID's are all in expected form SP_####

[[Image:Uniprotids2.JPG|400px]]

'''RefSeq Table'''
*IDs in form NP_######
*This is expected form for RefSeq, refers to protein accession number.

[[Image:Refseqsnip2.JPG|200px]]

==.gdb Use in GenMAPP==

Note:

===Putting a gene on the MAPP using the GeneFinder window===

* Try a sample ID from each of the gene ID systems. Open the Backpage and see if all of the cross-referenced IDs that are supposed to be there are there.

Note:

*no criteria met: Q97RY3 (Q97RY3_STRPN) matched with [http://www.uniprot.org/uniprot/Q97RY3 uniprot page]
*not found: Q97NJ5 (Q97NJ5_STRPN) matched with [http://www.uniprot.org/uniprot/Q97NJ5 uniprot page]
*decreased: Q97SJ6 (CPSC_STRPN) matched with [http://www.uniprot.org/uniprot/Q97SJ6 uniprot page]
*increased: Q97SB2 (Q97SB2_STRPN) matched with [http://www.uniprot.org/uniprot/Q97SB2 uniprot page]

===Creating an Expression Dataset in the Expression Dataset Manager===

* How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note: 4689 out of 5022 IDs were imported. There were 333 exceptions, all due to not being present in UniProt.

===Coloring a MAPP with expression data===

Note: increased was colored red, decreased was colored green, no criteria met was colored grey, and not found was colored white

===Running MAPPFinder===

Note: MAPPFinder worked successfully for all of the data used in this project.

== Compare Gene Database to Outside Resource==
*Could not find downloadable gene ID list on Ensembl, which was used as MOD for the TIGR4 strain.
*A 'Coding Gene Count' was given, a value of 2125
*This total is one less than our expected value of 2126
*Inability to find ID list kept us from being able to identify reasons for differences between the values.

[[Image:Ensembl snip.JPG|150px]]

==Template==
{{Team ATK}}

Personal Assessment

2013-12-13T03:24:40Z

Lena: /* Links to Works */

[[Media:Leishmania PersonalAssesment 12122013 Hunt.pdf]]
==Links to Works==
:GenMAPP Gene Database for Leishmania: [[Media:LeishmaniaGDB Lena Gabe 20131205.zip]]
:ReadMe: [[Media:ReadMe Leishmania 20131212.pdf]]
:Gene Database Testing Report : [[Media:Leishmania Gene Database Testing Report.pdf]]
:[[Lena Project Notebook]]
:[[Leishmania major]]

Kevin's Assessment and Reflection

2013-12-12T19:12:05Z

Kmeilak: /* Assessment of Project */

==Statement of Work==

I compiled the raw data downloaded from array express

#[[Media:20131106_ArrayExpressADF_tATK_KM.adf.txt|Associated Data Files]]
#[[Media:20131106_ArrayExpressarraydesign_tATK_KM.txt|Array Design]]
#[[Media:20131106_ArrayExpressrawdata_tATK_KM.zip|Raw Data]]
#[[Media:20131106_ArrayExpressprocesseddata_tATK_KM.zip|Processed Data]]
#[[Media:20131106_ArrayExpresssdrf_tATK_KM.txt|Sample and Data Relationship]]

into one excel file [[Media:20131119_teamATK_KM_compiledrawdata_(1)_(1).xls|20131119_teamATK_KM_compiledrawdata.xls]], log transformed the data, scaled and centered it, and ran Tstat and Pvalue calculations on the data as described in the V. cholerae instructions. [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae V. cholerae 1] However, averages were performed first on the technical replicates corresponding to the same biological replicate, and then on each time period. Furthermore, Tstats and Pvalues were calculated by replicate rather than as one total calculation.

This data was imported into GenMAPP to create a gene database for S. pneumoniae, and color sets were created according to the instructions for V. cholerae [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols V. cholerae 2]. MAPPFinder was run, and a filtered ranked list was produced [[Media:20131210_tATK_filteredresults.xls|20131210_tATK_filteredresults.xls]]. The sugar transmembrane transporter activity pathway was selected for further investigation, and a mapp of the genes in that pathway was produced using MAPPFinder [[Media:20131211_tATK_sugar_transmembrane_transporter_activity.mapp|20131211_tATK_sugar_transmembrane_transporter_activity.mapp]]
[[Media:MAPP.JPG|tATK jpeg MAPP]]. The function of each gene was determined based on the UniProt page for that particular gene, and a compiled list may be found here [[Media:Pathway_Gene_Info.xls]]. This information was incorporated into the report and presentation.

==Assessment of Project==

*The project workflow went well overall. There was some confusion concerning my role and how to change the V. cholerae instructions for the data set from this experiment, but once imported into GenMAPP everything ran smoothly. The work Tauras and Alina produced was excellent, and we worked together very well and efficiently to produce our final products.
*We worked well as a team, group meetings were productive, and most things went smoothly. I had some troubles interpreting the data before the column headers were acquired, and difficulty understanding how to apply the statistics tests and data transformations, but those issues were worked out with help from Dr. Dahlquist and Dr. Dionisio.
*I would work earlier on and focus more on getting the data ready for GenMAPP so that results rather than methods could receive more of my time.
*Gene Database Project and Group Report Assessment
#Content: the work is of high quality, but relatively limited in scope. Only one pathway at one time point was analyzed. It would have been much more interesting to see that pathway at multiple time points, or to see different pathways in conjunction with the sugar transmembrane transporter activity pathway that was examined.
#Organization: Our group was generally well organized, but our wiki pages were messy as we all were uploading and posting haphazardly. We generally fixed these issues of disorganization in retrospect, and some pages are still quite messy.
#Completeness: Our team achieved all of the project objectives because we were motivated, put in the time and effort, and asked for help from the professors or each other when necessary.

==Reflection on the Process==
*I learned how to transform data, to use programs that allow one to easily organize and understand an enormous volume of otherwise incomprehensible data. I also learned how to integrate computer information and programs with my biologist's understanding of processes and workings of life. I had the opportunity to improve upon my teamwork skills that allowed this project to be successful relatively smoothly.
*I will take away the importance and power of interdisciplinary approaches to research, as well as the importance of effective teamwork and planning.

Kevin McGee Assessment and Reflection

2013-12-12T06:33:29Z

Kevinmcgee: Created page with "Statement of Work Describe exactly what you did on the project. Kevinmcgee Week 10 # #*I used the PubMed database to find the reference geneome of Leishmania Major. #*I se..."

Statement of Work
Describe exactly what you did on the project.
[[Kevinmcgee Week 10]]
#
#*I used the PubMed database to find the reference geneome of Leishmania Major.
#*I searched with the terms "Leishmania Major [MeSH Terms] AND Genome [Title]"
#*The search terms came back with 30 articles
#*The 9th article was titled: ''The Genome of the kinetoplastid parasite, Leishmania Major'' (Ivens et al., 2005) This is the reference genome.
#*[http://www.ncbi.nlm.nih.gov/pubmed/16020728 Link to Article Online]
#
#*On Web of Science, I searched using the search terms "Leishmania Major" for the Title and "Ivens AC" for the author.
#*I got 7 article results back from my search terms. The 1st article was the reference genome that I found on PubMed.
#*The last thing that Ivens published was the genome sequence and has not had published work on Leishmania Major since then. However, looking at the people who have referenced his reference genome, you can see many of the directions people have taken his research. Many articles have been posted in the last year on determining Leishmania resistance to drugs and many properties of different proteins within the gene.
#
#*Didn't find any good sources on Leishmania microarray data. Went to ArrayExpress, to look for arrays on Leishmania major to backtrack to articles.
#*I typed in "Leishmania Major" in the organism field and "Array assay" in the technology field and filtered down my results.
#*[http://europepmc.org/abstract/MED/18638379/reload=0;jsessionid=UIW9RunP1XHUGCaWD7p4.48 ''Modulation of gene expression in drug resistant Leishmania is associated with gene amplification, gene deletion and chromosome aneuploidy.''] Found on ArrayExpress
#*[http://europepmc.org/abstract/MED/18510761 ''Genome-wide gene expression profiling analysis of Leishmania major and Leishmania infantum developmental stages reveals substantial differences between the two species.''] Found on ArrayExpress
[[Kevinmcgee Week 11]] Journal club Reference article
[[Kevinmcgee Week 12]]
#Downloaded SDRF file off of the wiki.
#Started to edit SDRF file
#*Left the following columns while deleting the rest:
#**Source NAme
#**Characteristics
#**Comment (Sample_description)
#**Comment (Sample_source_name)
#**Label
#**Array Data File
#Filtered the file down to only L.Infantum samples
#Was left with the following image
#*[[image:SDRFL.Infantum (1).PNG]]
#*This image showed me what data was where when looking at the raw data files
#Proceeded to go into each data file for L.Infantum and keeping the name of each gene along with the log ratio of each gene.
#Compiled all data into a single sheet
#Was left with the following image
#*[[image:L.InfantumCompiledRawData.PNG]]
#Uploaded the Compiled Raw data file onto the wiki
#*Sdrf file was uploaded by Viktoria

[[Kevinmcgee Week 13]]
*Opened [[media:L.infantumCompliedRawData(A).txt | L.infantumCompliedRawData(A).txt]]
*Finished the formatting by flipping the dye swap chips negative
*Created a column next to dye swap chips and did the formula:
=-1*(dye swap chip column)
*made a new sheet
*added all data from old sheet except only added the flipped dye swaps
*looked for background information in the array paper
**L. infantum MHOM/MA/67/ITMAP-263 and L. major LV39 MRHO/SU/59/P strains used in this study
**All microarray data will be freely available on the Geo NCBI database in the MIAME format
***The series accession number for our manuscript is GSE10407.
**Each chip compares promastigote vs. amastigote with different replicates
**Following data files found
[[File:LmjSampleInfo.PNG]]
*Finished naming sheet with helpful names to know what is what on the sheet
*Ready for statistical analysis
*Began analysis by taking the average and standard deviation of our data chips seperately and using that information to scale and center our data:
=(B4-B$2)/B$3 This shows the equation we used to scale and center.
*Copied and pasted values of scaled centered onto a new page. From there, we edited out all VALUE! cells and left them blank. GenMAPP will ignore these blanks when we input our data.
*Made a column of the average fold change for each gene call Avg_LogFC_All
Average B2:G2
*Made a column of the Tstat and Pvalue for the fold changes of each gene:
=AVERAGE(B2:G2)/STDEV(B2:G2)/SQRT(6) TStat
=TDIST(ABS(I2),5,2) Pvalue
*Created a new page titled forGENMAPP
**Copied and pasted all values from statistics page
*Cut and pasted columns H-J and moved them to columns B-D
*Inserted a new column at B called System Code. Filled in column with the letter N
*File is now ready for GenMAPP import

*Sample of what the final file looked like
[[File:L.InfantumforGenMAPP.PNG]]

[[Kevinmcgee Week 15]]
=Uploading Into GenMAPP=
==Datasheet==
#compiled all data onto a single data sheet including both L.Major and L.Infantum and uploaded into GenMAPP
#*Ran into some problems uploading (almost everything was an error)
#Filtered out any Lin genes and created a new datasheet for only LmfJ genes.
#*Still ran into problems uploading
#looked at the database OrthologicalNames sheet, and saw that the GeneID's were in a different format on there then in the spreadsheet.
#*Made a quick-fix file by changing the names on the spreadsheet, but longterm fixes are being made to the coding so that other users do not have to change the names on their spreadsheets every time (for convenience).
#Were able to upload our data and continue on with the project.
==Sanity Check==
===Leishmania Infantum===
#Filtered P-value
#*1392 genes were <.05
#*327 genes were <.01
#*67 genes were <.001
#*28 genes were >.0001
#Filtered Average Log Fold Change
#*748 genes were >0
#*646 genes were <0
#*699 genes were >.25 while 646 were <.05
#*748 genes were >.05 while 606 were <.-25
==MAPPFinder Color Changes==
#The colors were assigned to the two main criterion
#*Increased relative to control had a Log FC> 0.25 and P-Value <0.05 these were colored blue
#*Decreased relative to control had a Log FC< -0.25 and P-Value <0.05 these were colored purple
==Running MAPPFinder==
*Set up MAPPFinder to run with the file name LMajorGOMap
*Ran for about 1 1/2 hours
#Top Ten GO Terms
#*catalytic activity
#*Endonuclease activity
#*DNA catabolic process
#*Aromatic compound catabolic process
#*cellular nitrogen compound catabolic process
#*nucleobase-containing compound catabolic process
#*organic cyclic compound catabolic process
#*heterocycle catabolic process
#*oxidoreductase activity
#*macromolecular complex

[[image: Picture GO.PNG]]
=== Statement of Work ===

* Describe exactly what you did on the project.
* Provide references or links to artifacts of your work, such as:
** Wiki pages
** Other files or documents
** Code or scripts

=== Assessment of Project ===

* Give an objective assessment of the success of your project workflow and teamwork.
* What worked and what didn't work?
* What would you do differently if you could do it all over again?
* Evaluate the Gene Database Project and Group Report in the following areas:
*# Content: What is the quality of the work?
*# Organization: Comment on the organization of the project and of your group's wiki pages.
*# Completeness: Did your team achieve all of the project objectives? Why or why not?

=== Reflection on the Process ===

* What did you learn?
** With your head (biological or computer science principles)
** With your heart (personal qualities and teamwork qualities that make things work or not work)?
** With your hands (technical skills)?
* What lesson will you take away from this project that you will still use a year from now?

Laurmagee: Individual Assessment

2013-12-12T00:24:40Z

Laurmagee: /* Reflection on the Process */

==Statement of Work==
*Describe exactly what you did on the project.
*The first step in my role as GenMAPP user was to find an article that contained microarray data on the Rhizobacterium, Sinorhizobium Meliloti. I submitted a few articles for consideration, but ultimately my partner, Miles, found the article we have been using throughout this project. The mentioned article can be found on the following page: [[http://jb.asm.org/content/188/21/7617 HTML version]] and is references below.
*Domínguez-Ferreras, A., Pérez-Arnedo, R., Becker, A., Olivares, J., Soto, M.J., Sanjuán, J. (2006) Transcriptome Profiling Reveals the Importance of Plasmid pSymB for Osmoadaptation of Sinorhizobium meliloti ''Journal of Bacteriology'' 188:7617-7625
*From this article, we were able to procure the raw microarray data off of the following website: [[http://www.ebi.ac.uk/arrayexpress/experiments/E-MEXP-785/?keywords=&organism=Sinorhizobium%20meliloti&array=&exptype Osmotic upshift elicited by salt and sucrose]].
*The article that we were studying carried out four different experiments, with different levels of NaCl and sucrose being the manipulated variable. I personally studied the experiment done with 700mm of sucrose. Raw Data File for 700S (1-3): [[File:Full Raw Data.xls]]
*The columns of interest in the above data file were collect and scaling and centering was done to produce the log values needed for statistical analysis. Further description can be found on the following page: [[Laurmagee: Week 13]]. And the following data file was produced [[Media:Compiled Ratios and Logs.xls]].
*With this new file, statistical analysis could be done on the log values. Fist the Avg_LogFC values were calculated, averaging the log values of the three replicants present at each of the four individual time intervals (t15, t30, t60, t240). Therefore, I had to calculate four of these Avg_LogFC values, one for each time point. From these averages, I was able to calculate the T_stat and P_value for each time interval as well. This process if outlined in the "Statistical Analysis" section of the following journal: [[Laurmagee: Week 15]]. This spreadsheet was formatted specifically for GenMAPP standards and the following were produced: [[Media:SinorhizobiumMeliloti_LM_GenMapp_DataSheet.xls]] and [[Media:SinorhizobiumMeliloti_LM_GenMapp_DataSheet.txt]]. The text file is to be fed into GenMAPP.
*A Sanity Check was done and highlighted on the following page: [[Laurmagee: Week 15]], and it shows the number of microarray dots that were changed significantly in the process of the experiment.
*After this check proved to provide accurate results, I moved on to load my datafile into GenMAPP. However, this is where I found a snag. My GenMAPP file would not load into GenMAPP and the whole program would stop responding immediately after I would input my text file.
*After trying different computer, walking through every aspect of my data file with my partners, and trouble shooting different alternatives, I finally emailed Dr. Dondi, who found the main problem with my dataset. The gene IDs that had been present on my sheet did not follow the same format of those in the Gene Database that was created by my partners. The paralleled IDs were present, but they had extraneous information attached to them, which was inhibiting GenMAPP from recognizing them. The amount of error that the file was collecting was so large that the program had to stop responding entirely, which is why I wasn't getting an error message.
*However, even upon testing the modified data sheet, I found that it was giving me another error message. This time it was telling me my column titles were insufficient. After some research, I found that my text file was not in tab delimited format, despite the fact that I had saved it as such on my MacBook Pro computer. After transferring my previous .xls workbook onto a Windows computer, and saving it as a tab delimited text file, I finally got GenMAPP to accept the following data file: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.txt]].
*The GenMAPP and MAPPFiner protocols can be viewed at the bottom of the following journal page: [[Laurmagee: Week 15]].
*The exception file created with the GenMAPP program is included here: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.EX.txt] along with the following other three GenMAPP program exports: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gex]], [[Media:ColorSets.mapp]}, [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gmf]], which are all detailed on the journal page noted above.
*In addition, the MAPPFinder documents are the following: [[Media:700S1-3-t15-Decreased-Criterion0-GO.txt]], [[Media:700S1-3-t15-Increased-Criterion0-GO.txt]]
*Beyond my assigned tasks, I helped my group out in whatever areas were necessary. It was a challenge to keep up with the tasks of my partners, but I was always accessible to offer input or advice.

==Assessment of Project==
*Give an objective assessment of the success of your project workflow and teamwork.
*#What worked and what didn't work?
*#*I think my group and I should have stressed collaboration more on this project. Since we were all given an independent job to handle, I think it was difficult for us to establish a "team" environment. All four of us were busy outside of class with conflicting schedules, so it was very challenging to find time when we could all meet one another and discuss where we were in our project. I think having each other as check and balances would have been very helpful throughout the project, especially since I made it all the way to my GenMAPP stage of the project without anyone realizing I was using different GeneIDs then those contained in the database.
*#What would you do differently if you could do it all over again?
*#*If I could do the project all over again, I would have spent much less time scaling, centering, and performing statistical analysis on the data and I would spend more time completing my GenMAPP and MAPPFinder analysis to produce more conclusive results. I would also set aside time from the beginning of the project, where Miles and I could meet outside of class, because we had the exact same protocol yet we failed to use each other as a resource.
*Evaluate the Gene Database Project and Group Report in the following areas:
*#Content: What is the quality of the work?
*#*Our quality of work thus far has been admittedly lack luster, due to extenuating circumstances that occurred only days before our project assignments were due. However, I think our final deliverable and our final report will reflect how much we can improved in our quality of work, with the right circumstances.
*#Organization: Comment on the organization of the project and of your group's wiki pages.
*#*[[Team Name]] the wiki page is a bit overcrowded, but for the most part is organized. All of our contacts are contained in the first section and then all the files related to the Microarray Paper are contained in the next section. The Coder and QA personnel in my group then organized different sections based off the times when the files had been exported. Our believe that our powerpoint presentation could have been a lot more organized, but I think we were able to fix those flaws in our written report. A common theme with our group, i think, has been as not working as a cohesive unit, so our product come out disorganized and lacking fluidity. I think we remedied this in our final report, but other assignments prior may have reflected this.
*#Completeness: Did your team achieve all of the project objectives? Why or why not?
*#*There were some issues with time constraints, mainly with my portion of the project. I ran into a lot of GenMAPP issues during the end of our time with the project and this set me back many days. As far as final products go, however, I believe that we have completed the necessary items to the best of our ability, which is all I can ask of myself and my group.

==Reflection on the Process==
*What did you learn?
*#With your head (biological or computer science principles)
*#*I learned a lot about computers in this course. I was coming in with a fair amount of knowledge in the subject of biology, but very little knowledge of computers in general. Therefore, I have been exposed to coding, which was completely new to me, and also using data analysis programs such as GenMAPP which were otherwise foreign to me. Biologically, I learned what it was like to follow a biology based project all the way through data collect to conclusion. I do research on statistics education with Dr. Bargagliotti, in the math department, so i already new the processes of such projects, but had never had an opportunity to follow it through with a biological perspective.
*#With your heart (personal qualities and teamwork qualities that make things work or not work)?
*#*I learned how important working with a team is and how your cohesiveness as a unit can make our break a project. Everyone has busy schedules, but it is important to make the time to meet for collaboration, otherwise things like presentation seem choppy and you are much more prone to making errors. Although this project did assign personal responsibilities, that should not deter from the overall idea that you are a team and your success depends on one another.
*#With your hands (technical skills)?
*#*As I said previously, I learned a lot of computer skills, so whether that be inputing code, generating a .mapp on GenMAPP, or creating data calculation shortcuts in Excel, this was all new directions for my hands. I have never had a class taught primarily on the computer as well, so this was all together a new experience for me. In this project specifically, I learned how important it is to look over every single detail of a spreadsheet before you input it into a data processing program such as GenMAPP. With the issues I was struggling with using the GenMAPP program for hours, I will never make the mistake again of not checking the IDs of my file against those in the gene database.
*What lesson will you take away from this project that you will still use a year from now?
**I think all of the components that I listed above with stay with my even a year from now. I will definitely remember how to prepare a file for GenMAPP and complete an analysis on that file, but more generally, I will remember to look for the details, especially in research, that i have been tripped up on in this project and in this class as a whole as well. I will also take away the importance of teamwork, especially on such a multifaceted project as this one, and the excitement that can come out of your own research by creating a product that is completely your own.

Team Name Deliverables

2013-12-11T23:37:53Z

Laurmagee: /* Group */

=Group=
:GenMAPP Gene Database:[[Media:Sinorhizobium_meliloti_1021_mpetredi_2013123-2.gdb]]
:ReadMe file: [[Media:ReadMe-SM.pdf|ReadME_S.meliloti]]
:Gene Database Schema diagram: [[Media:S.Meliloti_schema_20131205.pdf|S.meliloti_schema_20131205.pdf]]
:Gene Database Testing Report: [[Media:Match Test Sheet1.pdf]]
:Processed and analyzed DNA microarray dataset:
* .7 M: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.txt]]
* .3 M: [[Media:Complete_processed_Data_MPM.xls|300mm NaCl XLS Version]]
:GenMAPP Expression Dataset file:
* .7 M: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gex]]
* .3 M: [[Media:T30.gex|Gene expression set for 300mm NaCl]]

:Filtered MAPPFinder Results:
*.7 M files:
**[[Media:700s1-3-t15-Decreased-Criterion-GO.xls]]
**[[Media:700s1-3-t15-Increased-Criterion-GO.xls]]
*.3 M files:
**[[Media:MAPPFinder_results_T15-Criterion_Increased-GO.txt|Increased expression at T15 300mm NaCl]]
**[[Media:MAPPFinder_results_T15-Criterion_Decreased-GO.txt|Decreased expression at T15 300mm NaCl]]
**[[Media:T30_increased-Criterion0-GO.txt|Increased expression at t30 300mm NaCl]]
**[[Media:T30_MAPP_results-Criterion_Decreased-GO.txt|Decreased expression at t30 300mm NaCl]]
**[[Media:MAPPFinder_t60-Criterion_Increased-GO.txt|Increased expression at t60 300mm NaCl]]
**[[Media:MAPPFinder_t60-Criterion_Decreasaed-GO.txt|Decreased expression at t60 300mm NaCl]]
**[[Media:300N_T240_MAPPFINDER-Criterion_Increased-GO.txt|Increased expression t240 300mmNaCl]]
**[[Media:300N_T240_MAPPFINDER-Criterion_Decreased-GO.txt|decreased expression at t240 300mmNaCl]]

:Sample MAPP file of a relevant biological pathway for your species:
* .7 M files: [[Media:regulation of transcription, DNA-dependent.mapp]]
* .3 M files: [[Media:Structural_constituent_of_Ribosome.mapp| Structural constituents of ribosome 300mm NaCl]]

:Group Report: [[Media: BiologicalDatabasesFinalReport.pdf | S. meliloti Final Report]]
:PowerPoint presentation: [[Media:Sinorhizobium_Meliloti_(Strain_1021)_FINAL_PRESENTATION.pdf | Sinorhizobium meliloti (Strain_1021) FINAL_PRESENTATION (PDF)]]
*Note: Wiki page is not accepting our converted .ppt format. The PDF version should suffice.

=Individual=

*[[Miles Malefyt deliverables]]
*[[https://xmlpipedb.cs.lmu.edu/biodb/fall2013/index.php/Individual_Assessment_and_Reflection Mitchell Petredis deliverables]]
*[[Stephen Louie Deliverables]]
*[[Laurmagee: Individual Assessment]]

Vkuehn Individual Assessment and Reflection

2013-12-11T10:51:07Z

Vkuehn: /* Final Week */ Finished summarizing last minute changes after the presentation

== Statement of Work ==

* Describe exactly what you did on the project.
* Provide references or links to artifacts of your work, such as:
** Wiki pages
** Other files or documents
** Code or scripts

=== [[Vkuehn Week 12]]===
* Compiled all of the raw data so that it would be ready to be analysed statistically and normalized
** Chip raw data files compared to SDRF and organized
** Named each chip in a way that gave more information to the project (Chip number from sdrf and species)
** Created the compiled raw data for L. major
*Updated the team page and made wiki improvements
* As the team leader created [[The Plan]] to keep our progress organized and documented
** Created an outline of the project goals and interim deadlines
**Wrote the updates that had to do with GenMAPP users for the team page
===[[Vkuehn Week 13]]===
*Reread the paper and took notes on the specific meaning behind the microarray data and took notes on relevant information
*Scaled and centered microarray data
*Performed statistical analysis
* Edited the wiki and made formatting improvements on the templates so that the page would be easier to update for everyone
***Wrote the updates that had to do with GenMAPP users for the week
===[[Vkuehn Week 15]]===
*Did sanity check on the statistical results from L. major
*imported the mircoarray data to GenMAPP
*Assigned colors to the criteria for GenMAPP
*Ran GenMAPP and GO terms were found
** Found the ones with links to families and looked up the GO terms so that the mapp could be created
* Created the L. major week 15 Status Report Page
**Wrote the updates that had to do with GenMAPP users
===Final Week===
*Wrote my part of paper that corresponds to the journal club for the article and the microarray data compilation and statistical analysis, and made the power point outlining the project and having everyone input their parts [[Media:Leishmania PowerPoint 12122013.pdf]]
*Found and analyzed the results from our GenMAPP and the results in the experiment and compared them
*Looked up the GO terms and the genes that were found to have changed significantly using MAPPFinder and UniProt to organize the genes.
*:Figured out a way to categorize them for the MAPP [[File:Arromatic Compound Catabolic Process Comparative Pathway Map.mapp]]
**Highlighted similarities
* Created a presentation on Google Drive for the group to input the project to presentation
* Reviewed all of the pages and noted all of the requirements that need to be submitted for the project so nothing slips through, and communicated with the group on last minute deadlines
* Redid the statistical analysis of the L. major p-values and fold changes by filtering the excel file down to the relevant information after the presentation [[File:LeishmaniaCompiledStatAnalysis(C).txt ]]
* Wrote the conclusion comparing our results to the results of the article and made general conclusive remarks on the group write up, and made final touches throughout the paper. [[File:LeishmaniaFinalPaper.pdf]]

=== Assessment of Project ===

* Give an objective assessment of the success of your project workflow and teamwork.
*: I think our team worked very well together. We were always good about communicating where the different members stood and if there were problems everyone worked together to make sure that they would get resolved so that the team could move to the next step. We were very good about planning all of the goals for the weeks and would stay after to make sure everyone was updated for the goals for the next meeting.
* What worked and what didn't work?
*: We all did a good job dividing the workload and the completion of all of the requirements in a timely manner worked well. The set up pf the team page for the initial journal club presentation was hectic and disorganized, but we were still getting familiar with the project.
* What would you do differently if you could do it all over again?
*: Made sure to complete even the last step before finals week because this step did not have a lot of instruction and took a lot of time to understand.
* Evaluate the Gene Database Project and Group Report in the following areas:
*# Content: What is the quality of the work?
*#: I am personally very impressed with the quality of our work. With such a large project undestanding all of the details of each step can seem overwhelming, but with the way each team member took a part we were each able to understand all of the details and see the big picture.
*# Organization: Comment on the organization of the project and of your group's wiki pages.
*#: The organization of the group's wiki could have been improved in my opinion. I think we were in a rush to set it all up in the beginning, so the layout and organization of the information lacked structure. I think that the project was organized though, this just did not translate very well on the wiki page because of the ineffective formatting at the beginning.
*# Completeness: Did your team achieve all of the project objectives? Why or why not?
*# Yes, our team achieved all of the project objectives. We worked to fix as many of the errors as we could up to the last minute. There are some genes that still did not make it into the database, but the coders worked hard and successfully reduced this number a lot. Other than that, everything was completed.

=== Reflection on the Process ===

* What did you learn?
** With your head (biological or computer science principles)
**: I learned the expanse of biological databases and why they are integral to biology today. I also learned the possibilities that these databases provide.
** With your heart (personal qualities and teamwork qualities that make things work or not work)?
**: I learned how to manage a long term project so that it does not become unmanageable at the end.
** With your hands (technical skills)?
**: I became more familiar with using computers and learned how to use different ways of manipulating data in different programs.
* What lesson will you take away from this project that you will still use a year from now?
*: The knowledge of how to use databases and how to navigate them to find information.

{{Template: Vkuehn}}

Final Project Deliverables

2013-12-11T01:23:27Z

Kevinmcgee: Added deliverables

:GenMAPP Gene Database for assigned species (.gdb): [[Media:LeishmaniaGDB Lena Gabe 20131205.zip]]
:ReadMe file to accompany the Gene Database (.pdf): [[Media:ReadMe Leishmania 20131212.pdf]]
:Gene Database Testing Report for final submitted Gene Database (print from wiki to .pdf file): [[Media:Leishmania Gene Database Testing Report.pdf]]
:Processed and analyzed DNA microarray dataset (.xls): [[File:L.majorStats.xls]],[[Media:L.infantumStats_B.xls]]
:GenMAPP Expression Dataset file (.gex): [[Media:LeishmaniaCompiledStatAnalysisLMJFiltered(B).gex]]
:Filtered MAPPFinder Results (.xls)[[File:LMajorFilteredIncreasedGOTerms.xls]] [[File:LMajorDecreasedFilteredGoTerms.xls]]
:Sample MAPP file of a relevant biological pathway for your species (.mapp)[[File: Arromatic_Compound_Catabolic_Process_Comparative_Pathway_Map.mapp]]
:Group Report describing the creation of the Gene Database and the biological analysis of the data (.doc or .pdf)[[File:LeishmaniaFinalPaper.pdf]]
:PowerPoint presentation (.ppt, given on Thursday, December 12): [[Media:Leishmania PowerPoint 12122013.pdf]]

Individual Assessment and Reflection

2013-12-10T23:44:20Z

Mpetredi: added link to prompt

{{User Page Link}}

Reference page for this assignment: [[Gene Database Project Report Guidelines]]

==Statement of Work==

*As coder of [[[[Team Name]]]], I was responsible for creating and tweaking versions of gmbuilder under the provision of Dr. Dionisio to help create a proper analysis tool for our species ''Sinorhizobium Meliloti'' (Strain 1021). Additionally, I imported the unique UniProt XML, GO OBO-XML, and GOA files into the PostgreSQL database for our species using my custom versions of gmbuilder and exported the .gdb files necessary for the quality assurance and GenMAPP users to do their respective roles.
*In total, 3 versions of gmbuilder were created and produced the following .gdb files (in order of oldest to newest):
**[[Media: gmbuilder-2.0b71.zip | gmbuilder-2.0b71.zip]]
***[[Media:Sinorhizobium_meliloti_1021_GenMAPP_database_mpetredi_2013117.gdb | Sinorhizobium_meliloti_1021_GenMAPP_database_mpetredi_2013117.gdb]]
**[[Media:GenMAPP_Builder_2.0b72 S. meliloti.zip|GenMAPP_Builder_2.0b72 S. meliloti.zip]]
***[[Media:Sinorhizobium_meliloti_1021_GenMAPP_database_mpetredi_20131121.gdb|Sinorhizobium_meliloti_1021_GenMAPP_database_mpetredi_20131121.gdb]]
**[[Media: SmelilotiGenMAPP_Builder_2.0b73.zip]]
***[[Media:Sinorhizobium_meliloti_1021_mpetredi_2013123-2.gdb]]

*My lab progression is listed below:
*[[mpetredi Week 12]]
*[[mpetredi Week 13]]
*[[mpetredi Week 14]]

NOTE: "Important Files 3" does not have a .gdb file because of accidentally importing duplicate data into PostgreSQL; I decided it would be a hindrance to use that .gdb file for analysis and therefore discarded it. A new database and .gdb file were created under "Important Files 4" to resolve this issue.

==Assessment of Project==

*Our project was somewhat successful, but it is still a work in progress. I believe my team displayed dedication to the quality of the project, and while we were not able to create an error-free database this semester, our work-ethic significantly contributed to the further completion of the gene database. If my team's schedule outside of this class weren't as complex, I believe we could have delivered a finished product.
*If I could do it all over again, I'd start the project earlier. Additionally, having a laptop with all the lab software would have made it more convenient to organize the project and work on it away from the lab.
*Gene Database Project and Group Report Evaluation
*#Content: We described our methods with as much detail as we could and provided explanations for our results and discussion.
*#Organization: We followed the prompt given to us, and reading the report over again it seems like it's structured well.
*#Completeness: Considering the challenges we had with this project, I'd say our report feels as complete as it can be.

==Reflection on the Process==

*I learned the following
**With my head: Biology and Computer Science can drastically benefit from each other. As technology improves, scientists can discover new things that previously could not be comprehended or understood.
**With my heart: Close communication and flexibility in scheduling with team members is critical to achieving project goals.
**With your hands: I gained a better understanding of how database management and organization works, and even learned some coding.
*A year from now, I'll remember the various SQL queries and how to utilize filtering mechanisms in large datasets.

Teamname Week 15 Status Report

2013-12-05T19:02:01Z

Mmalefyt: /* Miles Malefyt */

=='''[[Team Name]]'''==

=='''[[user:mmalefyt|Miles Malefyt]]'''==
What worked?
*This week I finally began to see the fruits of my labor over the past couple weeks. I successfully uploaded some data to GenMAPP even though some of it didnt work, it turns out that a lot of the gene IDs went through. The test file I used was the t15 log fold change. I also went through and re-did some of the original title heading in the xls file before I converted it into a text file and compiled in on GenMAPP
What didn't work?
*There was an issue with a lot of the gene IDs that GenMAPP didnt recognize their format. Also, before I corrected it the names of the columns weren't importing into GenMAPP
What will I do next to fix what didn't work?
*Well first we need to isolate the problem. There seems to be 3 sets of IDs for each replicate and 3 replicates. I suspect that it has something to do with the labeling they gave the genes to distinguish the two megaplasmids and the chromosome. If I am correct then I will have to talk to the coder to see what we can do about it. If its another error then we will have to get to the bottom of that.

=='''[[user:laurmagee|Lauren Magee]]''': Reflection Questions==
#What worked?
#*A lot has been accomplished since the last status report and I now have a file that is ready for the GenMapp protocol. I have been going through an intense editing process of this file for sometime now with Dr. Dahlquist and my fellow group members, so that the file I currently have will hopefully produce error free results during GenMapp analysis.
#What didn't work?
#*In my original data file for GenMapp, I had accidentally copied over an entire column, but had labeled it as something else. This made the rest of my calculations, that had been based off of this data, invalid so I had to carefully correct my mistakes. If this error hadn't been pointed out by Dr. Dahlquis during the editing process, the results for my entire GenMapp analysis would have been incorrect. That is why professor and peer review is so important, because they are able to catch mistakes you may have glazed over in your own editing.
#What will I do next to fix what didn't work?
#*I have already fixed the incorrect column in my original data set, so I know have an error free document to run in GenMapp. There may be new errors that are shown while I am following the GenMapp protocol, but for now, everything appears correct.

Leishmania major Week 15 Status Report

2013-12-05T18:56:40Z

Lena: /* Reflection */

== Team Journal Assignment Week 15 ==
===Status report as to progress on each milestone that that you have set for the week:===
#GenMAPP users:
#*Did sanity check for the p-values of both L. major and L. infantum
#*Created GenMAPP criteria and colorsets
#*Ran MAPPFiner and are waiting for results
#*:For more details on these steps refer to [[Vkuehn Week 15]] and [[Kevinmcgee Week 15]]
#QA:
#Coder: Initiated Gene Database Testing Report, Edited code for species customization

=== Reflection ===
Each team member should reflect on the team's progress:

[[User:Vkuehn|Viktoria Kuehn]]
# What worked: We successfully looked at the results of our P-values and created the GenMAPP criteria and colorsets.
# What didn't work: We are waiting for MAPPFinder to give us results.
# What will I do next to fix what didn't work: If it does not work within 2 hours we will have to try it again and make sure no errors were made.
[[User:Vkuehn|Vkuehn]] ([[User talk:Vkuehn|talk]]) 11:03, 5 December 2013 (PST)

'''[[User:Kevinmcgee|Kevin McGee]]'''
#We were able to troubleshoot any problems with uploading our data into GenMAPP by going through everything step by step
#We were held up at times by errors that involved us to wait long amounts of time. For instance, we need to export a new gene database, however we could not because postgress was already being used on the computer we were using. therefore, we had to wait hours for that to finish so we could export our database. We should plan ahead better so we don't have to wait like that.
#I don't know what we could do to fix that other than stay on our computers we had been working on the whole time
[[User:Kevinmcgee|Kevinmcgee]] ([[User talk:Kevinmcgee|talk]]) 11:08, 5 December 2013 (PST)

'''[[User:Gleis|Gabriel Leis]]'''
#Code for species customization was edited twice to ensure ease of use of microarray data in GenMapp. Gene Database Testing report was initiated and nearly completed.
#Errors in the code as well as errors in the formatting of the microarray data held up the group project.
#Test new species customization code, finish gene database testing report
[[User:Gleis|Gleis]] ([[User talk:Gleis|talk]]) 22:16, 6 December 2013 (PST)

'''[[User:Lena | Lena Hunt]]
#We were able to run TallyEgine and postgres to figure out which IDs were in our database and how to capture them.
#Our latest database customization didn't take so we had to rerun that.
#We just need to re-export, then we will be able to move on to writing our report.
[[User:Lena|Lena]] ([[User talk:Lena|talk]]) 15:28, 7 December 2013 (PST)

{{Template:Leishmania Major Navigation}}
[[Category:Assignment]]
[[Category: Group Projects]]
[[Category: Journal Entry]]

Kmeilak Week 15

2013-12-05T18:29:23Z

Kmeilak: Created page with "==Electronic Lab Journal== == Map Onto Biological Pathways (GenMAPP & MAPPFinder) == '''Fall 2013:''' Beginning point for class on Tuesday, October 15 as part of the [https..."

==Electronic Lab Journal==

== Map Onto Biological Pathways (GenMAPP & MAPPFinder) ==

'''Fall 2013:''' Beginning point for class on Tuesday, October 15 as part of the [https://xmlpipedb.cs.lmu.edu/biodb/fall2013/index.php/Week_8 Week 8] journal assignment.

'''Fall 2010:''' Beginning point for class on Tuesday, October 26 and for the [[BIOL367/F10#Week_9_In-class_Exercise_and_Journal_Assignment | Week 9]] journal assignment.

Each time you launch GenMAPP, you need to make sure that the correct Gene Database (.gdb) is loaded.
* Look in the lower left-hand corner of the window to see which Gene Database has been selected.
* If you need to change the Gene Database, select Data > Choose Gene Database. Navigate to the directory C:\GenMAPP 2 Data\Gene Databases and choose the correct one for your species.
* For the exercise today, you will need to download the appropriate ''Vibrio cholerae'' Gene Database.
** Half of the class will use the Vc-Std_External_20090622.gdb Gene Database that was created by the Fall 2008 Biological Databases class.
*** To download this Gene Database, [http://sourceforge.net/projects/xmlpipedb/files/V.%20cholerae%20Gene%20Database/V.%20cholerae%2020090622/Vc-Std_External_20090622.zip/download follow '''''this link''''' to the XMLPipeDB SourceForge Download page].
** Half of the class will use a new Vc-Std_External_20101022.gdb Gene Database that was created by Drs. Dahlquist and Dionisio this weekend.
*** To download this Gene Database, [http://sourceforge.net/projects/xmlpipedb/files/V.%20cholerae%20Gene%20Database/V.%20cholerae%2020101022/Vc-Std_External_20101022.zip/download follow '''''this link''''' to the XMLPipeDB SourceForge Download page].
** The members of a pair should each choose a different gene database.
* Click on the link for the Gene Database to which you have been assigned, download the file, and save it into the folder C:\GenMAPP 2 Data\Gene Databases, and extract it.

=== GenMAPP Expression Dataset Manager Procedure ===

* Launch the GenMAPP Program. Check to make sure the correct Gene Database is loaded.
** Look in the lower, left-hand corner of the main GenMAPP Drafting Board window to see the name of the Gene Database that is loaded. If this is not the correct Gene Database or it says "No Gene Database", then go to the Data > Choose Gene Database menu item to select the Gene Database you need to perform the analysis.
** '''Remember, you and your buddy are going to use ''different versions'' of the ''Vibrio cholerae'' Gene Database for this exercise.'''
* Select the Data menu from the main Drafting Board window and choose Expression Dataset Manager from the drop-down list. The Expression Dataset Manager window will open.
* Select New Dataset from the Expression Datasets menu. Select the tab-delimited text file that you formatted for GenMAPP (.txt) in the procedure above from the file dialog box that appears.
** You may need to download your .txt file from the wiki onto your Desktop if you have not already done so.
* The Data Type Specification window will appear. GenMAPP is expecting that you are providing numerical data. If any of your columns has text (character) data, you would check the box next to the field (column) name.
** ''The Vibrio data we have been working with does not have any text (character) data in it.''
* Allow the Expression Dataset Manager to convert your data.
** This may take a few minutes depending on the size of the dataset and the computer’s memory and processor speed. When the process is complete, the converted dataset will be active in the Expression Dataset Manager window and the file will be saved in the same folder the raw data file was in, named the same except with a .gex extension; for example, MyExperiment.gex.
** A message may appear saying that the Expression Dataset Manager could not convert one or more lines of data. Lines that generate an error during the conversion of a raw data file are not added to the Expression Dataset. Instead, an exception file is created. The exception file is given the same name as your raw data file with .EX before the extension (e.g., MyExperiment.EX.txt). The exception file will contain all of your raw data, with the addition of a column named ~Error~. This column contains either error messages or, if the program finds no errors, a single space character.
*** '''Record the number of errors. For your journal assignment, open the .EX.txt file and use the Data > Filter > Autofilter function to determine what the errors were for the rows that were not converted. Record this information in your individual journal page.'''
*** '''It is likely that you will have a different number of errors than your buddy who is using a different version of the ''Vibrio cholerae'' Gene Database. Which of you has more errors? Why do you think that is? Record your answers in your journal page.'''
*** '''Upload your exceptions file: <code>EX.txt</code> to your wiki page.
* Customize the new Expression Dataset by creating new Color Sets which contain the instructions to GenMAPP for displaying data on MAPPs.
** Color Sets contain the instructions to GenMAPP for displaying data from an Expression Dataset on MAPPs. Create a Color Set by filling in the following different fields in the Color Set area of the Expression Dataset Manager: a name for the Color Set, the gene value, and the criteria that determine how a gene object is colored on the MAPP. Enter a name in the Color Set Name field that is 20 characters or fewer.
** The Gene Value is the data displayed next to the gene box on a MAPP. Select the column of data to be used as the Gene Value from the drop down list or select [none]. We will use "Avg_LogFC_all" for the Vibrio dataset you just created.
** Activate the Criteria Builder by clicking the New button.
** Enter a name for the criterion in the Label in Legend field.
** Choose a color for the criterion by left-clicking on the Color box. Choose a color from the Color window that appears and click OK.
** State the criterion for color-coding a gene in the Criterion field.
*** A criterion is stated with relationships such as "this column greater than this value" or "that column less than or equal to that value". Individual relationships can be combined using as many ANDs and ORs as needed. A typical relationship is
[ColumnName] RelationalOperator Value
::with the column name always enclosed in brackets and character values enclosed in single quotes. For example:
[Fold Change] >= 2
[p value] < 0.05
[Quality] = 'high'
::This is the equivalent to queries that you performed on the command line when working with the PostgreSQL movie database. GenMAPP is using a graphical user interface (GUI) to help the user format the queries correctly. The easiest and safest way to create criteria is by choosing items from the Columns and Ops (operators) lists shown in the Criteria Builder. The Columns list contains all of the column headings from your Expression Dataset. To choose a column from the list, click on the column heading. It will appear at the location of the cursor in the Criterion box. The Criteria Builder surrounds the column names with brackets.

::The Ops (operators) list contains the relational operators that may be used in the criteria: equals ( = ) greater than ( > ), less than ( < ), greater than or equal to ( >= ), less than or equal to ( <= ), is not equal to ( <> ). To choose an operator from the list, click on the symbol. It will appear at the location of the insertion bar (cursor) in the Criterion box. The Criteria Builder automatically surrounds the operators with spaces.
::The Ops list also contains the conjunctions AND and OR, which may be used to make compound criteria. For example:
[Fold Change] > 1.2 AND [p value] <= 0.05
::Parentheses control the order of evaluation. Anything in parentheses is evaluated first. Parentheses may be nested. For example:
[Control Average] = 100 AND ([Exp1 Average] > 100 OR [Exp2 Average] > 100)
::Column names may be used anywhere a value can, for example:
[Control Average] < [Experiment Average]

* After completing a new criterion, add the criterion entry (label, criterion, and color) to the Criteria List by clicking the Add button.
** For the Vibrio dataset, you will create two criterion. "Increased" will be [Avg_LogFC_all] > 0.25 AND [Pvalue] < 0.05 and "Decreased will be [Avg_LogFC_all] < -0.25 AND [Pvalue] < 0.05.
** You may continue to add criteria to the Color Set by using the previous steps.
*** The buttons to the right of the list represent actions that can be performed on individual criteria. To modify a criterion label, color, or the criterion itself, first select the criterion in the list by left-clicking on it, and then click the Edit button. This puts the selected criterion into the Criteria Builder to be modified. Click the Save button to save changes to the modified criterion; click the Add button to add it to the list as a separate criterion. To remove a criterion from the list, left-click on the criterion to select it, and then click on the Delete button. The order of Criteria in the list has significance to GenMAPP. When applying an Expression Dataset and Color Set to a MAPP, GenMAPP examines the expression data for a particular gene object and applies the color for the first criterion in the list that is true. Therefore, it is imperative that when criteria overlap the user put the most important or least inclusive criteria in the list first. To change the order of the criteria in the list, left-click on the criterion to select it and then click the Move Up or Move Down buttons. No criteria met and Not found are always the last two positions in the list.
* Save the entire Expression Dataset by selecting Save from the Expression Dataset menu. Changes made to a Color Set are not saved until you do this.
* Exit the Expression Dataset Manager to view the Color Sets on a MAPP. Choose Exit from the Expression Dataset menu or click the close box in the upper right hand corner of the window.
* '''Upload your .gex file to your journal entry page for later retrieval.'''

=== MAPPFinder Procedure ===

'''''Note: You and your buddy will both do the same criterion, either "Increased" or "Decreased", but your group does not need to do both "Increased" and "Decreased" Sign up for the criterion you want on the group list ([[BIOL367/F10#Groups_3 | Fall 2010]] or [https://xmlpipedb.cs.lmu.edu/biodb/fall2013/index.php/Week_8#Groups Fall 2013]) so that we can make sure that as a class we are covering both criteria.'''''

* Launch the MAPPFinder program (or from within GenMAPP, select Tools > MAPPFinder).
* Make sure that the Gene Database for the correct species is loaded. The name of the Gene Database appears at the bottom of the window. If this is not the right one, go to File > Choose Gene Database and choose the correct one. (The Gene Databases are stored in the folder C:\GenMAPP 2 Data\Gene Databases\.)
* Click on the button "Calculate New Results".
* Click on "Find File" and choose the your Expression Dataset file, for example, "MyDataset.gex", and click OK.
** MAPPFinder may have found it for you already if you already had it open in GenMAPP, in which case, you just need to click OK.
* Choose the Color Set and Criteria with which to filter the data. Click on either the "Increased" and "Decreased" criteria in the right-hand box, depending on which one your group is doing. (You could select both by holding down the Control key while clicking).
* Check the boxes next to "Gene Ontology" and "p value".
* Click the "Browse" button and create a meaningful filename for your results.
* Click "Run MAPPFinder". The analysis will take several minutes. It may look like the computer is stalled; be patient, it will eventually start running.
* When the results have been calculated, a Gene Ontology browser will open showing your results. All of the Gene Ontology terms that have at least 3 genes measured and a p value of less than 0.05 will be highlighted yellow. A term with a p value less than 0.05 is considered a "significant" result. Browse through the tree to see your results.
* To see a list of the most significant Gene Ontology terms, click on the menu item "Show Ranked List".
** '''List the top 10 Gene Ontology terms in your individual journal entry.'''
** '''Compare your list with your buddy who used a different version of the Gene Database. Are your terms the same or different? Why do you think that is? Record your answer in your individual journal entry.'''
* One of the things you can do in MAPPFinder is to find the Gene Ontology term(s) with which a particular gene is associated. First, in the main MAPPFinder Browser window, click on the button "Collapse the Tree". Then, you can search for the genes that were mentioned by Merrell et al. (2002), VC0028, VC0941, VC0869, VC0051, VC0647, VC0468, VC2350, and VCA0583. Type the identifier for one of these genes into the MAPPFinder browser gene ID search field. Choose "OrderedLocusNames" from the drop-down menu to the right of the search field. Click on the GeneID Search button. The GO term(s) that are associated with that gene will be highlighted in blue. '''List the GO terms associated with each of those genes in your individual journal. (Note: they might not all be found.) Are they the same as your buddy who is using a different Gene Database? Why or why not?'''
* Click on one of the GO terms that are associated with one of the genes you looked up in the previous step. A MAPP will open listing all of the genes (as boxes) associated with that GO term. The genes named within the map are based on the UniProt identification system. To match the gene of interest to its identification go to the [http://www.uniprot.org/ UniProt site] and type in your gene ID into the search bar. Moreover, the genes on the MAPP will be color-coded with the gene expression data from the microarray experiment. '''List in your journal entry the name of the GO term you clicked on and whether the expression of the gene you were looking for changed significantly in the experiment.'''
** Double-click on the gene box. This will open a Internet Explorer window called the "Backpage" for this gene. This page has links to pages for this gene in the public databases. '''Click on the links to find out the function of this gene and record your answer in your individual journal page.'''
** The MAPP that has just been created is stored in the directory, C:\GenMAPP 2 Data\MAPPs\VC GO. '''Upload this file and link to it in your journal.'''
* In Windows, make a copy of your results (XXX-CriterionX-GO.txt) file.
** "XXX" refers to the name you gave to your results file.
** "CriterionX" refers to either "Criterion0" or "Criterion1". Since computers start counting at zero, "Criterion0" is the first criterion in the list you clicked on ("Increased" if you followed the directions) and "Criterion1" is the second criterion in the list you clicked on ("Decreased" if you followed the directions).
** '''Upload your results file to your journal page.'''
* Launch Microsoft Excel. Open the copies of the .txt files in Excel (you will need to "Show all files" and click "Finish" to the wizard that will open your file). This will show you the same data that you saw in the MAPPFinder Browser, but in tabular form.
* Look at the top of the spreadsheet. There are rows of information that give you the background information on how MAPPFinder made the calculations. '''Compare this information with your buddy who used a different version of the Vibrio Gene Database. Which numbers are different? Why are they different? Record this information in your individual journal entry.'''
* You will filter this list to show the top GO terms represented in your data for both the "Increased" and "Decreased" criteria. You will need to filter your list down to about 20 terms. Click on a cell in the row of headers for the data. Then go to the Data menu and click "Filter > Autofilter". Drop-down arrows will appear in the row of headers. You can now choose to filter the data. Click on the drop-down arrow for the column you wish to filter and choose "(Custom…)". A window will open giving you choices on how you want to filter. You must set these two filters:
Z Score (in column N) greater than 2
PermuteP (in column O) less than 0.05

:You will use these two filters depending on the number of terms you have:

Number Changed (in column I) greater than or equal to 4 or 5 AND less than 100
Percent Changed (in column L) greater than or equal to 25-50%

* Save your changes to an Excel spreadsheet. Select File > Save As and select Excel workbook (.xls) from the drop-down menu. Your filter settings won’t be saved in a .txt file.
* '''Are any of your filtered GO terms closely related to one another, meaning are they a direct child or parent to another term in the list? You can judge this by comparing your spreadsheet with the MAPPFinder browser. Highlight the terms that fit this relationship with the same color in your Excel spreadsheet. Upload your .xls file to your journal page.'''
* '''Interpret your results. Look up the definitions for any GO terms that are unfamiliar to you. The "official" definitions for GO terms can be found at [http://www.geneontology.org http://www.geneontology.org]. You can use one of the online biological dictionaries as a supplement, if needed. Write a paragraph relating the results of this GO analysis to the experiment performed (comparing laboratory-grown and patient-derived ''Vibrio cholerae''. You need to give a biological interpretation of what do each of these GO terms in your filtered list have to to with the pathogenecity of the bacterium? You may consult with your partner on this, but your explanation on your individual journal page needs to be in your own words. This is where the real "brain power" comes in with interpreting DNA microarray data. Even experienced scientists struggle with this part. Use your creativity as a scientist to stretch your brain in this question.'''
* '''There is one other file you need to save to your journal page. It has a .gmf extension and should be in the same fold as the .gex file that you created with the GenMAPP Expression Dataset Manager. You will need this file to re-open your results in MAPPFinder.'''

==== List of Files to Upload ====

It may be easier to zip all of these files together and then upload them as a single zipped file, rather than zipping and uploading individually (for filetypes not allowed by OpenWetware).

# Your exceptions file when you imported your data into GenMAPP: <code>.EX.txt</code>
# Your Expression Dataset file: <code>.gex</code>
# Your GO results file: <code>XXX-CriterionX-GO.txt</code>
# Your GO results saved as an Excel spreadsheet with filters applied: <code>.xls</code>
# The MAPP you looked at: <code>.mapp</code>
# The MAPPFinder GO mappings file: <code>.gmf</code>

[[Category:BIOL367/F10]]

Team H(oo)KD Week 15 Status Report

2013-12-05T18:17:14Z

Dwilliams: /* Reflection */ Formatting

{{Team H(oo)KD}}

'''Refer to the calendar on the team home page to see the milestones for this week.'''

==Coder Status Report==

The following was accomplished during Weeks 14-15:
*Determined why TallyEngine counted 917 genes in the gene database for ''C. trachomatis'' while the count was 919 when viewing the gene database in Access.
:*Apparently, one ordered locus ID (CTA_0406/CTA_0407/CTA_0408) is actually a combination of three ordered locus IDs each of which were predicted to correspond to a different gene before it was found that all three IDs actually correspond to the same gene. TallyEngine does not separate the ordered locus IDs while Access does.
:*In consulting with Dr. Dahlquist and Dr. Dionisio, we decided to leave this as is and describe the discrepancy in the final testing report, which is recorded in my [[Ksherbina Project Notebook|project journal]] under November 27-28, 2013.
*Customized the Tally Engine for ''C. trachomatis'' and committed the changes to SourceForge.

[[User:Ksherbina|Ksherbina]] ([[User talk:Ksherbina|talk]]) 22:41, 5 December 2013 (PST)

===Reflection===

#The Quality Assurance person (Hilda) and I were able to finish working on the gene database and figuring out how to separate the gene IDs from the Affymetrix IDs appened to the gene IDs in the microarray data. This allowed as to devote the rest of our time to working with Dillon to perform the GenMAPP and MAPPFinder analysis. Lucky for us, no exceptions file was generated when running GenMAPP with the microarray data with the modified IDs and the gene database for ''C. trachomatis''.
#When we originally set the milestones, we had planned to start working on both the paper and the presentation this week. Unfortunately, we had not advanced far enough in finalizing the database and performing GenMAPP analysis at the beginning of the week to be able to do this.
#With next week being finals week, we will have several meetings leading up to the final presentation in order to work on the paper, the presentation, and deliverables to make sure we meet the final deadline.

[[User:Ksherbina|Ksherbina]] ([[User talk:Ksherbina|talk]]) 23:02, 5 December 2013 (PST)

==Quality Assurance Status Report==
Katrina, the Coder, and I were able to work on separating the gene IDs so that statistical analysis could be performed by Dillon, the GenMAPP User, along with including this information into GenMAPP. The process of separating the IDs were tricky, but with Katrina's advice and some google searches I was able to manage separating the IDs, although I must admit it took Katrina 5 minutes to separate the IDs whereas it took me about close to an hour to figure out how to maneuver my way into figuring out how to go about this process. In any case, once this was done, Dillon (GenMAPP User) was able to generate the statistical analysis and the three of us proceeded to applying the data into GenMAPP. Luckily, we did not receive any errors throughout this transition to GenMAPP, so an exceptions file did not need to be generated. Moreover, we are now able to proceed with the project, so we will meet and start on our paper by Saturday. I will also proceed to complete the relational database schema for the gene database by Saturday. Katrina and I looked at the schema today and asked Dr. Dionisio questions about the schema to make sure we understood its format.

===Reflection===
#What worked?
#:Our schedules were able to match much better this week, so we were all able to meet without any problems.
#What didn't work?
#:We had hoped to start on the paper this week, but unfortunately we faced difficulties with the microarray data and gmbuilder, so we were not able to get far enough in the process to run GenMAPP earlier in the week.
#What will I do next to fix what didn't work?
#:We have resolved the problems we faced, so we will make sure to begin the paper as soon as possible (by Saturday) and if there are any questions we will contact Dr. Dionisio and Dr. Dahlquist immediately.

[[User:HDelgadi|HDelgadi]] ([[User talk:HDelgadi|talk]]) 00:05, 6 December 2013 (PST)

==GenMAPP User Status Report==
This week, after fixing the initial issues that we had with running the statistics on the C. trachomatis raw data, I was able to create the final excel spreadsheet that was to be used in GenMAPP. With the help of the coder and QA in formatting the gene ID's for the "for GenMAPP" sheet in the excel master spreadhseet, I was able to create a sheet with all of the information necessary to run the data in GenMAPP.
*I used the database the Katrina (coder) created and proceeded to use the tab delimited file of the final excel spreadsheet to run the dataset in GenMAPP.
*We had no errors.
*After running GenMAPP, I used the LogFC changes and P values of the Rifampicin/No Rifampicin EB to RB data to create a color set. The color set was then saved as a .gex file on our team page.
*After completing this, Katrina and I used MAPPFinder to find the changes in some of the metabolic and catabolic pathways of the genes used.
*We are currently in the process of creating a map for the changes in these pathways.
===Reflection===
#) This week, I spent a lot of time sorting out a lot of little problems. When it came to some of the things that required collaboration of multiple group members, Katrina and I were able to get the work done fairly efficiently and we both spent a lot of time making sure that everything ran smoothly.
#) There wasn't anything that didn't work per se, although we did not get as much done as we were hoping for, we are going to continue working on the project over the course of the weekend.
#) Next week we will really be focusing our time and energy into ensuring that everything is done correctly and that the paper is done as well as possible. I personally, will also contribute time to cleaning up my electronic lab notebook, streamlining some of the steps and making the page itself more readable (I have already begun to improve some of the formatting on my page). Ultimately, next week will consist of tying off any loose ends and ensuring that everything is running how it should be running.
-[[User:Dwilliams|Dwilliams]] ([[User talk:Dwilliams|talk]]) 22:07, 5 December 2013 (PST)

Vkuehn Week 15

2013-12-05T18:14:09Z

Vkuehn: /* GenMAPP Expression Dataset Manager */

=='''ELECTRONIC NOTEBOOK'''==
==12/5/13==
===Sanity Check===
*P-value less than 0.05: 5303
*P-value less that 0.01: 2130
*P-value less that 0.001: 317
*P-value less that 0.0001: 0
*Out of the 19201 T tests performed, more than 960 results had a P-value less than 0.05. Since 5303 genes passed this cut off it means that there were some significant changes.
*Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC" column to show all genes with an average log fold change greater than zero. These were the ones that increased relative to control. There were 2970
*Genes with an average log fold change less than zero. These were the ones that decreased relative to control. There were: 2334
*Average log fold change of > 0.25 and p < 0.05: 2861
*Average log fold change of < -0.25 and p < 0.05: 2431
===GenMAPP Expression Dataset Manager===
*For exact procedure reference [[Vkuehn Week 13]] or the procedure from http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols
* The colors were assigned to the two main criterion
**Increased relative to control had a Log FC> 0.25 and P-Value <0.05 these were colored '''blue'''
**Decreased relative to control had a Log FC< -0.25 and P-Value <0.05 these were colored '''purple'''

=='''12/7/13'''==
===GO Terms Analysed using MAPPFinder===
*Filtered Z Score (in column N) greater than 2
PermuteP (in column O) less than 0.05
:*Came up with 6 results for increased
:*Two GO terms were within the same family. "integral to membrane" and "intrinsic to membrane"
:*:Integral to membrane:Penetrating at least one phospholipid bilayer of a membrane. May also refer to the state of being buried in the bilayer with no exposure outside the bilayer. When used to describe a protein, indicates that all or part of the peptide sequence is embedded in the membrane. (under cellular compnent)
:*:intrinsic to membrane:Located in a membrane such that some covalently attached portion of the gene product, for example part of a peptide sequence or some other covalently attached group such as a GPI anchor, spans or is embedded in one or both leaflets of the membrane. (under cellular component)
*Filtered Z Score (in column N) greater than 2
PermuteP (in column O) less than 0.05. Number Changed (in column I) greater than or equal to 4 or 5 AND less than 100
:*Came up with 8 results for decreased
:*Found a lot that have to do with catabolic process.
:*"aromatic compound catabolic process" "cellular nitrogen catabolic process" "nucelobase containing compound catabolic process" "heterocyled catabolic process"
:*:aromatic compound catabolic process:The chemical reactions and pathways resulting in the breakdown of aromatic compounds, any substance containing an aromatic carbon ring.
:*:cellular nitrogen catablolic process: The chemical reactions and pathways resulting in the breakdown of organic and inorganic nitrogenous compounds

===GenMAPP Finder===
Ran genmapp and waiting for results to be generated

{{Template:Vkuehn}}
[[Category: Journal Entry]]

Kmeilak Week 14

2013-12-05T18:13:54Z

Kmeilak:

==Electronic Lab Notebook==

===11/29/13===

{{BIOL398-01/S10}}

<div style="padding: 10px; width: 720px; border: 5px solid #000000;">

This page has been written with the analysis of the ''Vibrio cholerae'' dataset in mind. However, these steps are similar to what needs to be performed with ''any'' microarray dataset (see [[BIOL398-01/S10:DNA_Microarrays#Overview_of_Microarray_Data_Analysis | Overview of Microarray Data Analysis]], although the details will differ with the particular experimental design.

== Normalize the log ratios for the set of slides in the experiment ==

To scale and center the data (between chip normalization) perform the following operations:

* Insert a new Worksheet into your Excel file, and name it "scaled_centered".
* Go back to the "compiled_raw_data" worksheet, Select All and Copy. Go to your new "scaled_centered" worksheet, click on the upper, left-hand cell (cell A1) and Paste.
* Insert two rows in between the top row of headers and the first data row.
* In cell A2, type "Average" and in cell A3, type "StdDev".
* You will now compute the Average log ratio for each chip (each column of data). In cell B2, type the following equation:
=AVERAGE(B4:B5224)
: and press "Enter". Excel is computing the average value of the cells specified in the range given inside the parentheses. Instead of typing the cell designations, you can click on the beginning cell, scroll down to the bottom of the worksheet, and shift-click on the ending cell.
* You will now compute the Standard Deviation of the log ratios on each chip (each column of data). In cell B3, type the following equation:
=STDEV(B4:B5224)
: and press "Enter".
* Excel will now do some work for you. Copy these two equations (cells B2 and B3) and paste them into the empty cells in the rest of the columns. Excel will automatically change the equation to match the cell designations for those columns.
* You have now computed the average and standard deviation of the log ratios for each chip. Now we will actually do the scaling and centering based on these values.
* Copy the column headings for all of your data columns and then paste them to the right of the last data column so that you have a second set of headers above blank colums of cells. Edit the names of the columns so that they now read: A1_scaled_centered, A2_scaled_centered, etc.
* In cell N4, type the following equation:
=(B4-B$2)/B$3
: In this case, we want the data in cell B4 to have the average subtracted from it (cell B2) and be divided by the standard deviation (cell B3). We use the dollar sign symbols in front of the "2" and "3" to tell Excel to always reference that row in the equation, even though we will paste it for the entire column. Why is this important?
* Copy and paste this equation into the entire column.
* Copy and paste the scaling and centering equation for each of the columns of data with the "_scaled_centered" column header. Be sure that your equation is correct for the column you are calculating.

== Perform statistical analysis on the ratios ==

We are going to perform this step on the scaled and centered data you produced in the previous step.

* Insert a new worksheet and name it "statistics".
* Go back to the "scaling_centering" worksheet and copy the first column ("ID").
* Paste the data into the first column of your new "statistics" worksheet.
* Go back to the "scaling_centering" worksheet and copy the columns that are designated "_scaled_centered".
* Go to your new worksheet and click on the B1 cell. Select "Paste Special" from the Edit menu. A window will open: click on the radio button for "Values" and click OK. This will paste the numerical result into your new worksheet instead of the equation which must make calculations on the fly.
* Go to a new column on the right of your worksheet. Type the header "Avg_LogFC_A", "Avg_LogFC_B", and "Avg_LogFC_C" into the top cell of the next three columns.
* Compute the average log fold change for the replicates for each patient by typing the equation:
=AVERAGE(B2:E2)
: into cell N2. Copy this equation and paste it into the rest of the column.
* Create the equation for patients B and C and paste it into their respective columns.
* Now you will compute the average of the averages. Type the header "Avg_LogFC_all" into the first cell in the next empty column. Create the equation that will compute the average of the three previous averages you calculated and paste it into this entire column.
* Insert a new column next to the "Avg_LogFC_all" column that you computed in the previous step. Label the column "Tstat". This will compute a T statistic that tells us whether the scaled and centered average log ratio is significantly different than 0 (no change). Enter the equation:
=AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates))
: (NOTE: in this case the number of replicates is 3. Be careful that you are using the correct number of parentheses.) Copy the equation and paste it into all rows in that column.
* Label the top cell in the next column "Pvalue". In the cell below the label, enter the equation:
=TDIST(ABS(R2),degrees of freedom,2)
The number of degrees of freedom is the number of replicates minus one, so in our case there are 2 degrees of freedom. Copy the equation and paste it into all rows in that column.
* Insert a new worksheet and name it "forGenMAPP".
* Go back to the "statistics" worksheet and Select All and Copy.
* Go to your new sheet and click on cell A1 and select Paste Special, click on the Values radio button, and click OK. We will now format this worksheet for import into GenMAPP.
* Select Columns B through Q (all the fold changes). Select the menu item Format > Cells. Under the number tab, select 2 decimal places. Click OK.
* Select Columns R and S. Select the menu item Format > Cells. Under the number tab, select 4 decimal places. Click OK.
* Select Columns N through S and Cut. Select Column B by left-clicking on the "B" at the top of the column. Then right-click on the Column B header and select "Insert Cut Cells". This will insert the data without writing over your existing columns.
* Delete Rows 2 and 3 where it says "Average" and "StDev" so that your data rows with gene IDs are immediately below the header row 1.
* Insert a column to the right of the "ID" column. Type the header "SystemCode" into the top cell of this column. Fill the entire column (each cell) with the letter "N".
* Select the menu item File > Save As, and choose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu. Excel will make you click through a couple of warnings because it doesn't like you going all independent and choosing a different file type than the native .xls. This is OK. Your new *.txt file is now ready for import into GenMAPP. But before we do that, we want to know a few things about our data as shown in the next section.
** Upload both the .xls and .txt files that you have just created to your journal page in the class wiki. Make sure that your file name is distinct from your other classmates so that nobody overwrites anyone else's file.

== Sanity Check: Number of genes significantly changed ==

Before we move on to the GenMAPP/MAPPFinder analysis, we want to perform a sanity check to make sure that we performed our data analysis correctly. We are going to find out the number of genes that are significantly changed at various p value cut-offs and also compare our data analysis with the published results of Merrell et al. (2002).

* Open your spreadsheet and go to the "forGenMAPP" tab.
* Click on cell A1 and select the menu item Data > Filter > Autofilter. Little drop-down arrows should appear at the top of each column. This will enable us to filter the data according to criteria we set.
* Click on the drop-down arrow on your "Pvalue" column. Select "Custom". In the window that appears, set a criterion that will filter your data so that the Pvalue has to be less than 0.05.
** How many genes have p value < 0.05?
** What about p < 0.01?
** What about p < 0.001?
** What about p < 0.0001?
* When we use a p value cut-off of p < 0.05, what we are saying is that you would have seen a gene expression change that deviates this far from zero less than 5% of the time.
* We have just performed 5221 T tests for significance. Another way to state what we are seeing with p < 0.05 is that we would expect to see this magnitude of a gene expression change in about 5% of our T tests, or 261 times. Since we have more than 261 genes that pass this cut off, we know that some genes are significantly changed. However, we don't know ''which'' ones.
* The "Avg_LogFC_all" tells us the size of the gene expression change and in which direction. Positive values are increases relative to the control; negative values are decreases relative to the control.
** Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there?
** Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there?
** What about an average log fold change of > 0.25 and p < 0.05?
** Or an average log fold change of < -0.25 and p < 0.05? (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
* In summary, the p value cut-off should not be thought of as some magical number at which data becomes "significant". Instead, it is a moveable confidence level. If we want to be very confident of our data, use a small p value cut-off. If we are OK with being less confident about a gene expression change and want to include more genes in our analysis, we can use a larger p value cut-off. For the GenMAPP analysis below, we will use the fold change cut-off of greater than 0.25 or less than -0.25 and the p value cut off of p < 0.05 for our analysis because we want to include several hundred genes in our analysis.
* What criteria did Merrell et al. (2002) use to determine a significant gene expression change? How does it compare to our method?

== Sanity Check: Compare individual genes with known data ==

* Merrell et al. (2002) report that genes with IDs: VC0028, VC0941, VC0869, VC0051, VC0647, VC0468, VC2350, and VCA0583 were all significantly changed in their data. Look these genes up in your spreadsheet. What are their fold changes and p values? Are they significantly changed in our analysis?

</div>

Kevinmcgee Week 15

2013-12-05T18:02:40Z

Kevinmcgee: /* Running MAPPFinder */

=Uploading Into GenMAPP=
==Datasheet==
#compiled all data onto a single data sheet including both L.Major and L.Infantum and uploaded into GenMAPP
#*Ran into some problems uploading (almost everything was an error)
#Filtered out any Lin genes and created a new datasheet for only LmfJ genes.
#*Still ran into problems uploading
#looked at the database OrthologicalNames sheet, and saw that the GeneID's were in a different format on there then in the spreadsheet.
#*Made a quick-fix file by changing the names on the spreadsheet, but longterm fixes are being made to the coding so that other users do not have to change the names on their spreadsheets every time (for convenience).
#Were able to upload our data and continue on with the project.
==Sanity Check==
===Leishmania Infantum===
#Filtered P-value
#*1392 genes were <.05
#*327 genes were <.01
#*67 genes were <.001
#*28 genes were >.0001
#Filtered Average Log Fold Change
#*748 genes were >0
#*646 genes were <0
#*699 genes were >.25 while 646 were <.05
#*748 genes were >.05 while 606 were <.-25
==MAPPFinder Color Changes==
#The colors were assigned to the two main criterion
#*Increased relative to control had a Log FC> 0.25 and P-Value <0.05 these were colored blue
#*Decreased relative to control had a Log FC< -0.25 and P-Value <0.05 these were colored purple
==Running MAPPFinder==
*Set up MAPPFinder to run with the file name LMajorGOMap
*Ran for about 1 1/2 hours
#Top Ten GO Terms
#*catalytic activity
#*Endonuclease activity
#*DNA catabolic process
#*Aromatic compound catabolic process
#*cellular nitrogen compound catabolic process
#*nucleobase-containing compound catabolic process
#*organic cyclic compound catabolic process
#*heterocycle catabolic process
#*oxidoreductase activity
#*macromolecular complex

[[image: Picture GO.PNG]]

Taur.vil Week 15

2013-12-05T17:47:19Z

Taur.vil: /* Lab Notebook */

==Lab Notebook==
===Final Checks===
*Renamed the database as Streptococcus_pneumoniae_TIGR4_20131125.gdb so the name would include the species name
*Ran TallyEngine to verify database readings in access
**All values were the same
*Coordinated with Kevin to work in GenMapp and with Alina to verify the code worked and the database exceptions were not present in the XML file

===Documentation===
*Created a ReadMe file for the Streptococcus_pneumoniae_TIGR4_20131125.gdb based on the VC example [[Media:ReadMe_Vc-Std_External_20131122.zip | ReadMe_Vc-Std_External_20131122.zip]]
*Organized/cleaned up the online accounts of the project including updating templates, organizing files, and creating progress report

==Template==
{{Template:Team ATK}}
===Personal Template===
{{Template:Taur.vil}}

Ajvree Week 15

2013-12-03T19:10:38Z

Ajvree: /* Links */

==12/3/13==

'''Comparing gene id's in excel:'''
*first search in xmlmatch:
**searched for SP_[0-9][0-9][0-9][0-9] in uniprot file
**got 2126 match results
**saved to text file
*used excel to compare lists
**to convert match id's from SP_#### to SP####, used control F (search/replace function) to replace underscore with a 'no space'
*link to instructions for excel match comparison found here: [[http://xmlpipedb.sourceforge.net/wiki/index.php/Using_Microsoft_Excel_to_Compare_ID_Lists]]
*NO Matches found, all results were N/A
*'''Excel File:''' [[Media:20131203_IDExcelcomparison_tATK_TIGR4_AJV.xls|ID Comparison]]
**Out of all 333 Gene ID's, none were found in original XML file, so those errors cannot be fixed

'''Visual Representation of Data Information'''
*Finished in-class 12/3/13
[[Image:TATK tree.png]]

'''Compilation of Counting Totals/Visual Information'''
*Began word document with all visual counting results in one place
*still missing information regarding comparisons to microarray data and MOD
*File: (uploaded as pdf since wiki refused to upload it as a .doc) [[Media:20131203 CompiledCounts tATK TIGR4 AJV.pdf|PDF]]

==12/5/13==
'''Updating of Export 4 Testing Page'''
*Added relevant information to Tally Engine, XMLPipeDB, Row Counts, SQL, (etc) sections.
*Reformatted images and added images previously found on Export 1 page.

==Links==
{{Team ATK}}

Leishmania Major Group Project Report

2013-12-03T19:03:39Z

Lena:

'''Methods'''

:'''Data Source Files'''
:The Uniprot XML proteome set was downloaded from the Uniprot complete proteomes page for Leishmania major. We used the version that was last updated on October 16th 2013. The GOA (GO association) file was downloaded from the Uniprot-GOA downloads page. Our version was downloaded on November 14th 2013. The GO file was downloaded from the Ontology Downloads page (we used the beta version of the page). The version was from November 4th 2013, ‏‎2:03:38 AM in the obo-xml.gz format.
:'''Generating Gene Database'''
:In PostgresSQL, a new database was created called Leishmania_05112013_Lena_Gabe. GenMAPP Builder tables were then created in PostgresSQL. GenMAPP builder was downloaded from Source Forge (version: GenMAPP Builder 2.0b71.) The Leishmania database that was created in PostgresSQL was then configured to GenMAPP Builder and we imported our Uniprot XML and GOA files and our GOA file. Once all the files were imported, we exported a GenMAPP Gene Database for Leishmania major was saved as Leishmania_05112013_Lena_Gabe.gdb.
:'''Inspecting Database'''
:To make sure our Gene Database export had worked, we ran Tally Engine to make sure our XML count matched our Database count. We also used XMLpipedb Match and SQL query to make sure we could match our Gene IDs.
:# Prepare microarray data (organize, normalize, perform statistical analysis)
:# Run GenMAPP using the Gene Database.
:# Microarray data (import using Expression Dataset Manager)
:# Run MAPPFinder analysis
:# Place genes on MAPP and draw pathway

[[Category: Leishmania Major]]

Gleis Week 15

2013-12-03T18:50:37Z

Gleis:

==Lab Journal==
*Code needed ID changes in Eclipse
*Replaced first period with an "_" and removed second "_"
*Replaced to look like "LMJF_##_####"
*First export failed due to minor error in Eclipse code
*Initiated Gene Database Testing Report
*Began new export with edited code for database
*Need to review errors in GenMAPP
*New export somewhat successful
*Database not catching errors of the form LmjF##.#####
*IDs not present in XML follow the form LmjF01.[0160-1983]
:*Except IDs ending in zero are found in XML for example IDs Lmjf01.0160 and LmjF01.1970 are found but IDs LmjF01.016[1-9] and LmjF01.197[1-9]would not be found

[[Leishmania major]]

[[User:Gleis|Gleis]] ([[User talk:Gleis|talk]]) 22:24, 6 December 2013 (PST)

[[Category:Journal Entry]] [[Category:Leishmania major]]

Laurmagee: Week 15

2013-12-03T18:28:06Z

Laurmagee: /* GenMAPP and MAPPFinder Protocols */

==Statistical Analysis==
*Open the following spreadsheet [[File:Compiled Ratios and Logs.xls]] from [[Laurmagee: Week 13]].
*Begin a new workbook and copy over the Gene ID column into cell A1. Then in the subsequent columns, copy over the log values found in your previous sheet. Order all the different time periods in increasing intervals and sort the replicants at each time in increasing order as well. After this pasting has been done, your column titles across the top of the worksheet will say "Gene ID", "log_700S1-t15", "log_700S2-t15", "log_700S3-t15"... and so on for t30, t60, and t240.
*Begin scaling and centering the data by first inserting a new worksheet in Excel labeled "scaled_centered".
*Select and copy all of the data from your original worksheet. Then paste it into cell A1 in new worksheet.
*Insert two rows in between the top row of headers and the first data row. In cell A2, type "Average" and in cell A3, type "StdDev".
*You will now compute the Average log ratio for replicant and time period.
*In cell B2, type the following equation:
=AVERAGE(B4:B5224)
and press "Enter".
*Excel is computing the average value of the cells specified in the range given inside the parentheses. Instead of typing the cell designations, you can click on the beginning cell, scroll down to the bottom of the worksheet, and shift-click on the ending cell.
*You will now compute the Standard Deviation of the log ratios on each chip (each column of data). In cell B3, type the following equation:
=STDEV(B4:B5224)
and press "Enter".
*Excel will now do some work for you. Copy these two equations (cells B2 and B3) and paste them into the empty cells in the rest of the columns. Excel will automatically change the equation to match the cell designations for those columns.
You have now computed the average and standard deviation of the log ratios for replicant and time period.
*Copy the column headings for all of your data columns and then paste them to the right of the last data column so that you have a second set of headers above blank colums of cells. Edit the names of the columns so that they now read: log_700S1-t15_scaled_centered, log_700S2-t15_scaled_centered, etc.
*In cell N4, type the following equation:
=(B4-B$2)/B$3
*In this case, we want the data in cell B4 to have the average subtracted from it (cell B2) and be divided by the standard deviation (cell B3). We use the dollar sign symbols in front of the "2" and "3" to tell Excel to always reference that row in the equation, even though we will paste it for the entire column.
*Copy and paste this equation into the entire column.
*Copy and paste the scaling and centering equation for each of the columns of data with the "_scaled_centered" column header. Be sure that your equation is correct for the column you are calculating.

*Insert a new worksheet and name it "statistics".
*Go back to the "scaling_centering" worksheet and copy the first column ("ID").
*Paste the data into the first column of your new "statistics" worksheet.
*Go back to the "scaling_centering" worksheet and copy the columns that are designated "_scaled_centered".
*Go to your new worksheet and click on the B1 cell. Select "Paste Special" from the Edit menu. A window will open: click on the radio button for "Values" and click OK. This will paste the numerical result into your new worksheet instead of the equation which must make calculations on the fly.
*Go to a new column on the right of your worksheet. Type the header "Avg_LogFC_t15", "Avg_LogFC_t30", "Avg_LogFC_60", and "Avg_LogFC_240" into the top cell of the next four columns.
*Compute the average log fold change for the replicates for each patient by typing the equation:
=AVERAGE(B2:D2)
into cell N2. Copy this equation and paste it into the rest of the column.
*Create the equation for times t30, t60, and t240 and paste it into their respective columns.
*Label the next four columns "Tstat_t15", "Tstat_t30", "Tstat_t60", and "Tstat_t240". This will compute a T statistic that tells us whether the scaled and centered average log ratio is significantly different than 0 (no change). Enter the equation:
=N2/(STDEV(B2:D2)/SQRT(3))
*(NOTE: in this case the number of replicates is 3. Be careful that you are using the correct number of parentheses.) Copy the equation and paste it into all rows in that column as well as the next three column making sure to change the cells involved in the equation accordingly.
*Label the top cell in the next four columns "Pvalue_t15", "Pvalue_t30", "Pvalue_t60", and "Pvalue_t240". In the cell below the label, enter the equation:
=TDIST(ABS(R2),2, 2)
*The number of degrees of freedom is the number of replicates minus one, so in our case there are 2 degrees of freedom. Copy the equation and paste it into all rows in that column and the next three columns making sure to change the cell involved to the appropriate Tstat value.
*Insert a new worksheet and name it "forGenMAPP".
*Go back to the "statistics" worksheet and Select All and Copy.
*Go to your new sheet and click on cell A1 and select Paste Special, click on the Values radio button, and click OK. We will now format this worksheet for import into GenMAPP.
*Select Columns B through Q (all the fold changes). Select the menu item Format > Cells. Under the number tab, select 2 decimal places. Click OK.
*Select Columns R and Z. Select the menu item Format > Cells. Under the number tab, select 4 decimal places. Click OK.
*Select Columns N through Z and Cut. Select Column B by left-clicking on the "B" at the top of the column. Then right-click on the Column B header and select "Insert Cut Cells". This will insert the data without writing over your existing columns.
*Delete Rows 2 and 3 where it says "Average" and "StDev" so that your data rows with gene IDs are immediately below the header row 1.
*Insert a column to the right of the "ID" column. Type the header "SystemCode" into the top cell of this column. Fill the entire column (each cell) with the letter "N".
*Select the menu item File > Save As, and choose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu. Excel will make you click through a couple of warnings because it doesn't like you going all independent and choosing a different file type than the native .xls. This is OK. Your new *.txt file is now ready for import into GenMAPP.
*[[File:SinorhizobiumMeliloti LM GenMapp DataSheet.xls]]

==Sanity Check==
#How many genes have p value < 0.05 for the time set of 15 minutes?
#*3613 genes
#How many genes have p value < 0.05 for the time set of 30 minutes?
#*5225 genes
#How many genes have p value < 0.05 for the time set of 60 minutes?
#*5207 genes
#How many genes have p value < 0.05 for the time set of 240 minutes?
#*6790 genes
#How many genes have p value < 0.01 for the time set of 15 minutes?
#*907 genes
#How many genes have p value < 0.01 for the time set of 30 minutes?
#*1518 genes
#How many genes have p value < 0.01 for the time set of 60 minutes?
#*1553 genes
#How many genes have p value < 0.01 for the time set of 240 minutes?
#*2437 genes
#How many genes have p value < 0.001 for the time set of 15 minutes?
#*92 genes
#How many genes have p value < 0.001 for the time set of 30 minutes?
#*179 genes
#How many genes have p value < 0.001 for the time set of 60 minutes?
#*172 genes
#How many genes have p value < 0.001 for the time set of 240 minutes?
#*347 genes
#How many genes have p value < 0.0001 for the time set of 15 minutes?
#*7 genes
#How many genes have p value < 0.0001 for the time set of 30 minutes?
#*15 genes
#How many genes have p value < 0.0001 for the time set of 60 minutes?
#*13 genes
#How many genes have p value < 0.0001 for the time set of 240 minutes?
#*36 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 15 minutes?
#*1521 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 30 minutes?
#*1926 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 60 minutes?
#*2194 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 240 minutes?
#*2846 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 15 minutes?
#*2092 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 30 minutes?
#*3299 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 60 minutes?
#*3013 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 240 minutes?
#*3944 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 15 minutes?
#*1476 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 30 minutes?
#*1890 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 60 minutes?
#*2129 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 240 minutes?
#*2763 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 15 minutes?
#*2052 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 30 minutes?
#*3256 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 60 minutes?
#*2942 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 240 minutes?
#*3866 genes

==GenMAPP and MAPPFinder Protocols==
*To begin GenMAPP analysis, first launch GenMAPP 2 or download it off of the following website: http://genmapp.org.
*Look at the lower-left hand corner to see what gene database is loaded. For this assignment, the gene database is [[Media:Sinorhizobium_meliloti_1021_mpetredi_2013123-2.gdb]] should appear in the corner.
*If another database appears or if there is "No Gene Database", go to Data > Choose Gene Database and find the database you need to use.
*Once the correct database is loaded, go to Data > Expression Dataset Manager. This will allow you to input the data file created in the "Statistical Analysis" portion of this page.
*In the window that pops up, go to Expression Datasets > New Dataset and open the tab-delimited file you created for GenMAPP: [[Media:SinorhizobiumMeliloti_LM_GenMapp_DataSheet.txt]]
*In the "Data Type Specification" window that pops up, only check the box next to a column header if that column has character data. All of the boxes should remain unchecked, because none of the columns in our dataset contain non-numerical values.
*Give the Expression Dataset Manager time to convert your data into a GEX file.
*An error message may appear that states that the Expression Dataset Manager was unable to convert some of the lines of the data. These lines of data are not incorporated into the Expression Dataset but rather recorded in an exception file that contains all of your raw data and an additional column called ~Error~.
*The exception file is a tab-delimited file with the suffix .EX appended to the name of the raw data file you loaded into the Expression Dataset Manager.
*Open the the exception file in Excel and filter the data to note what errors have been recorded.
*Using the .gdb Gene Database created by my partners, there were 5,538 errors, each of which was "Gene not found in OrderedLocusNames or any related system."
*Customize the new Expression Dataset by creating Color Sets, which contain the instructions to GenMAPP for displaying data on MAPPs.
*In the "Color Sets" section, type in your own title into the "Name" field.
*To specify what value appears next to each gene on a MAPP, select "Avg_LogFC_t15" in the drop down menu in the "Gene Value" field.
*We are using the t15 time period for this step to represent the results from all four time intervals, because it would be too challenging to complete this protocol with all four time interval values.
*In the "Criteria Builder" section, click on the "New" button. Now, we will construct the criterion to query the data.
*We will set the criterion to query for all the genes that have a significant (i.e. [Pvalue] < 0.05) decrease in the average log fold change (i.e. [Avg_LogFC_t15] < -0.25).
*In the menu under "Columns" in the "Criteria Builder" section, select "Avg_LogFC_t240", which will then appear in the "Criterion" field.
*Then choose the "<", ">", and "=" as appropriate, paired with -0.25.
*Type out the word "AND" in this same field and select "Pvalue_t15" and the "Ops" accordingly. .
*Under "Ops", click on the "<" operator. Then, type 0.05 (this will appear in the "Criterion" field).
*Enter the name for the criterion in the "Label in Legend" field "Decreased", as we are looking for the Avg_LogFC that have decreased.
*Choose a color for the criterion by left-clicking on the box next to "Color". Choose a color from the Color window that appears and click OK. In this experiment, the color red was chosen.
*You may now click the "Add" button.
*Now we will add another criterion. In the "Criteria Builder" section, click on the "New" button. Now, we will construct the criterion to query the data.
*We will set the criterion to query for all the genes that have a significant (i.e. [Pvalue] < 0.05) decrease in the average log fold change (i.e. [Avg_LogFC_t15] > 0.25).
*In the menu under "Columns" in the "Criteria Builder" section, select "Avg_LogFC_t240", which will then appear in the "Criterion" field.
*Then choose the "<", ">", and "=" as appropriate, paired with 0.25.
*Type out the word "AND" in this same field and select "Pvalue_t15" and the "Ops" accordingly. .
*Under "Ops", click on the "<" operator. Then, type 0.05 (this will appear in the "Criterion" field).
*Enter the name for the criterion in the "Label in Legend" field "Increased", as we are looking for the Avg_LogFC that have increased.
*Choose a color for the criterion by left-clicking on the box next to "Color". Choose a color from the Color window that appears and click OK. In this experiment, the color green was chosen.
*You may now click the "Add" button.
*Save the entire Expression Dataset by going to Expression Datasets > Save.
*Exit the Expression Dataset to view the Color Sets on a MAPP.
*[[Media:ColorSets.mapp]]

*Moving onto the MAPPFinder Protocol, we will stay in GenMAPP, but selected Tools > MAPPFinder.
*Click on the button "Calculate New Results" and then choose the "Find File" button on the page to load your GEX file: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gex]]
*Choose the Color Set and Criteria with which to filter the data. Click on "Decreased" criteria in the right-hand box then check the two boxes labeled "Gene Ontology" and "p value".
*Then "Browse" through your computer and create a meaningful filename for the project.
*Now you can hit "Run MAPPFinder".
*It will take a while for this process to finish, but a Gene Ontology browsers with open showing your results when it has been completed.
*To see a list of the most significant Gene Ontology terms, click on the menu item "Show Ranked List".
*The top twenty go terms are listed in the following image: [[File:MAPPFinder.PNG]]
*In Windows, make a copy of the results file and open it in Excel.
*Click on a cell in the row of headers. On the tool bar, select Sort & Filter > Filter. Set the following filters:
**Z Score greater than 2
**Permute P less than 0.05
**Number Changed greater than or equal to 5 and less than 100.
**Percent Changed greater than or equal to 25
**108 results found
*Save the file as a different Excel spreadsheet name by selecting File > Save As and select Excel workbook (.xls) from the drop-down menu.
*Now look at the "Increased" data. We will stay in GenMAPP, but selected Tools > MAPPFinder.
*Click on the button "Calculate New Results" and then choose the "Find File" button on the page to load your GEX file: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gex]]
*Choose the Color Set and Criteria with which to filter the data. Click on "Increased" criteria in the right-hand box then check the two boxes labeled "Gene Ontology" and "p value".
*Then "Browse" through your computer and create a meaningful filename for the project.
*Now you can hit "Run MAPPFinder".
*It will take a while for this process to finish, but a Gene Ontology browsers with open showing your results when it has been completed.
*To see a list of the most significant Gene Ontology terms, click on the menu item "Show Ranked List".
*The top twenty go terms are listed in the following image: [[File: MAPPFinder Capture1.PNG]]
*In Windows, make a copy of the results file and open it in Excel.
*Click on a cell in the row of headers. On the tool bar, select Sort & Filter > Filter. Set the following filters:
**Z Score greater than 2
**PermuteP less than .05
**Number Changed greater than or equal to 5 and less than 100
**Percent Changed greater than or equal to 25
**6 results found
*Save the file as a different Excel spreadsheet name by selecting File > Save As and select Excel workbook (.xls) from the drop-down menu.

Electronic notebook: sinorhizobium meliloti

2013-12-03T18:23:50Z

Mmalefyt: /* Reflection on the Process */

==Week 12==
*Read the paper on the salinity and sucrose stress on gene expression
*Sorted the raw data into an XML file
*started to compile the raw data
**downloaded all raw data and sorted through the information needed
**used the cys5 and cys3 fold change as well as all the IDs
*Uploaded [[Media:Team_Name_NaCl_compiled_raw_Data.xls|300 NaCl compiled data set]]

==Week 13==
*I continued to sort the raw data and began to process the data in an xls file
*this was a very repetitive part because it involved a lot of replications for each time set
*finished my compiled raw data and processed raw data, as well as the data ready for GenMAPP
*Made individual LOG fold change ratios for each time point replicate then averaged all of the LOG fold change ratios for each of the time points
*preformed a Tstat test
*Preformed a Pvalue test
*added a row of N next to the gene ID name in the forGenMAPP tab
*uploaded [[Media:Complete_processed_Data.xls|Processed Data]]
**NOTE:the GenMAPP version of the tab is labeled Complete Processed data_MPM and not forGenMAPP

==week 15==
*I worked on some of the mistakes that I had made in my prior data sets
**removed AVG_LOGFC_ALL row
**added individual Pvalues and TSTAT for each individual replicate of the experiment instead of one Tstat and P value for the whole experiment
*sanity check concluded the number of genes significantly changed at each time point
**T15- 5520
**T30- 7484
**T60- 6711
**T240- 5901
*Removed all of the #DIV/0! from the data that was transferred over to the GenMapp data
*uploaded [[Media:Complete_processed_Data_MPM.xls|XLS Version]] and [[Media:Complete_processed_Data_MPM.txt|TXT version, USE THIS]]

*had to change names of the columns in order to correctly upload to GenMAPP
**system code column was renamed
**Gene ID column was renamed to ID on the Programmers computer to resolve some issues
*it appears that there is something wrong with the actual gene IDs that is not compatible with GenMAPP
*ran the first integration of the data and came up with 5535 errors which is roughly half of the overall genes we loaded.

==Reflection on the Process==
'''What did you learn?'''
*'''With your head?'''
*I learned a lot about gene expression and metabolic pathways and how the cell responds in many ways to a single stimulus. Also I learned a lot about the principles of coding and how to organize a query in language that the computer understands
*'''With you Heart'''
* I learned that people need to keep in contact every few days and need the latest updates on your progression in the work. Also coordinating with your team in order to tackle a broader objective that contains many parts and disciplines, it was really rewarding to see that we all started working on the project at completely different ends but came together in the end with a single final result.
*'''with your Hands? (technical skills)'''
I learned to organize and operate GenMAPP and map finder with enough proficiency that I could probably do it again if I was given the same project but on a different species. I also learned how to format things in a way that the computer could read and understand what I wanted in order to get a desired result
'''What lesson will you take away from this project that you will still use a year from now?'''
*The lessons that I will take away from this project was that you need to understand the gist of where your partners are in their work even if you don't understand what they are doing. This will allow you to get a general idea in mind about when your job and their job will meet and could give you insight if there is a discrepancy between your work and their. In other words a problem that arises when your part of the project meets theirs but cant be found out until both of the pieces of work come together

TATK Week 15 Status Report

2013-12-03T18:03:32Z

Ajvree: /* Alina */ signature

==Project Progress==
*Coding and QA assurance complete
*Gex files uploaded
*Working on final processing of GenMapp data and pathway creation
*Tauras and Alina started formatting final deliverable documents

==Important Numbers and Results==

*numbers of significant p-values by hour for TIGR4
**4hr - 1218
**12hr - 1902
**24hr - 918
**48hr - 129

*333 errors resulted from GenMAPP import of TIGR4 data
**None of the gene IDs were found in the XML file, indicating there was no accompanying code issue

==Reflections==

===Tauras===

===Kevin===

===Alina===
*'''What worked?'''
We were able to compare the gene IDs in excel to make sure the errors were not found in the original XML file. Since none of the IDs matched the original file, there is nothing further we could do to correct the errors.
*'''What didn't work?'''
All of the data that I worked with was used without any obstacles.
*'''What will I do next to fix what didn't work?'''
Overall, there wasn't really anything that didn't work, so I will continue to strive to work without any obstacles while completing the rest of the project.

[[User:Ajvree|Ajvree]] ([[User talk:Ajvree|talk]]) 20:33, 5 December 2013 (PST)

==Template==
{{Template:Team ATK}}

Mpetredi Week 14

2013-12-03T17:08:09Z

Mpetredi: added image of new string of code

==[[user:mpetredi|'''Mitchell Petredis''']]==

'''Lab Progression'''

==December 3, 2013==
*Added the following string of code in Eclipse (line is highlighted)
**[[Image:2013123_New_String_of_Code.PNG]]
**String of code also available below as text:

@Override
public TableManager getSystemTableManagerCustomizations(TableManager tableManager, TableManager primarySystemTableManager, Date version) throws SQLException, InvalidParameterException {
List<String> comparisonList = new ArrayList<String>();
comparisonList.add("ordered locus");
comparisonList.add("ORF");

return systemTableManagerCustomizationsHelper(tableManager, primarySystemTableManager, version, "OrderedLocusNames", comparisonList);
}
*Used all files except for the .gdb to run another import/export, using the latest version of gmbuilder
*All information pertaining to import/export 3 can be found under [[[[Team Name]]]] in "Important Files 3".

[[[[Team Name]]]]

Gleis Week 14

2013-11-26T19:13:17Z

Gleis: /* Status Report */

==To Do List==
*Identify new URL pattern??
*Finish export and upload microarray data
==Status Report==
*Began new export
*Used postgres and TallyEngine to ensure proper database version
*Postgres identified geneID formatting and exceptions:
:[[Media:SQL Query ResultsandIDPattern Leishmania.PNG]]
*Five exceptions found- identify if relevant
*Export Status update @ 67%
*Screenshot of command prompt @ 67% of export
:[[Media:Leishmania GDBexport status 67percent.PNG]]
*Export to GenMAPP finished
*Screenshot of command prompt at finish
:[[Media:Leishmania GDBexport complete 26112013.PNG]]
*No errors appear to be present in export
*Tallyengine and postgres reported 8355 ORF IDs
*Access reports 8354 Ordered Locus Names
*attempted import of microarray data
*import yielded the following error:
:[[Media:Leishmania Data import error.PNG]]

Ajvree Week 14

2013-11-26T18:50:39Z

Ajvree: /* 11/26/13 */

==11/26/13==
*Began Export 4 tallying, counting, etc
*ran tally engine on E4 via Tauras's computer

'''Original Row Counts''' 
[[Image:20131126 OGrowcounts tATK TIGR4 E4 AJV.PNG]]

'''Match Investigation'''
*One rogue ID in original Tally Count (2126 vs 2127)
*ID found: spr0485
*further investigation in match found five other similar IDS within database, but they were not listed under orderedlocusnames
[[Image:20131126 xmlmatch tATK TIGR4 AJV.PNG]]
[[Image:20131126 xmlfileinvestigation tATK TIGR4 AJV.PNG]]

[[Image:20131126 xmlfileinvestigationsql tATK TIGR4 AJV.PNG]]

==Links==
{{Team ATK}}

Taur.vil Week 14

2013-11-26T06:58:01Z

Taur.vil: IE C4

==Laboratory Journal==

==TIGR4 IMP-EXP C4==
===Collecting files===
*Used files from I-E C3
*created new database in pgAdminIII called tATK_TIGR4_2013NOV25
*Executed gmbuilder.sql (the new version, see bellow) in SQL
**Verified success by the 159 tables created
===Import-Export Cycle===
*Opened new version of gmbuilder-32bit.bat (dist_Experimental20131121)
*Connected to database created earlier
*Imported UniProt XML file: 20131118_UniProtXML_tATK_TIGR4_TPV.xml
*Imported the obo-xml file: 20131120_OBOXML_tATK_TPV.obo-xml
*Imported the goa file: 20131118_GOA_tATK_TIGR4_TPV.goa
*Exported to 20131125_E4_tATK_TIGR4.gdb

==Template==
{{Team ATK}}

TATK E4: TIGR4 Testing Report

2013-11-26T06:57:51Z

Ajvree: /* Compare Gene Database to Outside Resource */

Team H(oo)KD Week 13 Status Report

2013-11-22T01:04:47Z

Ksherbina: /* Reflection */ Added signature

{{Team H(oo)KD}}

'''Refer to the calendar on the team home page to see the milestones for this week.'''

==Coder Status Update==

*When first performing an import/export cycle for the gene database, I ran into Java heap space errors, which I tried to reconcile by manually increasing the maximum heap space allocated for gmbuilder. Through this process, I found that 32-bit gmbuilder has a limit for what the maximum heap space can be set to.
*For the first import/export cycle, I imported the files into the database created in pgAdminIII using 64-bit gmbuilder with increased maximum heap space and performed the export in 32-bit gmbuilder without changing the heap space.
*Tally Engine did not produce the same counts for the UniProt file and the OBO-XML file for this first exported database. We believe that this is a result of duplicate entries being formed when I imported the UniProt, OBO-XML, and GOA files multiple times when I was working out the heap space error.
*A new database was created in pgAdminIII and a new import/export cycle in 32-bit gmbuilder was performed with this new database.
*Hilda and I were able to find the gene link pattern, which I was able to add to the custom species profile for ''C. trachomatis''. With this final edit, I was able to build a new version of gmbuilder.
*Using the new build, I exported a new gene database.
*Hilda and I then performed quality assurance testing with the newest version of the ''C. trachomatis'' gene database. In so doing, we found that two genes were counted in Access and xmlpipedb match that were not counted by TallyEngine and pgAdminIII.
:*In addition, Hilda and I also found in using xmlpipedb match that some of the genes have the gene ID pCTA_#### rather than CTA_####.
:*Next week, we may need to consult with Dr. Dahlquist regarding what may be causing this discrepancy in the gene counts.
:*The plan then for next week is to try to run GenMAPP with the microarray data (or at least a dummy file with the gene IDs from the data and some fake numbers) and the new gene database.

[[User:Ksherbina|Ksherbina]] ([[User talk:Ksherbina|talk]]) 23:44, 21 November 2013 (PST)

===Reflection===

#What were the week’s key accomplishments?
#*This week's key accomplishments included finishing the custom species profile and then creating a new build of gmbuilder. With this new build, we were able to produce a gene database for ''C. trachomatis'' and run through the quality assurance testing for the database.
#*In addition, we were able to access the microarray data as well as normalize it using a software from Affymetrix.
#What are next week’s target accomplishments?
#*For next week, the plan is to finish formatting the microarray data so that we know what chip corresponds to which experimental conditions that were tested by Omsland '''et al.'''
#*In addition, we want to try running GenMAPP using the microarray data (or at least a dummy fie with the gene IDs from the data and some fake values) with the latest version of the gene database.
#What team strengths were seen this week?
#*I think our biggest strength was catching up to where the professors expected us to be in our project at this time.
#*We also were able to help each other with different tasks to make sure that we were all on the same page.
#What team weaknesses were seen this week?
#*Unlike last week, we were not able to meet as a whole group this week outside of class time to check up on our progress and work out some of the difficulties that we were having with some tasks.

[[User:Ksherbina|Ksherbina]] ([[User talk:Ksherbina|talk]]) 00:00, 22 November 2013 (PST)

==Quality Assurance Status Update==
*The coder, Katrina, and I have completed a full import and export cycle. We were able to look at the Tally Engine, XMLPipeDB Match, SQL, and Microsoft Access in order to check our gene ID counts on Katrina's laptop. Fortunately, our numbers matched for the Tally Engine which assured us that our export of our database was essentially successful. The XMLPipeDB Match gave us 911 total unique matches, while in the SQL query the count was 917 and in ACCESS there were 919 unique gene ID matches. Therefore, there is certainly a discrepancy with the SQL query and the Tally Engine in which 917 gene IDs are being found, but are not adding up with the amount found in ACCESS and XMLPipeDB Match. We will look into the discrepancies next week.
===Reflection===
#What were the week’s key accomplishments?
#*Within this week we were able to complete an import and export cycle of the files and database, respectively. We were also able to distinguish our specific gene IDs from the microarray data and we will work on categorizing the RBs and EBs with Rifampicin and without Rifampicin, so that we can make sense of which genes correlate to what category. We have also used the Tally Engine, XMLPipeDB Match, SQL, and Microsoft Access to compare our gene ID counts. Although we are facing some count discrepancies, we are in the midst of finding answers to these problems by looking closely at the formatting of the gene IDs.
#What are next week’s target accomplishments?
#*We are hoping to Perform GenMAPP and MAPPFinder Analysis as well as figure out the discrepancies that were faced this week.
#What team strengths were seen this week?
#*The team was very committed to putting our heads together and come up with solutions to our obstacles that we faced. We work well in helping each other out and brainstorming to find potential answers to our particular wrinkles that we encounter.
#What team weaknesses were seen this week?
#*The team did not really meet all together in a group due to our conflicting class schedule, but Katrina was able to meet with both Dillon and I independently and fill us in as to what was discussed in the previous meeting.

[[User:HDelgadi|HDelgadi]] ([[User talk:HDelgadi|talk]]) 23:40, 21 November 2013 (PST)

==GenMAPP User Status Update==

===Status Report===
Milestone 2
Read the microarray paper to understand the experiment.
*Read Microarray Paper to understand experiment and find relations between raw data and article.
Create a table or list that shows the correspondence between the samples in the experiment and the files you have downloaded.
*Downloaded sdrf file and opened in excel to determine list that shows correspondence between samples in experiment and raw data file downloaded.
Determine how many biological or technical replicates, and which samples were labeled with Cy3 or Cy5.
*Determined 4 replicates based on article and data.
Create a Master Raw Data file that contains the IDs and columns of data required for further analysis.
*Created [[Media: Master Spreadsheet.xls|Master Spreadsheet]] and uploaded it to team wiki.
Consult with Dr. Dahlquist on how to process the data (normalization, statistics).
*Consulted with Dr. Dahlquist on how to process data and recorded steps in [[Dwilliams Project Notebook|Project Notebook]].

===Reflection===
# What were the week’s key accomplishments?
#*The key accomplishments (for my position in the group) were understanding how to run the dChip software and convert and collect the raw data used into a workable format.
# What are next week’s target accomplishments?
#*By next week we would like to perform GenMAPP and MAPPFinder analysis of the data.
# What team strengths were seen this week?
#*Every member on our team accomplishes their role in an efficient and effective manner. If one member needs help with their role, the others are more than willing to help.
# What team weaknesses were seen this week?
#*The Affymetrix data and use of the dChip software were confusing at first, but easily resolved once Dr. Dahlquist was consulted.
-[[User:Dwilliams|Dwilliams]] ([[User talk:Dwilliams|talk]]) 21:27, 21 November 2013 (PST)

[[Category:Journal Entry]]
[[Category:Team H(oo)KD]]

Laurmagee: Week 13

2013-11-21T20:35:11Z

Laurmagee:

The following link is to the Sinorhizobium meliloti team page: [[Team Name]]
* After downloading all of the files off of the following link last week: [[http://www.ebi.ac.uk/arrayexpress/experiments/E-MEXP-785/?keywords=&organism=Sinorhizobium%20meliloti&array=&exptype| Microarray Downloads]], I began compiling the data and computing the log((Cy5 Signal- Cy5 background)/(Cy3 Signal- Cy3 background)) for every replicate (1-3) corresponding with the four different times offered in the experiment (15, 30, 60, 240) when adding 0.7 M sucrose.
* The following files are representing the data for every specific replicate at a specific time. The labeling 700S represents .7M sucrose and the following numbers (1-3) represent the replicate number. Then t(15, 30, 60, 240) represents each time period that was taken.
**[[File:700S1-t15.xls]]
**[[File:700S1-t30.xls]]
**[[File:700S1-t60.xls]]
**[[File:700S1-t240.xls]]
**[[File:700S2-t15.xls]]
**[[File:700S2-t30.xls]]
**[[File:700S2-t60.xls]]
**[[File:700S2-t240.xls]]
**[[File:700S3-t15.xls]]
**[[File:700S3-t30.xls]]
**[[File:700S3-t60.xls]]
**[[File:700S3-t240.xls]]
* These files were created by extracting the Cy5 Signal, Cy5 Background, Cy3 Signal, and Cy3 Background median values and placing them into a new worksheet.
*In the A column is the Cy3 Signal median, B column Cy3 Background median, and the C column is Cy3 Signal-Cy3 Background, which are values found by entering in the following equation into C2: =A2-B2. You can copy and paste this equation into the rest of the column by copying the C2 cell and highlighting it, then scrolling down to the last value and selecting it while pressing the SHIFT key. This will highlight the entire column and you can paste the equation into all of the columns easily by hitting COMMAND and then V.
* In the D column, we will now add Cy5 Signal median, in the E column Cy5 Background median, and finally in column F label Cy5 Signal-Cy5 Background. In the F2 cell, you will enter the equation: =D2-E2. copy and paste this equation into the rest of the F column by using the steps described above for column C.
*Then finally in column G, we will find the value of (Cy5 Signal- Cy5 background)/(Cy3 Signal- Cy3 background) by entering in the following equation into the G2 cell: =C2/F2 and pasting it into the rest of the G column.
* The above steps were repeated twelve times for each of the three replicates and their four time measurements.
*The following file is a compilation of all the Cy5/Cy3 ratios, with the addition of the log of these values: [[File:Compiled Ratios and Logs.xls]]
*The A column includes the Gene IDs, and then the B, D, F, H, J, L, N, P, R, T, V, X include the Cy5/Cy3 ratios for each time period and replicant. To find the log of each column, the following equation was entered: =log(B2) into cell C2. Then the equation was copy and pasted in the whole C column. In the subsequent empty columns, the same log equation is inserted, with the appropriate letter representing the column to the left.
*The final data file is: [[File:Full Raw Data.xls]]

[[User:Laurmagee|Laurmagee]] ([[User talk:Laurmagee|talk]]) 12:35, 21 November 2013 (PST)
[[Category: Journal Entry]][[Category: Sinorhizobium meliloti]]

Teamname Week 13 Status Report

2013-11-21T19:29:54Z

Slouie: /* Stephen Louie */ added more to entry

=='''[[Team Name]]'''==

=='''[[user:mmalefyt|Miles Malefyt]]'''==
#What were the week’s key accomplishments?
This weeks key accomplishments were compiling the raw data from many individual data sources and then calculating where or not the gene was shown to be repressed or induced. Then the fold change ratios, the STdev, and averages were calculated in a new processed data folder. The process I used was based off of the Vibrio Cholera exercise we did earlier in the semester.
#What are next week’s target accomplishments?
The next weeks key accomplishments are finishing up the processed data files and then moving on to working on the gen mapp portion of the project. Mostly the GenMapp part because I am going in tomorrow (friday) to finish up processing the data.
#What team strengths were seen this week?
The team strengths that were shown this week was our ability to understand each of our roles in the project and communicate between each other on any questions we had with our roles. The group has shown a good work ethic in getting done what they need to get done.
#What team weaknesses were seen this week?
I didnt exactly see many weaknesses this week since we were all working on our respective portions of the project. In the future as all of our roles converge I think we need to work on our communication which will be integral to getting the work done.
[[User:Mmalefyt|Mmalefyt]] ([[User talk:Mmalefyt|talk]]) 17:25, 21 November 2013 (PST)

=='''[[user:laurmagee|Lauren Magee]]''': Reflection Questions==
#What were the week’s key accomplishments?
#*I was able to finalize a spreadsheet with the Gene IDs and corresponding log((Cy5 Signal- Cy5 background)/(Cy3 Signal- Cy3 background)) for every replicate (1-3) in the following file: [[File:Full Raw Data.xls]]. I was also able to clean up the data, so it was ready for the protocol used to prepare the [[http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae| Vibrio cholerae]] data in [[Week 8]] for GenMapp analysis.
#What are next week’s target accomplishments?
#*Next week, I will hopefully be able to finish the protocol mentioned above and have a data file ready for GenMapp.
#What team strengths were seen this week?
#*We were more geared toward independent work this week, so all of our tasks were done on our own time. I think my teammates and I worked well in dividing our duties and taking responsibility for its completion. If one person fails to include their part, then the project is halted to a certain degree. Therefore, it is very important we all stay on top of our personal assignments and hold each other accountable.
#What team weaknesses were seen this week?
#*I don't think we had any blatant weakness this week, but in the future I think we could all work on our communication with one another. Since we were all doing our own assignments this week, I think we lost track of each other and what was getting done outside of our own role.
[[User:Laurmagee|Laurmagee]] ([[User talk:Laurmagee|talk]]) 12:31, 21 November 2013 (PST)

=='''[[user:slouie|Stephen Louie]]'''==
*On the GenMAPP user side, a raw data collection of the microarray data was completed. Specifically, the data concerned the transcriptional responses of S. Meliloti when exposed to either 0.3 M NaCl or 0.7 M Sucrose. A new export of the GenMAPP database for S. Meliloti based on an update made available from GenMAPP Builder. With these two things, I ran a preliminary sanity check using Match and MS Access. Based on the results. There was 0 ID correlation between the gdb. and the microarray data.
*For next week, it is integral that I identify the discrepancy between the IDs used in the microarray data and gdb. file. From what I can tell, the main difference is that the gdb. uses the old name of S. Meliloti (Rhizobium Meliloti) and uses the gene ID RB####. This is the most apparent of the differences, but does not mean that this is the sole cause for the difference.
*The team primarily focused on their individual guild assignments as opposed to working together on one task. In light of this, I was particularly impressed as to how my teammates were able to finish their tasks and make their data available for the other members in an expedient and organized manner.
*This was more of a fault on my part then my team. There was only limited communication being facilitated between team members during this week's work sessions. This was not particularly harmful as we all had an implicit understanding of each others' progress. Next week, this will definitely be critical as we move further into the export stages.
[[User:Slouie|Slouie]] ([[User talk:Slouie|talk]]) 23:31, 21 November 2013 (PST)

[[Category:Journal Entry]]

TATK Week 13 Status Report

2013-11-21T05:49:13Z

Taur.vil: Tauras' wk13 status report responses

===Week Goals and Status Report===

#To create the compiled raw data file
#*This was completed by Kevin on 11/18/13 and uploaded to the team wiki page
#Properly document I-E cycle in testing report and create new export
#*Done by Tauras on 19NOV2013, I-E C3 on the wiki
#Work in Eclipse to modify gmbuilder for species code and to connect with website
#*Done by Tauras during class on Thursday, modified gm committed to server and on personal computer
#Complete Testing Report Information for I-E cycles that have been completed
#*Alina finished the one that was complete by Wednesday, still have C3 which was exported Wednesday night.

===Next Steps===

#Complete new I-E cycle after having worked with Dondi to link to the database and include the gene ID modification
#To fix the issues with the compiled raw data file
#To follow the instructions concerning the normalization of data and begin statistical tests
#To test microarray data in GenMapp after the next I-E cycle is complete

===Team Reflections===

====[[User:Kmeilak|Kevin Meilak]]====

1. What worked?

*I was able to make the compiled raw data file

2. What didn't work?

*I was not able to fix the issues with the compiled raw data file or begin statistical tests due to travel

3. What will I do next to fix what didn't work?

*Work with the data over the weekend

====[[User:Ajvree|Alina Vreeland]]====
'''What were the week’s key accomplishments?'''
*All three exports were completed, along with the counting from all the different programs. We were able to start analyzing the data and comparing the gene id's.

'''What are next week’s target accomplishments?'''
*To start navigating through the microarray data and looking for/fixing the mistakes found in the raw data file.

'''What team strengths were seen this week?'''
*We made significant progress in analyzing the microarray data and finishing the exports for TIGR4. We also completed the counting tasks using TallyEngine, XMLpipedb match, and SQL.

'''What team weaknesses were seen this week?'''
*It was unfortunate that Kevin wasn't able to join us on Thursday this week, since he has been working the closest with the microarray data. More work is needed in that area, so we can begin doing statistical tests.

==[[User:Taur.vil|Tauras]]
#The key accomplishments this week were completing the last I-E cycle, modifying the gmbuilder for our website and gene IDs, and having a working microarray file which can be analyzed and then entered into GenMapp.
#This upcoming week, we will run a new I-E cycle to test the changes to gmbuilder, make sure that it was successful in representing all data, and test microarray data in GenMapp after it is processed and modified.
#The real weakness this week was Kevin not being present Thursday during class which would have helped get microarray data ready. Besides that everything seems to be going well although I doubt we have been working at maximum efficiency

===Template===
{{Team ATK}}

[[Category:Journal Entry]]

TATK E3: TIGR4 Testing Report

2013-11-21T05:26:49Z

Ajvree: /* Visual Inspection */

==Export Information==

Version of GenMAPP Builder: 2.0b71 modified and distributed 2013NOV19 and version not yet updated

Computer on which export was run: Tauras' Personal Computer

Postgres Database name: tATK_TIGR4_2013NOV20

UniProt XML filename: 20131118_UniProtXML_tATK_TIGR4_TPV.xml
* UniProt XML version: UniProt Release 2013_11; 2013Nov13
* Time taken to import: 2.60min

GO OBO-XML filename: 20131120_OBOXML_tATK_TPV.obo-xml
* GO OBO-XML version: 2013Nov20
* Time taken to import: 9.85min
* Time taken to process: 7.95min

GOA filename: 20131118_GOA_tATK_TIGR4_TPV.goa
* GOA version: 2013Nov12 14:49
* Time taken to import: 0.03min

Name of .gdb file: 20131120_E3_tATK_TIGR4.gdb
* Time taken to export .gdb: less than 1 hour
**Started at 21:43
**Finished by 22:36
* Upload your file and link to it here. [[Media:20131120_E3_tATK_TIGR4.gdb]]

Note: There were no updates to XML or goa files since last IE cycle

==TallyEngine==

*Results for TallyEngine using new version gmbuilder72
*Gene Id's now visible, total count 2105
*Orderedlocusname count was consistent with previous count totals (2126)
*screenshot: [[Image:20131121 TallyEngineE3 tATK TIGR4 AJV.PNG]]

[[User:Ajvree|Ajvree]] ([[User talk:Ajvree|talk]]) 10:19, 21 November 2013 (PST)

Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the PostgreSQL databases (or you can upload and link to a screenshot of the results).

== Using XMLPipeDB match to Validate the XML Results from the TallyEngine==
*Refer to [[TATK Export One: TIGR4 Testing Report|Export 1]] page for match results.

[[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways#XMLPipeDB_Match | Follow the instructions found on this page to run XMLPipeDB match.]]

Are your results the same as you got for the TallyEngine? Why or why not?

== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine==
*Refer to [[TATK Export One: TIGR4 Testing Report|Export 1]] page for SQL results.

[[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways#SQL | Follow the instructions on this page to query the PostgreSQL Database.]]

==OriginalRowCounts Comparison==
E3 TIGR4: [[Image:20131121 E3rowcounts tATK TIGR4 AJV.PNG]]
Benchmark:[[Image:20131121 benchmarkrowcounts tATK TIGR4 AJV.PNG]]

*Some of the tables are out of order within the files, but both files contain the same number and same categories.

[[User:Ajvree|Ajvree]] ([[User talk:Ajvree|talk]]) 10:26, 21 November 2013 (PST)

Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.

Benchmark .gdb file: (for the [[Week 9]] Assignment, use the "Vc-Std_External_20101022.gdb" as your benchmark, downloadable from [http://sourceforge.net/projects/xmlpipedb/files/V.%20cholerae%20Gene%20Database/V.%20cholerae%2020101022/Vc-Std_External_20101022.zip/download here].

Copy the OriginalRowCounts table and paste it here:

Note:

==Visual Inspection==

Perform visual inspection of individual tables to see if there are any problems.

'''Systems Table''' 
[[Image:20131121 E3Systemstable tATK TIGR4 AJV.PNG]]
*There are missing dates for quite a few gene ID systems

'''OrderedLocusNames Table'''
*All ID's took the expected form, SP_####

'''UniProt Table'''
*ID's are scattered. Have a general pattern of beginning with P or Q, following with five characters (mix of numbers and letters)

'''RefSeq Table'''
*all ID's in form NP_######

[[User:Ajvree|Ajvree]] ([[User talk:Ajvree|talk]]) 10:39, 21 November 2013 (PST)

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?

Note:

==.gdb Use in GenMAPP==

Note:

===Putting a gene on the MAPP using the GeneFinder window===

* Try a sample ID from each of the gene ID systems. Open the Backpage and see if all of the cross-referenced IDs that are supposed to be there are there.

Note:

===Creating an Expression Dataset in the Expression Dataset Manager===

* How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

===Coloring a MAPP with expression data===

Note:

===Running MAPPFinder===

Note:

== Compare Gene Database to Outside Resource==

The OrderedLocusNames IDs in the exported Gene Database are derived from the UniProt XML. It is a good idea to check your list of OrderedLocusNames IDs to see how complete it is using the original source of the data (the sequencing organization, the MOD, etc.) Because UniProt is a protein database, it does not reference any non-protein genome features such as genes that code for functional RNAs, centromeres, telomeres, etc.

Note:



==Template==
{{Team ATK}}

Kmeilak Week 13

2013-11-19T18:32:49Z

Kmeilak: /* 11/19/2013 */

==Electronic Lab Notebook==

===11/19/2013===

*Created compiled raw data file using the following steps

#Opened first of 18 raw data files in excel
#Opened blank excel spreadsheet, titled 20131119_teamATK_KM_compiledrawdata
#Titled column A "ID" and column B "control_Cy3_48hr_Cy5_1" (Note: column B, and all columns following the ID for the individual raw data files, relate to the individual data files, and can be found here [[Data Information]])
#Copied column L from raw data file into compiled raw data file, and pasted in column B "control_Cy3_48hr_Cy5_1"
#Copied column AG from raw data file into compiled raw data file, and pasted in column A "ID"
#Repeated steps 1-5 for all 18 raw data files which may be found here [[Media:20131106_ArrayExpressrawdata_tATK_KM.zip|Raw Data]]

{{Team ATK}}

Gleis Week 13

2013-11-19T18:28:22Z

Gleis: /* Lab Journal */

==Lab Journal==
*Identified database link pattern:
:*http://www.genedb.org/gene/~
*Customized species profile
:*Linked Ordered Locus Names to ORF
:*Updated Species Database Link
*Customized IDs
:*Links ordered locus names to ORF
*Create New database
:*New Import/Export sequence
:*[[Leishmania major Week 13 Status Report]]
*Export Successful
:*GenMAPP did not recognize as "custom" database
*Saved Database again in Eclipse
*Ran another export
*Identified Microarray Raw data file
*Edited file for initial GenMAPP review
:*Saved file to Team Homepage

[[User:Gleis|Gleis]] ([[User talk:Gleis|talk]]) 10:32, 21 November 2013 (PST)

[[Category:Journal Entry]] [[Category:Leishmania major]]

Leishmania major Week 13 Status Report

2013-11-19T18:18:29Z

Gleis:

==Import/Export GenMAPP==
Name:Leishmania_major_18112013
===Export Information===
:Uniprot: 7.42 minutes
:[[File:UniprotXML Leishmania 05112013 Gabe Lena.xml]]
:GO OBO: 5.96 minutes
:[[File:Leishmania 05112013 Gabe Lena.obo-xml.gz]]
:GOA: 0.04 minutes
:[[File:LeishmaniaGOA 19112013 Lena Gabe.goa]]

==Files==
:'''Newly Updates Database:'''[[File:Leishmania major 19112013 Dist.zip]]
:'''Name of .gdb File:'''[[Media:LeishmaniaGDB 21112013 Lena Gabe.gdb]]
:'''Fixed Microarray Data:'''[[File:L.majorCompiledRawData C 21112013.txt]]

==Reflections==
===[[User:Lena|Lena Hunt]]===
:1.)We got our database to export into GenMAPP. Originally it was showing a generic database, and it wouldn't recognize the OrderedLocusNames. We fixed it in Eclipse and uploaded it to the homepage. We then successfully got GenMAPP to recognize our database. We also updated our microarray data so that the gene IDs did not have extra numbers (we used the : as a separator.) We then went through and changed all the "Error" messages with a 0 so as not to confuse GenMAPP.
:2.)We managed to work through all the things that didn't work this week. At this time we are waiting for the export to complete to test gene database link and microarray data.
:3.)Currently we do not have any problems we are aware of, but after a entire work session of working through problems today, I think we are prepared to work through problems that may come up in the next work session.

===[[User:Gleis|Gabriel Leis]]===
#What worked?
#:As a team we accomplished almost all goals for the week. Successful organization allowed team members from various "guilds" receive the information needed to accomplish various tasks.
#What didn't work?
#:Initial screening of new database yielded an error but the error appears to have been fixed.
#What will I do next to fix what didn't work?
#:Continue with exported .gdb to identify necessary edits in the code for customizations to Leishmania major database

[[User:Gleis|Gleis]] ([[User talk:Gleis|talk]]) 21:30, 21 November 2013 (PST)

===[[User:Kevinmcgee|Kevin McGee]]===
#We were able to successfully analyze our GenMAPP data to prepare it for import to GenMAPP
#It was hard to relate some of the commands from VC to our data set because there were some major differences in the data that we had.
#Trying to fully understand all the columns and the cells will help us understand what is different from our data compared to VC and therefore how to handle the data differently

[[User:Kevinmcgee|Kevinmcgee]] ([[User talk:Kevinmcgee|talk]]) 11:08, 21 November 2013 (PST)

===[[User:Vkuehn|Viktoria Kuehn]]===
#What worked?
#:We got all of the data ready for import to GenMAPP
#What didn't work?
#:Some of the data had errors instead of values
#What will I do next to fix what didn't work?
#:We already fixed this by making those blank and not using them in the statistical analysis

==GenMAPP Data==
#Centered and Scaled the Microarray data
#Found the Average Fold Change
#Statistical Analysis: Found P-value and T-test
#Formatted to be compatible with GenMAPP
#Saved for GenMAPP as a txt file
#:For more detailed procedure info see [[Vkuehn Week 13]] for major or, for infantum see [[Kevinmcgee Week 13]]
#:The files can be found under [[Leishmania major]] in the File Updates section

[[Category:Journal Entry]] [[Category:Leishmania major]]

Ajvree Week 13

2013-11-19T18:02:24Z

Ajvree: /* 11/21/13 */ table analysis

==Week 12 Information==
Export Counting: -open Access, open file (remember to change all to all files) TIGR4 file: 
ids used: SP_#### 
orderedlocusnames count total: 2126 entries 
R6 file: 
orderedlocusnames count total: 2115 entries 
ids used: SPG_#### 
G54 file: 
ids used: SPG_#### 
orderedlocusnames count total: 2115 entries 
After finding identical results for R6 and G54 files, realized that R6 file was actually for the G54 strain. Checked a few times (reopened files multiple times to confirm) 
First try on Tally Engine for TIGR4: 
XML Count: 
orderedlocus: 2127 
refseq: 2106 
Database Count: 
ordered locus: 3831 
refseq: 3403 

=='''Week 13'''==
'''Tally Engine:'''
*created new database in pgadmin III
*in sql, opened gmbuilder.sql
*ran query, database tables were inserted
*went in to tally engine and imported files
**Xml import took 5.41 min
**GOA import took 0.07 min
*unzipped go-xml file
*OBO-XML import time: 19.92 min
*additional gene ontology information was processed, this took 14.96 min
*ran tally, came up with error
*refreshed gmbuilder and tried again successfully
Results:
[[Image:TallyEngineTrial2.PNG]]

'''XMLpipedb Match'''
*downloaded program from sourceforge
*opened cmd program
*cd Downloads file
*moved xmlmatch jar file to download folder
*used match to look for pattern SP_[0-9][0-9][0-9][0-9]
*Total unique matches: 2126
*almost identical to tally engine results of 2127, minus one result
Results:
[[Image:20131107 XMLmatch tATK TIGR4 AJV.PNG]]

'''OriginalRowCounts'''
*Looked at TIGR4 gdb file and benchmard VD file for table similarities/differences
*seemed to have same tables/same information
*took screenshots of both, included here:
TIGR4: [[Image:20131119 ogrowcounts tATK TIGR4 AJV.PNG]]
Benchmark: [[Image:20131119 benchmarkrowcounts tATK TIGR4 AJV.PNG]]

*Note: a few of the rows are missing in the benchmark screenshot- could not fit all of them on screen.

'''SQL'''
*used following query to search for matches:
**select count(*) from genenametype where type = 'ordered locus' and value ~ 'SP_[0-9][0-9][0-9][0-9]';
*Result given was 2126
[[Image:20131119 SQLcountresults tATK TIGR4 AJV.PNG]]

==11/21/13==

'''Tally Engine for Export 3'''
*downloaded Taurus's version of gmbuilder to redo tally engine counting
*used export 3 files instead of previous export 1 files
*connected to avreelan database in pgadminIII, inserted new gmbuilder tables
*opened new version of gmbuilder/tally engine, connected to avreelan database
*XML file import took: 2.02 min
*OBO-XML file import took: 6.25 min
**additional gene ontology data processing took: 4.81 min
*GOA file import took: 0.04 min
*Results:
**GeneId's now visible, total of 2105 in both xml and database counts
**orderedlocusnames still at same value of 2126 in both xml and database counts
**screenshot: [[Image:20131121 TallyEngineE3 tATK TIGR4 AJV.PNG]]

'''Original Row Counts for Export 3'''
*redid row counts using the export 3 file
*compared with benchmark file
*both files had identical number of tables with same categories in each, although some were out of order.
*Screenshots:
E3 TIGR4: [[Image:20131121 E3rowcounts tATK TIGR4 AJV.PNG]]
Benchmark:[[Image:20131121 benchmarkrowcounts tATK TIGR4 AJV.PNG]]

'''Table Analysis'''
*looked at tables within E3 gdb file to find inconsistencies in data

'''Systems Table''' 
[[Image:20131121 E3Systemstable tATK TIGR4 AJV.PNG]]
*There are missing dates for quite a few gene ID systems

'''OrderedLocusNames Table'''
*All ID's took the expected form, SP_####

'''UniProt Table'''
*ID's are scattered. Have a general pattern of beginning with P or Q, following with five characters (mix of numbers and letters)

'''RefSeq Table'''
*all ID's in form NP_######

==Links==
{{Team ATK}}

Mpetredi Week 13

2013-11-19T17:59:09Z

Mpetredi: /* Lab Progression */ added export settings image

{{User Page Link}}

11/19/2013

==Lab Progression==
#Attempting to find URL pattern of S. meliloti
##Navigated to model organism database (link: http://cmr.jcvi.org/tigr-scripts/CMR/GenomePage.cgi?org=ntsm01)
##Clicked on all 3 GenBank Accession.Versions and looked for locus ID formatting
##*[[http://www.ncbi.nlm.nih.gov/nucleotide/AL591688 Complete Chromosome]]
##**SMc#####
##*[[http://www.ncbi.nlm.nih.gov/nucleotide/AL591985 plasmid pSymB]]
##**SMb#####
##***NOTE: in the XML file (from [[[[Team Name]]]]) appears as SM_b for gene ID; uses SMb for ORF name
##*[[http://www.ncbi.nlm.nih.gov/nucleotide/AE006469 plasmid pSymA]]
##**SMa####
##Copy any locus ID that follows the format of the options above
##After many trials, the general link looks like this: http://cmr.jcvi.org/tigr-scripts/CMR/shared/GenePage.cgi?locus=~
#Copied link to Eclipse
#Opened the .gdb file (from [[[[Team Name]]]]) in Microsoft Access to determine how GeneIDs look like
##Open OrderedLocusNames and look at how IDs are named (R.####, where "." is either a letter or a number) and how many records are displayed
##*4712 records

11/21/2013

==Lab Progression==
#Started a new export based on latest, custom build of gmbuilder (GenMAPP Builder 2.0b72).
#* Except for gmbuilder, I used all files posted on [[[[Team Name]]]] under "Important Files" section.
#Name of export file is Sinorhizobium_meliloti_1021_GenMAPP_database_mpetredi_20131121.gdb
#*Settings used for export: [[Image:Export_Information_20131121.PNG]]
{{Individual Assignment Categories}}
[[Category: Sinorhizobium meliloti]]
[[Category: Group Projects]]

Mpetredi Week 12

2013-11-19T17:42:21Z

Mpetredi: added categories

{{User Page Link}}

==Status Update==
*Completed milestones 1 and 2 in the [[Coder]] page.
*Organized files and information in [[[[Team Name]]]].
*First import/export was finished 2 weeks ago.

{{Individual Assignment Categories}}
[[Category: Sinorhizobium meliloti]]
[[Category: Group Projects]]

Kevinmcgee Week 13

2013-11-19T17:41:35Z

Kevinmcgee: Added pic

*Opened [[media:L.infantumCompliedRawData(A).txt | L.infantumCompliedRawData(A).txt]]
*Finished the formatting by flipping the dye swap chips negative
*Created a column next to dye swap chips and did the formula:
=-1*(dye swap chip column)
*made a new sheet
*added all data from old sheet except only added the flipped dye swaps
*looked for background information in the array paper
**L. infantum MHOM/MA/67/ITMAP-263 and L. major LV39 MRHO/SU/59/P strains used in this study
**All microarray data will be freely available on the Geo NCBI database in the MIAME format
***The series accession number for our manuscript is GSE10407.
**Each chip compares promastigote vs. amastigote with different replicates
**Following data files found
[[File:LmjSampleInfo.PNG]]
*Finished naming sheet with helpful names to know what is what on the sheet
*Ready for statistical analysis
*Began analysis by taking the average and standard deviation of our data chips seperately and using that information to scale and center our data:
=(B4-B$2)/B$3 This shows the equation we used to scale and center.
*Copied and pasted values of scaled centered onto a new page. From there, we edited out all VALUE! cells and left them blank. GenMAPP will ignore these blanks when we input our data.
*Made a column of the average fold change for each gene call Avg_LogFC_All
Average B2:G2
*Made a column of the Tstat and Pvalue for the fold changes of each gene:
=AVERAGE(B2:G2)/STDEV(B2:G2)/SQRT(6) TStat
=TDIST(ABS(I2),5,2) Pvalue
*Created a new page titled forGENMAPP
**Copied and pasted all values from statistics page
*Cut and pasted columns H-J and moved them to columns B-D
*Inserted a new column at B called System Code. Filled in column with the letter N
*File is now ready for GenMAPP import

*Sample of what the final file looked like
[[File:L.InfantumforGenMAPP.PNG]]

Data Information

2013-11-19T17:34:41Z

Ajvree: /* corresponding data files to ID and sample excel file */

==corresponding data files to ID and sample excel file==

#GSM664117_14090187_control_Cy3_48_hr_Cy5 = control_Cy3_48hr_Cy5_1
#GSM664117_14090181_control_Cy3_48_hr_Cy5 = control_Cy3_48hr_Cy5_2
#GSM664116_14090193_control_Cy3_24_hr_Cy5 = control_Cy3_24hr_Cy5_1
#GSM664116_14090175_24_hr_Cy3_control_Cy5 = 24hr_Cy3_control_Cy5_1
#GSM664116_14090170_control_Cy3_24_hr_Cy5 = control_Cy3_24hr_Cy5_2
#GSM664116_14087687_24_hr_Cy3_control_Cy5 = 24hr_Cy3_control_Cy5_2
#GSM664115_14090191_control_Cy3_12_hr_Cy5 = control_Cy3_12hr_Cy5_1
#GSM664115_14090185_control_Cy3_12_hr_Cy5 = control_Cy3_12hr_Cy5_2
#GSM664115_14090180_12_hr_Cy3_control_Cy5 = 12hr_Cy3_control_Cy5_1
#GSM664115_14090174_12_hr_Cy3_control_Cy5 = 12hr_Cy3_control_Cy5_2
#GSM664115_14090168_12_hr_Cy3_control_Cy5 = 12hr_Cy3_control_Cy5_3
#GSM664115_14087688_control_Cy3_12_hr_Cy5 = control_Cy3_12hr_Cy5_3
#GSM664114_14090192_control_Cy3_4_hr_Cy5 = control_Cy3_4hr_Cy5_1
#GSM664114_14090190_4_hr_Cy3_control_Cy5 = 4hr_Cy3_control_Cy5_1
#GSM664114_14090188_4_hr_Cy3_control_Cy5 = 4hr_Cy3_control_Cy5_2
#GSM664114_14090176_4_hr_Cy3_control_Cy5 = 4hr_Cy3_control_Cy5_3
#GSM664114_14090169_control_Cy3_4_hr_Cy5 = control_Cy3_4hr_Cy5_2
#GSM664114_14090167_control_Cy3_4_hr_Cy5 = control_Cy3_4hr_Cy5_3

all information data fold changes

"Pneumococcal biofilms compared to planktonic control at 4, 12, 24, 48 hours. 3 biological replicates each of 4 and 12 hour time points, and 2 biological replicates each of 24 and 48 hour time points. Flip dye (technical replicates) performed for 4, 12, and 24 hour time points; no technical replicate performed for 48 hour time point due to limiting material. Ratios were determined by averaging across technical and biological replicates. The following hybridizations made up each biological replicate: 14090167.tav.annot and 14090190.tav.annot (4hr biol rep 1); 14090169.tav.annot and 14090176.tav.annot (4hr biol rep 2); 14090192.tav.annot and 14090188.tav.annot (4hr biol rep 3); 14087688.tav.annot and 14090180.tav.annot (12hr biol rep 1); 14090185.tav.annot and 14090168.tav.annot (12hr biol rep 2); 14090191.tav.annot and 14090174.tav.annot (12hr biol rep 3); 14090170.tav.annot and 14090175.tav.annot (24hr biol rep 2); 14090193.tav.annot and 14087687.tav.annot (24hr biol rep 3); 14090181.tav.annot (48hr biol rep 1); 14090187.tav.annot (48hr biol rep 2)"

This excerpt from [http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-26976/?query=streptococcus+pneumoniae Array Express] was the basis for determining which biological replicates were related to which technical replicates as seen in the data

Tree Representation: 
[[Image:TATK tree.png]]

[[User:Ajvree|Ajvree]] ([[User talk:Ajvree|talk]]) 10:20, 3 December 2013 (PST)

50571 error messages replaced with single space character.

Vkuehn Week 13

2013-11-19T09:25:03Z

Vkuehn: /* 11/21/13 */

=='''Week 13 Individual Lab Notebook'''==

From this week on, your Individual Journal Assignment will be to complete your electronic laboratory notebook that records your work on your team's final project.
===11/19/13===
:Understanding the array data:
:*data from several independent L. infantum and L. major DNA microarray experiments were compiled and compared
:*compared each gene according to the normalized log2 ratio of the Alexa 647/Alexa 555 signal intensities (amastigotes/promastigotes) and to the signal mean intensity of each spot
:*Transcriptional analyses of L. infantum promastigote compared to L. infantum intracellular amastigote, and L. major promastigote compared to L. major intracellular amastigote
The full-genome DNA microarray includes one 70mer-oligonucleotide probe for each gene of L. infantum and for each gene of L.major LV39
:*Lmj Sample Info: The replicate numbers and sample IDs correspond to our txt files. (22, 24, 25, 28, 30 were dye swaps)
[[File:LmjSampleInfo.PNG]]
:You need to record a narrative that describes all of the methods that you are using in enough detail so that the instructors or your other team members can follow and reproduce your work.
Include links to artifacts you produce (files, images, testing reports, code, etc.)
===11/21/13===
Scaled and Centered the L. major data and performed statistical analysis
*Working with the data:
*#Create new worksheet and copy over all the data
*#Find STDEV and AVG for each column as new rows on top
*#Create new columns next to the sample columns, label name_SC
*#Scale and center doing (Value-AVG)/STDEV for each sample
*Inserted new worksheet called Statistics
*#Added new column: Avg_Log_FC_all
*#Delete the other columns and keep only the SC columns. Delete 2 empty rows on top
*#Compute average log full change for all of the samples ex:(=AVG(B2:E2))
*#T-test: =AVERAGE(samples)/(STDEV(samples)/(SQRT(# of replicates,4))
*#Create new column titled P-Value and Calculated the P-value (lower than 0.05)using TDIST(ABS(tstat),degrees of freedom,tails)
*Created new worksheet titled GenMAPP
*#copy over the data from Statistics worksheet (using values only)
*#Select Fold Changes column and format cells under number tab to 2 decimal places
*#Make 4 decimal places for Tstat and Pvalue
*#Cut Tstat and Pvalue columns and paste to the right of gene ID column
*#Insert "SystemCode" column next to the ID column and input "N" for all rows
*#Save as tab-delimited text file and excel file
'''Excel File''': [[File:L.majorStats.xls]]
'''Text File''': [[File:L.majorStats.txt]]

Taur.vil Week 13

2013-11-19T06:54:43Z

Taur.vil: /* gmbuilder coding in class */ bit more eclipse editing

==Laboratory Journal==

===TIGR4 IMP-EXP C2===
====Collecting Files====
*Signed off on closing Imp-Export cycle 1 due to updated files and unclear documentation
*Created section for I-E2, started with TIGR4
*Downloaded GenMapp Builder 2.0b71
*Found UniProt Complete Proteome [http://www.uniprot.org/taxonomy/?query=complete%3Ayes+ancestor%3A2+streptococcus+TIGR4&sort=score]
*Downloaded XML file from [http://www.uniprot.org/uniprot/?query=organism%3a170187+keyword%3a181&format=*]
*Named XML file: 20131118_UniProtXML_tATK_TIGR4_TPV.xml
**Uploaded to wiki: [[Media:20131118_UniProtXML_tATK_TIGR4_TPV.xml]]
**Is the 2013Nov13 version
*Went to [http://www.ebi.ac.uk/GOA/downloads] to download goa file
*Followed link to proteomes directory [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/]
*Downloaded the goa file for: 57.S_pneumoniae_TIGR4.goa
**Version last updated 2013Nov12
**Downloaded file renamed 20131118_GOA_tATK_TIGR4_TPV.goa
**Uploaded to wiki: [[Media:20131118_GOA_tATK_TIGR4_TPV.goa]]
*Went to [http://www.geneontology.org/GO.downloads.ontology.shtml] to download go-obo file
*Went to beta version and downloaded the obo-xml.gz file
**Extracted using 7zip
**Could not find version information, only showed the date it was downloaded in properties
***Was downloaded 2013Nov18
**Renamed 20131118_OBOXML_tATK_TPV.obo-xml
**Uploaded compressed file to wiki: [[Media:20131118_OBOXML_tATK_TPV.gz]]
====Import-Export Process====
*Launched pgAdminIII
*Logged in with password
*Created new database: tATK_TIGR4_2013NOV18
*opened SQL window and opened gmbuilder.sql
**Executed file
**Created the 159 tables expected
*Verified 2.0b71 was the most updated version of gmbuilder
*Launched gmbuilder-32bit.bat
**Connected with database created earlier
*Imported XML file: 20131118_UniProtXML_tATK_TIGR4_TPV.xml
*Imported and processed obo-xml file: 20131118_OBOXML_tATK_TPV.obo-xml
*Imported goa file: 20131118_GOA_tATK_TIGR4_TPV.goa
*Exported as 20131118_E2_tATK_TIGR4.gdb

==TIGR4 IMP-EXP C3==
===Collecting files===
*Verified that goa and XML files had not been updated
*Downloaded Nov. 20, 2013 obo-xml file from [http://beta.geneontology.org/page/download-ontology]
*created new database in pgAdminIII called tATK_TIGR4_2013NOV20
*Executed gmbuilder.sql (the new version, see bellow) in SQL
**Verified success by the 159 tables created
===Import-Export Cycle===
*Opened new version of gmbuilder-32bit.bat
*Connected to database created earlier
*Imported UniProt XML file: 20131118_UniProtXML_tATK_TIGR4_TPV.xml
*Imported the obo-xml file: 20131120_OBOXML_tATK_TPV.obo-xml
*Imported the goa file: 20131118_GOA_tATK_TIGR4_TPV.goa
*Exported to 20131120_E3_tATK_TIGR4.gdb

==gmbuilder coding in class==
*Checked out gmbuilder in eclipse, labeled it tATK_gmbuilder3
*Edited StrptococcusPenumoniaeTIGR4UniProtSpeciesProfile.java in the src folder to include the appropriate species id of 170189 and gene identifier of http://www.streppneumoniae.com/gene_detail_output.asp?id=2741&name=~
*Copied over dist folder (what Dondi called the baby gmbuilder) onto my flash drive
*Will use for additional export cycle.

*Discovered we had linked to the wrong database
*Went back into eclipse and linked files to http://bacteria.ensembl.org/streptococcus_pneumoniae_tigr4/Gene/Summary?g=~
*Added VC-like code to adjust gene IDs to match microarray data, XML, and database
*Copied over dist_Experiment20131121 to flash drive
*Will use for I-E C4

==Microarray Data==
*Verified that all gene IDs matched for different column
*Deleted Gene IDs except for the first column
*In the microarray data:
**SP#### is used for TIGR4
**SPN#### is used for G54
**spr#### is used for R6
**Various other formats exist for controls
*Gene names for TIGR4 database are as SP_####

==Future Steps==
Export, check for custom link, load microarray data

==Template==
{{Team ATK}}

TATK E2: TIGR4 Testing Report

2013-11-19T06:52:23Z

Taur.vil: /* Export Information */ added more version info

==Export Information==

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Tauras' Personal Computer

Postgres Database name: tATK_TIGR4_2013NOV18

UniProt XML filename: 20131118_UniProtXML_tATK_TIGR4_TPV.xml
* UniProt XML version: UniProt Release 2013_11; 2013Nov13
* Time taken to import: 2.65min

GO OBO-XML filename: 20131118_OBOXML_tATK_TPV.obo-xml
* GO OBO-XML version: 2013Nov18
* Time taken to import: 10.12min
* Time taken to process: 7.90min

GOA filename: 20131118_GOA_tATK_TIGR4_TPV.goa
* GOA version: 2013Nov12 14:49
* Time taken to import: 0.04min

Name of .gdb file: 20131118_E2_tATK_TIGR4.gdb
* Time taken to export .gdb: <1hr
**Started at 22:52
**Finished by 23:52
* Upload your file and link to it here. [[Media:20131118_E2_tATK_TIGR4.gdb]]

Note:

==TallyEngine==

Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the PostgreSQL databases (or you can upload and link to a screenshot of the results).

== Using XMLPipeDB match to Validate the XML Results from the TallyEngine==

[[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways#XMLPipeDB_Match | Follow the instructions found on this page to run XMLPipeDB match.]]

Are your results the same as you got for the TallyEngine? Why or why not?

== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine==

[[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways#SQL | Follow the instructions on this page to query the PostgreSQL Database.]]

==OriginalRowCounts Comparison==

Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.

Benchmark .gdb file: (for the [[Week 9]] Assignment, use the "Vc-Std_External_20101022.gdb" as your benchmark, downloadable from [http://sourceforge.net/projects/xmlpipedb/files/V.%20cholerae%20Gene%20Database/V.%20cholerae%2020101022/Vc-Std_External_20101022.zip/download here].

Copy the OriginalRowCounts table and paste it here:

Note:

==Visual Inspection==

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?

Note:

==.gdb Use in GenMAPP==

Note:

===Putting a gene on the MAPP using the GeneFinder window===

* Try a sample ID from each of the gene ID systems. Open the Backpage and see if all of the cross-referenced IDs that are supposed to be there are there.

Note:

===Creating an Expression Dataset in the Expression Dataset Manager===

* How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

===Coloring a MAPP with expression data===

Note:

===Running MAPPFinder===

Note:

== Compare Gene Database to Outside Resource==

The OrderedLocusNames IDs in the exported Gene Database are derived from the UniProt XML. It is a good idea to check your list of OrderedLocusNames IDs to see how complete it is using the original source of the data (the sequencing organization, the MOD, etc.) Because UniProt is a protein database, it does not reference any non-protein genome features such as genes that code for functional RNAs, centromeres, telomeres, etc.

Note:



==Template==
{{Team ATK}}

TATK Export One: TIGR4 Testing Report

2013-11-18T17:39:34Z

Ajvree: /* Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine */ SQL results/screenshot

==Export Information==

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: lab computer, front left

Postgres Database name: tATK_20131107

UniProt XML filename: [[Media:20131107_UniProtXML_tATK_TIGR4_AJV.xml|20131107_UniProtXML_tATK_TIGR4_AJV.xml]]
* UniProt XML version: 2013_10 (''has since been updated'')
* Time taken to import: 1.62min

GO OBO-XML filename: [[Media:20131107_GO-OBO_tATK_KM.obo-xml|20131107_GO-OBO_tATK_KM.obo-xml]]
* GO OBO-XML version: was last updated 2013OCT10
* Time taken to import: 5.46min
* Time taken to process: 4.09min

GOA filename: [[Media:20131107_UniProtXML_tATK_TIG4_AJV.goa|20131107_UniProtXML_tATK_TIG4_AJV.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 2013NOV12 (''will have to update'')
* Time taken to import: 0.03min

Name of .gdb file: 20131107_GenMappExport_tATK_TIGR4_TPV.gdb
* Time taken to export .gdb: Unknown, left overnight in the lab
* Upload your file and link to it here. [[Media:20131107_GenMappExport_tATK_TIGR4_TPV.gdb|20131107_GenMappExport_tATK_TIGR4_TPV.gdb]]

Note:

==TallyEngine==

'''TIGR4 Results:'''
(using gmbuilder70 version)
[[Image:TallyEngineTrial2.PNG]]

*GeneID's were not shown to a glitch in this version of gmbuilder, but will be redone using Tauras's newer version.

[[User:Ajvree|Ajvree]] ([[User talk:Ajvree|talk]]) 14:21, 20 November 2013 (PST)

== Using XMLPipeDB match to Validate the XML Results from the TallyEngine==
'''XML Match Results:'''
[[Image:20131107 XMLmatch tATK TIGR4 AJV.PNG]]

*The final count on XML match was 1 less than that of the Tally Engine results (2127 matches). XML match found all of the unique results, whereas the tally engine must have counted a repeated match into the total.

[[User:Ajvree|Ajvree]] ([[User talk:Ajvree|talk]]) 14:21, 20 November 2013 (PST)

== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine==

*Query used in SQL:
**select count(*) from genenametype where type = 'ordered locus' and value ~ 'SP_[0-9][0-9][0-9][0-9]';
*Results were identical to that of XML match: 2126
[[Image:20131119 SQLcountresults tATK TIGR4 AJV.PNG]]

[[User:Ajvree|Ajvree]] ([[User talk:Ajvree|talk]]) 14:44, 20 November 2013 (PST)

[[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways#SQL | Follow the instructions on this page to query the PostgreSQL Database.]]

==OriginalRowCounts Comparison==

Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.

Benchmark .gdb file: (for the [[Week 9]] Assignment, use the "Vc-Std_External_20101022.gdb" as your benchmark, downloadable from [http://sourceforge.net/projects/xmlpipedb/files/V.%20cholerae%20Gene%20Database/V.%20cholerae%2020101022/Vc-Std_External_20101022.zip/download here].

Copy the OriginalRowCounts table and paste it here:

TIGR4: [[Image:20131119 ogrowcounts tATK TIGR4 AJV.PNG]]
Benchmark: [[Image:20131119 benchmarkrowcounts tATK TIGR4 AJV.PNG]]

Note: a few of the rows are missing in the benchmark screenshot- could not fit all of them on screen.

[[User:Ajvree|Ajvree]] ([[User talk:Ajvree|talk]]) 14:36, 20 November 2013 (PST)

==Visual Inspection==

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?

Note:

==.gdb Use in GenMAPP==

Note:

===Putting a gene on the MAPP using the GeneFinder window===

* Try a sample ID from each of the gene ID systems. Open the Backpage and see if all of the cross-referenced IDs that are supposed to be there are there.

Note:

===Creating an Expression Dataset in the Expression Dataset Manager===

* How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

===Coloring a MAPP with expression data===

Note:

===Running MAPPFinder===

Note:

== Compare Gene Database to Outside Resource==

The OrderedLocusNames IDs in the exported Gene Database are derived from the UniProt XML. It is a good idea to check your list of OrderedLocusNames IDs to see how complete it is using the original source of the data (the sequencing organization, the MOD, etc.) Because UniProt is a protein database, it does not reference any non-protein genome features such as genes that code for functional RNAs, centromeres, telomeres, etc.

Note:

==Group Information==
[[Template: Team ATK]]
[[Category: Streptococcus pneumoniae]]
[[Category: Group Projects]]
[[Category: Team ATK]]