Difference between revisions of "Lenaolufson Week 15"
From LMU BioDB 2015
Lenaolufson (Talk | contribs) (→12/8/15: added in protocol for sanity check) |
Lenaolufson (Talk | contribs) (→12/8/15: added the protocol for creation of .gex file) |
||
Line 34: | Line 34: | ||
* the fold change cut-off of greater than 0.25 or less than -0.25 and the unadjusted p value cut off of p < 0.05 | * the fold change cut-off of greater than 0.25 or less than -0.25 and the unadjusted p value cut off of p < 0.05 | ||
**1722/3552, 48% | **1722/3552, 48% | ||
+ | *I then was ready to run my .txt file in GenMAPP. | ||
+ | *I downloaded the .gdb file from my team page [[https://xmlpipedb.cs.lmu.edu/biodb/fall2015/index.php/The_Class_Whoopers]] so that I would have it to run GenMAPP with. | ||
+ | * I opened the Expression Dataset Manger from the Data drop-down list in GenMAPP. | ||
+ | * I selected New Dataset from the Expression Datasets menu and choose the tab-delimited text file formatted for GenMAPP (.txt). | ||
+ | * Upon specifying that all data was numerical, the Expression Dataset Manager converted my data to .gex file. This process took approximately one minute to complete. In addition to converting the data to a .gex file, an exceptions file (.EX.txt) was also produced, as 342 errors were reportedly detected in the raw data. | ||
+ | ** However, there was a problem at this point because the data set had a few mistakes in it. | ||
+ | * I went back to my data sheet and with the help of Dr. Dahlquist, we discovered that some of the values were incorrect as they displayed: #DIV/0! | ||
+ | ** We then replaced all of the #DIV/0! cells with blank cells. | ||
+ | ***23 replacements for the #DIV/0! | ||
+ | *I then saved and exported this new .txt file and ran it through GenMAPP again. | ||
+ | * This resulted in fewer errors and everything was smooth. | ||
+ | **339 errors with new .txt file: [[Media:Errors in GenMAPP.png]] | ||
+ | * I customized the new Expression Dataset by creating a Color Sets= with instructions to GenMAPP for displaying data on MAPPs. The new Color Set was entitled "LogFoldChange". | ||
+ | **First, I created a criterion for this color set to label genes that demonstrated a significant ''increase'' in their expression. | ||
+ | ***I specified the Gene value as "Avg_ABC_Samples" for the Vibrio dataset. | ||
+ | ***I activated the Criteria Builder by clicking the New button and named the criterion "Increased". | ||
+ | ***I selected the color for this criterion using the color box. | ||
+ | ***I stated the criterion as follows and added it to the Criteria List: <code>[Avg_ABC_Samples] > 0.25 AND [Pvalue] < 0.05</code> | ||
+ | **Second, I created a criterion for this color set to label genes that demonstrated a significant ''decrease'' in their expression. | ||
+ | ***I specified the Gene value as "Avg_ABC_Samplesl" for the Vibrio dataset. | ||
+ | ***I activated the Criteria Builder by clicking the New button and named the criterion "Decreased". | ||
+ | ***I selected the color for this criterion using the color box. | ||
+ | ***I stated the criterion as follows and added it to the Criteria List: <code>[Avg_ABC_Samples] < -0.25 AND [Pvalue] < 0.05</code> | ||
+ | * Upon entering these color sets, I savedthe entire Expression Dataset by selecting Save from the Expression Dataset menu. | ||
+ | ** The updated .gex fie produced by this procedure can be found here: [[[File:Bpertussis CompiledRawData MS2015-3.gex]] | ||
+ | |||
− | |||
− | |||
− | |||
*links to files created: | *links to files created: | ||
** [[File:Bpertussis CompiledRawData MS2015-3.EX.txt]] | ** [[File:Bpertussis CompiledRawData MS2015-3.EX.txt]] |
Revision as of 21:43, 10 December 2015
12/8/15
- It was now time for me to prepare my file for GenMAPP, and I did so by the Vibrio cholerae instructions found here.]
- I inserted a new worksheet and named it "forGenMAPP".
- I went back to the "statistics" worksheet and Selected All and Copied.
- I went to my new sheet and clicked on cell A1 and selected Paste Special, clicked on the Values radio button, and clicked OK.
- I then deleted the ID columns besides the far left one in column A, and I deleted the second MasterIndex column because it was unnecessary.
- I added a "1" before all of the titles of columns D through I so that none of the columns would have the same names due to the replicates.
- I selected Columns V through Y (all the fold changes). I selected the menu item Format > Cells. Under the number tab, I selected 2 decimal places. I clicked OK.
- I selected all the columns containing p values. I selected the menu item Format > Cells. Under the number tab, I selected 4 decimal places. I clicked OK.
- I deleted the left-most Bonferroni p value column, preserving the one that showed the result of my "if" statement.
- I inserted a column to the right of the "ID" column. I typed the header "SystemCode" into the top cell of this column. I filled the entire column (each cell) with the letter "N".
- I selected the menu item File > Save As, and chose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu.
- After preparing it for GenMAPP, here are the .xls and .txt files:
- Then it was time to perform a sanity check, which was done using the Vibrio cholerae instructions found here.]
- I opened my spreadsheet and went to the "forGenMAPP" tab.
- I clicked on cell A1 and selected the menu item Data > Filter > Autofilter. Little drop-down arrows appeared at the top of each column. This enabled me to filter the data according to criteria I set.
- I clicked on the drop-down arrow on my "Pvalue" column. I selected "Custom". In the window that appeared, I set a criterion that filtered my data so that the Pvalue was less than 0.05.
- p-value less than 0.05: 1923/3552, 54%
- p-value less than 0.01: 1028/3552, 29%
- p-value less than 0.001: 242/3552, 7%
- p-value less than 0.0001: 40/3552, 1%
- p < 0.05 for the Bonferroni-corrected p value: 9/3552, 0.2%
- p < 0.05 for the Benjamini and Hochberg-corrected p value: 1365/3552, 38%
- Keeping the (unadjusted) "Pvalue" filter at p < 0.05, I filtered the "Avg_ABC_Samples" column to show all genes with an average log fold change greater than zero.
- 964/3552, 27%
- Keeping the (unadjusted) "Pvalue" filter at p < 0.05, I filtered the "Avg_ABC_Samples" column to show all genes with an average log fold change less than zero.
- 959/3552, 27%
- With an average log fold change of > 0.25 and p < 0.05
- 874/3552, 25%
- With an average log fold change of < -0.25 and p < 0.05
- 848/3552, 24%
- the fold change cut-off of greater than 0.25 or less than -0.25 and the unadjusted p value cut off of p < 0.05
- 1722/3552, 48%
- I then was ready to run my .txt file in GenMAPP.
- I downloaded the .gdb file from my team page [[1]] so that I would have it to run GenMAPP with.
- I opened the Expression Dataset Manger from the Data drop-down list in GenMAPP.
- I selected New Dataset from the Expression Datasets menu and choose the tab-delimited text file formatted for GenMAPP (.txt).
- Upon specifying that all data was numerical, the Expression Dataset Manager converted my data to .gex file. This process took approximately one minute to complete. In addition to converting the data to a .gex file, an exceptions file (.EX.txt) was also produced, as 342 errors were reportedly detected in the raw data.
- However, there was a problem at this point because the data set had a few mistakes in it.
- I went back to my data sheet and with the help of Dr. Dahlquist, we discovered that some of the values were incorrect as they displayed: #DIV/0!
- We then replaced all of the #DIV/0! cells with blank cells.
- 23 replacements for the #DIV/0!
- We then replaced all of the #DIV/0! cells with blank cells.
- I then saved and exported this new .txt file and ran it through GenMAPP again.
- This resulted in fewer errors and everything was smooth.
- 339 errors with new .txt file: Media:Errors in GenMAPP.png
- I customized the new Expression Dataset by creating a Color Sets= with instructions to GenMAPP for displaying data on MAPPs. The new Color Set was entitled "LogFoldChange".
- First, I created a criterion for this color set to label genes that demonstrated a significant increase in their expression.
- I specified the Gene value as "Avg_ABC_Samples" for the Vibrio dataset.
- I activated the Criteria Builder by clicking the New button and named the criterion "Increased".
- I selected the color for this criterion using the color box.
- I stated the criterion as follows and added it to the Criteria List:
[Avg_ABC_Samples] > 0.25 AND [Pvalue] < 0.05
- Second, I created a criterion for this color set to label genes that demonstrated a significant decrease in their expression.
- I specified the Gene value as "Avg_ABC_Samplesl" for the Vibrio dataset.
- I activated the Criteria Builder by clicking the New button and named the criterion "Decreased".
- I selected the color for this criterion using the color box.
- I stated the criterion as follows and added it to the Criteria List:
[Avg_ABC_Samples] < -0.25 AND [Pvalue] < 0.05
- First, I created a criterion for this color set to label genes that demonstrated a significant increase in their expression.
- Upon entering these color sets, I savedthe entire Expression Dataset by selecting Save from the Expression Dataset menu.
- The updated .gex fie produced by this procedure can be found here: [[[File:Bpertussis CompiledRawData MS2015-3.gex]]
- links to files created:
- File:Bpertussis CompiledRawData MS2015-3.EX.txt
- File:Bpertussis CompiledRawData MS2015-3.xlsx
- File:Bpertussis CompiledRawData MS2015-3.txt
- File:Bpertussis CompiledRawData MS2015-3.gex
- Media:MAPPFinder results for geneontologyresultsCriterion1-GOtxt.png
- Media:Gene ontology results.png
- Media:Errors in GenMAPP.png