Asandle1 Week 9
Contents
Purpose
The purpose of this assignment is to get experience with the analysis part of the "data life cycle" for DNA microarray datasets and to "develop an intuition about different p-value cut-offs and their meanings.
It also is to show how to keep a detailed electronic lab notebook to make our research reproducible to develop an intuition about what different p-value cut-offs mean.
Methods/Results/Electronic Notebook
1) Downloaded the gln3 data
2) Opened in Excel
3) Enabled Editing
4) Saved File with AS&CK_ before the rest of its name
5) Add a new sheet called ANOVA_dGLN3
6) Copied the Data from the Master_Sheet_dGLN3 to the new ANOVA_dGLN3
7) Created columns for Average Log called dGLN3_AvgLogFC_t for each time period adding 15,30,60,90, and 120 at the end.
8) Then we computed the average for the first row of t 15 time points of 2 through 5.
9) We copied down the formula all the rows
10)We noticed a DIV/0 Error for some rows since the data was blank.
11)Repeated the same process for the rest.
12)Creating a column for dGLN3_ss_HO for the sum of squares.
13) Highlighted the columns for all times for this and did =sumsq(all relevant data in the row)
14) Created Columns for dGLN3_ss_t15 through t120
15) Did the same process highlight the data in a row that applied to the specific time t.
16) Double clicked to expand
17) Was not the correct equation so deleted everything in the sumsq= that we just did except for the dGLN_ss_HO which was still correct
18) Instead we do number 14 in the worksheet.
19) For dGLN_ss_t15 the equation was =SUMSQ(D2:G2)-COUNTA(D2:G2)*X2^2
20) After completed copied down all the columns.
21) Created dGLN3_SS_full to the right.
22) Created the dGLN_Fstat to the
23) Took a weekend break
24) Back to the task
25) Added =SUM of the ss rows for the dGLN_SS_full
26) Double Clicked to Drop down for all of them
27) Counted my n for the fstat to be 20.
28) There was an error in my SS_Full, I should have only included the time times t15-120 and I accidentally included HO, removed it and the problem was fixed.
29) Now adding the Bonferroni and p value corrections
30) Created a box called dGLN3_Bonferroni_p-value and in the cell did = and selected AK2. Then did *6189.
31) Double clicked to fill the column.
32) Created a new column, dGLN3_Bonferroni_p-value correction. Entered =if(AL2>1,1,AL2) and then double clicked to expand down the formula.
33) Added a new worksheet to begin the Benjamini & Hochberg p value correction
34) Named worksheet dGLN3_ANOVA_B-H
35)Followed the Calculate the rest of the Benjamini & Hochberg p value correction section exactly.
36) Saved to upload to the wiki.
37) Uploaded to Wiki with same file name.
38) Continuing, now with the "Sanity Check".
39) Following the instruction and opening the dGLN3_ANOVA worksheet.
40) Did the Data Ribon > Filter. This is a shortcut to autofilter instead of using the top bars.
41) Expanded all the column headers so that they wouldn't be cut off by selecting all of them and double clicking between the column letter markers.
42) Clicked on the dropdown for the unadjusted p-value.
43) I selected a criterion to only show p-values less than to 0.05. There are 2531 genes with a p < 0.05.
44) I repeated the process of checking criterion but for 0.01, 0.001, and 0.0001. The results were:
1204 at 0.01, 514 at 0.001, 180 at 0.0001
45)Copying these into a new worksheet called data_checks.
46) Removed the filter on the dGLN_ANOVA worksheet. and verified the total entries are 6189.
47) Did the calculations for the % each number was out of 6189. These are the answers rounded. 40.90%, 19.45%, 8.31%, 2.91%
48) Did the same filtering for the Bonferroni and B&H p values for 0.05.
49) Added the answer to the data_checks sheet.
50) For some reason, my B&H had less results than the Bonferroni correction which makes me think there is somewhere that I did something wrong. For Bonferroni, I got 45 rows and for B&H I only have 10.
51) Now I am looking for NSR1. Searched for YGR159C and highlighted the row yellow and YGR159C and NSR1 boxes green. Its unadjusted P-value is 0.000506764. It's Bonferroni p-value is 3.136364678. It's B&H correct value is 1.
52) The average log fold change are as follows for YGR159C: t15 = 3.506225, t30 = 4.5319, t60 = 2.7592, t90 = -1.85025, t120 = -1.867425.
53) NSR1 does have a P-value of 0.0005 but that is unadjusted. It does not seem to change to cold shock when looking at the Bonferroni or B&H values.
54) Looking now at my favorite gene Sir2. YDL042C.
55) Located YDL042C and highlighted the row orange.
56) YDL042C's unadjusted P-value is 0.015418948. It's Bonferroni p-value is 95.42786686. It's B&H correct value is 1.
57) The average log fold change are as follows for YGR159C: t15 = 0.503525, t30 = 1.5781, t60 = 0.60005, t90 = -0.075175, t120 = -0.168375.
58) Figured out the error in the values with Dr. Dahlquist's help. It was the B-H
59) Delete old B-H values since they were not correct.
60) Unfiltered and made sure the orders on both sheets were from smallest number to largest id.
61) replaced the B-H values with the new ones. Updated the filter for less than 0.05, got 1185 values
62) The new %'s are 40.90%, 19.45%, 8.31%, 2.91%, 0.727%, 19.147%
Data and Files
Media:AS&CK_BIOL367_S24_microarray-data_dGLN3.xlsx
Media:BIOL367_S24_Andrew_dGLN3_p-value_slide.pptx
Conclusion
I am not sure we have concluded this task yet. I think the conclusion will come with part 2.
Acknowledgements & References
Acknowledgments
- I worked on this assignment with Charlotte Kaplan in class. I helped her with a lot of her excel stuff because I had more experience. I texted her at 5:57 pm on wednesday about the p-values for B&H being strangely low for me. For the most part I did the assignment on my own and really was helping her catch up to me.
- Dr. Dahlquist was very helpful in class when I ran into some errors with the data. The specific errors are in the electronic notebook.
- I used the Week 9 instructions and followed them.
- Except for what is noted above, this individual journal entry was completed by me and not copied from another source.
18:04, 20 March 2024 (PDT)
References
- Dahlquist, K. Master_sheet_dGLN3.
- LMU BioDB 2024. (2024). Week 9. Retrieved Mar 20, 2024, from https://xmlpipedb.cs.lmu.edu/biodb/Spring2024/index.php/Week_9
To User Page: User: Asandle1
To Template: Template:Asandle1