Difference between revisions of "Ckaplan Week 9"

From LMU BioDB 2024
Jump to navigation Jump to search
(References, acknowledgements, signature)
(uploading table)
Line 10: Line 10:
  
 
[[Media:BIOL367_S24_microarray-data_dGLN3CKAS3121.xlsx]].  
 
[[Media:BIOL367_S24_microarray-data_dGLN3CKAS3121.xlsx]].  
 +
 +
[[Media:Untitled_presentation.pdf]]
  
 
===Procedure:===
 
===Procedure:===

Revision as of 22:12, 20 March 2024

Purpose:

We're utilizing Excel to practice loading data and conducting statistical analysis specifically on the gene expression data of the "Δgln3" strain. This exercise helps us refine our skills in organizing interpreting biological data.

Methods:

We are analyzing Δgln3 data.

-filename: BIOL367_S24_microarray-data_dGLN3CKAS312.xlsx and BIOL367_S24_microarray-data_dGLN3CKAS3121 due to changes

Media:BIOL367_S24_microarray-data_dGLN3CKAS312.xlsx.

Media:BIOL367_S24_microarray-data_dGLN3CKAS3121.xlsx.

Media:Untitled_presentation.pdf

Procedure:

  • Creating a New Worksheet: I created a new worksheet named "Δgln3_ANOVA" to conduct the ANOVA analysis for the strain "Δgln3".
  • Copying Data: I copied all the data from the "Master_Sheet" worksheet and pasted it into the new "Δgln3_ANOVA" worksheet.
  • Adding Column Headers: I added five column headers to the right of the data, named "Δgln3_AvgLogFC_15", "Δgln3_AvgLogFC_30", "Δgln3_AvgLogFC_60", "Δgln3_AvgLogFC_90", and "Δgln3_AvgLogFC_120".
  • Calculating Average Log Fold Change: I used the AVERAGE function to calculate the average log fold change for each timepoint. For example, in cell B2, I typed "=AVERAGE(C2:G2)" to calculate the average log fold change at t=15 minutes for the first gene, and then I double-clicked the fill handle to copy the formula for all genes.
  • Calculating Sum of Squares (SS): In the "Δgln3_ss_HO" column, I used the SUMSQ function to calculate the sum of squares for each timepoint.
  • Calculating Total SS: In the "Δgln3_SS_full" column, I summed the SS values for each timepoint to get the total SS.
  • Calculating F-statistic: Using the formula provided, I calculated the F-statistic for each gene by replacing placeholders with the appropriate cell references.
  • Calculating p-value: I used the FDIST function to calculate the p-value for each gene based on the F-statistic and the degrees of freedom.
  • Performing Sanity Check: I applied a filter to the p-value column to display only values less than 0.05 to verify the results.
  • Calculating Bonferroni-corrected p-value: I created a new column for Bonferroni-corrected p-values and applied the correction formula, replacing any values greater than 1 with 1.
  • Calculating Benjamini & Hochberg p-value Correction:

I inserted a new worksheet named "Δgln3_ANOVA_B-H". I copied the relevant columns from the ANOVA worksheet and pasted them into the new worksheet. I sorted the data by ascending p-values and added a rank column from 1 to 6189. I calculated the B-H p-value correction using the provided formula. I corrected B-H p-values greater than 1 to 1. I sorted the data back by MasterIndex and copied corrected p-values to the ANOVA worksheet.

How many genes have p < 0.05? and what is the percentage (out of 6189)? 2531

How many genes have p < 0.01? and what is the percentage (out of 6189)? 1204

How many genes have p < 0.001? and what is the percentage (out of 6189)? 514

How many genes have p < 0.0001? and what is the percentage (out of 6189)? 180

How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 6189)? There may be an issue with the data as the number of rows seem un proportionate. Bonferroni had more data than Benjamini and Hochberg

How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 6189)? There may be an issue with the data as the number of rows seem un proportionate.

Analyzing nsir in the data: -Unadjusted P-value 0.000506764 -Bonferroni p-value 3.136364678 -B&H value of 1. -Log fold changes in increments of 15, 30, 60, 90, 120: 3.50622, 4.532, 2.759,-1.8502, -1.86742.

  • I don't think nsir changes with cold shock.

-Favorite gene: Sir2 P-value of 0.01541894, Bonferroni p-value of 95.427866, and Benjamin and Hochberg correct value of 1. Log fold changes ranging from t15, t30, t60, t90, t120: 0.50352, 0.578, 0.60005, -0.07517, -0.168375.

Conclusion:

After speaking with Andrew, we both concluded that there may be an issue with the data as the number of rows seem un proportionate. Bonferroni had more data than Benjamini and Hochberg. We are going to speak with Dr. Dahlquist in class tomorrow for clarification.

References:

Dahlquist, K. Master_sheet_dGLN3.

LMU BioDB 2024. (2024). Week 9. Retrieved Mar 20, 2024, from https://xmlpipedb.cs.lmu.edu/biodb/Spring2024/index.php/Week_9

Acknowledgements:

I worked with my homework partner, Andrew Sandler, inside and outside of class. Andrew helped me keep up and catch up in class with the Excel since he has experience from his entrepreneurship degree. After completing this week's assignment, we communicated that we both may have gotten incorrect values for a certain section.

Ckapla12 (talk) 23:09, 20 March 2024 (PDT) Except for what is noted above, this individual journal entry was completed by me and not copied from another source.


Assignment Pages

Individual Journal Entry Pages

Shared Journal Entry Pages