Difference between revisions of "Data Analysts Week 13"

From LMU BioDB 2024
Jump to navigation Jump to search
(Milestone 3: another typo)
(Acknowledgements: spacing issue)
Line 28: Line 28:
 
This procedure was adapted from the Data Analysis page Milestone 1, 2, and 3 protocols, linked here: [[Data Analysis]]
 
This procedure was adapted from the Data Analysis page Milestone 1, 2, and 3 protocols, linked here: [[Data Analysis]]
 
The procedure for Milestone 3 was also adapted from the steps outlined in the Week 9 assignment page.  
 
The procedure for Milestone 3 was also adapted from the steps outlined in the Week 9 assignment page.  
 +
 
Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.
 
Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.
  

Revision as of 22:03, 17 April 2024

Charlotte and Katie's Data Analyst Journal

Milestone 1

Completed as of April 11th when we gave our Journal Club Presentation with Hailey Ivanson

Milestone 2

  1. With Quality Assurance team member Hailey Ivanson, we downloaded and examined the microarray dataset: SGD Processed Data.
  2. We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber or CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. The timepoint for each is either 0, 3 , 6, 12, 20, 40, 70, or 120, and the replicate number either 1, 2, or 3. We organized the data in a worksheet in an Excel workbook so that:
    • ID is the first column header, and within it are all of the SGD systemic names
    • Data columns are to the right, in increasing chronological order, using the column header pattern we created.
    • Treatments are grouped together
    • Replicates are grouped together
    • We deleted the "EWEIGHT" row and "GWEIGHT" column.
    • We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation =2^<cell designation>, where the cell designation is the first cell of the respective Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber columns, and then applied this command to the remaining cells of the column.
    • We then created a new worksheet labeled "with_averages", in which we copy and special pasted the values of the columns with headers Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber.
    • We then created new columns called Control_FC_0-avg and CHP_FC_0-avg to the right of their respective t0 timepoint trials, and then within them computed the average value of the t0 timepoint trials for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command =AVG(B2:D2) and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command =AVG(F2:H2) and then applied this command to all cells in the column.
    • We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Fold_Change_3-1 or CHP_Fold_Change_3-1. We then calculated the fold change by dividing the first cell of each trial by the average t0 value for the respective treatment (control or CHP-treated), and then applying this throughout the column. To the right of each new column, we also created created columns with a column header either Control_Log2_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Log2_Fold_Change_3-1 or CHP_Log2_Fold_Change_3-1. We then Log2 transformed the fold changes by using =LOG(cell designation, 2), where the cell designation is the first cell of the respective Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber columns. We then applied this throughout all of the cells in the Log2 column.

Milestone 3

  1. We created a new worksheet, naming it "CHP_ANOVA".
  2. We copied all the Control_Log2_Fold_Change_timepoint-replicatenumber and CHP_Log2_Fold_Change_timepoint-replicatenumber columns and special pasted only the values into the new worksheet.
  3. To the right of each group of either the Control or CHP replicates at one timepoint, we created columns with headers in the form Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint, where timepoint is either 3, 6, 12, 20, 40, 70, or 120.
  4. In the cell below the Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint headers, we typed =AVG(replicate 1 cell designation:replicate 3 cell designation), where the cell designations are the first cells of the replicate 1 at one timepoint and the replicate 3 at that same time point. This computed the average of replicate values, and we then applied this throughout the remaining cells in the column.
  5. To the right, we then copy and special pasted the values of each Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint columns. To the right of these, we created columns with the the headers Control_ss_HO and CHP_ss_HO.
  6. In the first cell below Control_ss_HO, we typed =SUMSQ(B2,C2,D2,J2,K2,L2,R2,S2,T2,Z2,AA2,AB2,AH2,AI2,AJ2,AP2,AQ2,AR2,AX2,AY2,AZ2) and clicked enter, and below CHP_ss_HO, we typed =SUMSQ(F2,G2,H2,N2,O2,P2,V2,W2,X2,AD2,AE2,AF2,AL2,AM2,AN2,AT2,AU2,AV2,BB2,BC2,BD2) and clicked enter.

Acknowledgements

This procedure was adapted from the Data Analysis page Milestone 1, 2, and 3 protocols, linked here: Data Analysis The procedure for Milestone 3 was also adapted from the steps outlined in the Week 9 assignment page.

Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.

Kmill104 (talk) 22:50, 17 April 2024 (PDT)

References

LMU BioDB 2024. (2024). Week 13. Retrieved April 17, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_13