Difference between revisions of "Data Analysts Week 13"

From LMU BioDB 2024
Jump to navigation Jump to search
(creating milestone 3)
(Milestone 2: adding more specific steps)
Line 5: Line 5:
  
 
===Milestone 2===
 
===Milestone 2===
With Quality Assurance team member [[User:Hivanson| Hailey Ivanson]], we downloaded and examined the microarray dataset, comparing it to the samples in our journal club article. We used the processed dataset from SGD. We made a sample-data relationship table that lists all of the samples (microarray chips), noting the treatment, time point, and replicate number.
+
#With Quality Assurance team member [[User:Hivanson| Hailey Ivanson]], we downloaded and examined the microarray dataset: [https://sgd-prod-upload.s3.amazonaws.com/S000204389/Sha_2013_PMID_24073228.zip SGD Processed Data].
 
+
#We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. We organized the data in a worksheet in an Excel workbook so that:  
We came up with consistent column headers that summarize this information. We named each column CHP_LogFc_time-trial without using special characters. We organized the data in a worksheet in an Excel workbook so that:  
+
#*ID is the first column header, and within it are all of the SGD systemic names
 
+
#*Data columns are to the right, in increasing chronological order, using the column header pattern we created
-ID (GSE7645) is in the first column
+
#*Treatments are grouped together
 
+
#*Replicates are grouped together
-Data columns are to the right, in increasing chronological order, using the column header pattern we created
+
#*We deleted the "EWEIGHT" row and "GWEIGHT" column.
 
+
#*We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation <code>=2^<cell designation></code>, and then applied it to the remaining cells. The specific commands we used are shown below.
-Treatments are grouped together
+
#**Below Control_FC_0-1, we typed =2^B2 and applied it throughout the column.
 
+
#**Below Control_FC_0-2, we typed =2^C2 and applied it throughout the column.
-Replicates are grouped together
+
#**Below Control_FC_0-3, we typed =2^D2 and applied it throughout the column.
 
+
#**Below CHP_FC_0-1, we typed =2^H2 and applied it throughout the column.
-We deleted the "EWEIGHT" row and "GWEIGHT" column.
+
#**Below CHP_FC_0-2, we typed =2^I2 and applied it throughout the column.
 
+
#**Below CHP_FC_0-3, we typed =2^J2 and applied it throughout the column.
-We converted the data into Log2 fold changes (LogFC).
+
#*We then created new columns called Control_FC_0-avg and CHP_FC_0-avg, and then within them computed the average value of the t0 timepoint for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command <code>=AVG(B2:D2)<code/>, and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command <code>=AVG(F2:H2)<code/>, and then applied this command to all cells in the column.  
 
+
#*We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicate number, as in Control_Fold_Change_0-1 or CHP_Fold_Change_0-1. We then calculated the fold change by dividing each value for each timepoint by the average t0 value for the respective treatment (control or CHP-treated).
-We undid log transformations before we calculated the ratios.
 
 
 
-We created new column headers and then transformed of all the data with the equation "=2^<cell designation>"
 
 
 
-We computed the average value of the t0 timepoint for the control and CHP-treated data.
 
 
 
-We calculated the fold change by dividing each value for each timepoint by the average t0 value for the respective treatment (control or CHP-treated).
 
  
 
-We Log2 transformed the fold changes.
 
-We Log2 transformed the fold changes.

Revision as of 20:44, 17 April 2024

Charlotte and Katie's Data Analyst Journal

Milestone 1

Completed as of April 11th when we gave our Journal Club Presentation with Hailey Ivanson

Milestone 2

  1. With Quality Assurance team member Hailey Ivanson, we downloaded and examined the microarray dataset: SGD Processed Data.
  2. We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. We organized the data in a worksheet in an Excel workbook so that:
    • ID is the first column header, and within it are all of the SGD systemic names
    • Data columns are to the right, in increasing chronological order, using the column header pattern we created
    • Treatments are grouped together
    • Replicates are grouped together
    • We deleted the "EWEIGHT" row and "GWEIGHT" column.
    • We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation =2^<cell designation>, and then applied it to the remaining cells. The specific commands we used are shown below.
      • Below Control_FC_0-1, we typed =2^B2 and applied it throughout the column.
      • Below Control_FC_0-2, we typed =2^C2 and applied it throughout the column.
      • Below Control_FC_0-3, we typed =2^D2 and applied it throughout the column.
      • Below CHP_FC_0-1, we typed =2^H2 and applied it throughout the column.
      • Below CHP_FC_0-2, we typed =2^I2 and applied it throughout the column.
      • Below CHP_FC_0-3, we typed =2^J2 and applied it throughout the column.
    • We then created new columns called Control_FC_0-avg and CHP_FC_0-avg, and then within them computed the average value of the t0 timepoint for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command =AVG(B2:D2), and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command =AVG(F2:H2), and then applied this command to all cells in the column.
    • We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicate number, as in Control_Fold_Change_0-1 or CHP_Fold_Change_0-1. We then calculated the fold change by dividing each value for each timepoint by the average t0 value for the respective treatment (control or CHP-treated).

-We Log2 transformed the fold changes.

Milestone 3

Acknowledgements

This procedure was adapted from the Data Analysis page Milestone protocols, linked here: Data Analysis

References

LMU BioDB 2024. (2024). Week 13. Retrieved April 17, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_13