Data Analysts Week 13
Jump to navigation
Jump to search
Contents
Charlotte and Katie's Data Analyst Journal
Milestone 1
Completed as of April 11th when we gave our Journal Club Presentation with Hailey Ivanson
Milestone 2
- With Quality Assurance team member Hailey Ivanson, we downloaded and examined the microarray dataset: SGD Processed Data.
- We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. The timepoint for each is either 0, 3 , 6, 12, 20, 40, 70, or 120, and the replicate number either 1, 2, or 3. We organized the data in a worksheet in an Excel workbook so that:
- ID is the first column header, and within it are all of the SGD systemic names
- Data columns are to the right, in increasing chronological order, using the column header pattern we created.
- Treatments are grouped together
- Replicates are grouped together
- We deleted the "EWEIGHT" row and "GWEIGHT" column.
- We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation
=2^<cell designation>
, and then applied it to the remaining cells. The specific commands we used are shown below.- Below Control_FC_0-1, we typed =2^B2 and applied it throughout the column.
- Below Control_FC_0-2, we typed =2^C2 and applied it throughout the column.
- Below Control_FC_0-3, we typed =2^D2 and applied it throughout the column.
- Below CHP_FC_0-1, we typed =2^H2 and applied it throughout the column.
- Below CHP_FC_0-2, we typed =2^I2 and applied it throughout the column.
- Below CHP_FC_0-3, we typed =2^J2 and applied it throughout the column.
- Below Control_FC_3-1, we typed =2^N2 and applied it throughout the column.
- Below Control_FC_3-2, we typed =2^O2 and applied it throughout the column.
- Below Control_FC_3-3, we typed =2^P2 and applied it throughout the column.
- Below CHP_FC_3-1, we typed =2^T2 and applied it throughout the column.
- Below CHP_FC_3-2, we typed =2^U2 and applied it throughout the column.
- Below CHP_FC_3-3, we typed =2^V2 and applied it throughout the column.
- Below Control_FC_6-1, we typed =2^Z2 and applied it throughout the column.
- Below Control_FC_6-2, we typed =2^AA2 and applied it throughout the column.
- Below Control_FC_6-3, we typed =2^AB2 and applied it throughout the column.
- Below CHP_FC_6-1, we typed =2^AF2 and applied it throughout the column.
- Below CHP_FC_6-2, we typed =2^AG2 and applied it throughout the column.
- Below CHP_FC_6-3, we typed =2^AH2 and applied it throughout the column.
- We then created a new worksheet labeled "with_average", in which we copy and special pasted the values of the columns with headers Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1.
- We then created new columns called Control_FC_0-avg and CHP_FC_0-avg to the right of their respective t0 timepoint trials, and then within them computed the average value of the t0 timepoint trials for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command
=AVG(B2:D2)
and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command=AVG(F2:H2)
and then applied this command to all cells in the column. - We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Fold_Change_3-1 or CHP_Fold_Change_3-1. We then calculated the fold change by dividing the first cell of each trial by the average t0 value for the respective treatment (control or CHP-treated), and then applying this throughout the column. To the right of each new column, we also created created columns with a column header either Control_Log2_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Log2_Fold_Change_3-1 or CHP_Log2_Fold_Change_3-1. We then Log2 transformed the fold changes by using
=LOG(cell designation, 2)
, where the cell designation is the first cell of the respective Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber columns. We then applied this throughout all of the cells in the Log2 column.
Milestone 3
Acknowledgements
This procedure was adapted from the Data Analysis page Milestone protocols, linked here: Data Analysis
References
LMU BioDB 2024. (2024). Week 13. Retrieved April 17, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_13