LMU BioDB 2024 - User contributions [en]

Data Analysts Week 14

2024-05-03T19:12:29Z

Kmill104: /* Milestone 4 */

===Continuing Milestone 3===
[[User:Hivanson| Hailey Ivanson]] helped Katie and I with the Bonferroni and B-H values.
#We used the formula =IFCHP_Bonferroni_p-value>1,1,CHP_Bonferroni_p-value) and =IFControl_Bonferroni_p-value>1,1,Control_Bonferroni_p-value)
#we inserted a new worksheet named "CHP_ANOVA_B-H" and "Control_ANOVA_B-H."
#we copied and pasted the "MasterIndex", "ID", and "Standard Name" columns from our previous worksheet into the first two columns of the new worksheet.
#We used Paste special > Paste values and copied our unadjusted p values from our ANOVA worksheet and pasted it into Column D.
#We selected all of columns A, B, C, and D and sorted by ascending values, smallest to largest.
#We typed "Rank" in cell E1. we typed "1" into cell E2 and "2" into cell E3. Then we selected both cells E2 and E3 and double clicked on the plus sign to fill the column with a series of numbers from 1 to 4697.
#[[User:Hivanson| Hailey Ivanson]] assisted us in calculating the Benjamini and Hochberg p value correction. We typed CHP_B-H_p-value and repeated for the control. We copied that equation to the entire column. =(D2*4697)/E2
#We then typed "CHP_B-H_p-value" into cell G1.
#In cell G2: we used the equation =IF(F2>1,1,F2) and copied that equation to the entire column.
#We selected columns A through G and sorted them in ascending order.
#We copied column G and used Paste special > Paste values to paste it into the next column of our ANOVA sheet.
#We zipped and uploaded the .xlsx file.
#We performed a sanity check by selecting row 1: Data > Filter > Autofilter- p value less than 0.05

Sanity Check Results:
*CONTROL
How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 4697)?
3699; 79%
<.01
3219; 68%
<.001
2558; 54%
<.0001
1921; 41%
<.00001
1325; 28%

*CHP
How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 4697)?
2863; 61%
<.01
2403; 51%
<.001
1884; 40%
<.0001
1435; 31%
<.00001
1076; 22%

Our suitable p-value cut-off was chosen to be <.00001, as it was the closet value to 25%

===Milestone 4===
#We inserted a new worksheet into our Excel workbook and named it "CHP_stem".
#We selected all of the data from the "CHP_ANOVA" worksheet and Paste special > paste values into the "CHP_stem" worksheet.
#"Master_Index" was renamed to "SPOT". Column B named "ID" was renamed to "Gene Symbol". We deleted the column named "Standard_Name".
#We filtered the data on the B-H corrected p value to be > 0.0001.
#We selected all of the rows except for the header row and deleted the rows by right-clicking and choosing "Delete Row" from the context menu. Then we undid this filter.
#We deleted all of the data columns except for the Average Log Fold change columns for each timepoint.
#we renamed the data columns with just the time and units.
#We clicked "Replace all" to remove the #DIV/0! errors.
#We saved this spreadsheet as Text (Tab-delimited) (*.txt).
#We downloaded the stem.zip file and selected "Extract all" from the menu, creating a folder called stem.
#We clicked on the "Profile GO Table" to see the list of Gene Ontology. We saved these files to our desktop.
#We clicked on the "Profile Gene Table" button to see the list of genes belonging to each profile, and then downloaded these files to BOX.

We made adjustments to the GO term analysis by using the GO enrichment tool at GeneOntology.org by going to [http://geneontology.org/ http://geneontology.org/] and then analyzing our clusters 41 and 7. We copied over each cluster's gene list from their txt. file converted to an Excel file into the "GO Enrichment Analysis" box, chose "Saccharomyces cerevisiae" from the drop-down menu, and then clicked "Launch". We then exported each table and saved as txt. files in BOX, linked here: [https://lmu.app.box.com/file/1512453796651 GO Terms 41], [https://lmu.app.box.com/file/1512450115120 GO Terms 7]

#*We ran YEASTRACT according to the protocol listed in Week 10 for clusters 41 and 7.
#*We opened the gene list file in Excel for cluster 41 and copied the list of gene IDs.
#*We went to the YEASTRACT website at [http://www.yeastract.com/ YEASTRACT database], and clicked on "Rank by TF" in the left panel of the window. We pasted in our gene list for 41 into the box called "ORFs/Genes".
#* We checked the box for ''Check for all TFs''.
#* We accepted the defaults for the Regulations Filter (Documented, DNA binding or expression evidence)
#* We didn't apply a filter for "Filter Documented Regulations by environmental condition".
#* We ranked genes by TF using: The % of genes in the list and in YEASTRACT regulated by each TF.
#* We clicked the ''Search'' button.
#* We copied the table of results from the web page and pasted it into a new Excel workbook to preserve the results, which was uploaded to BOX as a txt. file and linked here: [https://lmu.app.box.com/file/1512473661599 YEASTRACT 41]
#* These steps were then repeated for Cluster 7, results were uploaded to BOX and linked here: [https://lmu.app.box.com/file/1512481065512 YEASTRACT 7]

4/24/24
Milestone 4
P value >0.00001
41 and 7 significant profiles

We then chose 23 transcription factors from profile 41, listed here.
Rpn4
Gcn4
Pdr1
Xbp1
Met28
Mga2
Spt23
Bas1
Yap1
Sok2
Msn2
Crz1
Rlm1
Fhl1
Pdr3
Cbf1
Rph1
Met31
Stp1
Msn4
Tec1
Rgm1
Stp2

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone 3 and 4 protocol, linked here: [[Data Analysis]] The procedure for Milestone 3 was also adapted from the steps outlined in the [[Week 9]] assignment page. The procedure for Milestone 4 was also adapted from the steps outlined in the [[Week 10]] assignment page. Our quality assurance, [[User:Hivanson| Hailey Ivanson]] was a key part in completing this milestone, and her help was very valuable.
Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.

[[User:Ckapla12|Ckapla12]] ([[User talk:Ckapla12|talk]]) 14:28, 23 April 2024 (PDT)

[[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 19:15, 2 May 2024 (PDT)

==References==
LMU BioDB 2024. (2024). Week 14. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_14

LMU BioDB 2024. (2024). Week 9. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_9

LMU BioDB 2024. (2024). Week 10. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_10

LMU BioDB 2024. (2024). Data Analysis. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Data_Analysis

{{Yeast Beasts}}
[[Category:Journal Entry]]
[[Category:Team Project]]

Data Analysts Week 14

2024-05-03T19:11:22Z

Kmill104: /* Milestone 4 */

===Continuing Milestone 3===
[[User:Hivanson| Hailey Ivanson]] helped Katie and I with the Bonferroni and B-H values.
#We used the formula =IFCHP_Bonferroni_p-value>1,1,CHP_Bonferroni_p-value) and =IFControl_Bonferroni_p-value>1,1,Control_Bonferroni_p-value)
#we inserted a new worksheet named "CHP_ANOVA_B-H" and "Control_ANOVA_B-H."
#we copied and pasted the "MasterIndex", "ID", and "Standard Name" columns from our previous worksheet into the first two columns of the new worksheet.
#We used Paste special > Paste values and copied our unadjusted p values from our ANOVA worksheet and pasted it into Column D.
#We selected all of columns A, B, C, and D and sorted by ascending values, smallest to largest.
#We typed "Rank" in cell E1. we typed "1" into cell E2 and "2" into cell E3. Then we selected both cells E2 and E3 and double clicked on the plus sign to fill the column with a series of numbers from 1 to 4697.
#[[User:Hivanson| Hailey Ivanson]] assisted us in calculating the Benjamini and Hochberg p value correction. We typed CHP_B-H_p-value and repeated for the control. We copied that equation to the entire column. =(D2*4697)/E2
#We then typed "CHP_B-H_p-value" into cell G1.
#In cell G2: we used the equation =IF(F2>1,1,F2) and copied that equation to the entire column.
#We selected columns A through G and sorted them in ascending order.
#We copied column G and used Paste special > Paste values to paste it into the next column of our ANOVA sheet.
#We zipped and uploaded the .xlsx file.
#We performed a sanity check by selecting row 1: Data > Filter > Autofilter- p value less than 0.05

Sanity Check Results:
*CONTROL
How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 4697)?
3699; 79%
<.01
3219; 68%
<.001
2558; 54%
<.0001
1921; 41%
<.00001
1325; 28%

*CHP
How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 4697)?
2863; 61%
<.01
2403; 51%
<.001
1884; 40%
<.0001
1435; 31%
<.00001
1076; 22%

Our suitable p-value cut-off was chosen to be <.00001, as it was the closet value to 25%

===Milestone 4===
#We inserted a new worksheet into our Excel workbook and named it "CHP_stem".
#We selected all of the data from the "CHP_ANOVA" worksheet and Paste special > paste values into the "CHP_stem" worksheet.
#"Master_Index" was renamed to "SPOT". Column B named "ID" was renamed to "Gene Symbol". We deleted the column named "Standard_Name".
#We filtered the data on the B-H corrected p value to be > 0.0001.
#We selected all of the rows except for the header row and deleted the rows by right-clicking and choosing "Delete Row" from the context menu. Then we undid this filter.
#We deleted all of the data columns except for the Average Log Fold change columns for each timepoint.
#we renamed the data columns with just the time and units.
#We clicked "Replace all" to remove the #DIV/0! errors.
#We saved this spreadsheet as Text (Tab-delimited) (*.txt).
#We downloaded the stem.zip file and selected "Extract all" from the menu, creating a folder called stem.
#We clicked on the "Profile GO Table" to see the list of Gene Ontology. We saved these files to our desktop.
#We clicked on the "Profile Gene Table" button to see the list of genes belonging to each profile, and then downloaded these files to BOX.

We made adjustments to the GO term analysis by using the GO enrichment tool at GeneOntology.org by going to [http://geneontology.org/ http://geneontology.org/] and then analyzing our clusters 41 and 7. We copied over each cluster's gene list from their txt. file converted to an Excel file into the "GO Enrichment Analysis" box, chose "Saccharomyces cerevisiae" from the drop-down menu, and then clicked "Launch". We then exported each table and saved as txt. files in BOX, linked here: [https://lmu.app.box.com/file/1512453796651 GO Terms 41], [https://lmu.app.box.com/file/1512450115120 GO Terms 7]

#*We ran YEASTRACT according to the protocol listed in Week 10 for clusters 41 and 7.
#*We opened the gene list file in Excel for clusters 41 and 7.
#*We went to the YEASTRACT website at [http://www.yeastract.com/ YEASTRACT database], and clicked on "Rank by TF" in the left panel of the window. We pasted in our gene list for 41 into the box called "ORFs/Genes".
#* We copied the list of gene IDs, which we then
#* We checked the box for ''Check for all TFs''.
#* We accepted the defaults for the Regulations Filter (Documented, DNA binding or expression evidence)
#* We didn't apply a filter for "Filter Documented Regulations by environmental condition".
#* We ranked genes by TF using: The % of genes in the list and in YEASTRACT regulated by each TF.
#* We clicked the ''Search'' button.
#* We copied the table of results from the web page and pasted it into a new Excel workbook to preserve the results, which was uploaded to BOX as a txt. file and linked here: [https://lmu.app.box.com/file/1512473661599 YEASTRACT 41]
#* These steps were then repeated for Cluster 7, results were uploaded to BOX and linked here: [https://lmu.app.box.com/file/1512481065512 YEASTRACT 7]

4/24/24
Milestone 4
P value >0.00001
41 and 7 significant profiles

We then chose 23 transcription factors from profile 41, listed here.
Rpn4
Gcn4
Pdr1
Xbp1
Met28
Mga2
Spt23
Bas1
Yap1
Sok2
Msn2
Crz1
Rlm1
Fhl1
Pdr3
Cbf1
Rph1
Met31
Stp1
Msn4
Tec1
Rgm1
Stp2

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone 3 and 4 protocol, linked here: [[Data Analysis]] The procedure for Milestone 3 was also adapted from the steps outlined in the [[Week 9]] assignment page. The procedure for Milestone 4 was also adapted from the steps outlined in the [[Week 10]] assignment page. Our quality assurance, [[User:Hivanson| Hailey Ivanson]] was a key part in completing this milestone, and her help was very valuable.
Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.

[[User:Ckapla12|Ckapla12]] ([[User talk:Ckapla12|talk]]) 14:28, 23 April 2024 (PDT)

[[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 19:15, 2 May 2024 (PDT)

==References==
LMU BioDB 2024. (2024). Week 14. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_14

LMU BioDB 2024. (2024). Week 9. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_9

LMU BioDB 2024. (2024). Week 10. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_10

LMU BioDB 2024. (2024). Data Analysis. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Data_Analysis

{{Yeast Beasts}}
[[Category:Journal Entry]]
[[Category:Team Project]]

Data Analysts Week 15

2024-05-03T18:58:37Z

Kmill104: talk

===Milestone 5===
#We examined the YEASTRACT output for profile 41 in Excel, and selected the first 23 transcription factors for analysis. This list was sent to the Coder/Designers, who found 22 of these factors in the database.
#[[User:Msymond1| Dean Symonds]] assisted us in creating an input workbook for GRNmap using queries to the Microsoft Access database created by the Coders/Designers. The query designs are linked here: [https://lmu.app.box.com/folder/261861431194?s=y6x97kqcjlxfezhnbpvck4h4oe4pm366]

The queries were created to fill our workbook sheets as follows:
*production_rates
*degradation_rates
*optimization_parameters
*threshold_b

The input data is linked here: [https://lmu.app.box.com/file/1518231373852]

*This data was sent to Dr. Dahlquist, who ran GRNmap and uploaded the GRNmap output to BOX.

After obtaining out GRNmap results, we got together as a group to work on our presentation, paper, and uploading the remaining deliverables to our page.

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone 3 and 4 protocol, linked here: [[Data Analysis]]

The procedure for Milestone 3 was also adapted from the steps outlined in the [[Week 9]] assignment page.

The procedure for Milestone 4 was also adapted from the steps outlined in the [[Week 10]] assignment page.

For Milestone 5, we followed the procedure listed on the [[Data Analysis]] page.
Our quality assurance, [[User:Hivanson| Hailey Ivanson]] was a key part in completing these milestones, and her help was very valuable.

[[User:Msymond1| Dean Symonds]] assisted us in creating an input workbook for GRNmap using queries to the Microsoft Access database created by the Coders/Designers.

==References==
LMU BioDB 2024. (2024). Week 14. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_14

LMU BioDB 2024. (2024). Week 15. Retrieved May 2, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_15

LMU BioDB 2024. (2024). Week 9. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_9

LMU BioDB 2024. (2024). Week 10. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_10

LMU BioDB 2024. (2024). Data Analysis. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Data_Analysis

{{Yeast Beasts}}
[[Category:Journal Entry]]
[[Category:Team Project]]

Except for what is noted above, this entry was completed by Katie and Charlotte and not copied from another source. [[User:Ckapla12|Ckapla12]] ([[User talk:Ckapla12|talk]]) 19:10, 2 May 2024 (PDT)

[[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 11:58, 3 May 2024 (PDT)

Data Analysts Week 14

2024-05-03T18:58:13Z

Kmill104: /* Milestone 4 */ adding final steps

===Continuing Milestone 3===
[[User:Hivanson| Hailey Ivanson]] helped Katie and I with the Bonferroni and B-H values.
#We used the formula =IFCHP_Bonferroni_p-value>1,1,CHP_Bonferroni_p-value) and =IFControl_Bonferroni_p-value>1,1,Control_Bonferroni_p-value)
#we inserted a new worksheet named "CHP_ANOVA_B-H" and "Control_ANOVA_B-H."
#we copied and pasted the "MasterIndex", "ID", and "Standard Name" columns from our previous worksheet into the first two columns of the new worksheet.
#We used Paste special > Paste values and copied our unadjusted p values from our ANOVA worksheet and pasted it into Column D.
#We selected all of columns A, B, C, and D and sorted by ascending values, smallest to largest.
#We typed "Rank" in cell E1. we typed "1" into cell E2 and "2" into cell E3. Then we selected both cells E2 and E3 and double clicked on the plus sign to fill the column with a series of numbers from 1 to 4697.
#[[User:Hivanson| Hailey Ivanson]] assisted us in calculating the Benjamini and Hochberg p value correction. We typed CHP_B-H_p-value and repeated for the control. We copied that equation to the entire column. =(D2*4697)/E2
#We then typed "CHP_B-H_p-value" into cell G1.
#In cell G2: we used the equation =IF(F2>1,1,F2) and copied that equation to the entire column.
#We selected columns A through G and sorted them in ascending order.
#We copied column G and used Paste special > Paste values to paste it into the next column of our ANOVA sheet.
#We zipped and uploaded the .xlsx file.
#We performed a sanity check by selecting row 1: Data > Filter > Autofilter- p value less than 0.05

Sanity Check Results:
*CONTROL
How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 4697)?
3699; 79%
<.01
3219; 68%
<.001
2558; 54%
<.0001
1921; 41%
<.00001
1325; 28%

*CHP
How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 4697)?
2863; 61%
<.01
2403; 51%
<.001
1884; 40%
<.0001
1435; 31%
<.00001
1076; 22%

Our suitable p-value cut-off was chosen to be <.00001, as it was the closet value to 25%

===Milestone 4===
#We inserted a new worksheet into our Excel workbook and named it "CHP_stem".
#We selected all of the data from the "CHP_ANOVA" worksheet and Paste special > paste values into the "CHP_stem" worksheet.
#"Master_Index" was renamed to "SPOT". Column B named "ID" was renamed to "Gene Symbol". We deleted the column named "Standard_Name".
#We filtered the data on the B-H corrected p value to be > 0.0001.
#We selected all of the rows except for the header row and deleted the rows by right-clicking and choosing "Delete Row" from the context menu. Then we undid this filter.
#We deleted all of the data columns except for the Average Log Fold change columns for each timepoint.
#we renamed the data columns with just the time and units.
#We clicked "Replace all" to remove the #DIV/0! errors.
#We saved this spreadsheet as Text (Tab-delimited) (*.txt).
#We downloaded the stem.zip file and selected "Extract all" from the menu, creating a folder called stem.
#We clicked on the "Profile GO Table" to see the list of Gene Ontology. We saved these files to our desktop.
#We clicked on the "Profile Gene Table" button to see the list of genes belonging to each profile, and then downloaded these files to BOX.

We made adjustments to the GO term analysis by using the GO enrichment tool at GeneOntology.org by going to [http://geneontology.org/ http://geneontology.org/] and then analyzing our clusters 41 and 7. We copied over each cluster's gene list from their txt. file converted to an Excel file into the "GO Enrichment Analysis" box, chose "Saccharomyces cerevisiae" from the drop-down menu, and then clicked "Launch". We then exported each table and saved as txt. files in BOX, linked here: [https://lmu.app.box.com/file/1512453796651 GO Terms 41], [https://lmu.app.box.com/file/1512450115120 GO Terms 7]

#*We ran YEASTRACT according to the protocol listed in Week 10 for clusters 41 and 7.
#*We opened the gene list file in Excel for clusters 41 and 7.
#*We went to the YEASTRACT website at [http://www.yeastract.com/ YEASTRACT database], and clicked on "Rank by TF" in the left panel of the window. We pasted in our gene list for 41 into the box called "ORFs/Genes".
#* We copied the list of gene IDs, which we then
# Launch a web browser and go to the [http://www.yeastract.com/ YEASTRACT database].
#* On the left panel of the window, click on the link to [http://www.yeastract.com/formrankbytf.php ''Rank by TF''].
#* Paste your list of genes from your cluster into the box labeled ''ORFs/Genes''.
#* We checked the box for ''Check for all TFs''.
#* We accepted the defaults for the Regulations Filter (Documented, DNA binding or expression evidence)
#* We didn't apply a filter for "Filter Documented Regulations by environmental condition".
#* We ranked genes by TF using: The % of genes in the list and in YEASTRACT regulated by each TF.
#* We clicked the ''Search'' button.
#* We copied the table of results from the web page and pasted it into a new Excel workbook to preserve the results, which was uploaded to BOX as a txt. file and linked here: [https://lmu.app.box.com/file/1512473661599 YEASTRACT 41]
#* These steps were then repeated for Cluster 7, results were uploaded to BOX and linked here: [https://lmu.app.box.com/file/1512481065512 YEASTRACT 7]

4/24/24
Milestone 4
P value >0.00001
41 and 7 significant profiles

We then chose 23 transcription factors from profile 41, listed here.
Rpn4
Gcn4
Pdr1
Xbp1
Met28
Mga2
Spt23
Bas1
Yap1
Sok2
Msn2
Crz1
Rlm1
Fhl1
Pdr3
Cbf1
Rph1
Met31
Stp1
Msn4
Tec1
Rgm1
Stp2

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone 3 and 4 protocol, linked here: [[Data Analysis]] The procedure for Milestone 3 was also adapted from the steps outlined in the [[Week 9]] assignment page. The procedure for Milestone 4 was also adapted from the steps outlined in the [[Week 10]] assignment page. Our quality assurance, [[User:Hivanson| Hailey Ivanson]] was a key part in completing this milestone, and her help was very valuable.
Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.

[[User:Ckapla12|Ckapla12]] ([[User talk:Ckapla12|talk]]) 14:28, 23 April 2024 (PDT)

[[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 19:15, 2 May 2024 (PDT)

==References==
LMU BioDB 2024. (2024). Week 14. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_14

LMU BioDB 2024. (2024). Week 9. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_9

LMU BioDB 2024. (2024). Week 10. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_10

LMU BioDB 2024. (2024). Data Analysis. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Data_Analysis

{{Yeast Beasts}}
[[Category:Journal Entry]]
[[Category:Team Project]]

Data Analysts Week 14

2024-05-03T18:46:36Z

Kmill104: /* Milestone 4 */ adding link titles

===Continuing Milestone 3===
[[User:Hivanson| Hailey Ivanson]] helped Katie and I with the Bonferroni and B-H values.
#We used the formula =IFCHP_Bonferroni_p-value>1,1,CHP_Bonferroni_p-value) and =IFControl_Bonferroni_p-value>1,1,Control_Bonferroni_p-value)
#we inserted a new worksheet named "CHP_ANOVA_B-H" and "Control_ANOVA_B-H."
#we copied and pasted the "MasterIndex", "ID", and "Standard Name" columns from our previous worksheet into the first two columns of the new worksheet.
#We used Paste special > Paste values and copied our unadjusted p values from our ANOVA worksheet and pasted it into Column D.
#We selected all of columns A, B, C, and D and sorted by ascending values, smallest to largest.
#We typed "Rank" in cell E1. we typed "1" into cell E2 and "2" into cell E3. Then we selected both cells E2 and E3 and double clicked on the plus sign to fill the column with a series of numbers from 1 to 4697.
#[[User:Hivanson| Hailey Ivanson]] assisted us in calculating the Benjamini and Hochberg p value correction. We typed CHP_B-H_p-value and repeated for the control. We copied that equation to the entire column. =(D2*4697)/E2
#We then typed "CHP_B-H_p-value" into cell G1.
#In cell G2: we used the equation =IF(F2>1,1,F2) and copied that equation to the entire column.
#We selected columns A through G and sorted them in ascending order.
#We copied column G and used Paste special > Paste values to paste it into the next column of our ANOVA sheet.
#We zipped and uploaded the .xlsx file.
#We performed a sanity check by selecting row 1: Data > Filter > Autofilter- p value less than 0.05

Sanity Check Results:
*CONTROL
How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 4697)?
3699; 79%
<.01
3219; 68%
<.001
2558; 54%
<.0001
1921; 41%
<.00001
1325; 28%

*CHP
How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 4697)?
2863; 61%
<.01
2403; 51%
<.001
1884; 40%
<.0001
1435; 31%
<.00001
1076; 22%

Our suitable p-value cut-off was chosen to be <.00001, as it was the closet value to 25%

===Milestone 4===
#We inserted a new worksheet into our Excel workbook and named it "CHP_stem".
#We selected all of the data from the "CHP_ANOVA" worksheet and Paste special > paste values into the "CHP_stem" worksheet.
#"Master_Index" was renamed to "SPOT". Column B named "ID" was renamed to "Gene Symbol". We deleted the column named "Standard_Name".
#We filtered the data on the B-H corrected p value to be > 0.0001.
#We selected all of the rows except for the header row and deleted the rows by right-clicking and choosing "Delete Row" from the context menu. Then we undid this filter.
#We deleted all of the data columns except for the Average Log Fold change columns for each timepoint.
#we renamed the data columns with just the time and units.
#We clicked "Replace all" to remove the #DIV/0! errors.
#We saved this spreadsheet as Text (Tab-delimited) (*.txt).
#We downloaded the stem.zip file and selected "Extract all" from the menu, creating a folder called stem.
#We clicked on the "Profile GO Table" to see the list of Gene Ontology. We saved these files to our desktop.
#We clicked on the "Profile Gene Table" button to see the list of genes belonging to each profile, and then downloaded these files to BOX.

We made adjustments to the GO term analysis by using the GO enrichment tool at GeneOntology.org by going to [http://geneontology.org/ http://geneontology.org/] and then analyzing our clusters 41 and 7. We copied over each cluster's gene list from their txt. file converted to an Excel file into the "GO Enrichment Analysis" box, chose "Saccharomyces cerevisiae" from the drop-down menu, and then clicked "Launch". We then exported each table and saved as txt. files in BOX, linked here: [https://lmu.app.box.com/file/1512453796651 GO Terms 41], [https://lmu.app.box.com/file/1512450115120 GO Terms 7]

4/24/24
Milestone 4
P value >0.00001
41 and 7 significant profiles

Profile 41 grnsight:
Rpn4
Gcn4
Pdr1
Xbp1
Met28
Mga2
Spt23
Bas1
Yap1
Sok2
Msn2
Crz1
Rlm1
Fhl1
Pdr3
Cbf1
Rph1
Met31
Stp1
Msn4
Tec1
Rgm1
Stp2

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone 3 and 4 protocol, linked here: [[Data Analysis]] The procedure for Milestone 3 was also adapted from the steps outlined in the [[Week 9]] assignment page. The procedure for Milestone 4 was also adapted from the steps outlined in the [[Week 10]] assignment page. Our quality assurance, [[User:Hivanson| Hailey Ivanson]] was a key part in completing this milestone, and her help was very valuable.
Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.

[[User:Ckapla12|Ckapla12]] ([[User talk:Ckapla12|talk]]) 14:28, 23 April 2024 (PDT)

[[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 19:15, 2 May 2024 (PDT)

==References==
LMU BioDB 2024. (2024). Week 14. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_14

LMU BioDB 2024. (2024). Week 9. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_9

LMU BioDB 2024. (2024). Week 10. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_10

LMU BioDB 2024. (2024). Data Analysis. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Data_Analysis

{{Yeast Beasts}}
[[Category:Journal Entry]]
[[Category:Team Project]]

Data Analysts Week 14

2024-05-03T18:45:37Z

Kmill104: /* Milestone 4 */ adding more steps

===Continuing Milestone 3===
[[User:Hivanson| Hailey Ivanson]] helped Katie and I with the Bonferroni and B-H values.
#We used the formula =IFCHP_Bonferroni_p-value>1,1,CHP_Bonferroni_p-value) and =IFControl_Bonferroni_p-value>1,1,Control_Bonferroni_p-value)
#we inserted a new worksheet named "CHP_ANOVA_B-H" and "Control_ANOVA_B-H."
#we copied and pasted the "MasterIndex", "ID", and "Standard Name" columns from our previous worksheet into the first two columns of the new worksheet.
#We used Paste special > Paste values and copied our unadjusted p values from our ANOVA worksheet and pasted it into Column D.
#We selected all of columns A, B, C, and D and sorted by ascending values, smallest to largest.
#We typed "Rank" in cell E1. we typed "1" into cell E2 and "2" into cell E3. Then we selected both cells E2 and E3 and double clicked on the plus sign to fill the column with a series of numbers from 1 to 4697.
#[[User:Hivanson| Hailey Ivanson]] assisted us in calculating the Benjamini and Hochberg p value correction. We typed CHP_B-H_p-value and repeated for the control. We copied that equation to the entire column. =(D2*4697)/E2
#We then typed "CHP_B-H_p-value" into cell G1.
#In cell G2: we used the equation =IF(F2>1,1,F2) and copied that equation to the entire column.
#We selected columns A through G and sorted them in ascending order.
#We copied column G and used Paste special > Paste values to paste it into the next column of our ANOVA sheet.
#We zipped and uploaded the .xlsx file.
#We performed a sanity check by selecting row 1: Data > Filter > Autofilter- p value less than 0.05

Sanity Check Results:
*CONTROL
How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 4697)?
3699; 79%
<.01
3219; 68%
<.001
2558; 54%
<.0001
1921; 41%
<.00001
1325; 28%

*CHP
How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 4697)?
2863; 61%
<.01
2403; 51%
<.001
1884; 40%
<.0001
1435; 31%
<.00001
1076; 22%

Our suitable p-value cut-off was chosen to be <.00001, as it was the closet value to 25%

===Milestone 4===
#We inserted a new worksheet into our Excel workbook and named it "CHP_stem".
#We selected all of the data from the "CHP_ANOVA" worksheet and Paste special > paste values into the "CHP_stem" worksheet.
#"Master_Index" was renamed to "SPOT". Column B named "ID" was renamed to "Gene Symbol". We deleted the column named "Standard_Name".
#We filtered the data on the B-H corrected p value to be > 0.0001.
#We selected all of the rows except for the header row and deleted the rows by right-clicking and choosing "Delete Row" from the context menu. Then we undid this filter.
#We deleted all of the data columns except for the Average Log Fold change columns for each timepoint.
#we renamed the data columns with just the time and units.
#We clicked "Replace all" to remove the #DIV/0! errors.
#We saved this spreadsheet as Text (Tab-delimited) (*.txt).
#We downloaded the stem.zip file and selected "Extract all" from the menu, creating a folder called stem.
#We clicked on the "Profile GO Table" to see the list of Gene Ontology. We saved these files to our desktop.
#We clicked on the "Profile Gene Table" button to see the list of genes belonging to each profile, and then downloaded these files to BOX.

We made adjustments to the GO term analysis by using the GO enrichment tool at GeneOntology.org by going to [http://geneontology.org/ http://geneontology.org/] and then analyzing our clusters 41 and 7. We copied over each cluster's gene list from their txt. file converted to an Excel file into the "GO Enrichment Analysis" box, chose "Saccharomyces cerevisiae" from the drop-down menu, and then clicked "Launch". We then exported each table and saved as txt. files in BOX, linked here: [https://lmu.app.box.com/file/1512453796651], [https://lmu.app.box.com/file/1512450115120]

4/24/24
Milestone 4
P value >0.00001
41 and 7 significant profiles

Profile 41 grnsight:
Rpn4
Gcn4
Pdr1
Xbp1
Met28
Mga2
Spt23
Bas1
Yap1
Sok2
Msn2
Crz1
Rlm1
Fhl1
Pdr3
Cbf1
Rph1
Met31
Stp1
Msn4
Tec1
Rgm1
Stp2

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone 3 and 4 protocol, linked here: [[Data Analysis]] The procedure for Milestone 3 was also adapted from the steps outlined in the [[Week 9]] assignment page. The procedure for Milestone 4 was also adapted from the steps outlined in the [[Week 10]] assignment page. Our quality assurance, [[User:Hivanson| Hailey Ivanson]] was a key part in completing this milestone, and her help was very valuable.
Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.

[[User:Ckapla12|Ckapla12]] ([[User talk:Ckapla12|talk]]) 14:28, 23 April 2024 (PDT)

[[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 19:15, 2 May 2024 (PDT)

==References==
LMU BioDB 2024. (2024). Week 14. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_14

LMU BioDB 2024. (2024). Week 9. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_9

LMU BioDB 2024. (2024). Week 10. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_10

LMU BioDB 2024. (2024). Data Analysis. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Data_Analysis

{{Yeast Beasts}}
[[Category:Journal Entry]]
[[Category:Team Project]]

Data Analysts Week 14

2024-05-03T18:36:26Z

Kmill104: /* Continuing Milestone 3 */ adding cut off value

Data Analysts Week 14

2024-05-03T18:34:32Z

Kmill104: /* Continuing Milestone 3 */ adding details

Data Analysts Week 15

2024-05-03T18:28:24Z

Kmill104: /* Acknowledgements */ updating format

Data Analysts Week 15

2024-05-03T18:27:14Z

Kmill104: /* Milestone 5 */ updating procedure

===Milestone 5===
#We examined the YEASTRACT output for profile 41 in Excel, and selected the first 23 transcription factors for analysis. This list was sent to the Coder/Designers, who found 22 of these factors in the database.
#[[User:Msymond1| Dean Symonds]] assisted us in creating an input workbook for GRNmap using queries to the Microsoft Access database created by the Coders/Designers. The query designs are linked here: [https://lmu.app.box.com/folder/261861431194?s=y6x97kqcjlxfezhnbpvck4h4oe4pm366]

The queries were created to fill our workbook sheets as follows:
*production_rates
*degradation_rates
*optimization_parameters
*threshold_b

The input data is linked here: [https://lmu.app.box.com/file/1518231373852]

*This data was sent to Dr. Dahlquist, who ran GRNmap and uploaded the GRNmap output to BOX.

After obtaining out GRNmap results, we got together as a group to work on our presentation, paper, and uploading the remaining deliverables to our page.

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone 3 and 4 protocol, linked here: [[Data Analysis]] The procedure for Milestone 3 was also adapted from the steps outlined in the [[Week 9]] assignment page. The procedure for Milestone 4 was also adapted from the steps outlined in the [[Week 10]] assignment page. For Milestone 5, we followed the procedure listed on [[Data Analysis]]. Our quality assurance, [[User:Hivanson| Hailey Ivanson]] was a key part in completing these milestones, and her help was very valuable. [[User:Msymond1| Dean Symonds]] assisted us in creating an input workbook for GRNmap using queries to the Microsoft Access database created by the Coders/Designers.

==References==
LMU BioDB 2024. (2024). Week 14. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_14

LMU BioDB 2024. (2024). Week 15. Retrieved May 2, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_15

LMU BioDB 2024. (2024). Week 9. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_9

LMU BioDB 2024. (2024). Week 10. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_10

LMU BioDB 2024. (2024). Data Analysis. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Data_Analysis

{{Yeast Beasts}}
[[Category:Journal Entry]]
[[Category:Team Project]]

Except for what is noted above, this entry was completed by Katie and Charlotte and not copied from another source. [[User:Ckapla12|Ckapla12]] ([[User talk:Ckapla12|talk]]) 19:10, 2 May 2024 (PDT)

Yeast Beasts

2024-05-03T04:53:20Z

Kmill104: /* Charlotte's Reflection Week 14/15: */ uploading weeks 14/15 reflection

* This page will be the main place from which the Yeast Beasts team project will be managed. Include all of the information/links that you think will be useful for your team to organize your work and communicate with each other and with the instructors. ''Hint: the kinds of things that are on your own User pages and on the course Main page can be used as a guide.''

[[Media:Yeast_Beasts_Presentation.pdf | Final version of presentation]]

[[Yeast Beasts Deliverables]]

==Week Reflections==
===Week 13===
====[[User:Hivanson|Hailey's]] Reflection [Quality Assurance]====
*I worked closely with Charlotte and Katie toward completing Milestones 2 and 3. We completed milestone 2 and are close to completing milestone 3.
*I thought it worked well to split up, with Natalija going with the coder/designers and me going with data analysis, ''but'' I would love to see where they are at on their progress so that we can join up for the upcoming milestone 4. I want to do this on or before next Tuesday, April 30th.
*It did not work to try to do tasks simultaneously with the data analysts. To fix this, we had one person with an open Excel sheet on their computer, another reading and checking off the steps, and another checking that all of the data and equations were being entered properly. This solution worked well for us and we will continue to have just one computer with Excel open on it, but switching roles between the person inputting data and the one checking off steps could be better for the future.

[[User:Hivanson|Hivanson]] ([[User talk:Hivanson|talk]]) 23:35, 17 April 2024 (PDT)

====Andrew's Reflection [Coder/Designer]====
To find my electronic notebook for this week please click on [[Asandle1 Week 13#Electronic Lab Notebook|Andrew Sandler's Week 13 Lab Notebook]]
'''Executive Summary'''

#Classified Significant P-values as 1 (P < 0.01) or '0'
#Found issues with data including missing gene descriptions.
#Initially tried to use Yeastmine to find the missing gene information but it was inefficient.
#Found additional blanks in the dataset and need to speak with Dr. Dahlquist about how to solve this issue.

'''What worked?'''
Everything "worked" but some surprises came up.

'''What didn't work?'''
Not that this didn't work but it provided a challenge, the issues with the #REF boxes, the blank boxes, the random text in some boxes, and the NaN coming up in some spots. I don't know how to deal with this and will need help from Dr. Dahlquist so I am not just guessing at a solution. I also need to figure out how to get a complete Gene ID list and then compare the whole ID list to the missing ID's. I also need that list for the ID's for the Access Database.

'''What will I do next to fix what didn't work?'''
I plan on speaking with Dean and Dr. Dahlquist in class tomorrow to fix these issues and then move onto working on the Access section of this assignment.

[[Category:Journal Entry]]
[[Category:Team Project]]

====Dean's Reflection [Coder/Designer]====
# This week, me and my partner completed milestones 1 and 2, and we are currently working on milestone 3, there are some complications in milestone 3, for a large part of it requires Microsoft Access, and there are also some issues in importing tables to excel. [[MSymond1]]
# Each team member should reflect on the team's progress:
## The things that worked well are cleaning up the data for the network table, which was done in class on Tuesday, the data table looks much more organized and the p values have all been successfully converted
## The other data tables are not pasting into excel as neatly as anticipated, I am also unaware of how to obtain the data from the yeastmine website.
## To fix these issues, I will ask Dr. Dahlquist for further advice in class on Thursday.
[[User:Msymond1|Msymond1]] ([[User talk:Msymond1|talk]]) 13:33, 18 April 2024 (PDT)

====Katie's Reflection====
#This week, Charlotte, Hailey, and I worked on completing Milestones 2 and 3. These milestones consisted of preparing the dataset from SGD for analysis, and then performing an ANOVA analysis like we had done in Week 9. A more detailed summary of the steps we followed is outlined on mine and Charlotte's individual page, linked below.
#* [[Data Analysts Week 13]]
#The data analysts, me and Charlotte, worked together with Hailey on progressing through Milestones 2 and 3 on the Data Analysis page. We contacted each other throughout the week to check in on what each person was doing. We then met in person to work together on performing the ANOVA analysis. This worked well, because when we couldn't meet we were still able to get some work done, and then once we got together we were able to ask any questions that we had. It was slightly difficult to progress through the steps in person because when attempting to work on the dataset at the same time, only one person could be actively making changes. I don't believe it is possible for this issue to be fixed, as we cannot have multiple people working at exactly the same time, because steps need to be followed in a specific order. In the future, we will continue to make sure that we split up the steps so that each person is doing an equal amount of work, and to be communicative about any questions that we have or can answer.

[[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 23:09, 17 April 2024 (PDT)

====Charlotte's Reflection:====
[[User:Kmill104| Katherine Miller]] and I, being the data analysts, worked with Quality Assurance [[User:Hivanson| Hailey Ivanson]] to complete Milestone 2 and Milestone 3 in person on April 17th, 2024. We messaged the Coder/Designers and got an update from them. I wrote out the steps taken on our [[Data Analysts Week 13]] page. It was helpful that we were able to meet in person to collaborate. However, it was hard to make changes to the data since we were working on one computer. We ended up splitting up the work well, but at first everyone trying to make edits at once was hard. Now we know a system that works for us as a group.

[[User:Ckapla12|Ckapla12]] ([[User talk:Ckapla12|talk]]) 14:00, 18 April 2024 (PDT)

===Week 14===
===Week 15===
====Dean's Reflection====
# This week and last week, the entire group and I completed milestones 3-6, as well as the rest of the deliverables for the final project.
# Each team member should reflect on the team's progress:
## The things that worked well are creating the database and getting it all well organized and working out any bugs or issues in the database. Running the queries in the database the way I did them also worked very well and was very quick once the issues in the database were resolved. The creation of the final project report also worked well since we also had presented on the project already.
## The things that did not work well were the collaboration on running the queries in the coders/designers since Andrew first tried doing it in a much more complicated way that required typing all of the syntax in the SQL mode and there was little to no communication between us on how these were done or what needs to be done in the future.
## To fix these issues, I made sure that for the rest of the project we all collaborated and communicated well for the final presentation and project.

# Each person needs to write a short executive summary of that person's progress on the project for the week, with links to the relevant individual journal pages (which will have more detailed information).
# Each team member should reflect on the team's progress:
## What worked?
## What didn't work?
## What will I do next to fix what didn't work?
# Note that you will be directed to add specific information to your team's pages in the individual portion of the assignment for this and future weeks.

====Charlotte's Reflection Week 14/15:====
[[User:Kmill104| Katherine Miller]] and I, being the data analysts, worked with Quality Assurance [[User:Hivanson| Hailey Ivanson]] to complete Milestone 3 and Milestone 4 in person for week 14. [[User:Kmill104| Katherine Miller]] [[User:Hivanson| Hailey Ivanson]] and I to complete Milestone 2 and Milestone 3 in person. We created our ANOVA sheet and calculated the Bonferroni and B&H values, obtaining our needed p-values, and we also performed a sanity check. We then created a CHP_STEM file to filter the correct BH values to p>0.05, getting rid of any div errors. We saved this as a text file and proceeded to work on our GO and Gene tables. We found 7 significant profiles within profile 41. We had issues creating the queries for Miestone 5, so [[User:Msymond1| Dean Symonds]] assisted us in creating the queries needed to run GRNmap. What worked since we had experience from previous assignments, creating our CHP_stem file, filtering p-values, and choosing our transcription factors did not raise too many questions. What we struggled with was creating the queries needed for Dr. Dahlquist to run the GRNmap. Without Dean's help, we could not have done this on our own. To fix what we had issues with, it would have been easier if all six of us could meet at the same time, although it is understandable due to varying schedules.

[[User:Ckapla12|Ckapla12]] ([[User talk:Ckapla12|talk]]) 19:21, 2 May 2024 (PDT)

====Katie's Reflection Weeks 14/15====
Due to the wiki being down during Week 14, we had to combine our reflections for Week 14/15 into one. During Week 14, Charlotte, Hailey, and I worked on completing milestone 3 on the Data Analysts page, which contained instructions on how to complete the ANOVA analysis on our new Log2 transformed data. After determining our suitable cut-off p-value, we filtered our data to fit that value. We then worked on completing Milestone 4, which involved clustering with stem and then putting that data into YEASTRACT. More detailed instructions can be seen on our Data Analysts Week 14 journal entry. We had a better experience of working together this week. With the ANOVA data, we were able to use Microsoft Excel online and work at the same time to compute our ANOVA values. We still had some trouble with divvying up the steps for obtaining our stem and YEASTRACT data, as only one person could use their computer to run the data through these programs. During Week 15, we worked with Dean to complete Milestone 5, who helped design queries to obtain our GRNmap input data. Again, a more detailed description of what was done can be seen on our Data Analysts Week 15 and the Coder's individual journal pages. After the input data was sent to Dr. Dahlquist, she was able to run GRNmap and upload the output data. Once this was done, we analyzed our results and worked as a group to complete our presentation, the group report, and uploaded all of the deliverables that were outlined by Dr. Dahlquist. We had some trouble coordinating a time that would work for everybody to meet, but we were able to work both in person and virtually to complete everything for the project. If I were to do this project again, I would try to have organized our time just a little bit better, as we were completing this during finals week and had limited opportunities.

===Andrew's Section===
1. What worked?

Everything, “worked” but we noticed we could save time by adding the gene name and systematic name both on the gene table.

2. What didn't work?

The class website did not work.

 3. What will I do next to fix what didn't work?

I don't have a fix but it meant that I took notes and did not have them save, so I lost some of my detailed notes on the methods for the past week. I am trying to determine what I would have saved them as.

Yeast Beasts Deliverables

2024-05-03T02:35:40Z

Kmill104: fixing typo

# Organized Team deliverables wiki page with table of contents
# Group Report (''.docx'' or ''.pdf'' file)
# Individual statements of work, assessments, reflections (wiki page, ''.docx'', ''.pdf'', or e-mailed to Dr. Dahlquist)
# Group PowerPoint presentation (given on Thursday, May 2, ''.pptx'' or ''.pdf'' file) [[Media:Yeast_Beasts_Presentation.pdf|Presentation]]
# Sample-data relationship table in Excel (''.xlsx'') [[Media:Sample_Data_Table.xlsx | Sample-data Relationship Table]]
# Excel spreadsheet with ANOVA results/stem formatting (''.xlsx'') [[Media:ANOVA_STEM.xlsx | ANOVA and STEM Spreadsheet]]
# PowerPoint of ANOVA table, screenshots of stem results (''.pptx''), screenshot of black and white GRNsight input network and colored GRNmap/GRNsight output networks [[Media:ANOVAslides.pdf | PPT of ANOVA, STEM, and Networks]]
# Gene List and GO List files from each significant profile (''.txt'' compressed together in a ''.zip'' file) [https://lmu.app.box.com/file/1512445828707 GO List Zipped Files] and [https://lmu.app.box.com/file/1512445314285 Gene List Zipped Files]
# YEASTRACT "rank by TF" results (''.xlsx'') [https://lmu.app.box.com/file/1513580824058 YEASTRACT Results]
# GRNmap input workbook (''.xlsx'') [[media:GRNmap_930PM.xlsx|GRNmap Input Workbook]]
# GRNmap output workbook (''.xlsx'') and output plots (''.jpg'') zipped together [https://lmu.box.com/s/rathfd4jhdqdjzuci4kdagmptq1jbxeb link to Box with GRNmap Output]
# MS Access database, including all tables (''.accdb'') [https://lmu.box.com/s/2mcetil8n7vxzwe100avt5yhmc7psuh5 Database]
# ReadMe for the database that describes the design of the database, references the sources of the data, and has a [https://www.quackit.com/microsoft_access/microsoft_access_2016/howto/how_to_create_a_database_diagram_in_access_2016.cfm database schema diagram] (''.docx'', ''.pdf'') [https://lmu.box.com/s/0j6y9p1mrb3dji8wuvn8zhfwrb6lpo97 Database Diagram]
# Query design for populating a GRNmap input workbook from the database (screenshot of MS Access; or SQL code, ''.txt'') [https://lmu.box.com/s/y6x97kqcjlxfezhnbpvck4h4oe4pm366 Query Designs]
# Electronic notebook corresponding to these the microarray results files ([[Week 13]], [[Week 14]], and [[Week 15]]) to support ''reproducible research'' so that all manipulations of the data and files are documented so that someone else could begin with your starting file, follow the protocol, and obtain your results.

{{Yeast_Beasts}}
[[Template:Yeast_Beasts]]

Yeast Beasts Deliverables

2024-05-03T02:35:10Z

Kmill104: fixing capitals

# Organized Team deliverables wiki page with table of contents
# Group Report (''.docx'' or ''.pdf'' file)
# Individual statements of work, assessments, reflections (wiki page, ''.docx'', ''.pdf'', or e-mailed to Dr. Dahlquist)
# Group PowerPoint presentation (given on Thursday, May 2, ''.pptx'' or ''.pdf'' file) [[Media:Yeast_Beasts_Presentation.pdf|Presentation]]
# Sample-data relationship table in Excel (''.xlsx'') [[Media:Sample_Data_Table.xlsx | Sample-data Relationship Table]]
# Excel spreadsheet with ANOVA results/stem formatting (''.xlsx'') [[Media:ANOVA_STEM.xlsx | ANOVA and STEM Spreadsheet]]
# PowerPoint of ANOVA table, screenshots of stem results (''.pptx''), screenshot of black and white GRNsight input network and colored GRNmap/GRNsight output networks [[Media:ANOVAslides.pdf | PPT of ANOVA, STEM, and Networks]]
# Gene List and GO List files from each significant profile (''.txt'' compressed together in a ''.zip'' file) [https://lmu.app.box.com/file/1512445828707 GO List Zipped Files] and [https://lmu.app.box.com/file/1512445314285 Gene List Zipped Files]
# YEASTRACT "rank by TF" results (''.xlsx'') [https://lmu.app.box.com/file/1513580824058 YEASTRACT Results]
# GRNmap input workbook (''.xlsx'') [[media:GRNmap_930PM.xlsx|GRNmap Input Workbook]]
# GRNmap output workbook (''.xlsx'') and output plots (''.jpg'') zipped together [https://lmu.box.com/s/rathfd4jhdqdjzuci4kdagmptq1jbxeb link to box with GRNmap output]
# MS Access database, including all tables (''.accdb'') [https://lmu.box.com/s/2mcetil8n7vxzwe100avt5yhmc7psuh5 Database]
# ReadMe for the database that describes the design of the database, references the sources of the data, and has a [https://www.quackit.com/microsoft_access/microsoft_access_2016/howto/how_to_create_a_database_diagram_in_access_2016.cfm database schema diagram] (''.docx'', ''.pdf'') [https://lmu.box.com/s/0j6y9p1mrb3dji8wuvn8zhfwrb6lpo97 Database Diagram]
# Query design for populating a GRNmap input workbook from the database (screenshot of MS Access; or SQL code, ''.txt'') [https://lmu.box.com/s/y6x97kqcjlxfezhnbpvck4h4oe4pm366 Query Designs]
# Electronic notebook corresponding to these the microarray results files ([[Week 13]], [[Week 14]], and [[Week 15]]) to support ''reproducible research'' so that all manipulations of the data and files are documented so that someone else could begin with your starting file, follow the protocol, and obtain your results.

{{Yeast_Beasts}}
[[Template:Yeast_Beasts]]

Yeast Beasts Deliverables

2024-05-03T02:33:47Z

Kmill104: uploading ppt

# Organized Team deliverables wiki page with table of contents
# Group Report (''.docx'' or ''.pdf'' file)
# Individual statements of work, assessments, reflections (wiki page, ''.docx'', ''.pdf'', or e-mailed to Dr. Dahlquist)
# Group PowerPoint presentation (given on Thursday, May 2, ''.pptx'' or ''.pdf'' file) [[Media:Yeast_Beasts_Presentation.pdf|Presentation]]
# Sample-data relationship table in Excel (''.xlsx'') [[Media:Sample_Data_Table.xlsx | Sample-data Relationship Table]]
# Excel spreadsheet with ANOVA results/stem formatting (''.xlsx'') [[Media:ANOVA_STEM.xlsx | ANOVA and STEM Spreadsheet]]
# PowerPoint of ANOVA table, screenshots of stem results (''.pptx''), screenshot of black and white GRNsight input network and colored GRNmap/GRNsight output networks [[Media:ANOVAslides.pdf | PPT of ANOVA, STEM, and Networks]]
# Gene List and GO List files from each significant profile (''.txt'' compressed together in a ''.zip'' file) [https://lmu.app.box.com/file/1512445828707 GO List Zipped Files] and [https://lmu.app.box.com/file/1512445314285 Gene List Zipped Files]
# YEASTRACT "rank by TF" results (''.xlsx'') [https://lmu.app.box.com/file/1513580824058 YEASTRACT Results]
# GRNmap input workbook (''.xlsx'') [[media:GRNmap_930PM.xlsx|GRNmap input workbook]]
# GRNmap output workbook (''.xlsx'') and output plots (''.jpg'') zipped together [https://lmu.box.com/s/rathfd4jhdqdjzuci4kdagmptq1jbxeb link to box with GRNmap output]
# MS Access database, including all tables (''.accdb'') [https://lmu.box.com/s/2mcetil8n7vxzwe100avt5yhmc7psuh5 Database]
# ReadMe for the database that describes the design of the database, references the sources of the data, and has a [https://www.quackit.com/microsoft_access/microsoft_access_2016/howto/how_to_create_a_database_diagram_in_access_2016.cfm database schema diagram] (''.docx'', ''.pdf'') [https://lmu.box.com/s/0j6y9p1mrb3dji8wuvn8zhfwrb6lpo97 database diagram]
# Query design for populating a GRNmap input workbook from the database (screenshot of MS Access; or SQL code, ''.txt'') [https://lmu.box.com/s/y6x97kqcjlxfezhnbpvck4h4oe4pm366 Query Designs]
# Electronic notebook corresponding to these the microarray results files ([[Week 13]], [[Week 14]], and [[Week 15]]) to support ''reproducible research'' so that all manipulations of the data and files are documented so that someone else could begin with your starting file, follow the protocol, and obtain your results.

{{Yeast_Beasts}}
[[Template:Yeast_Beasts]]

File:ANOVAslides.pdf

2024-05-03T02:32:34Z

Kmill104: ANOVA, STEM, and network results

== Summary ==
ANOVA, STEM, and network results

Data Analysts Week 15

2024-05-03T02:17:01Z

Kmill104: /* Milestone 5 */ adding more detail

===Milestone 5===
#We created a gene regulatory network for GRNmap using using the Access database that the Coders/Designers made.
#[[User:Msymond1| Dean Symonds]] assisted us in creating an input workbook for GRNmap using queries to the Microsoft Access database created by the Coders/Designers.

The queries were created to fill our workbook sheets as follows:
*production_rates
*degradation_rates
*wt_log2_expression (based on the CHP-treated data)
*network

*We sent our data to Dr. Dahlquist and she helped us interpret the data.

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone 3 and 4 protocol, linked here: [[Data Analysis]] The procedure for Milestone 3 was also adapted from the steps outlined in the [[Week 9]] assignment page. The procedure for Milestone 4 was also adapted from the steps outlined in the [[Week 10]] assignment page. For Milestone 5, we followed the procedure listed on [[Data Analysis]]. Our quality assurance, [[User:Hivanson| Hailey Ivanson]] was a key part in completing these milestones, and her help was very valuable. [[User:Msymond1| Dean Symonds]] assisted us in creating an input workbook for GRNmap using queries to the Microsoft Access database created by the Coders/Designers.

==References==
LMU BioDB 2024. (2024). Week 14. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_14

LMU BioDB 2024. (2024). Week 15. Retrieved May 2, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_15

LMU BioDB 2024. (2024). Week 9. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_9

LMU BioDB 2024. (2024). Week 10. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_10

LMU BioDB 2024. (2024). Data Analysis. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Data_Analysis

{{Yeast Beasts}}
[[Category:Journal Entry]]
[[Category:Team Project]]

Except for what is noted above, this entry was completed by Katie and Charlotte and not copied from another source. [[User:Ckapla12|Ckapla12]] ([[User talk:Ckapla12|talk]]) 19:10, 2 May 2024 (PDT)

Data Analysts Week 14

2024-05-03T02:15:14Z

Kmill104: /* Acknowledgements */ adding signature

Yeast Beasts Deliverables

2024-05-03T02:08:07Z

Kmill104: uploading sample data relationship table

# Organized Team deliverables wiki page with table of contents
# Group Report (''.docx'' or ''.pdf'' file)
# Individual statements of work, assessments, reflections (wiki page, ''.docx'', ''.pdf'', or e-mailed to Dr. Dahlquist)
# Group PowerPoint presentation (given on Thursday, May 2, ''.pptx'' or ''.pdf'' file) [[Media:Yeast_Beasts_Presentation.pdf|Presentation]]
# Sample-data relationship table in Excel (''.xlsx'') [[Media:Sample_Data_Table.xlsx | Sample-data Relationship Table]]
# Excel spreadsheet with ANOVA results/stem formatting (''.xlsx'') [[Media:ANOVA_STEM.xlsx | ANOVA and STEM Spreadsheet]]
# PowerPoint of ANOVA table, screenshots of stem results (''.pptx''), screenshot of black and white GRNsight input network and colored GRNmap/GRNsight output networks
# Gene List and GO List files from each significant profile (''.txt'' compressed together in a ''.zip'' file) [https://lmu.app.box.com/file/1512445828707 GO List Zipped Files] and [https://lmu.app.box.com/file/1512445314285 Gene List Zipped Files]
# YEASTRACT "rank by TF" results (''.xlsx'') [https://lmu.app.box.com/file/1513580824058 YEASTRACT Results]
# GRNmap input workbook (''.xlsx'') [[media:GRNmap_930PM.xlsx|GRNmap input workbook]]
# GRNmap output workbook (''.xlsx'') and output plots (''.jpg'') zipped together [https://lmu.box.com/s/rathfd4jhdqdjzuci4kdagmptq1jbxeb link to box with GRNmap output]
# MS Access database, including all tables (''.accdb'') [https://lmu.box.com/s/2mcetil8n7vxzwe100avt5yhmc7psuh5 Database]
# ReadMe for the database that describes the design of the database, references the sources of the data, and has a [https://www.quackit.com/microsoft_access/microsoft_access_2016/howto/how_to_create_a_database_diagram_in_access_2016.cfm database schema diagram] (''.docx'', ''.pdf'') [https://lmu.box.com/s/0j6y9p1mrb3dji8wuvn8zhfwrb6lpo97 database diagram]
# Query design for populating a GRNmap input workbook from the database (screenshot of MS Access; or SQL code, ''.txt'') [https://lmu.box.com/s/y6x97kqcjlxfezhnbpvck4h4oe4pm366 Query Designs]
# Electronic notebook corresponding to these the microarray results files ([[Week 13]], [[Week 14]], and [[Week 15]]) to support ''reproducible research'' so that all manipulations of the data and files are documented so that someone else could begin with your starting file, follow the protocol, and obtain your results.

{{Yeast_Beasts}}
[[Template:Yeast_Beasts]]

File:Sample Data Table.xlsx

2024-05-03T02:06:49Z

Kmill104: project sample data relationship table

== Summary ==
project sample data relationship table

Yeast Beasts Deliverables

2024-05-03T01:56:23Z

Kmill104: adding GO and gene list zipped files

# Organized Team deliverables wiki page with table of contents
# Group Report (''.docx'' or ''.pdf'' file)
# Individual statements of work, assessments, reflections (wiki page, ''.docx'', ''.pdf'', or e-mailed to Dr. Dahlquist)
# Group PowerPoint presentation (given on Thursday, May 2, ''.pptx'' or ''.pdf'' file) [[Media:Yeast_Beasts_Presentation.pdf|Presentation]]
# Sample-data relationship table in Excel (''.xlsx'')
# Excel spreadsheet with ANOVA results/stem formatting (''.xlsx'') [[Media:ANOVA_STEM.xlsx | ANOVA and STEM Spreadsheet]]
# PowerPoint of ANOVA table, screenshots of stem results (''.pptx''), screenshot of black and white GRNsight input network and colored GRNmap/GRNsight output networks
# Gene List and GO List files from each significant profile (''.txt'' compressed together in a ''.zip'' file) [https://lmu.app.box.com/file/1512445828707 GO List Zipped Files] and [https://lmu.app.box.com/file/1512445314285 Gene List Zipped Files]
# YEASTRACT "rank by TF" results (''.xlsx'') [https://lmu.app.box.com/file/1513580824058 YEASTRACT Results]
# GRNmap input workbook (''.xlsx'') [[media:GRNmap_930PM.xlsx|GRNmap input workbook]]
# GRNmap output workbook (''.xlsx'') and output plots (''.jpg'') zipped together [https://lmu.box.com/s/rathfd4jhdqdjzuci4kdagmptq1jbxeb link to box with GRNmap output]
# MS Access database, including all tables (''.accdb'') [https://lmu.box.com/s/2mcetil8n7vxzwe100avt5yhmc7psuh5 Database]
# ReadMe for the database that describes the design of the database, references the sources of the data, and has a [https://www.quackit.com/microsoft_access/microsoft_access_2016/howto/how_to_create_a_database_diagram_in_access_2016.cfm database schema diagram] (''.docx'', ''.pdf'') [https://lmu.box.com/s/0j6y9p1mrb3dji8wuvn8zhfwrb6lpo97 database diagram]
# Query design for populating a GRNmap input workbook from the database (screenshot of MS Access; or SQL code, ''.txt'') [https://lmu.box.com/s/y6x97kqcjlxfezhnbpvck4h4oe4pm366 Query Designs]
# Electronic notebook corresponding to these the microarray results files ([[Week 13]], [[Week 14]], and [[Week 15]]) to support ''reproducible research'' so that all manipulations of the data and files are documented so that someone else could begin with your starting file, follow the protocol, and obtain your results.

{{Yeast_Beasts}}
[[Template:Yeast_Beasts]]

Yeast Beasts Deliverables

2024-05-03T01:54:37Z

Kmill104: uploading box link to yeastract results

# Organized Team deliverables wiki page with table of contents
# Group Report (''.docx'' or ''.pdf'' file)
# Individual statements of work, assessments, reflections (wiki page, ''.docx'', ''.pdf'', or e-mailed to Dr. Dahlquist)
# Group PowerPoint presentation (given on Thursday, May 2, ''.pptx'' or ''.pdf'' file) [[Media:Yeast_Beasts_Presentation.pdf|Presentation]]
# Sample-data relationship table in Excel (''.xlsx'')
# Excel spreadsheet with ANOVA results/stem formatting (''.xlsx'') [[Media:ANOVA_STEM.xlsx | ANOVA and STEM Spreadsheet]]
# PowerPoint of ANOVA table, screenshots of stem results (''.pptx''), screenshot of black and white GRNsight input network and colored GRNmap/GRNsight output networks
# Gene List and GO List files from each significant profile (''.txt'' compressed together in a ''.zip'' file)
# YEASTRACT "rank by TF" results (''.xlsx'') [https://lmu.app.box.com/file/1513580824058 YEASTRACT Results]
# GRNmap input workbook (''.xlsx'') [[media:GRNmap_930PM.xlsx|GRNmap input workbook]]
# GRNmap output workbook (''.xlsx'') and output plots (''.jpg'') zipped together [https://lmu.box.com/s/rathfd4jhdqdjzuci4kdagmptq1jbxeb link to box with GRNmap output]
# MS Access database, including all tables (''.accdb'') [https://lmu.box.com/s/2mcetil8n7vxzwe100avt5yhmc7psuh5 Database]
# ReadMe for the database that describes the design of the database, references the sources of the data, and has a [https://www.quackit.com/microsoft_access/microsoft_access_2016/howto/how_to_create_a_database_diagram_in_access_2016.cfm database schema diagram] (''.docx'', ''.pdf'') [https://lmu.box.com/s/0j6y9p1mrb3dji8wuvn8zhfwrb6lpo97 database diagram]
# Query design for populating a GRNmap input workbook from the database (screenshot of MS Access; or SQL code, ''.txt'') [https://lmu.box.com/s/y6x97kqcjlxfezhnbpvck4h4oe4pm366 Query Designs]
# Electronic notebook corresponding to these the microarray results files ([[Week 13]], [[Week 14]], and [[Week 15]]) to support ''reproducible research'' so that all manipulations of the data and files are documented so that someone else could begin with your starting file, follow the protocol, and obtain your results.

{{Yeast_Beasts}}
[[Template:Yeast_Beasts]]

Yeast Beasts Deliverables

2024-05-03T01:53:02Z

Kmill104: fixing link

# Organized Team deliverables wiki page with table of contents
# Group Report (''.docx'' or ''.pdf'' file)
# Individual statements of work, assessments, reflections (wiki page, ''.docx'', ''.pdf'', or e-mailed to Dr. Dahlquist)
# Group PowerPoint presentation (given on Thursday, May 2, ''.pptx'' or ''.pdf'' file) [[Media:Yeast_Beasts_Presentation.pdf|Presentation]]
# Sample-data relationship table in Excel (''.xlsx'')
# Excel spreadsheet with ANOVA results/stem formatting (''.xlsx'') [[Media:ANOVA_STEM.xlsx | ANOVA and STEM Spreadsheet]]
# PowerPoint of ANOVA table, screenshots of stem results (''.pptx''), screenshot of black and white GRNsight input network and colored GRNmap/GRNsight output networks
# Gene List and GO List files from each significant profile (''.txt'' compressed together in a ''.zip'' file)
# YEASTRACT "rank by TF" results (''.xlsx'')
# GRNmap input workbook (''.xlsx'') [[media:GRNmap_930PM.xlsx|GRNmap input workbook]]
# GRNmap output workbook (''.xlsx'') and output plots (''.jpg'') zipped together [https://lmu.box.com/s/rathfd4jhdqdjzuci4kdagmptq1jbxeb link to box with GRNmap output]
# MS Access database, including all tables (''.accdb'') [https://lmu.box.com/s/2mcetil8n7vxzwe100avt5yhmc7psuh5 Database]
# ReadMe for the database that describes the design of the database, references the sources of the data, and has a [https://www.quackit.com/microsoft_access/microsoft_access_2016/howto/how_to_create_a_database_diagram_in_access_2016.cfm database schema diagram] (''.docx'', ''.pdf'') [https://lmu.box.com/s/0j6y9p1mrb3dji8wuvn8zhfwrb6lpo97 database diagram]
# Query design for populating a GRNmap input workbook from the database (screenshot of MS Access; or SQL code, ''.txt'') [https://lmu.box.com/s/y6x97kqcjlxfezhnbpvck4h4oe4pm366 Query Designs]
# Electronic notebook corresponding to these the microarray results files ([[Week 13]], [[Week 14]], and [[Week 15]]) to support ''reproducible research'' so that all manipulations of the data and files are documented so that someone else could begin with your starting file, follow the protocol, and obtain your results.

{{Yeast_Beasts}}
[[Template:Yeast_Beasts]]

Yeast Beasts Deliverables

2024-05-03T01:50:59Z

Kmill104: typo fix

# Organized Team deliverables wiki page with table of contents
# Group Report (''.docx'' or ''.pdf'' file)
# Individual statements of work, assessments, reflections (wiki page, ''.docx'', ''.pdf'', or e-mailed to Dr. Dahlquist)
# Group PowerPoint presentation (given on Thursday, May 2, ''.pptx'' or ''.pdf'' file) [[Media:Yeast_Beasts_Presentation.pdf|Presentation]]
# Sample-data relationship table in Excel (''.xlsx'')
# Excel spreadsheet with ANOVA results/stem formatting (''.xlsx'') [[ANOVA_STEM.xlsx|ANOVA and STEM Spreadsheet]]
# PowerPoint of ANOVA table, screenshots of stem results (''.pptx''), screenshot of black and white GRNsight input network and colored GRNmap/GRNsight output networks
# Gene List and GO List files from each significant profile (''.txt'' compressed together in a ''.zip'' file)
# YEASTRACT "rank by TF" results (''.xlsx'')
# GRNmap input workbook (''.xlsx'') [[media:GRNmap_930PM.xlsx|GRNmap input workbook]]
# GRNmap output workbook (''.xlsx'') and output plots (''.jpg'') zipped together [https://lmu.box.com/s/rathfd4jhdqdjzuci4kdagmptq1jbxeb link to box with GRNmap output]
# MS Access database, including all tables (''.accdb'') [https://lmu.box.com/s/2mcetil8n7vxzwe100avt5yhmc7psuh5 Database]
# ReadMe for the database that describes the design of the database, references the sources of the data, and has a [https://www.quackit.com/microsoft_access/microsoft_access_2016/howto/how_to_create_a_database_diagram_in_access_2016.cfm database schema diagram] (''.docx'', ''.pdf'') [https://lmu.box.com/s/0j6y9p1mrb3dji8wuvn8zhfwrb6lpo97 database diagram]
# Query design for populating a GRNmap input workbook from the database (screenshot of MS Access; or SQL code, ''.txt'') [https://lmu.box.com/s/y6x97kqcjlxfezhnbpvck4h4oe4pm366 Query Designs]
# Electronic notebook corresponding to these the microarray results files ([[Week 13]], [[Week 14]], and [[Week 15]]) to support ''reproducible research'' so that all manipulations of the data and files are documented so that someone else could begin with your starting file, follow the protocol, and obtain your results.

{{Yeast_Beasts}}
[[Template:Yeast_Beasts]]

Yeast Beasts Deliverables

2024-05-03T01:50:37Z

Kmill104: updating file

# Organized Team deliverables wiki page with table of contents
# Group Report (''.docx'' or ''.pdf'' file)
# Individual statements of work, assessments, reflections (wiki page, ''.docx'', ''.pdf'', or e-mailed to Dr. Dahlquist)
# Group PowerPoint presentation (given on Thursday, May 2, ''.pptx'' or ''.pdf'' file) [[Media:Yeast_Beasts_Presentation.pdf|Presentation]]
# Sample-data relationship table in Excel (''.xlsx'')
# Excel spreadsheet with ANOVA results/stem formatting (''.xlsx'') [[ANOVA_STEM.xslx|ANOVA and STEM Spreadsheet]]
# PowerPoint of ANOVA table, screenshots of stem results (''.pptx''), screenshot of black and white GRNsight input network and colored GRNmap/GRNsight output networks
# Gene List and GO List files from each significant profile (''.txt'' compressed together in a ''.zip'' file)
# YEASTRACT "rank by TF" results (''.xlsx'')
# GRNmap input workbook (''.xlsx'') [[media:GRNmap_930PM.xlsx|GRNmap input workbook]]
# GRNmap output workbook (''.xlsx'') and output plots (''.jpg'') zipped together [https://lmu.box.com/s/rathfd4jhdqdjzuci4kdagmptq1jbxeb link to box with GRNmap output]
# MS Access database, including all tables (''.accdb'') [https://lmu.box.com/s/2mcetil8n7vxzwe100avt5yhmc7psuh5 Database]
# ReadMe for the database that describes the design of the database, references the sources of the data, and has a [https://www.quackit.com/microsoft_access/microsoft_access_2016/howto/how_to_create_a_database_diagram_in_access_2016.cfm database schema diagram] (''.docx'', ''.pdf'') [https://lmu.box.com/s/0j6y9p1mrb3dji8wuvn8zhfwrb6lpo97 database diagram]
# Query design for populating a GRNmap input workbook from the database (screenshot of MS Access; or SQL code, ''.txt'') [https://lmu.box.com/s/y6x97kqcjlxfezhnbpvck4h4oe4pm366 Query Designs]
# Electronic notebook corresponding to these the microarray results files ([[Week 13]], [[Week 14]], and [[Week 15]]) to support ''reproducible research'' so that all manipulations of the data and files are documented so that someone else could begin with your starting file, follow the protocol, and obtain your results.

{{Yeast_Beasts}}
[[Template:Yeast_Beasts]]

File:ANOVA STEM.xlsx

2024-05-03T01:49:36Z

Kmill104: project anova and stem spreadsheet

== Summary ==
project anova and stem spreadsheet

Data Analysts Week 15

2024-05-03T00:58:30Z

Kmill104: removing presentation link

Yeast Beasts Deliverables

2024-05-02T22:20:06Z

Kmill104: creating deliverables page

Yeast Beasts

2024-05-02T22:19:32Z

Kmill104: adding deliverables page

Final Project

2024-05-02T18:08:59Z

Kmill104: removing presentation link

{{Final Project Links}}

== Overall Project Goal ==

# To create a Microsoft Access database that contains microarray data and gene regulatory network data from published datasets that will be incorporated into the GRNsight gene expression database.
# To analyze the published microarray data and model candidate gene regulatory networks with GRNmap to gain new insights.
# To perform quality assurance of current GRNsight databases and the one created in this class.

The class will work as a team to achieve these goals. Each member of the team will have a specific role (in a ''guild''), detailed below. Even though team members have a specific role, you are also expected to work with each other as a whole team, in order to keep each component of the project coordinated and consistent.

== Group Deliverables ==

* You will give a final group [[Final Project Deliverables#PowerPoint_Presentation | PowerPoint presentation]] in class during the final exam period on '''Thursday, May 2, 11:00 AM'''.
* Final due date for all other deliverables is no later than '''Friday, May 3, 12:00 noon'''.
* The deliverables should be uploaded and organized onto one group wiki page.
* Detailed specifications, particularly for the [[Final Project Deliverables#Group Report|group report]] and [[Final Project Deliverables#Individual Assessment and Reflection|individual assessment and reflection]], are given on the [[Final Project Deliverables]] page.
* There are group and individual components to the grade for this project. For the group component, each member of the group receives the same grade. The individual component will be assessed based on individual effort and contributions.
* The Group Report is worth 30 points.
* The Group PowerPoint presentation is worth 60 points.
* All other deliverables listed on the [[Final Project Deliverables | Deliverables]] page, together, are worth 60 points.

=== Individual Deliverable ===

The individual deliverable is an [[Final Project Deliverables#Individual Assessment and Reflection|assessment and reflection]] on the process submitted via email to Dr. Dahlquist by the deadline (10 points):
* Statement of work
* Assessment of the work done, including justification for the group and individual components of the grade
* What was learned

== Team Journal Entries ==

* Each team will write a combined journal entry for each week with contributions from all members.
* [[Week 12]] Prepare for journal club presentation, reflections
* [[Week 13]] Creation of team page, executive summaries of progress, reflections
* [[Week 14]] Executive summaries of progress, reflections
* [[Week 15]] Organized deliverables page, prepare for final presentation

The project roles are:
* Team Name: Yeast Beasts
** Project Manager: Hailey
** Quality Assurance: Hailey & Natalija
** Data Analysis: Charlotte & Katie
** Coder/Designer: Andrew & Dean

== Roles (Guilds) ==

As the project moves forward, we will use class time for team meetings/work sessions. Each student has been assigned a primary role in the project by the instructors (see above).

=== [[Project Manager]] (PM) ===

The project manager makes sure that individuals are fulfilling their roles and performing the tasks on time.

=== [[Quality Assurance]] (QA) ===

The QA team member is the link between the Coder/Designers and the Data Analysts. He or she needs to know the details of the microarray dataset being analyzed so that the database being designed and populated by the Coder/Designer is correct and useful for the Data Analysts. The QA will also perform checks on the GRNsight database.

=== [[Data Analysis]] (DA) ===

The role of the Data Analyst will be to apply the data analysis pipeline that you learned by analyzing the Dahlquist Lab microarray dataset to complete the analysis of a different published yeast timecourse microarray dataset. The Data Analysts are the end-users of the project, ultimately determining whether the work of the coder/designer and quality assurance members is useful to them.

=== [[Coder/Designer]] (CD) ===

The Coder/Designer is responsible for creating the Microsoft Access database based on the microarray data from the Data Analysts which will eventually be made available on the GRNsight website. The Coder/Designer is also the resident expert on the technology being used—assorted software, file management, version control, and troubleshooting. He or she coordinates with the QA to make sure that the database is accurate.

== Project Milestones ==

Specific project milestones are found on the individual guild pages.

* [[Project Manager#Milestones|Project Manager]]
* [[Quality Assurance#Milestones|Quality Assurance]]
* [[Data Analysis#Milestones|Data Analysis]]
* [[Coder/Designer#Milestones|Coder]]

{{Final Project Links}}

[[Category:Team Project]]

Final Project Deliverables

2024-05-02T18:07:06Z

Kmill104: /* PowerPoint Presentation */ removing presentation link

{{Final Project Links}}
== Deliverables Checklist ==

# Organized Team deliverables wiki page with table of contents
# Group Report (''.docx'' or ''.pdf'' file)
# Individual statements of work, assessments, reflections (wiki page, ''.docx'', ''.pdf'', or e-mailed to Dr. Dahlquist)
# Group PowerPoint presentation (given on Thursday, May 2, ''.pptx'' or ''.pdf'' file)
# Sample-data relationship table in Excel (''.xlsx'')
# Excel spreadsheet with ANOVA results/stem formatting (''.xlsx'')
# PowerPoint of ANOVA table, screenshots of stem results (''.pptx''), screenshot of black and white GRNsight input network and colored GRNmap/GRNsight output networks
# Gene List and GO List files from each significant profile (''.txt'' compressed together in a ''.zip'' file)
# YEASTRACT "rank by TF" results (''.xlsx'')
# GRNmap input workbook (''.xlsx'')
# GRNmap output workbook (''.xlsx'') and output plots (''.jpg'') zipped together
# MS Access database, including all tables (''.accdb'')
# ReadMe for the database that describes the design of the database, references the sources of the data, and has a [https://www.quackit.com/microsoft_access/microsoft_access_2016/howto/how_to_create_a_database_diagram_in_access_2016.cfm database schema diagram] (''.docx'', ''.pdf'')
# Query design for populating a GRNmap input workbook from the database (screenshot of MS Access; or SQL code, ''.txt'')
# Electronic notebook corresponding to these the microarray results files ([[Week 13]], [[Week 14]], and [[Week 15]]) to support ''reproducible research'' so that all manipulations of the data and files are documented so that someone else could begin with your starting file, follow the protocol, and obtain your results.

=== Grading ===

All members of the group will be assigned the same grade for the group work. However, the instructor reserves the right to make adjustments to an individual's grade based on their contributions to the group work.
* The Group Report is worth 30 points.
* The Group PowerPoint presentation is worth 60 points.
* All other deliverables listed above, together, are worth 60 points.

This is assessed on an individual basis.
* The Individual Assessment and Reflection is worth 10 points.

== Group Report ==

* The report should be written with contributions from all group members.
* Submit as ''.doc'', ''.docx'' or ''.pdf'' file.

=== Style Sheet ===

Use the following guidelines when formatting your report:
* 2.54 cm (1 in) margins on all sides
* Double-spaced
* 12 point Times/Times New Roman font
* Number the pages on the lower-right corner
* Use left justification (“jagged” on the right side)

=== Title Page ===

Include the following information in a standalone title page:
* A descriptive title for your project
** The function of the title is to identify the main result or take-home message of the paper. It should be as specific as possible. It can be a phrase or a sentence. What is the main result of your paper that you want to convey with the title?
* The names of the team members (with middle initials)
* The course number and title of the class
* The date of submission

=== Introduction ===

The introduction gives the background information necessary to understand your report and should be '''2-3''' pages long (double-spaced). The introduction should be in the form of a logical argument that “funnels” from broad to narrow:

<gallery mode="nolines" widths=322px heights=256px>
Funnel.jpg
</gallery>

* States importance of the problem
* States what is known about the problem
** Give an overview of what is known about the regulation of the transcriptional response in yeast from your team's journal article.
** Discuss how you will approach this by re-analyzing the yeast microarray data from your article.
* States what is unknown about the problem
** Which transcription factors belong in the gene regulatory network that controls the response in yeast is still unknown.
* States clues that suggest how to approach the unknown
** Introduce GRNmap and GRNsight as the answer to this problem.
* States the question the paper is trying to address
** In this case you want to discover new information about the microarray data using GRNmap and GRNsight.
* State the approach
** Creating an MS Access database to facilitate making a GRNmap input workbook that can be used by your group and other scientists in the future.

=== Combined Methods/Results/Discussion section ===

This section will summarize the entire workflow and findings for the project with contributions from all team members (Data Analyst, QA, and Coder/Designers). Create a combined flow chart of the tasks/milestones of each of the team members and then briefly describe the flow chart in the text.
* Number each of the figures sequentially and number each of the tables sequentially in order from first mention in the text. You can either embed your figures and tables in the appropriate place in the text or put them all at the end. Do not mix both styles, however.
* Write a descriptive legend for each figure and table that briefly states what the figure/table is and gives a brief key to any labels and abbreviations.

==== Data Analyst ====

* Table of ANOVA results, discussing the interpretation of the p values.
** With an ANOVA p value < 0.05, are there more than 5% of the genes with a significant change in gene expression at any timepoint?
** Compare with what the authors of the paper considered a meaningful gene expression change.
* From the STEM analyis, include as figures the overall results (the screenshot showing all of the clusters) and then focus on the ones you interpreted.
** Were there clusters shown in your journal article? How do they compare with the ones you got?
** Include a table showing the GO results for that cluster (just the narrowed down list of terms that you have interpreted).
** Discuss what the p values for the cluster and for the GO term list mean.
** Discuss the biological interpretation of your GO terms.
** Were there GO terms in your article? Compare your terms with the article.
* Include a table that lists the transcription factors in your final network and their enrichment p value from YEASTRACT.
** Describe how and why you and your partner chose these transcription factors for your network.
** Include a figure (screenshot) of the unweighted networks visualized with GRNsight.
** What transcription factors were mentioned in your article? Are yours the same or different?
* GRNmap results
** Show the GRNsight visualization (screenshot) of the weighted networks, making sure that the genes are placed in the same relative location as each other an as the unweighted network figure.
** Provide the LSE:minLSE ratio
** Provide a table of the weights, P's, and b's
** Organize and show the individual expression plots
** Interpret the results of the model
*** What seem to be the most important transcription factors in the network? How does that compare with the journal article?

==== Coder/Designers ====

* Give a narrative description of the database, including the design of the expression tables and metadata table(s). Include a figure of the schema.
* Provide a database schema.

==== QA ====

* Describe the QA process for the database, noting any issues that were encountered, and stating whether data are complete or not.
** Refer to the sample-data relationship table to discuss how microarray data format was regularized
** Discuss whether all yeast gene IDs were imported into the database and any formatting issues

=== Conclusions ===

* Write a '''1-2''' page conclusion that summarizes the overall project and your findings.
** Overall, what have you learned about the biological phenomenon described in your journal club article?
** How does the new database facilitate the data analysis?
** What future directions would you take if you were to continue this project?
** Relate the results of your project to the papers you presented for journal club.
*** Did you discover anything new that wasn't reported in the journal club paper (Data Analyst/QA)?

=== Acknowledgments ===

Write a short paragraph acknowledging the assistance of anyone who is not a member of your team.

=== References ===

* This section lists all of the references cited in the text of the report (and only those references cited in the paper). Follow the [[Media: BIOL367_Spring2024_GuidelinesforLiteratureCitations.pdf | Guidelines for Literature Citations in a Scientific Paper]] handout for general principles.
* Remember that you need to cite anything for which you are not the original source. Generally, in the introduction, you should aim for a minimum of two in-text citations per paragraph. You may reference the course web site using the appropriate format for a web reference.
* List your references in alphabetical order by first author using the [https://owl.purdue.edu/owl/research_and_citation/apa_style/apa_formatting_and_style_guide/reference_list_basic_rules.html APA style] that you have been using throughout the semester. Note that the proper format must be followed for both articles and websites, just URLs are not sufficient.

== PowerPoint Presentation ==

Each team of students will prepare and give a 20 minute PowerPoint presentation to report the results of their project on Tuesday, December 10 at 2:00-4:00 PM.
* Please follow the [http://brightspace.lmu.edu Presentation Guidelines on Brightspace] for how to format your slides.
* You will need to prepare ~20 slides (assume 1 slide per minute of presentation) and include the following content:
** Title slide that gives the main take-home message as the title of your presentation, the authors, date, and venue (course number and title).
** Outline slide that is a summary of take-home messages of your talk (should mirror your conclusion slide)
** The body of your talk (organized in a logical flow, not necessarily in the order given below).
*** Introduce the importance/significance of the problem and give background on the experiment in your journal article. You can draw from the your journal club presentation. This should follow the logic of the Introduction section of your group paper.
*** Show a combined flow chart of the tasks/milestones completed by your group (see Materials and Methods of group paper).
*** Show the table of ANOVA results.
*** Show the screenshot of the overall clustering results and the cluster you focused on.
*** Show a table of the GO results from that cluster, giving an interpretation.
*** Show the table of regulatory transcription factors in your network and their p values for enrichment.
*** Show the unweighted and weighted networks in GRNsight (with the genes arranged in the same way in each figure)
*** Database schema diagram for the MS Access database
** Conclusion slide that mirrors your outline
** Future directions
** Acknowledgments
** References
* '''''Your PowerPoint slides must be uploaded to the wiki and linked to from your individual journal page and your team page by 10:00am, Thursday, May 2.'''''
** You can update your slides before your presentation, but we will be grading the ones you upload by the deadline.
* Your presentation (both the slides and the oral presentation) will be evaluated by the instructors using the [[Presentation Rubric]].

== Individual Assessment and Reflection ==

Each person on the team will complete an assessment and reflection ''individually''. If you are comfortable with making this assessment publicly available, you may write it up as a wiki page or as a Word or PDF document uploaded to your group deliveables page. If you prefer to communicate your assessment privately, then email this to Dr. Dahlquist.

=== Statement of Work ===

* Describe exactly what you did on the project.
* Provide references or links to artifacts of your work, such as:
** Wiki pages
** Other files or documents
** Code or scripts

=== Assessment of Project ===

* Give an objective assessment of the success of your project workflow and teamwork.
* What worked and what didn't work?
* What would you do differently if you could do it all over again?
* Evaluate your team’s portion of the Final Project and Group Report in the following areas:
*# Content: What is the quality of the work?
*# Organization: Comment on the organization of the project and of your group's wiki pages.
*# Completeness: Did your team achieve all of the project objectives? Why or why not?

=== Reflection on the Process ===

* What did you learn?
** With your head (biological or computer science principles)
** With your heart (personal qualities and teamwork qualities that make things work or not work)?
** With your hands (technical skills)?
* What lesson will you take away from this project that you will still use a year from now?

{{Final Project Links}}

[[Category:Team Project]]

File:Yeast Beasts Presentation.zip

2024-05-02T17:07:35Z

Kmill104: Kmill104 uploaded a new version of File:Yeast Beasts Presentation.zip

Data Analysts Week 15

2024-05-02T17:04:31Z

Kmill104: uploading slides

[[Media:~$Yeast Beasts Presentation (1).pptx.zip]]

Final Project

2024-05-02T17:03:48Z

Kmill104: /* Group Deliverables */ uploading presentation

{{Final Project Links}}

== Overall Project Goal ==

# To create a Microsoft Access database that contains microarray data and gene regulatory network data from published datasets that will be incorporated into the GRNsight gene expression database.
# To analyze the published microarray data and model candidate gene regulatory networks with GRNmap to gain new insights.
# To perform quality assurance of current GRNsight databases and the one created in this class.

The class will work as a team to achieve these goals. Each member of the team will have a specific role (in a ''guild''), detailed below. Even though team members have a specific role, you are also expected to work with each other as a whole team, in order to keep each component of the project coordinated and consistent.

== Group Deliverables ==

* You will give a final group [[Final Project Deliverables#PowerPoint_Presentation | PowerPoint presentation]] in class during the final exam period on '''Thursday, May 2, 11:00 AM'''. [[Media: ~$Yeast Beasts Presentation (1).pptx.zip]]
* Final due date for all other deliverables is no later than '''Friday, May 3, 12:00 noon'''.
* The deliverables should be uploaded and organized onto one group wiki page.
* Detailed specifications, particularly for the [[Final Project Deliverables#Group Report|group report]] and [[Final Project Deliverables#Individual Assessment and Reflection|individual assessment and reflection]], are given on the [[Final Project Deliverables]] page.
* There are group and individual components to the grade for this project. For the group component, each member of the group receives the same grade. The individual component will be assessed based on individual effort and contributions.
* The Group Report is worth 30 points.
* The Group PowerPoint presentation is worth 60 points.
* All other deliverables listed on the [[Final Project Deliverables | Deliverables]] page, together, are worth 60 points.

=== Individual Deliverable ===

The individual deliverable is an [[Final Project Deliverables#Individual Assessment and Reflection|assessment and reflection]] on the process submitted via email to Dr. Dahlquist by the deadline (10 points):
* Statement of work
* Assessment of the work done, including justification for the group and individual components of the grade
* What was learned

== Team Journal Entries ==

* Each team will write a combined journal entry for each week with contributions from all members.
* [[Week 12]] Prepare for journal club presentation, reflections
* [[Week 13]] Creation of team page, executive summaries of progress, reflections
* [[Week 14]] Executive summaries of progress, reflections
* [[Week 15]] Organized deliverables page, prepare for final presentation

The project roles are:
* Team Name: Yeast Beasts
** Project Manager: Hailey
** Quality Assurance: Hailey & Natalija
** Data Analysis: Charlotte & Katie
** Coder/Designer: Andrew & Dean

== Roles (Guilds) ==

As the project moves forward, we will use class time for team meetings/work sessions. Each student has been assigned a primary role in the project by the instructors (see above).

=== [[Project Manager]] (PM) ===

The project manager makes sure that individuals are fulfilling their roles and performing the tasks on time.

=== [[Quality Assurance]] (QA) ===

The QA team member is the link between the Coder/Designers and the Data Analysts. He or she needs to know the details of the microarray dataset being analyzed so that the database being designed and populated by the Coder/Designer is correct and useful for the Data Analysts. The QA will also perform checks on the GRNsight database.

=== [[Data Analysis]] (DA) ===

The role of the Data Analyst will be to apply the data analysis pipeline that you learned by analyzing the Dahlquist Lab microarray dataset to complete the analysis of a different published yeast timecourse microarray dataset. The Data Analysts are the end-users of the project, ultimately determining whether the work of the coder/designer and quality assurance members is useful to them.

=== [[Coder/Designer]] (CD) ===

The Coder/Designer is responsible for creating the Microsoft Access database based on the microarray data from the Data Analysts which will eventually be made available on the GRNsight website. The Coder/Designer is also the resident expert on the technology being used—assorted software, file management, version control, and troubleshooting. He or she coordinates with the QA to make sure that the database is accurate.

== Project Milestones ==

Specific project milestones are found on the individual guild pages.

* [[Project Manager#Milestones|Project Manager]]
* [[Quality Assurance#Milestones|Quality Assurance]]
* [[Data Analysis#Milestones|Data Analysis]]
* [[Coder/Designer#Milestones|Coder]]

{{Final Project Links}}

[[Category:Team Project]]

Final Project Deliverables

2024-05-02T17:03:01Z

Kmill104: /* PowerPoint Presentation */ uploading slides

{{Final Project Links}}
== Deliverables Checklist ==

# Organized Team deliverables wiki page with table of contents
# Group Report (''.docx'' or ''.pdf'' file)
# Individual statements of work, assessments, reflections (wiki page, ''.docx'', ''.pdf'', or e-mailed to Dr. Dahlquist)
# Group PowerPoint presentation (given on Thursday, May 2, ''.pptx'' or ''.pdf'' file)
# Sample-data relationship table in Excel (''.xlsx'')
# Excel spreadsheet with ANOVA results/stem formatting (''.xlsx'')
# PowerPoint of ANOVA table, screenshots of stem results (''.pptx''), screenshot of black and white GRNsight input network and colored GRNmap/GRNsight output networks
# Gene List and GO List files from each significant profile (''.txt'' compressed together in a ''.zip'' file)
# YEASTRACT "rank by TF" results (''.xlsx'')
# GRNmap input workbook (''.xlsx'')
# GRNmap output workbook (''.xlsx'') and output plots (''.jpg'') zipped together
# MS Access database, including all tables (''.accdb'')
# ReadMe for the database that describes the design of the database, references the sources of the data, and has a [https://www.quackit.com/microsoft_access/microsoft_access_2016/howto/how_to_create_a_database_diagram_in_access_2016.cfm database schema diagram] (''.docx'', ''.pdf'')
# Query design for populating a GRNmap input workbook from the database (screenshot of MS Access; or SQL code, ''.txt'')
# Electronic notebook corresponding to these the microarray results files ([[Week 13]], [[Week 14]], and [[Week 15]]) to support ''reproducible research'' so that all manipulations of the data and files are documented so that someone else could begin with your starting file, follow the protocol, and obtain your results.

=== Grading ===

All members of the group will be assigned the same grade for the group work. However, the instructor reserves the right to make adjustments to an individual's grade based on their contributions to the group work.
* The Group Report is worth 30 points.
* The Group PowerPoint presentation is worth 60 points.
* All other deliverables listed above, together, are worth 60 points.

This is assessed on an individual basis.
* The Individual Assessment and Reflection is worth 10 points.

== Group Report ==

* The report should be written with contributions from all group members.
* Submit as ''.doc'', ''.docx'' or ''.pdf'' file.

=== Style Sheet ===

Use the following guidelines when formatting your report:
* 2.54 cm (1 in) margins on all sides
* Double-spaced
* 12 point Times/Times New Roman font
* Number the pages on the lower-right corner
* Use left justification (“jagged” on the right side)

=== Title Page ===

Include the following information in a standalone title page:
* A descriptive title for your project
** The function of the title is to identify the main result or take-home message of the paper. It should be as specific as possible. It can be a phrase or a sentence. What is the main result of your paper that you want to convey with the title?
* The names of the team members (with middle initials)
* The course number and title of the class
* The date of submission

=== Introduction ===

The introduction gives the background information necessary to understand your report and should be '''2-3''' pages long (double-spaced). The introduction should be in the form of a logical argument that “funnels” from broad to narrow:

<gallery mode="nolines" widths=322px heights=256px>
Funnel.jpg
</gallery>

* States importance of the problem
* States what is known about the problem
** Give an overview of what is known about the regulation of the transcriptional response in yeast from your team's journal article.
** Discuss how you will approach this by re-analyzing the yeast microarray data from your article.
* States what is unknown about the problem
** Which transcription factors belong in the gene regulatory network that controls the response in yeast is still unknown.
* States clues that suggest how to approach the unknown
** Introduce GRNmap and GRNsight as the answer to this problem.
* States the question the paper is trying to address
** In this case you want to discover new information about the microarray data using GRNmap and GRNsight.
* State the approach
** Creating an MS Access database to facilitate making a GRNmap input workbook that can be used by your group and other scientists in the future.

=== Combined Methods/Results/Discussion section ===

This section will summarize the entire workflow and findings for the project with contributions from all team members (Data Analyst, QA, and Coder/Designers). Create a combined flow chart of the tasks/milestones of each of the team members and then briefly describe the flow chart in the text.
* Number each of the figures sequentially and number each of the tables sequentially in order from first mention in the text. You can either embed your figures and tables in the appropriate place in the text or put them all at the end. Do not mix both styles, however.
* Write a descriptive legend for each figure and table that briefly states what the figure/table is and gives a brief key to any labels and abbreviations.

==== Data Analyst ====

* Table of ANOVA results, discussing the interpretation of the p values.
** With an ANOVA p value < 0.05, are there more than 5% of the genes with a significant change in gene expression at any timepoint?
** Compare with what the authors of the paper considered a meaningful gene expression change.
* From the STEM analyis, include as figures the overall results (the screenshot showing all of the clusters) and then focus on the ones you interpreted.
** Were there clusters shown in your journal article? How do they compare with the ones you got?
** Include a table showing the GO results for that cluster (just the narrowed down list of terms that you have interpreted).
** Discuss what the p values for the cluster and for the GO term list mean.
** Discuss the biological interpretation of your GO terms.
** Were there GO terms in your article? Compare your terms with the article.
* Include a table that lists the transcription factors in your final network and their enrichment p value from YEASTRACT.
** Describe how and why you and your partner chose these transcription factors for your network.
** Include a figure (screenshot) of the unweighted networks visualized with GRNsight.
** What transcription factors were mentioned in your article? Are yours the same or different?
* GRNmap results
** Show the GRNsight visualization (screenshot) of the weighted networks, making sure that the genes are placed in the same relative location as each other an as the unweighted network figure.
** Provide the LSE:minLSE ratio
** Provide a table of the weights, P's, and b's
** Organize and show the individual expression plots
** Interpret the results of the model
*** What seem to be the most important transcription factors in the network? How does that compare with the journal article?

==== Coder/Designers ====

* Give a narrative description of the database, including the design of the expression tables and metadata table(s). Include a figure of the schema.
* Provide a database schema.

==== QA ====

* Describe the QA process for the database, noting any issues that were encountered, and stating whether data are complete or not.
** Refer to the sample-data relationship table to discuss how microarray data format was regularized
** Discuss whether all yeast gene IDs were imported into the database and any formatting issues

=== Conclusions ===

* Write a '''1-2''' page conclusion that summarizes the overall project and your findings.
** Overall, what have you learned about the biological phenomenon described in your journal club article?
** How does the new database facilitate the data analysis?
** What future directions would you take if you were to continue this project?
** Relate the results of your project to the papers you presented for journal club.
*** Did you discover anything new that wasn't reported in the journal club paper (Data Analyst/QA)?

=== Acknowledgments ===

Write a short paragraph acknowledging the assistance of anyone who is not a member of your team.

=== References ===

* This section lists all of the references cited in the text of the report (and only those references cited in the paper). Follow the [[Media: BIOL367_Spring2024_GuidelinesforLiteratureCitations.pdf | Guidelines for Literature Citations in a Scientific Paper]] handout for general principles.
* Remember that you need to cite anything for which you are not the original source. Generally, in the introduction, you should aim for a minimum of two in-text citations per paragraph. You may reference the course web site using the appropriate format for a web reference.
* List your references in alphabetical order by first author using the [https://owl.purdue.edu/owl/research_and_citation/apa_style/apa_formatting_and_style_guide/reference_list_basic_rules.html APA style] that you have been using throughout the semester. Note that the proper format must be followed for both articles and websites, just URLs are not sufficient.

== PowerPoint Presentation ==

Each team of students will prepare and give a 20 minute PowerPoint presentation to report the results of their project on Tuesday, December 10 at 2:00-4:00 PM.
* Please follow the [http://brightspace.lmu.edu Presentation Guidelines on Brightspace] for how to format your slides.
* You will need to prepare ~20 slides (assume 1 slide per minute of presentation) and include the following content:
** Title slide that gives the main take-home message as the title of your presentation, the authors, date, and venue (course number and title).
** Outline slide that is a summary of take-home messages of your talk (should mirror your conclusion slide)
** The body of your talk (organized in a logical flow, not necessarily in the order given below).
*** Introduce the importance/significance of the problem and give background on the experiment in your journal article. You can draw from the your journal club presentation. This should follow the logic of the Introduction section of your group paper.
*** Show a combined flow chart of the tasks/milestones completed by your group (see Materials and Methods of group paper).
*** Show the table of ANOVA results.
*** Show the screenshot of the overall clustering results and the cluster you focused on.
*** Show a table of the GO results from that cluster, giving an interpretation.
*** Show the table of regulatory transcription factors in your network and their p values for enrichment.
*** Show the unweighted and weighted networks in GRNsight (with the genes arranged in the same way in each figure)
*** Database schema diagram for the MS Access database
** Conclusion slide that mirrors your outline
** Future directions
** Acknowledgments
** References
* '''''Your PowerPoint slides must be uploaded to the wiki and linked to from your individual journal page and your team page by 10:00am, Thursday, May 2.'''''
** You can update your slides before your presentation, but we will be grading the ones you upload by the deadline.
* Your presentation (both the slides and the oral presentation) will be evaluated by the instructors using the [[Presentation Rubric]].

[[Media:~$Yeast Beasts Presentation (1).pptx.zip]]

== Individual Assessment and Reflection ==

Each person on the team will complete an assessment and reflection ''individually''. If you are comfortable with making this assessment publicly available, you may write it up as a wiki page or as a Word or PDF document uploaded to your group deliveables page. If you prefer to communicate your assessment privately, then email this to Dr. Dahlquist.

=== Statement of Work ===

* Describe exactly what you did on the project.
* Provide references or links to artifacts of your work, such as:
** Wiki pages
** Other files or documents
** Code or scripts

=== Assessment of Project ===

* Give an objective assessment of the success of your project workflow and teamwork.
* What worked and what didn't work?
* What would you do differently if you could do it all over again?
* Evaluate your team’s portion of the Final Project and Group Report in the following areas:
*# Content: What is the quality of the work?
*# Organization: Comment on the organization of the project and of your group's wiki pages.
*# Completeness: Did your team achieve all of the project objectives? Why or why not?

=== Reflection on the Process ===

* What did you learn?
** With your head (biological or computer science principles)
** With your heart (personal qualities and teamwork qualities that make things work or not work)?
** With your hands (technical skills)?
* What lesson will you take away from this project that you will still use a year from now?

{{Final Project Links}}

[[Category:Team Project]]

File:~$Yeast Beasts Presentation (1).pptx.zip

2024-05-02T17:01:35Z

Kmill104:

Data Analysts Week 13

2024-04-18T06:14:25Z

Kmill104: /* Charlotte and Katie's Data Analyst Journal */ final fixes

==Charlotte and Katie's Data Analyst Journal==

===Milestone 1===
Completed as of April 11th when we gave our Journal Club Presentation with [[User:Hivanson| Hailey Ivanson]]

===Milestone 2===
#With Quality Assurance team member [[User:Hivanson| Hailey Ivanson]], we downloaded and examined the microarray dataset: [https://sgd-prod-upload.s3.amazonaws.com/S000204389/Sha_2013_PMID_24073228.zip SGD Processed Data].
#We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber or CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. The timepoint for each is either 0, 3 , 6, 12, 20, 40, 70, or 120, and the replicate number either 1, 2, or 3. We organized the data in a worksheet in an Excel workbook so that:
#*ID is the first column header, and within it are all of the SGD systemic names
#*Data columns are to the right, in increasing chronological order, using the column header pattern we created
#*Treatments are grouped together
#*Replicates are grouped together
#*We deleted the "EWEIGHT" row and "GWEIGHT" column.
#*We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation <code>=2^<cell designation></code>, where the cell designation is the first cell of the respective Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber columns, and then applied this command to the remaining cells of the column.
#*We then created a new worksheet labeled "with_averages", in which we copy and special pasted the values of the columns with headers Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber.
#*We then created new columns called Control_FC_0-avg and CHP_FC_0-avg to the right of their respective t0 timepoint trials, and then within them computed the average value of the t0 timepoint trials for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command <code>=AVG(B2:D2)</code> and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command <code>=AVG(F2:H2)</code> and then applied this command to all cells in the column.
#*We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Fold_Change_3-1 or CHP_Fold_Change_3-1. We then calculated the fold change by dividing the first cell of each trial by the average t0 value for the respective treatment (control or CHP-treated), and then applying this throughout the column. To the right of each new column, we also created created columns with a column header either Control_Log2_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Log2_Fold_Change_3-1 or CHP_Log2_Fold_Change_3-1. In each Log2 column, We then log2 transformed the fold changes by using <code>=LOG(cell designation, 2)</code>, where the cell designation is the first cell of the respective Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber columns. We then applied this throughout all of the cells in each Log2 column.

===Milestone 3===
#We created a new worksheet, naming it "CHP_ANOVA".
#We copied all the Control_Log2_Fold_Change_timepoint-replicatenumber and CHP_Log2_Fold_Change_timepoint-replicatenumber columns and special pasted only the values into the new worksheet.
#To the right of each group of either the Control or CHP replicates at one timepoint, we created columns with headers in the form Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint, where timepoint is either 3, 6, 12, 20, 40, 70, or 120.
# In the cell below the Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint headers, we typed =AVG(replicate 1 cell designation:replicate 3 cell designation), where the cell designations are the first cells of the replicate 1 at one timepoint and the replicate 3 at that same time point. This computed the average of replicate values, and we then applied this throughout the remaining cells in the column.
# To the right, we then copy and special pasted the values of each Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint columns. To the right of these, we created columns with the the headers Control_ss_HO and CHP_ss_HO.
#In the first cell below Control_ss_HO, we typed <code>=SUMSQ(B2,C2,D2,J2,K2,L2,R2,S2,T2,Z2,AA2,AB2,AH2,AI2,AJ2,AP2,AQ2,AR2,AX2,AY2,AZ2)</code> and clicked enter, and below CHP_ss_HO, we typed <code>=SUMSQ(F2,G2,H2,N2,O2,P2,V2,W2,X2,AD2,AE2,AF2,AL2,AM2,AN2,AT2,AU2,AV2,BB2,BC2,BD2)</code> and clicked enter.

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone 1, 2, and 3 protocols, linked here: [[Data Analysis]]
The procedure for Milestone 3 was also adapted from the steps outlined in the Week 9 assignment page.

Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.

[[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 22:50, 17 April 2024 (PDT)

==References==
LMU BioDB 2024. (2024). Week 13. Retrieved April 17, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_13

Yeast Beasts

2024-04-18T06:09:53Z

Kmill104: /* Katie's Reflection */ adding talk signature

Data Analysts Week 13

2024-04-18T06:05:15Z

Kmill104: /* Charlotte and Katie's Data Analyst Journal */ typo fix

==Charlotte and Katie's Data Analyst Journal==

===Milestone 1===
Completed as of April 11th when we gave our Journal Club Presentation with [[User:Hivanson| Hailey Ivanson]]

===Milestone 2===
#With Quality Assurance team member [[User:Hivanson| Hailey Ivanson]], we downloaded and examined the microarray dataset: [https://sgd-prod-upload.s3.amazonaws.com/S000204389/Sha_2013_PMID_24073228.zip SGD Processed Data].
#We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber or CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. The timepoint for each is either 0, 3 , 6, 12, 20, 40, 70, or 120, and the replicate number either 1, 2, or 3. We organized the data in a worksheet in an Excel workbook so that:
#*ID is the first column header, and within it are all of the SGD systemic names
#*Data columns are to the right, in increasing chronological order, using the column header pattern we created
#*Treatments are grouped together
#*Replicates are grouped together
#*We deleted the "EWEIGHT" row and "GWEIGHT" column.
#*We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation <code>=2^<cell designation></code>, where the cell designation is the first cell of the respective Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber columns, and then applied this command to the remaining cells of the column.
#*We then created a new worksheet labeled "with_averages", in which we copy and special pasted the values of the columns with headers Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber.
#*We then created new columns called Control_FC_0-avg and CHP_FC_0-avg to the right of their respective t0 timepoint trials, and then within them computed the average value of the t0 timepoint trials for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command <code>=AVG(B2:D2)</code> and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command <code>=AVG(F2:H2)</code> and then applied this command to all cells in the column.
#*We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Fold_Change_3-1 or CHP_Fold_Change_3-1. We then calculated the fold change by dividing the first cell of each trial by the average t0 value for the respective treatment (control or CHP-treated), and then applying this throughout the column. To the right of each new column, we also created created columns with a column header either Control_Log2_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Log2_Fold_Change_3-1 or CHP_Log2_Fold_Change_3-1. We then Log2 transformed the fold changes by using <code>=LOG(cell designation, 2)</code>, where the cell designation is the first cell of the respective Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber columns. We then applied this throughout all of the cells in the Log2 column.

===Milestone 3===
#We created a new worksheet, naming it "CHP_ANOVA".
#We copied all the Control_Log2_Fold_Change_timepoint-replicatenumber and CHP_Log2_Fold_Change_timepoint-replicatenumber columns and special pasted only the values into the new worksheet.
#To the right of each group of either the Control or CHP replicates at one timepoint, we created columns with headers in the form Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint, where timepoint is either 3, 6, 12, 20, 40, 70, or 120.
# In the cell below the Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint headers, we typed =AVG(replicate 1 cell designation:replicate 3 cell designation), where the cell designations are the first cells of the replicate 1 at one timepoint and the replicate 3 at that same time point. This computed the average of replicate values, and we then applied this throughout the remaining cells in the column.
# To the right, we then copy and special pasted the values of each Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint columns. To the right of these, we created columns with the the headers Control_ss_HO and CHP_ss_HO.
#In the first cell below Control_ss_HO, we typed <code>=SUMSQ(B2,C2,D2,J2,K2,L2,R2,S2,T2,Z2,AA2,AB2,AH2,AI2,AJ2,AP2,AQ2,AR2,AX2,AY2,AZ2)</code> and clicked enter, and below CHP_ss_HO, we typed <code>=SUMSQ(F2,G2,H2,N2,O2,P2,V2,W2,X2,AD2,AE2,AF2,AL2,AM2,AN2,AT2,AU2,AV2,BB2,BC2,BD2)</code> and clicked enter.

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone 1, 2, and 3 protocols, linked here: [[Data Analysis]]
The procedure for Milestone 3 was also adapted from the steps outlined in the Week 9 assignment page.

Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.

[[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 22:50, 17 April 2024 (PDT)

==References==
LMU BioDB 2024. (2024). Week 13. Retrieved April 17, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_13

Data Analysts Week 13

2024-04-18T06:03:08Z

Kmill104: /* Acknowledgements */ spacing issue

==Charlotte and Katie's Data Analyst Journal==

===Milestone 1===
Completed as of April 11th when we gave our Journal Club Presentation with [[User:Hivanson| Hailey Ivanson]]

===Milestone 2===
#With Quality Assurance team member [[User:Hivanson| Hailey Ivanson]], we downloaded and examined the microarray dataset: [https://sgd-prod-upload.s3.amazonaws.com/S000204389/Sha_2013_PMID_24073228.zip SGD Processed Data].
#We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber or CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. The timepoint for each is either 0, 3 , 6, 12, 20, 40, 70, or 120, and the replicate number either 1, 2, or 3. We organized the data in a worksheet in an Excel workbook so that:
#*ID is the first column header, and within it are all of the SGD systemic names
#*Data columns are to the right, in increasing chronological order, using the column header pattern we created.
#*Treatments are grouped together
#*Replicates are grouped together
#*We deleted the "EWEIGHT" row and "GWEIGHT" column.
#*We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation <code>=2^<cell designation></code>, where the cell designation is the first cell of the respective Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber columns, and then applied this command to the remaining cells of the column.
#*We then created a new worksheet labeled "with_averages", in which we copy and special pasted the values of the columns with headers Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber.
#*We then created new columns called Control_FC_0-avg and CHP_FC_0-avg to the right of their respective t0 timepoint trials, and then within them computed the average value of the t0 timepoint trials for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command <code>=AVG(B2:D2)</code> and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command <code>=AVG(F2:H2)</code> and then applied this command to all cells in the column.
#*We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Fold_Change_3-1 or CHP_Fold_Change_3-1. We then calculated the fold change by dividing the first cell of each trial by the average t0 value for the respective treatment (control or CHP-treated), and then applying this throughout the column. To the right of each new column, we also created created columns with a column header either Control_Log2_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Log2_Fold_Change_3-1 or CHP_Log2_Fold_Change_3-1. We then Log2 transformed the fold changes by using <code>=LOG(cell designation, 2)</code>, where the cell designation is the first cell of the respective Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber columns. We then applied this throughout all of the cells in the Log2 column.

===Milestone 3===
#We created a new worksheet, naming it "CHP_ANOVA".
#We copied all the Control_Log2_Fold_Change_timepoint-replicatenumber and CHP_Log2_Fold_Change_timepoint-replicatenumber columns and special pasted only the values into the new worksheet.
#To the right of each group of either the Control or CHP replicates at one timepoint, we created columns with headers in the form Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint, where timepoint is either 3, 6, 12, 20, 40, 70, or 120.
# In the cell below the Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint headers, we typed =AVG(replicate 1 cell designation:replicate 3 cell designation), where the cell designations are the first cells of the replicate 1 at one timepoint and the replicate 3 at that same time point. This computed the average of replicate values, and we then applied this throughout the remaining cells in the column.
# To the right, we then copy and special pasted the values of each Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint columns. To the right of these, we created columns with the the headers Control_ss_HO and CHP_ss_HO.
#In the first cell below Control_ss_HO, we typed <code>=SUMSQ(B2,C2,D2,J2,K2,L2,R2,S2,T2,Z2,AA2,AB2,AH2,AI2,AJ2,AP2,AQ2,AR2,AX2,AY2,AZ2)</code> and clicked enter, and below CHP_ss_HO, we typed <code>=SUMSQ(F2,G2,H2,N2,O2,P2,V2,W2,X2,AD2,AE2,AF2,AL2,AM2,AN2,AT2,AU2,AV2,BB2,BC2,BD2)</code> and clicked enter.

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone 1, 2, and 3 protocols, linked here: [[Data Analysis]]
The procedure for Milestone 3 was also adapted from the steps outlined in the Week 9 assignment page.

Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.

[[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 22:50, 17 April 2024 (PDT)

==References==
LMU BioDB 2024. (2024). Week 13. Retrieved April 17, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_13

Data Analysts Week 13

2024-04-18T06:02:58Z

Kmill104: /* Milestone 3 */ another typo

==Charlotte and Katie's Data Analyst Journal==

===Milestone 1===
Completed as of April 11th when we gave our Journal Club Presentation with [[User:Hivanson| Hailey Ivanson]]

===Milestone 2===
#With Quality Assurance team member [[User:Hivanson| Hailey Ivanson]], we downloaded and examined the microarray dataset: [https://sgd-prod-upload.s3.amazonaws.com/S000204389/Sha_2013_PMID_24073228.zip SGD Processed Data].
#We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber or CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. The timepoint for each is either 0, 3 , 6, 12, 20, 40, 70, or 120, and the replicate number either 1, 2, or 3. We organized the data in a worksheet in an Excel workbook so that:
#*ID is the first column header, and within it are all of the SGD systemic names
#*Data columns are to the right, in increasing chronological order, using the column header pattern we created.
#*Treatments are grouped together
#*Replicates are grouped together
#*We deleted the "EWEIGHT" row and "GWEIGHT" column.
#*We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation <code>=2^<cell designation></code>, where the cell designation is the first cell of the respective Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber columns, and then applied this command to the remaining cells of the column.
#*We then created a new worksheet labeled "with_averages", in which we copy and special pasted the values of the columns with headers Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber.
#*We then created new columns called Control_FC_0-avg and CHP_FC_0-avg to the right of their respective t0 timepoint trials, and then within them computed the average value of the t0 timepoint trials for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command <code>=AVG(B2:D2)</code> and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command <code>=AVG(F2:H2)</code> and then applied this command to all cells in the column.
#*We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Fold_Change_3-1 or CHP_Fold_Change_3-1. We then calculated the fold change by dividing the first cell of each trial by the average t0 value for the respective treatment (control or CHP-treated), and then applying this throughout the column. To the right of each new column, we also created created columns with a column header either Control_Log2_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Log2_Fold_Change_3-1 or CHP_Log2_Fold_Change_3-1. We then Log2 transformed the fold changes by using <code>=LOG(cell designation, 2)</code>, where the cell designation is the first cell of the respective Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber columns. We then applied this throughout all of the cells in the Log2 column.

===Milestone 3===
#We created a new worksheet, naming it "CHP_ANOVA".
#We copied all the Control_Log2_Fold_Change_timepoint-replicatenumber and CHP_Log2_Fold_Change_timepoint-replicatenumber columns and special pasted only the values into the new worksheet.
#To the right of each group of either the Control or CHP replicates at one timepoint, we created columns with headers in the form Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint, where timepoint is either 3, 6, 12, 20, 40, 70, or 120.
# In the cell below the Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint headers, we typed =AVG(replicate 1 cell designation:replicate 3 cell designation), where the cell designations are the first cells of the replicate 1 at one timepoint and the replicate 3 at that same time point. This computed the average of replicate values, and we then applied this throughout the remaining cells in the column.
# To the right, we then copy and special pasted the values of each Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint columns. To the right of these, we created columns with the the headers Control_ss_HO and CHP_ss_HO.
#In the first cell below Control_ss_HO, we typed <code>=SUMSQ(B2,C2,D2,J2,K2,L2,R2,S2,T2,Z2,AA2,AB2,AH2,AI2,AJ2,AP2,AQ2,AR2,AX2,AY2,AZ2)</code> and clicked enter, and below CHP_ss_HO, we typed <code>=SUMSQ(F2,G2,H2,N2,O2,P2,V2,W2,X2,AD2,AE2,AF2,AL2,AM2,AN2,AT2,AU2,AV2,BB2,BC2,BD2)</code> and clicked enter.

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone 1, 2, and 3 protocols, linked here: [[Data Analysis]]
The procedure for Milestone 3 was also adapted from the steps outlined in the Week 9 assignment page.
Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.

[[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 22:50, 17 April 2024 (PDT)

==References==
LMU BioDB 2024. (2024). Week 13. Retrieved April 17, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_13

Data Analysts Week 13

2024-04-18T06:02:36Z

Kmill104: /* Milestone 3 */ fixing step detail

==Charlotte and Katie's Data Analyst Journal==

===Milestone 1===
Completed as of April 11th when we gave our Journal Club Presentation with [[User:Hivanson| Hailey Ivanson]]

===Milestone 2===
#With Quality Assurance team member [[User:Hivanson| Hailey Ivanson]], we downloaded and examined the microarray dataset: [https://sgd-prod-upload.s3.amazonaws.com/S000204389/Sha_2013_PMID_24073228.zip SGD Processed Data].
#We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber or CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. The timepoint for each is either 0, 3 , 6, 12, 20, 40, 70, or 120, and the replicate number either 1, 2, or 3. We organized the data in a worksheet in an Excel workbook so that:
#*ID is the first column header, and within it are all of the SGD systemic names
#*Data columns are to the right, in increasing chronological order, using the column header pattern we created.
#*Treatments are grouped together
#*Replicates are grouped together
#*We deleted the "EWEIGHT" row and "GWEIGHT" column.
#*We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation <code>=2^<cell designation></code>, where the cell designation is the first cell of the respective Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber columns, and then applied this command to the remaining cells of the column.
#*We then created a new worksheet labeled "with_averages", in which we copy and special pasted the values of the columns with headers Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber.
#*We then created new columns called Control_FC_0-avg and CHP_FC_0-avg to the right of their respective t0 timepoint trials, and then within them computed the average value of the t0 timepoint trials for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command <code>=AVG(B2:D2)</code> and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command <code>=AVG(F2:H2)</code> and then applied this command to all cells in the column.
#*We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Fold_Change_3-1 or CHP_Fold_Change_3-1. We then calculated the fold change by dividing the first cell of each trial by the average t0 value for the respective treatment (control or CHP-treated), and then applying this throughout the column. To the right of each new column, we also created created columns with a column header either Control_Log2_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Log2_Fold_Change_3-1 or CHP_Log2_Fold_Change_3-1. We then Log2 transformed the fold changes by using <code>=LOG(cell designation, 2)</code>, where the cell designation is the first cell of the respective Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber columns. We then applied this throughout all of the cells in the Log2 column.

===Milestone 3===
#We created a new worksheet, naming it "CHP_ANOVA".
#We copied all the Control_Log2_Fold_Change_timepoint-replicatenumber and CHP_Log2_Fold_Change_timepoint-replicatenumber columns and special pasted only the values into the new worksheet.
#To the right of each group of either the Control or CHP replicates at one timepoint, we created columns with headers in the form Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint, where timepoint is either 3, 6, 12, 20, 40, 70, or 120.
# In the cell below the Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint headers, we typed =AVG(replicate 1 cell designation:replicate 3 cell designation), where the cell designations are the first cells of the replicate 1 at one timepoint and the replicate 3 at that same time point. This computed the average of replicate values, and we then applied this throughout the remaining cells in the column.
# To the right, we then copy and special pasted the values of each Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint columns. To the right of these, we created columns with the the headers Control_ss_HO and CHP_ss_HO.
#In the first cell below Control_ss_HO, we typed <code>=SUMSQ(B2,C2,D2,J2,K2,L2,R2,S2,T2,Z2,AA2,AB2,AH2,AI2,AJ2,AP2,AQ2,AR2,AX2,AY2,AZ2)</code> and clicked enter, and below CHP_ss_H), we typed <code>=SUMSQ(F2,G2,H2,N2,O2,P2,V2,W2,X2,AD2,AE2,AF2,AL2,AM2,AN2,AT2,AU2,AV2,BB2,BC2,BD2)</code> and clicked enter.

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone 1, 2, and 3 protocols, linked here: [[Data Analysis]]
The procedure for Milestone 3 was also adapted from the steps outlined in the Week 9 assignment page.
Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.

[[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 22:50, 17 April 2024 (PDT)

==References==
LMU BioDB 2024. (2024). Week 13. Retrieved April 17, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_13

Data Analysts Week 13

2024-04-18T05:55:07Z

Kmill104: /* Milestone 2 */ fixing another typo

==Charlotte and Katie's Data Analyst Journal==

===Milestone 1===
Completed as of April 11th when we gave our Journal Club Presentation with [[User:Hivanson| Hailey Ivanson]]

===Milestone 2===
#With Quality Assurance team member [[User:Hivanson| Hailey Ivanson]], we downloaded and examined the microarray dataset: [https://sgd-prod-upload.s3.amazonaws.com/S000204389/Sha_2013_PMID_24073228.zip SGD Processed Data].
#We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber or CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. The timepoint for each is either 0, 3 , 6, 12, 20, 40, 70, or 120, and the replicate number either 1, 2, or 3. We organized the data in a worksheet in an Excel workbook so that:
#*ID is the first column header, and within it are all of the SGD systemic names
#*Data columns are to the right, in increasing chronological order, using the column header pattern we created.
#*Treatments are grouped together
#*Replicates are grouped together
#*We deleted the "EWEIGHT" row and "GWEIGHT" column.
#*We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation <code>=2^<cell designation></code>, where the cell designation is the first cell of the respective Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber columns, and then applied this command to the remaining cells of the column.
#*We then created a new worksheet labeled "with_averages", in which we copy and special pasted the values of the columns with headers Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber.
#*We then created new columns called Control_FC_0-avg and CHP_FC_0-avg to the right of their respective t0 timepoint trials, and then within them computed the average value of the t0 timepoint trials for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command <code>=AVG(B2:D2)</code> and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command <code>=AVG(F2:H2)</code> and then applied this command to all cells in the column.
#*We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Fold_Change_3-1 or CHP_Fold_Change_3-1. We then calculated the fold change by dividing the first cell of each trial by the average t0 value for the respective treatment (control or CHP-treated), and then applying this throughout the column. To the right of each new column, we also created created columns with a column header either Control_Log2_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Log2_Fold_Change_3-1 or CHP_Log2_Fold_Change_3-1. We then Log2 transformed the fold changes by using <code>=LOG(cell designation, 2)</code>, where the cell designation is the first cell of the respective Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber columns. We then applied this throughout all of the cells in the Log2 column.

===Milestone 3===
#We created a new worksheet, naming it "CHP_ANOVA".
#We copied all the Control_Log2_Fold_Change_timepoint-replicatenumber and CHP_Log2_Fold_Change_timepoint-replicatenumber columns and special pasted only the values into the new worksheet.
#To the right of each group of either the Control or CHP trials at one timepoint, we created columns with headers in the form Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint, where timepoint is either 3, 6, 12, 20, 40, 70, or 120.
# In the cell below the Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint headers, we typed <code>=AVERAGE()</code>, and then applied this throughout the column.
# Repeat steps (4) through (8) with the t30, t60, t90, and the t120 data.
# To the right, we then copy and special pasted the values of each Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint columns. To the right of these, we created columns with the the headers Control_ss_HO and CHP_ss_HO.
#In the first cell below Avg_Control_Log_FC_timepoint, we typed <code>=SUMSQ(B2,C2,D2,J2,K2,L2,R2,S2,T2,Z2,AA2,AB2,AH2,AI2,AJ2,AP2,AQ2,AR2,AX2,AY2,AZ2)</code>, and below Avg_CHP_Log_FC_timepoint, we typed <code>=SUMSQ(F2,G2,H2,N2,O2,P2,V2,W2,X2,AD2,AE2,AF2,AL2,AM2,AN2,AT2,AU2,AV2,BB2,BC2,BD2)</code>.

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone 1, 2, and 3 protocols, linked here: [[Data Analysis]]
The procedure for Milestone 3 was also adapted from the steps outlined in the Week 9 assignment page.
Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.

[[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 22:50, 17 April 2024 (PDT)

==References==
LMU BioDB 2024. (2024). Week 13. Retrieved April 17, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_13

Data Analysts Week 13

2024-04-18T05:53:29Z

Kmill104: /* Milestone 2 */ fixing typo

==Charlotte and Katie's Data Analyst Journal==

===Milestone 1===
Completed as of April 11th when we gave our Journal Club Presentation with [[User:Hivanson| Hailey Ivanson]]

===Milestone 2===
#With Quality Assurance team member [[User:Hivanson| Hailey Ivanson]], we downloaded and examined the microarray dataset: [https://sgd-prod-upload.s3.amazonaws.com/S000204389/Sha_2013_PMID_24073228.zip SGD Processed Data].
#We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber or CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. The timepoint for each is either 0, 3 , 6, 12, 20, 40, 70, or 120, and the replicate number either 1, 2, or 3. We organized the data in a worksheet in an Excel workbook so that:
#*ID is the first column header, and within it are all of the SGD systemic names
#*Data columns are to the right, in increasing chronological order, using the column header pattern we created.
#*Treatments are grouped together
#*Replicates are grouped together
#*We deleted the "EWEIGHT" row and "GWEIGHT" column.
#*We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation <code>=2^<cell designation></code>, where the cell designation is the first cell of the Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber columns, and then applied this command to the remaining cells of the column.
#*We then created a new worksheet labeled "with_averages", in which we copy and special pasted the values of the columns with headers Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1.
#*We then created new columns called Control_FC_0-avg and CHP_FC_0-avg to the right of their respective t0 timepoint trials, and then within them computed the average value of the t0 timepoint trials for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command <code>=AVG(B2:D2)</code> and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command <code>=AVG(F2:H2)</code> and then applied this command to all cells in the column.
#*We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Fold_Change_3-1 or CHP_Fold_Change_3-1. We then calculated the fold change by dividing the first cell of each trial by the average t0 value for the respective treatment (control or CHP-treated), and then applying this throughout the column. To the right of each new column, we also created created columns with a column header either Control_Log2_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Log2_Fold_Change_3-1 or CHP_Log2_Fold_Change_3-1. We then Log2 transformed the fold changes by using <code>=LOG(cell designation, 2)</code>, where the cell designation is the first cell of the respective Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber columns. We then applied this throughout all of the cells in the Log2 column.

===Milestone 3===
#We created a new worksheet, naming it "CHP_ANOVA".
#We copied all the Control_Log2_Fold_Change_timepoint-replicatenumber and CHP_Log2_Fold_Change_timepoint-replicatenumber columns and special pasted only the values into the new worksheet.
#To the right of each group of either the Control or CHP trials at one timepoint, we created columns with headers in the form Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint, where timepoint is either 3, 6, 12, 20, 40, 70, or 120.
# In the cell below the Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint headers, we typed <code>=AVERAGE()</code>, and then applied this throughout the column.
# Repeat steps (4) through (8) with the t30, t60, t90, and the t120 data.
# To the right, we then copy and special pasted the values of each Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint columns. To the right of these, we created columns with the the headers Control_ss_HO and CHP_ss_HO.
#In the first cell below Avg_Control_Log_FC_timepoint, we typed <code>=SUMSQ(B2,C2,D2,J2,K2,L2,R2,S2,T2,Z2,AA2,AB2,AH2,AI2,AJ2,AP2,AQ2,AR2,AX2,AY2,AZ2)</code>, and below Avg_CHP_Log_FC_timepoint, we typed <code>=SUMSQ(F2,G2,H2,N2,O2,P2,V2,W2,X2,AD2,AE2,AF2,AL2,AM2,AN2,AT2,AU2,AV2,BB2,BC2,BD2)</code>.

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone 1, 2, and 3 protocols, linked here: [[Data Analysis]]
The procedure for Milestone 3 was also adapted from the steps outlined in the Week 9 assignment page.
Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.

[[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 22:50, 17 April 2024 (PDT)

==References==
LMU BioDB 2024. (2024). Week 13. Retrieved April 17, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_13

Data Analysts Week 13

2024-04-18T05:50:58Z

Kmill104: /* Acknowledgements */ adding talk signature

==Charlotte and Katie's Data Analyst Journal==

===Milestone 1===
Completed as of April 11th when we gave our Journal Club Presentation with [[User:Hivanson| Hailey Ivanson]]

===Milestone 2===
#With Quality Assurance team member [[User:Hivanson| Hailey Ivanson]], we downloaded and examined the microarray dataset: [https://sgd-prod-upload.s3.amazonaws.com/S000204389/Sha_2013_PMID_24073228.zip SGD Processed Data].
#We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. The timepoint for each is either 0, 3 , 6, 12, 20, 40, 70, or 120, and the replicate number either 1, 2, or 3. We organized the data in a worksheet in an Excel workbook so that:
#*ID is the first column header, and within it are all of the SGD systemic names
#*Data columns are to the right, in increasing chronological order, using the column header pattern we created.
#*Treatments are grouped together
#*Replicates are grouped together
#*We deleted the "EWEIGHT" row and "GWEIGHT" column.
#*We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation <code>=2^<cell designation></code>, where the cell designation is the first cell of the Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber columns, and then applied this command to the remaining cells of the column.
#*We then created a new worksheet labeled "with_averages", in which we copy and special pasted the values of the columns with headers Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1.
#*We then created new columns called Control_FC_0-avg and CHP_FC_0-avg to the right of their respective t0 timepoint trials, and then within them computed the average value of the t0 timepoint trials for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command <code>=AVG(B2:D2)</code> and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command <code>=AVG(F2:H2)</code> and then applied this command to all cells in the column.
#*We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Fold_Change_3-1 or CHP_Fold_Change_3-1. We then calculated the fold change by dividing the first cell of each trial by the average t0 value for the respective treatment (control or CHP-treated), and then applying this throughout the column. To the right of each new column, we also created created columns with a column header either Control_Log2_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Log2_Fold_Change_3-1 or CHP_Log2_Fold_Change_3-1. We then Log2 transformed the fold changes by using <code>=LOG(cell designation, 2)</code>, where the cell designation is the first cell of the respective Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber columns. We then applied this throughout all of the cells in the Log2 column.

===Milestone 3===
#We created a new worksheet, naming it "CHP_ANOVA".
#We copied all the Control_Log2_Fold_Change_timepoint-replicatenumber and CHP_Log2_Fold_Change_timepoint-replicatenumber columns and special pasted only the values into the new worksheet.
#To the right of each group of either the Control or CHP trials at one timepoint, we created columns with headers in the form Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint, where timepoint is either 3, 6, 12, 20, 40, 70, or 120.
# In the cell below the Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint headers, we typed <code>=AVERAGE()</code>, and then applied this throughout the column.
# Repeat steps (4) through (8) with the t30, t60, t90, and the t120 data.
# To the right, we then copy and special pasted the values of each Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint columns. To the right of these, we created columns with the the headers Control_ss_HO and CHP_ss_HO.
#In the first cell below Avg_Control_Log_FC_timepoint, we typed <code>=SUMSQ(B2,C2,D2,J2,K2,L2,R2,S2,T2,Z2,AA2,AB2,AH2,AI2,AJ2,AP2,AQ2,AR2,AX2,AY2,AZ2)</code>, and below Avg_CHP_Log_FC_timepoint, we typed <code>=SUMSQ(F2,G2,H2,N2,O2,P2,V2,W2,X2,AD2,AE2,AF2,AL2,AM2,AN2,AT2,AU2,AV2,BB2,BC2,BD2)</code>.

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone 1, 2, and 3 protocols, linked here: [[Data Analysis]]
The procedure for Milestone 3 was also adapted from the steps outlined in the Week 9 assignment page.
Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.

[[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 22:50, 17 April 2024 (PDT)

==References==
LMU BioDB 2024. (2024). Week 13. Retrieved April 17, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_13

Data Analysts Week 13

2024-04-18T05:49:05Z

Kmill104: /* Charlotte and Katie's Data Analyst Journal */ adding another detail

==Charlotte and Katie's Data Analyst Journal==

===Milestone 1===
Completed as of April 11th when we gave our Journal Club Presentation with [[User:Hivanson| Hailey Ivanson]]

===Milestone 2===
#With Quality Assurance team member [[User:Hivanson| Hailey Ivanson]], we downloaded and examined the microarray dataset: [https://sgd-prod-upload.s3.amazonaws.com/S000204389/Sha_2013_PMID_24073228.zip SGD Processed Data].
#We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. The timepoint for each is either 0, 3 , 6, 12, 20, 40, 70, or 120, and the replicate number either 1, 2, or 3. We organized the data in a worksheet in an Excel workbook so that:
#*ID is the first column header, and within it are all of the SGD systemic names
#*Data columns are to the right, in increasing chronological order, using the column header pattern we created.
#*Treatments are grouped together
#*Replicates are grouped together
#*We deleted the "EWEIGHT" row and "GWEIGHT" column.
#*We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation <code>=2^<cell designation></code>, where the cell designation is the first cell of the Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber columns, and then applied this command to the remaining cells of the column.
#*We then created a new worksheet labeled "with_averages", in which we copy and special pasted the values of the columns with headers Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1.
#*We then created new columns called Control_FC_0-avg and CHP_FC_0-avg to the right of their respective t0 timepoint trials, and then within them computed the average value of the t0 timepoint trials for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command <code>=AVG(B2:D2)</code> and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command <code>=AVG(F2:H2)</code> and then applied this command to all cells in the column.
#*We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Fold_Change_3-1 or CHP_Fold_Change_3-1. We then calculated the fold change by dividing the first cell of each trial by the average t0 value for the respective treatment (control or CHP-treated), and then applying this throughout the column. To the right of each new column, we also created created columns with a column header either Control_Log2_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Log2_Fold_Change_3-1 or CHP_Log2_Fold_Change_3-1. We then Log2 transformed the fold changes by using <code>=LOG(cell designation, 2)</code>, where the cell designation is the first cell of the respective Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber columns. We then applied this throughout all of the cells in the Log2 column.

===Milestone 3===
#We created a new worksheet, naming it "CHP_ANOVA".
#We copied all the Control_Log2_Fold_Change_timepoint-replicatenumber and CHP_Log2_Fold_Change_timepoint-replicatenumber columns and special pasted only the values into the new worksheet.
#To the right of each group of either the Control or CHP trials at one timepoint, we created columns with headers in the form Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint, where timepoint is either 3, 6, 12, 20, 40, 70, or 120.
# In the cell below the Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint headers, we typed <code>=AVERAGE()</code>, and then applied this throughout the column.
# Repeat steps (4) through (8) with the t30, t60, t90, and the t120 data.
# To the right, we then copy and special pasted the values of each Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint columns. To the right of these, we created columns with the the headers Control_ss_HO and CHP_ss_HO.
#In the first cell below Avg_Control_Log_FC_timepoint, we typed <code>=SUMSQ(B2,C2,D2,J2,K2,L2,R2,S2,T2,Z2,AA2,AB2,AH2,AI2,AJ2,AP2,AQ2,AR2,AX2,AY2,AZ2)</code>, and below Avg_CHP_Log_FC_timepoint, we typed <code>=SUMSQ(F2,G2,H2,N2,O2,P2,V2,W2,X2,AD2,AE2,AF2,AL2,AM2,AN2,AT2,AU2,AV2,BB2,BC2,BD2)</code>.

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone protocols, linked here: [[Data Analysis]]

==References==
LMU BioDB 2024. (2024). Week 13. Retrieved April 17, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_13

Data Analysts Week 13

2024-04-18T05:47:05Z

Kmill104: /* Charlotte and Katie's Data Analyst Journal */ fixing typo

==Charlotte and Katie's Data Analyst Journal==

===Milestone 1===
Completed as of April 11th when we gave our Journal Club Presentation with [[User:Hivanson| Hailey Ivanson]]

===Milestone 2===
#With Quality Assurance team member [[User:Hivanson| Hailey Ivanson]], we downloaded and examined the microarray dataset: [https://sgd-prod-upload.s3.amazonaws.com/S000204389/Sha_2013_PMID_24073228.zip SGD Processed Data].
#We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. The timepoint for each is either 0, 3 , 6, 12, 20, 40, 70, or 120, and the replicate number either 1, 2, or 3. We organized the data in a worksheet in an Excel workbook so that:
#*ID is the first column header, and within it are all of the SGD systemic names
#*Data columns are to the right, in increasing chronological order, using the column header pattern we created.
#*Treatments are grouped together
#*Replicates are grouped together
#*We deleted the "EWEIGHT" row and "GWEIGHT" column.
#*We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation <code>=2^<cell designation></code>, and then applied it to the remaining cells of the column.
#*We then created a new worksheet labeled "with_averages", in which we copy and special pasted the values of the columns with headers Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1.
#*We then created new columns called Control_FC_0-avg and CHP_FC_0-avg to the right of their respective t0 timepoint trials, and then within them computed the average value of the t0 timepoint trials for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command <code>=AVG(B2:D2)</code> and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command <code>=AVG(F2:H2)</code> and then applied this command to all cells in the column.
#*We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Fold_Change_3-1 or CHP_Fold_Change_3-1. We then calculated the fold change by dividing the first cell of each trial by the average t0 value for the respective treatment (control or CHP-treated), and then applying this throughout the column. To the right of each new column, we also created created columns with a column header either Control_Log2_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Log2_Fold_Change_3-1 or CHP_Log2_Fold_Change_3-1. We then Log2 transformed the fold changes by using <code>=LOG(cell designation, 2)</code>, where the cell designation is the first cell of the respective Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber columns. We then applied this throughout all of the cells in the Log2 column.

===Milestone 3===
#We created a new worksheet, naming it "CHP_ANOVA".
#We copied all the Control_Log2_Fold_Change_timepoint-replicatenumber and CHP_Log2_Fold_Change_timepoint-replicatenumber columns and special pasted only the values into the new worksheet.
#To the right of each group of either the Control or CHP trials at one timepoint, we created columns with headers in the form Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint, where timepoint is either 3, 6, 12, 20, 40, 70, or 120.
# In the cell below the Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint headers, we typed <code>=AVERAGE()</code>, and then applied this throughout the column.
# Repeat steps (4) through (8) with the t30, t60, t90, and the t120 data.
# To the right, we then copy and special pasted the values of each Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint columns. To the right of these, we created columns with the the headers Control_ss_HO and CHP_ss_HO.
#In the first cell below Avg_Control_Log_FC_timepoint, we typed <code>=SUMSQ(B2,C2,D2,J2,K2,L2,R2,S2,T2,Z2,AA2,AB2,AH2,AI2,AJ2,AP2,AQ2,AR2,AX2,AY2,AZ2)</code>, and below Avg_CHP_Log_FC_timepoint, we typed <code>=SUMSQ(F2,G2,H2,N2,O2,P2,V2,W2,X2,AD2,AE2,AF2,AL2,AM2,AN2,AT2,AU2,AV2,BB2,BC2,BD2)</code>.

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone protocols, linked here: [[Data Analysis]]

==References==
LMU BioDB 2024. (2024). Week 13. Retrieved April 17, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_13

Data Analysts Week 13

2024-04-18T05:46:14Z

Kmill104: /* Charlotte and Katie's Data Analyst Journal */ updating steps

==Charlotte and Katie's Data Analyst Journal==

===Milestone 1===
Completed as of April 11th when we gave our Journal Club Presentation with [[User:Hivanson| Hailey Ivanson]]

===Milestone 2===
#With Quality Assurance team member [[User:Hivanson| Hailey Ivanson]], we downloaded and examined the microarray dataset: [https://sgd-prod-upload.s3.amazonaws.com/S000204389/Sha_2013_PMID_24073228.zip SGD Processed Data].
#We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. The timepoint for each is either 0, 3 , 6, 12, 20, 40, 70, or 120, and the replicate number either 1, 2, or 3. We organized the data in a worksheet in an Excel workbook so that:
#*ID is the first column header, and within it are all of the SGD systemic names
#*Data columns are to the right, in increasing chronological order, using the column header pattern we created.
#*Treatments are grouped together
#*Replicates are grouped together
#*We deleted the "EWEIGHT" row and "GWEIGHT" column.
#*We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation <code>=2^<cell designation></code>, and then applied it to the remaining cells of the column.
#*We then created a new worksheet labeled "with_average", in which we copy and special pasted the values of the columns with headers Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1.
#*We then created new columns called Control_FC_0-avg and CHP_FC_0-avg to the right of their respective t0 timepoint trials, and then within them computed the average value of the t0 timepoint trials for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command <code>=AVG(B2:D2)</code> and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command <code>=AVG(F2:H2)</code> and then applied this command to all cells in the column.
#*We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Fold_Change_3-1 or CHP_Fold_Change_3-1. We then calculated the fold change by dividing the first cell of each trial by the average t0 value for the respective treatment (control or CHP-treated), and then applying this throughout the column. To the right of each new column, we also created created columns with a column header either Control_Log2_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Log2_Fold_Change_3-1 or CHP_Log2_Fold_Change_3-1. We then Log2 transformed the fold changes by using <code>=LOG(cell designation, 2)</code>, where the cell designation is the first cell of the respective Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber columns. We then applied this throughout all of the cells in the Log2 column.

===Milestone 3===
#We created a new worksheet, naming it "CHP_ANOVA".
#We copied all the Control_Log2_Fold_Change_timepoint-replicatenumber and CHP_Log2_Fold_Change_timepoint-replicatenumber columns and special pasted only the values into the new worksheet.
#To the right of each group of either the Control or CHP trials at one timepoint, we created columns with headers in the form Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint, where timepoint is either 3, 6, 12, 20, 40, 70, or 120.
# In the cell below the Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint headers, we typed <code>=AVERAGE()</code>, and then applied this throughout the column.
# Repeat steps (4) through (8) with the t30, t60, t90, and the t120 data.
# To the right, we then copy and special pasted the values of each Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint columns. To the right of these, we created columns with the the headers Control_ss_HO and CHP_ss_HO.
#In the first cell below Avg_Control_Log_FC_timepoint, we typed <code>=SUMSQ(B2,C2,D2,J2,K2,L2,R2,S2,T2,Z2,AA2,AB2,AH2,AI2,AJ2,AP2,AQ2,AR2,AX2,AY2,AZ2)</code>, and below Avg_CHP_Log_FC_timepoint, we typed <code>=SUMSQ(F2,G2,H2,N2,O2,P2,V2,W2,X2,AD2,AE2,AF2,AL2,AM2,AN2,AT2,AU2,AV2,BB2,BC2,BD2)</code>.

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone protocols, linked here: [[Data Analysis]]

==References==
LMU BioDB 2024. (2024). Week 13. Retrieved April 17, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_13

Data Analysts Week 13

2024-04-18T05:43:47Z

Kmill104: /* Milestone 3 */ fixing details

==Charlotte and Katie's Data Analyst Journal==

===Milestone 1===
Completed as of April 11th when we gave our Journal Club Presentation with [[User:Hivanson| Hailey Ivanson]]

===Milestone 2===
#With Quality Assurance team member [[User:Hivanson| Hailey Ivanson]], we downloaded and examined the microarray dataset: [https://sgd-prod-upload.s3.amazonaws.com/S000204389/Sha_2013_PMID_24073228.zip SGD Processed Data].
#We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. The timepoint for each is either 0, 3 , 6, 12, 20, 40, 70, or 120, and the replicate number either 1, 2, or 3. We organized the data in a worksheet in an Excel workbook so that:
#*ID is the first column header, and within it are all of the SGD systemic names
#*Data columns are to the right, in increasing chronological order, using the column header pattern we created.
#*Treatments are grouped together
#*Replicates are grouped together
#*We deleted the "EWEIGHT" row and "GWEIGHT" column.
#*We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation <code>=2^<cell designation></code>, and then applied it to the remaining cells. The specific commands we used are shown below.
#**Below Control_FC_0-1, we typed =2^B2 and applied it throughout the column.
#**Below Control_FC_0-2, we typed =2^C2 and applied it throughout the column.
#**Below Control_FC_0-3, we typed =2^D2 and applied it throughout the column.
#**Below CHP_FC_0-1, we typed =2^H2 and applied it throughout the column.
#**Below CHP_FC_0-2, we typed =2^I2 and applied it throughout the column.
#**Below CHP_FC_0-3, we typed =2^J2 and applied it throughout the column.
#**Below Control_FC_3-1, we typed =2^N2 and applied it throughout the column.
#**Below Control_FC_3-2, we typed =2^O2 and applied it throughout the column.
#**Below Control_FC_3-3, we typed =2^P2 and applied it throughout the column.
#**Below CHP_FC_3-1, we typed =2^T2 and applied it throughout the column.
#**Below CHP_FC_3-2, we typed =2^U2 and applied it throughout the column.
#**Below CHP_FC_3-3, we typed =2^V2 and applied it throughout the column.
#**Below Control_FC_6-1, we typed =2^Z2 and applied it throughout the column.
#**Below Control_FC_6-2, we typed =2^AA2 and applied it throughout the column.
#**Below Control_FC_6-3, we typed =2^AB2 and applied it throughout the column.
#**Below CHP_FC_6-1, we typed =2^AF2 and applied it throughout the column.
#**Below CHP_FC_6-2, we typed =2^AG2 and applied it throughout the column.
#**Below CHP_FC_6-3, we typed =2^AH2 and applied it throughout the column.

#*We then created a new worksheet labeled "with_average", in which we copy and special pasted the values of the columns with headers Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1.
#*We then created new columns called Control_FC_0-avg and CHP_FC_0-avg to the right of their respective t0 timepoint trials, and then within them computed the average value of the t0 timepoint trials for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command <code>=AVG(B2:D2)</code> and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command <code>=AVG(F2:H2)</code> and then applied this command to all cells in the column.
#*We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Fold_Change_3-1 or CHP_Fold_Change_3-1. We then calculated the fold change by dividing the first cell of each trial by the average t0 value for the respective treatment (control or CHP-treated), and then applying this throughout the column. To the right of each new column, we also created created columns with a column header either Control_Log2_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Log2_Fold_Change_3-1 or CHP_Log2_Fold_Change_3-1. We then Log2 transformed the fold changes by using <code>=LOG(cell designation, 2)</code>, where the cell designation is the first cell of the respective Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber columns. We then applied this throughout all of the cells in the Log2 column.

===Milestone 3===
#We created a new worksheet, naming it "CHP_ANOVA".
#We copied all the Control_Log2_Fold_Change_timepoint-replicatenumber and CHP_Log2_Fold_Change_timepoint-replicatenumber columns and special pasted only the values into the new worksheet.
#To the right of each group of either the Control or CHP trials at one timepoint, we created columns with headers in the form Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint, where timepoint is either 3, 6, 12, 20, 40, 70, or 120.
# In the cell below the Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint headers, we typed <code>=AVERAGE()</code>, and then applied this throughout the column.
# Repeat steps (4) through (8) with the t30, t60, t90, and the t120 data.
# To the right, we then copy and special pasted the values of each Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint columns. To the right of these, we created columns with the the headers Control_ss_HO and CHP_ss_HO.
#In the first cell below Avg_Control_Log_FC_timepoint, we typed <code>=SUMSQ(B2,C2,D2,J2,K2,L2,R2,S2,T2,Z2,AA2,AB2,AH2,AI2,AJ2,AP2,AQ2,AR2,AX2,AY2,AZ2)</code>, and below Avg_CHP_Log_FC_timepoint, we typed <code>=SUMSQ(F2,G2,H2,N2,O2,P2,V2,W2,X2,AD2,AE2,AF2,AL2,AM2,AN2,AT2,AU2,AV2,BB2,BC2,BD2)</code>.

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone protocols, linked here: [[Data Analysis]]

==References==
LMU BioDB 2024. (2024). Week 13. Retrieved April 17, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_13

Data Analysts Week 13

2024-04-18T05:42:46Z

Kmill104: /* Milestone 3 */ adding anova steps

==Charlotte and Katie's Data Analyst Journal==

===Milestone 1===
Completed as of April 11th when we gave our Journal Club Presentation with [[User:Hivanson| Hailey Ivanson]]

===Milestone 2===
#With Quality Assurance team member [[User:Hivanson| Hailey Ivanson]], we downloaded and examined the microarray dataset: [https://sgd-prod-upload.s3.amazonaws.com/S000204389/Sha_2013_PMID_24073228.zip SGD Processed Data].
#We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. The timepoint for each is either 0, 3 , 6, 12, 20, 40, 70, or 120, and the replicate number either 1, 2, or 3. We organized the data in a worksheet in an Excel workbook so that:
#*ID is the first column header, and within it are all of the SGD systemic names
#*Data columns are to the right, in increasing chronological order, using the column header pattern we created.
#*Treatments are grouped together
#*Replicates are grouped together
#*We deleted the "EWEIGHT" row and "GWEIGHT" column.
#*We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation <code>=2^<cell designation></code>, and then applied it to the remaining cells. The specific commands we used are shown below.
#**Below Control_FC_0-1, we typed =2^B2 and applied it throughout the column.
#**Below Control_FC_0-2, we typed =2^C2 and applied it throughout the column.
#**Below Control_FC_0-3, we typed =2^D2 and applied it throughout the column.
#**Below CHP_FC_0-1, we typed =2^H2 and applied it throughout the column.
#**Below CHP_FC_0-2, we typed =2^I2 and applied it throughout the column.
#**Below CHP_FC_0-3, we typed =2^J2 and applied it throughout the column.
#**Below Control_FC_3-1, we typed =2^N2 and applied it throughout the column.
#**Below Control_FC_3-2, we typed =2^O2 and applied it throughout the column.
#**Below Control_FC_3-3, we typed =2^P2 and applied it throughout the column.
#**Below CHP_FC_3-1, we typed =2^T2 and applied it throughout the column.
#**Below CHP_FC_3-2, we typed =2^U2 and applied it throughout the column.
#**Below CHP_FC_3-3, we typed =2^V2 and applied it throughout the column.
#**Below Control_FC_6-1, we typed =2^Z2 and applied it throughout the column.
#**Below Control_FC_6-2, we typed =2^AA2 and applied it throughout the column.
#**Below Control_FC_6-3, we typed =2^AB2 and applied it throughout the column.
#**Below CHP_FC_6-1, we typed =2^AF2 and applied it throughout the column.
#**Below CHP_FC_6-2, we typed =2^AG2 and applied it throughout the column.
#**Below CHP_FC_6-3, we typed =2^AH2 and applied it throughout the column.

#*We then created a new worksheet labeled "with_average", in which we copy and special pasted the values of the columns with headers Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1.
#*We then created new columns called Control_FC_0-avg and CHP_FC_0-avg to the right of their respective t0 timepoint trials, and then within them computed the average value of the t0 timepoint trials for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command <code>=AVG(B2:D2)</code> and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command <code>=AVG(F2:H2)</code> and then applied this command to all cells in the column.
#*We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Fold_Change_3-1 or CHP_Fold_Change_3-1. We then calculated the fold change by dividing the first cell of each trial by the average t0 value for the respective treatment (control or CHP-treated), and then applying this throughout the column. To the right of each new column, we also created created columns with a column header either Control_Log2_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Log2_Fold_Change_3-1 or CHP_Log2_Fold_Change_3-1. We then Log2 transformed the fold changes by using <code>=LOG(cell designation, 2)</code>, where the cell designation is the first cell of the respective Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber columns. We then applied this throughout all of the cells in the Log2 column.

===Milestone 3===
#We created a new worksheet, naming it "CHP_ANOVA".
#We copied all the Control_Log2_Fold_Change_timepoint-replicatenumber columns and special pasted only the values into the new worksheet.
#To the right of each group of either the Control or CHP trials at one timepoint, we created columns with headers in the form Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint, where timepoint is either 3, 6, 12, 20, 40, 70, or 120.
# In the cell below the Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint headers, we typed <code>=AVERAGE()</code>, and then applied this throughout the column.
# Repeat steps (4) through (8) with the t30, t60, t90, and the t120 data.
# To the right, we then copy and special pasted the values of each Avg_Control_Log_FC_timepoint or Avg_CHP_Log_FC_timepoint columns. To the right of these, we created columns with the the headers Control_ss_HO and CHP_ss_HO.
#In the first cell below Avg_Control_Log_FC_timepoint, we typed <code>=SUMSQ(B2,C2,D2,J2,K2,L2,R2,S2,T2,Z2,AA2,AB2,AH2,AI2,AJ2,AP2,AQ2,AR2,AX2,AY2,AZ2)</code>, and below Avg_CHP_Log_FC_timepoint, we typed <code>=SUMSQ(F2,G2,H2,N2,O2,P2,V2,W2,X2,AD2,AE2,AF2,AL2,AM2,AN2,AT2,AU2,AV2,BB2,BC2,BD2)</code>.

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone protocols, linked here: [[Data Analysis]]

==References==
LMU BioDB 2024. (2024). Week 13. Retrieved April 17, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_13

Data Analysts Week 13

2024-04-18T05:30:04Z

Kmill104: /* Charlotte and Katie's Data Analyst Journal */ adding more details

==Charlotte and Katie's Data Analyst Journal==

===Milestone 1===
Completed as of April 11th when we gave our Journal Club Presentation with [[User:Hivanson| Hailey Ivanson]]

===Milestone 2===
#With Quality Assurance team member [[User:Hivanson| Hailey Ivanson]], we downloaded and examined the microarray dataset: [https://sgd-prod-upload.s3.amazonaws.com/S000204389/Sha_2013_PMID_24073228.zip SGD Processed Data].
#We made a sample-data relationship table in Excel labeled "reorganized" that lists all of the samples and at which time point they were collected, and their replicate number. We came up with consistent column headers that summarize this information. We named each column either Control_LogFC_timepoint-replicatenumber and CHP_LogFC_timepoint-replicatenumber, as in Control_LogFC_0-1 and CHP_LogFC_0-1. The timepoint for each is either 0, 3 , 6, 12, 20, 40, 70, or 120, and the replicate number either 1, 2, or 3. We organized the data in a worksheet in an Excel workbook so that:
#*ID is the first column header, and within it are all of the SGD systemic names
#*Data columns are to the right, in increasing chronological order, using the column header pattern we created.
#*Treatments are grouped together
#*Replicates are grouped together
#*We deleted the "EWEIGHT" row and "GWEIGHT" column.
#*We then had to undo the log-transformed raw intensity values. We first created new columns for each respective trial in the formats Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1. We then transformed the data in the first cell of each column with the equation <code>=2^<cell designation></code>, and then applied it to the remaining cells. The specific commands we used are shown below.
#**Below Control_FC_0-1, we typed =2^B2 and applied it throughout the column.
#**Below Control_FC_0-2, we typed =2^C2 and applied it throughout the column.
#**Below Control_FC_0-3, we typed =2^D2 and applied it throughout the column.
#**Below CHP_FC_0-1, we typed =2^H2 and applied it throughout the column.
#**Below CHP_FC_0-2, we typed =2^I2 and applied it throughout the column.
#**Below CHP_FC_0-3, we typed =2^J2 and applied it throughout the column.
#**Below Control_FC_3-1, we typed =2^N2 and applied it throughout the column.
#**Below Control_FC_3-2, we typed =2^O2 and applied it throughout the column.
#**Below Control_FC_3-3, we typed =2^P2 and applied it throughout the column.
#**Below CHP_FC_3-1, we typed =2^T2 and applied it throughout the column.
#**Below CHP_FC_3-2, we typed =2^U2 and applied it throughout the column.
#**Below CHP_FC_3-3, we typed =2^V2 and applied it throughout the column.
#**Below Control_FC_6-1, we typed =2^Z2 and applied it throughout the column.
#**Below Control_FC_6-2, we typed =2^AA2 and applied it throughout the column.
#**Below Control_FC_6-3, we typed =2^AB2 and applied it throughout the column.
#**Below CHP_FC_6-1, we typed =2^AF2 and applied it throughout the column.
#**Below CHP_FC_6-2, we typed =2^AG2 and applied it throughout the column.
#**Below CHP_FC_6-3, we typed =2^AH2 and applied it throughout the column.

#*We then created a new worksheet labeled "with_average", in which we copy and special pasted the values of the columns with headers Control_FC_timepoint-replicatenumber and CHP_FC_timepoint-replicatenumber, as in Control_FC_0-1 and CHP_FC_0-1.
#*We then created new columns called Control_FC_0-avg and CHP_FC_0-avg to the right of their respective t0 timepoint trials, and then within them computed the average value of the t0 timepoint trials for the control and CHP-treated data. In the first cell below the column headed Control_FC_0-avg, we used the Excel command <code>=AVG(B2:D2)</code> and then applied this command to all cells in the column. In the first cell below the column headed CHP_FC_0-avg column, we used the command <code>=AVG(F2:H2)</code> and then applied this command to all cells in the column.
#*We then created new columns to the right of each treatment with a column header either Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Fold_Change_3-1 or CHP_Fold_Change_3-1. We then calculated the fold change by dividing the first cell of each trial by the average t0 value for the respective treatment (control or CHP-treated), and then applying this throughout the column. To the right of each new column, we also created created columns with a column header either Control_Log2_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber, as in Control_Log2_Fold_Change_3-1 or CHP_Log2_Fold_Change_3-1. We then Log2 transformed the fold changes by using <code>=LOG(cell designation, 2)</code>, where the cell designation is the first cell of the respective Control_Fold_Change_timepoint-replicatenumber or CHP_Fold_Change_timepoint-replicatenumber columns. We then applied this throughout all of the cells in the Log2 column.

===Milestone 3===

==Acknowledgements==
This procedure was adapted from the Data Analysis page Milestone protocols, linked here: [[Data Analysis]]

==References==
LMU BioDB 2024. (2024). Week 13. Retrieved April 17, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_13