Latest revision as of 07:39, 8 December 2015

1 Shewanella oneidensis
2 Group Members
3 Important Links
- 3.1 Our Files
- 3.2 Our Deliverables
4 Goals for Week 14
- 4.1 Data Preparation and Statistical Analysis
5 Summary of Progress and Procedure
- 5.1 Compiling Raw Data and Statistical Analysis
  - 5.1.1 December 1, 2015 through December 3, 2015
  - 5.1.2 December 3, 2015 thru December 8, 2015
6 External Links

Shewanella oneidensis

Our Gene Database Testing Report

Group Paper - File:Final Report 20151218 2 HMH.docx

Group Members

Coder: Mary Alverson
GenMAPP User & Project Manager: Ron Legaspi
Quality Assurance: Josh Kuroda
GenMAPP User: Emily Simso

Important Links

Our Deliverables

Gene Database Project Links
Overview	Deliverables	Reference Format	Guilds	Project Manager	GenMAPP User	Quality Assurance	Coder
Overview	Deliverables	Reference Format	Teams	Heavy Metal HaterZ	The Class Whoopers	GÉNialOMICS	Oregon Trail Survivors

Individual Journal Entries
Mary Alverson	Week 11	Week 12	Week 14	Week 15
Emily Simso	Week 11	Week 12	Week 14	Week 15
Ron Legaspi	Week 11	Week 12	Week 14	Week 15
Josh Kuroda	Week 11	Week 12	Week 14	Week 15

Goals for Week 14

Data Preparation and Statistical Analysis

Create a Master Raw Data file that contains the IDs and columns of data required for further analysis.
Consult with Dr. Dahlquist on how to process the data (normalization, statistics)
Perform the statistical analysis in Excel.
Format the gene expression data for import into GenMAPP.

Summary of Progress and Procedure

Compiling Raw Data and Statistical Analysis

December 1, 2015 through December 3, 2015

Referencing the Week 12 Feedback provided by Dr. Dahlquist, I was able to begin compiling the raw data on single Excel File:

Created an Excel File and named file Raw Data Shewanella RARL 20151201
Sheet 1 was entitled CompiledRawData Sheet:
- Column 1 = Gene ID
- Column 2 = MasterIndex (numbered from 1 to 11520)
- The rest of the columns was log data taken from the 0, 5, 20, and 60 time points respectively
  - 7 timepoints total (C0, C5, C20, C60, F5, F20, F60) and 4 replicates total; therefore, 28 total columns of data
Created a MasterSheet and copied information from CompiledRawData Sheet into this new sheet
- Sorted the Gene ID's in alphabetical order (A-Z) and deleted the rows that contained an ID of Blank, blank, gDNA, NC-, or ORF resulting in the deletion of 705 rows.
- Deleted the cells that contained the error message of #NUM! which resulted in the deletion of 2,118 cells.
- Deleted the cells that contained the error message of #DIV/0! which resulted in the deletion of 23 cells.
Created a ScalingCentering Sheet
- Copied over data from the MasterSheet
- Added two rows right below the title row to represent the calculations for the Average and the Standard Deviation of each column
- For the Scaled and Centered Columns of data, typed the equation =(C4-C$2)/C$3 in the first cell under scaled and centered column for replicate 1 at timepoint C0, and used Excel functions in order to scale and center the rest of the data with the equation as a template.
Sent this file to Dr. Dahlquist to split the data to get rid of duplicates: File:Raw Data Shewanella RARL 20151201.xlsx

December 3, 2015 thru December 8, 2015

Discrepancies and issues arose with data between my partner Emily Simso and I that were brought to our attention by Dr. Dahlquist; thus, a review of the compiled raw data needed to be done in order for our Excel Sheets to match and to continue on with statistical analysis:

Repeated procedure from File:Raw Data Shewanella RARL 20151201.xlsx; however, feedback from Dr. Dahlquist was kept in mind and created a new Excel file called UpdatedCompiledRawData Shewanella RARL 20151201 HMH
- Correct set of timepoints were used in my previous Excel file so no changes were needed to be done there
- Ensured that I had 11520 Gene IDs; in which the last row which had a "Gene ID" as its label was changed to the correct Gene ID that is "SO4357."
- Once all the necessary changes were made and I had touched base with my partner Emily on Sunday and Monday, I had uploaded the file for splitting by Dr. Dahlquist: File:UpdatedCompiledRawData Shewanella RARL 20151201 HMH.xlsx
Split data was received and posted as a file on our team's file page by Dr. Dahlquist: File:UpdatedCompiledRawData Shewanella RARL 20151201 HMH forsplitting.xlsx
- Downloaded this file and copied the sheets of data into a new Excel file entitled StatisticalAnalysis Shewanella RARL 20151207 HMH
- Created a new sheet called Averages
  - Averaged together the replicate data from the two spots that are now split and used the equation =AVERAGE(C2,AG2) under the column for C0 replicate 1
  - Used excel to copy this equation to the entire column and get a derivative of the equation copied for the other columns of averages for each replicate
- Created a new sheet called Statistics
  - Copied and pasted values from the Averages sheet into this new sheet
  - Computed the average of the biological replicates for each treatment, biological average was calculated with the following equation for C0: =AVERAGE(C2:F2) and a derivative of this equation was used for every timepoint.
  - Calculated the average log ratios of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60
    - Since its in log space, I just needed to subtract the average from the C5 to the average from the C0.
  - Performed a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, and so on:

=TTEST(<range of cells containing the biological replicates for C0>, <range of cells containing the biological replicates for C5>, 2,3]

This will returned the p value and uploaded the file to the team's file page to be reviewed by Dr. Dahlquist, while performing a sanity check: File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx

External Links

Ron Legaspi
BIOL 367, Fall 2015

Assignment Links

Individual Weekly Journals

Individual Journal Week 1 - This is my User Page
Individual Journal Week 2
Individual Journal Week 3
Individual Journal Week 4
Individual Journal Week 5
Individual Journal Week 6
Individual Journal Week 7
Individual Journal Week 8
Individual Journal Week 9
Individual Journal Week 10
Individual Journal Week 11
Individual Journal Week 12
Individual Journal Week 14
Individual Journal Week 15

Shared Weekly Journals

Shared Journal Week 1
Shared Journal Week 2
Shared Journal Week 3
Shared Journal Week 4
Shared Journal Week 5
Shared Journal Week 6
Shared Journal Week 7
Shared Journal Week 8
Shared Journal Week 9
Heavy Metal HaterZ Team Page - Week 10-15 Shared Journal

@@ Line 1: / Line 1: @@
+{{Heavy Metal HaterZ}}
 = Goals for Week 14 =
-== Milestone 2: Data Preparation ==
+== Data Preparation and Statistical Analysis ==
 # Create a Master Raw Data file that contains the IDs and columns of data required for further analysis.
 # Consult with Dr. Dahlquist on how to process the data (normalization, statistics)
 # Perform the statistical analysis in Excel.
 # Format the gene expression data for import into GenMAPP.
-# Import data into GenMAPP, create ColorSets, and run MAPPFinder.
-# Document and take notes on test runs with GenMAPP.
-#* Use the ''EX.txt'' file to help the Coder/Quality Assurance team members to validate the ''.gdb''.
-# Do a journal club outline of the paper so that you can use it in the Discussion section of your group report and your final presentation.
-# Create a ''.mapp'' file showing one pathway that is changed in your data.
-= Summary of Progress =
-==Compiling Raw Data and Statistical Analysis==
-Referencing the Week 12 Feedback provided by Dr. Dahlquist, I was able to begin compiling the raw data on single data sheet
-*Created a CompiledRawData Sheet.
-*Created a MasterSheet and deleted data with GeneID containing the following:
-*#Number of deletions: 705
-*Found the error message <code>#NUM!</code> and replaced with a blank space ("nothing") and there were 2118 replacements made.
-*Found the error message <code>#DIV/0!</code> and replaced with blank space ("nothing") and there were 23 replacements made.
-File was sent to Dr. Dahlquist: [[File:Raw Data Shewanella RARL 20151201.xlsx]]
-== Week 12 Feedback ==
-* In light of the effort that it took to get to this point and to cut down on the additional workload that your particular dataset entails, from this point forward, let's cut out two of the timepoints for the depletion and repletion experiments (ignoring the 10 minute and 40 minute timepoints and keeping the 0, 5, 20, and 60 minute timepoints).
-*# All of your calculations at this point exist as individual files; you need to compile all of the log ratios you computed into a single file.  In a new workbook, name the first sheet "CompiledRawData".  Name Column A "ID" and copy and paste in the list of IDs from one of your files.
-*# Create a "MasterIndex" column as follows.  Insert a new column to the right of the "ID" column and name it "MasterIndex". In this column you will create a numerical index of genes so that you can always sort them back into the same order that they started out in.
-*#* Type a "1" in cell B2 and a "2" in cell B3.
-*#* Select both cells. Hover your mouse over the bottom-right corner of the selection until it makes a thin black + sign. Double-click on the + sign to fill the entire column with a series of numbers from 1 to 11520 (the number of spots on the microarray).
-*# Then you need to copy and paste (values) the "log2" column from your individual files.  They should be in order Log2FC-CO-rep1, Log2FC-CO-rep2, Log2FC-C0-rep3, Log2FC-C0-rep4, Log2FC-C5-rep1, etc.
-*# The next set of manipulations should be performed in a new sheet called "MasterSheet".
-*# Sort the data A-->Z based on the "ID" column.  Delete all rows that have an ID of "Blank", "blank", "gDNA", start with "NC-", start with "ORF".  Record how many rows got deleted.
-*# Some of your cells are going to have error messages in them because of the previous calculations you did.  Find and replace all of these with nothing, record how many cells that is.
-*# Create a new worksheet called "ScalingCentering" and copy and paste special all of your data into this new sheet.  [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae#Normalize_the_log_ratios_for_the_set_of_slides_in_the_experiment You will perform the scaling and centering operations like you did for the ''Vibrio cholerae'' data.]
-*#* Once you have done this, e-mail Dr. Dahlquist and provide a link to your file.  Your microarray data has duplicated spots.  I have a script that will separate these out so that you can average them as technical replicates.
-*# Create a new worksheet called "Statistics", copy and paste the values into this new sheet.
-*# You will average the technical replicate spots for each sample to get one value for each sample.
-*# You will average the biological replicates for each timepoint to get an average for each timepoint (C0, C5, C20, C60, F5, F20, F60).
-*# You will take the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60).  Since this is in log space, to take the ratio, you will actually subtract instead of divide.
-*# You will perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc. and perform the Bonferroni and Benjamini and Hochberg corrections on these p values.  This computation is not the same as what we did for ''Vibrio''.  Instead you will use the <code>TTEST</code> function in Excel (see me when you are ready to do this).  The corrections will be the same as what you did before.
-*# After that, then you will need to then format the data for GenMAPP and you'll be ready to import it into GenMAPP and run MAPPFinder.
-* Let me know if you have any questions.
-== 2015-12-07 ==
-* A new file with the split data has been uploaded to your team's files page: UpdatedCompiledRawData_Shewanella_RARL_20151201_HMH_forsplitting.xlsx
-** Note that this file is based on "UpdatedCompiledRawData_Shewanella_RARL_20151201_HMH.xlsx".  I still found an error in the other version of the file that there was a gene called "Gene ID" on the CompiledRawData sheet.  This led to a missing gene on the MasterSheet and a discrepancy in the data for the scaling and centering between the two files in the fourth decimal place.
-You now need to do the following:
+= Summary of Progress and Procedure =
-# Average together the replicate data from the two spots that are now split.  This means that you need to average the "Log2FC-C0-rep1-scaledandcentered" in cell C2 with the value in cell AG2, for example.
+== Compiling Raw Data and Statistical Analysis ==
-# Copy and paste special > paste values into a new sheet called "statistics".
+=== December 1, 2015 through December 3, 2015 ===
-# Compute the average of the biological replicates for each treatment and timepoint.  For example, average together all four biological replicates for Log2FC-C0.  Repeat for each timepoint.
+Referencing the Week 12 Feedback provided by Dr. Dahlquist, I was able to begin compiling the raw data on single Excel File:
-# Compute the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60).  Since this is in log space, to take the ratio, you will actually subtract instead of divide.
+*Created an Excel File and named file ''Raw Data Shewanella RARL 20151201''
-# Perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc.  Use the equation:
+*Sheet 1 was entitled CompiledRawData Sheet:
+**Column 1 = Gene ID
+**Column 2 = MasterIndex (numbered from 1 to 11520)
+**The rest of the columns was log data taken from the 0, 5, 20, and 60 time points respectively
+***7 timepoints total (C0, C5, C20, C60, F5, F20, F60) and 4 replicates total; therefore, 28 total columns of data
+*Created a MasterSheet and copied information from CompiledRawData Sheet into this new sheet
+**Sorted the Gene ID's in alphabetical order (A-Z) and deleted the rows that contained an ID of '''Blank, blank, gDNA, NC-, or ORF''' resulting in the deletion of '''705 rows.'''
+**Deleted the cells that contained the error message of <code>#NUM!</code> which resulted in the deletion of '''2,118 cells.'''
+**Deleted the cells that contained the error message of <code>#DIV/0!</code> which resulted in the deletion of '''23 cells.'''
+*Created a ScalingCentering Sheet
+**Copied over data from the MasterSheet
+**Added two rows right below the title row to represent the calculations for the Average and the Standard Deviation of each column
+**For the Scaled and Centered Columns of data, typed the equation <code>=(C4-C$2)/C$3</code> in the first cell under scaled and centered column for replicate 1 at timepoint C0, and used Excel functions in order to scale and center the rest of the data with the equation as a template.
+*Sent this file to Dr. Dahlquist to split the data to get rid of duplicates: [[File:Raw Data Shewanella RARL 20151201.xlsx]]
+=== December 3, 2015 thru December 8, 2015 ===
+Discrepancies and issues arose with data between my partner Emily Simso and I that were brought to our attention by Dr. Dahlquist; thus, a review of the compiled raw data needed to be done in order for our Excel Sheets to match and to continue on with statistical analysis:
+*Repeated procedure from [[File:Raw Data Shewanella RARL 20151201.xlsx]]; however, feedback from Dr. Dahlquist was kept in mind and created a new Excel file called ''UpdatedCompiledRawData Shewanella RARL 20151201 HMH''
+**Correct set of timepoints were used in my previous Excel file so no changes were needed to be done there
+**Ensured that I had 11520 Gene IDs; in which the last row which had a "Gene ID" as its label was changed to the correct Gene ID that is "SO4357."
+**Once all the necessary changes were made and I had touched base with my partner Emily on Sunday and Monday, I had uploaded the file for splitting by Dr. Dahlquist: [[File:UpdatedCompiledRawData Shewanella RARL 20151201 HMH.xlsx]]
+*Split data was received and posted as a file on our team's file page by Dr. Dahlquist: [[File:UpdatedCompiledRawData Shewanella RARL 20151201 HMH forsplitting.xlsx]]
+**Downloaded this file and copied the sheets of data into a new Excel file entitled ''StatisticalAnalysis Shewanella RARL 20151207 HMH''
+**Created a new sheet called Averages
+***Averaged together the replicate data from the two spots that are now split and used the equation <code>=AVERAGE(C2,AG2)</code> under the column for C0 replicate 1
+***Used excel to copy this equation to the entire column and get a derivative of the equation copied for the other columns of averages for each replicate
+**Created a new sheet called Statistics
+***Copied and pasted values from the Averages sheet into this new sheet
+***Computed the average of the biological replicates for each treatment, biological average was calculated with the following equation for C0: <code>=AVERAGE(C2:F2)</code> and a derivative of this equation was used for every timepoint.
+***Calculated the average log ratios of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60
+****Since its in log space, I just needed to subtract the average from the C5 to the average from the C0.
+***Performed a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, and so on:
   =TTEST(<range of cells containing the biological replicates for C0>, <range of cells containing the biological replicates for C5>, 2,3]
-: This will return the p value.  Send me the link to the file at this point so I can check the results.  You can also perform the sanity check.  Let me know how it goes.
+: This will returned the p value and uploaded the file to the team's file page to be reviewed by Dr. Dahlquist, while performing a sanity check: [[File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx]]
-instances of error "#DIV/0!" replaced with a blank cell.
+= External Links =
+{{Template:Rlegaspi}}

Difference between revisions of "Rlegaspi Week 14"

Latest revision as of 07:39, 8 December 2015

Contents

Shewanella oneidensis

Group Members

Important Links

Our Files

Our Deliverables

Goals for Week 14

Data Preparation and Statistical Analysis

Summary of Progress and Procedure

Compiling Raw Data and Statistical Analysis

December 1, 2015 through December 3, 2015

December 3, 2015 thru December 8, 2015

External Links

Assignment Links

Individual Weekly Journals

Shared Weekly Journals

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools