Difference between revisions of "Rlegaspi Week 14"
From LMU BioDB 2015
(Inserted file that was sent to Dr. Dahlquist.) |
(Changing headers and new section entitled summary of progress) |
||
Line 1: | Line 1: | ||
{{Heavy Metal HaterZ}} | {{Heavy Metal HaterZ}} | ||
− | = GenMAPP User Milestones | + | = GenMAPP User Milestones in relation to Week 14 = |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
== Milestone 2: Data Preparation == | == Milestone 2: Data Preparation == | ||
− | |||
− | |||
− | |||
# Create a Master Raw Data file that contains the IDs and columns of data required for further analysis. | # Create a Master Raw Data file that contains the IDs and columns of data required for further analysis. | ||
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics) | # Consult with Dr. Dahlquist on how to process the data (normalization, statistics) | ||
== Milestone 3: On-going Analysis Cycle == | == Milestone 3: On-going Analysis Cycle == | ||
− | |||
− | |||
− | |||
# Perform the statistical analysis in Excel. | # Perform the statistical analysis in Excel. | ||
# Format the gene expression data for import into GenMAPP. | # Format the gene expression data for import into GenMAPP. | ||
Line 30: | Line 15: | ||
# Create a ''.mapp'' file showing one pathway that is changed in your data. | # Create a ''.mapp'' file showing one pathway that is changed in your data. | ||
− | == | + | = Summary of Progress = |
− | + | ==Compiling Raw Data and Statistical Analysis== | |
+ | Referencing the Week 12 Feedback provided by Dr. Dahlquist, I was able to begin compiling the raw data on single data sheet | ||
+ | |||
*Created a CompiledRawData Sheet. | *Created a CompiledRawData Sheet. | ||
*Created a MasterSheet and deleted data with GeneID containing the following: | *Created a MasterSheet and deleted data with GeneID containing the following: | ||
Line 39: | Line 26: | ||
File was sent to Dr. Dahlquist: [[Media:Raw Data Shewanella RARL 20151201.xlsx]] | File was sent to Dr. Dahlquist: [[Media:Raw Data Shewanella RARL 20151201.xlsx]] | ||
+ | |||
== Week 12 Feedback == | == Week 12 Feedback == | ||
− | |||
* In light of the effort that it took to get to this point and to cut down on the additional workload that your particular dataset entails, from this point forward, let's cut out two of the timepoints for the depletion and repletion experiments (ignoring the 10 minute and 40 minute timepoints and keeping the 0, 5, 20, and 60 minute timepoints). | * In light of the effort that it took to get to this point and to cut down on the additional workload that your particular dataset entails, from this point forward, let's cut out two of the timepoints for the depletion and repletion experiments (ignoring the 10 minute and 40 minute timepoints and keeping the 0, 5, 20, and 60 minute timepoints). | ||
*# All of your calculations at this point exist as individual files; you need to compile all of the log ratios you computed into a single file. In a new workbook, name the first sheet "CompiledRawData". Name Column A "ID" and copy and paste in the list of IDs from one of your files. | *# All of your calculations at this point exist as individual files; you need to compile all of the log ratios you computed into a single file. In a new workbook, name the first sheet "CompiledRawData". Name Column A "ID" and copy and paste in the list of IDs from one of your files. |
Revision as of 23:52, 3 December 2015
Contents
Shewanella oneidensis
Our Gene Database Testing Report
Group Paper - File:Final Report 20151218 2 HMH.docx
Group Members
- Coder: Mary Alverson
- GenMAPP User & Project Manager: Ron Legaspi
- Quality Assurance: Josh Kuroda
- GenMAPP User: Emily Simso
Important Links
Our Files
Our Deliverables
Gene Database Project Links | |||||||
---|---|---|---|---|---|---|---|
Overview | Deliverables | Reference Format | Guilds | Project Manager | GenMAPP User | Quality Assurance | Coder |
Teams | Heavy Metal HaterZ | The Class Whoopers | GÉNialOMICS | Oregon Trail Survivors |
Individual Journal Entries | ||||
---|---|---|---|---|
Mary Alverson | Week 11 | Week 12 | Week 14 | Week 15 |
Emily Simso | Week 11 | Week 12 | Week 14 | Week 15 |
Ron Legaspi | Week 11 | Week 12 | Week 14 | Week 15 |
Josh Kuroda | Week 11 | Week 12 | Week 14 | Week 15 |
GenMAPP User Milestones in relation to Week 14
Milestone 2: Data Preparation
- Create a Master Raw Data file that contains the IDs and columns of data required for further analysis.
- Consult with Dr. Dahlquist on how to process the data (normalization, statistics)
Milestone 3: On-going Analysis Cycle
- Perform the statistical analysis in Excel.
- Format the gene expression data for import into GenMAPP.
- Import data into GenMAPP, create ColorSets, and run MAPPFinder.
- Document and take notes on test runs with GenMAPP.
- Use the EX.txt file to help the Coder/Quality Assurance team members to validate the .gdb.
- Do a journal club outline of the paper so that you can use it in the Discussion section of your group report and your final presentation.
- Create a .mapp file showing one pathway that is changed in your data.
Summary of Progress
Compiling Raw Data and Statistical Analysis
Referencing the Week 12 Feedback provided by Dr. Dahlquist, I was able to begin compiling the raw data on single data sheet
- Created a CompiledRawData Sheet.
- Created a MasterSheet and deleted data with GeneID containing the following:
- Number of deletions: 705
- Found the error message
#NUM!
and replaced with a blank space ("nothing") and there were 2118 replacements made. - Found the error message
#DIV/0!
and replaced with blank space ("nothing") and there were 23 replacements made.
File was sent to Dr. Dahlquist: Media:Raw Data Shewanella RARL 20151201.xlsx
Week 12 Feedback
- In light of the effort that it took to get to this point and to cut down on the additional workload that your particular dataset entails, from this point forward, let's cut out two of the timepoints for the depletion and repletion experiments (ignoring the 10 minute and 40 minute timepoints and keeping the 0, 5, 20, and 60 minute timepoints).
- All of your calculations at this point exist as individual files; you need to compile all of the log ratios you computed into a single file. In a new workbook, name the first sheet "CompiledRawData". Name Column A "ID" and copy and paste in the list of IDs from one of your files.
- Create a "MasterIndex" column as follows. Insert a new column to the right of the "ID" column and name it "MasterIndex". In this column you will create a numerical index of genes so that you can always sort them back into the same order that they started out in.
- Type a "1" in cell B2 and a "2" in cell B3.
- Select both cells. Hover your mouse over the bottom-right corner of the selection until it makes a thin black + sign. Double-click on the + sign to fill the entire column with a series of numbers from 1 to 11520 (the number of spots on the microarray).
- Then you need to copy and paste (values) the "log2" column from your individual files. They should be in order Log2FC-CO-rep1, Log2FC-CO-rep2, Log2FC-C0-rep3, Log2FC-C0-rep4, Log2FC-C5-rep1, etc.
- The next set of manipulations should be performed in a new sheet called "MasterSheet".
- Sort the data A-->Z based on the "ID" column. Delete all rows that have an ID of "Blank", "blank", "gDNA", start with "NC-", start with "ORF". Record how many rows got deleted.
- Some of your cells are going to have error messages in them because of the previous calculations you did. Find and replace all of these with nothing, record how many cells that is.
- Create a new worksheet called "ScalingCentering" and copy and paste special all of your data into this new sheet. You will perform the scaling and centering operations like you did for the Vibrio cholerae data.
- Once you have done this, e-mail Dr. Dahlquist and provide a link to your file. Your microarray data has duplicated spots. I have a script that will separate these out so that you can average them as technical replicates.
- Create a new worksheet called "Statistics", copy and paste the values into this new sheet.
- You will average the technical replicate spots for each sample to get one value for each sample.
- You will average the biological replicates for each timepoint to get an average for each timepoint (C0, C5, C20, C60, F5, F20, F60).
- You will take the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60). Since this is in log space, to take the ratio, you will actually subtract instead of divide.
- You will perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc. and perform the Bonferroni and Benjamini and Hochberg corrections on these p values. This computation is not the same as what we did for Vibrio. Instead you will use the
TTEST
function in Excel (see me when you are ready to do this). The corrections will be the same as what you did before. - After that, then you will need to then format the data for GenMAPP and you'll be ready to import it into GenMAPP and run MAPPFinder.
- Let me know if you have any questions.
— Kdahlquist (talk) 13:34, 24 November 2015 (PST)