Difference between revisions of "Lenaolufson Week 14"

From LMU BioDB 2015
Jump to: navigation, search
(added in the protocol followed in class for the data)
(added progress protocol)
Line 22: Line 22:
 
*Data set after followed protocol given by Dr. D:
 
*Data set after followed protocol given by Dr. D:
 
**[[File:Bpertussis_CompiledRawData_MS2015.xlsx]]
 
**[[File:Bpertussis_CompiledRawData_MS2015.xlsx]]
 +
 +
==12/3/15==
 +
*I followed the protocol given by Dr. Dahlquist after she reviewed the most updated version of the excel spreadsheet:
 +
* First I fixed my MasterSheet by getting rid of the columns I didn't need.
 +
* What did next was not to delete the genes, but instead I did a search and replace.  I searched for "100000" and "-1000000" and replaced it with an empty cell.  In the find/replace window, I typed 100000 or -100000 in the find field and nothing at all in the replace field and then I clicked on the replace all button.
 +
** replaced: 14 "100000" entries with nothing
 +
* Then I copied and paste special > paste values into my scaling and centering sheet and tried again.  Things are OK because my Average for each column is a very small number near zero and my standard deviation is near 1.

Revision as of 22:00, 3 December 2015

12/1/15

  • I followed the protocol given by Dr. Dahlquist left on my talk page to perform the correct steps for editing the excel data sheet. The protocol is as follows:
    1. I renamed Sheet1 to "CompiledRawData".
    2. I renamed my column heading as follows:
      • I called my leftmost column "ID" instead of Code.
      • For the data columns, I got rid of the "(635/532)" from each header. I named them like this as an example: "LogRatio_SampleA_Cy3-Cy5".
    3. Once I renamed the columns, I did all further manipulations in a different sheet. I copied and pasted all of the data into Sheet2 which I renamed to "DyeSwap".
    4. I created a "MasterIndex" column as follows. I inserted a new column to the right of the "ID" column and named it "MasterIndex". In this column I created a numerical index of genes so that I can always sort them back into the same order that they started out in.
      • I typed a "1" in cell B2 and a "2" in cell B3.
      • I selected both cells. I hovered my mouse over the bottom-right corner of the selection until it made a thin black + sign. I double-clicked on the + sign to fill the entire column with a series of numbers from 1 to 8448 (the number of spots on the microarray).
    5. Then, I selected all of the data and sorted it A-->Z on the "ID" column.
    6. I deleted all of the rows that had an ID of "_". The number of records after deleting the "_" columns: 7104.
    7. Then I swapped the dye orientation so that all of the samples were Cy5/Cy3.
      • I inserted a column to the right of the columns that needed to be swapped. I named the new column the same as I did before, but added "_swapped" to the header to designate that I swapped the samples.
      • Then, I typed a formula in the column: =C2*(-1). I copied and pasted the formula to the entire column.
    8. I created a new worksheet that I named "MasterSheet". I copied and Pasted special > Paste values the ID, MasterIndex, and data columns that were all in the orientation of Cy5/Cy3 (the original ones and the ones I just swapped).
    9. This was then the starting point for the normalization and statistics. I copied and pasted the data from this sheet into a new worksheet, which I renamed "ScalingCentering".
    10. In this new sheet, I performed the scaling and centering according to the Vibrio cholerae instructions found here.]
      • When I computed the average and standard deviation calculations for the log ratios, all of the values that came out were much too high to make sense with the data. Upon looking at the data and consulting with Dr. Dahlquist, we found that some of the values from the raw data were extremely large such as 100000.
    11. At this point, I posted my spreadsheet and e-mailed Dr. Dahlquist the link to it.

12/3/15

  • I followed the protocol given by Dr. Dahlquist after she reviewed the most updated version of the excel spreadsheet:
  • First I fixed my MasterSheet by getting rid of the columns I didn't need.
  • What did next was not to delete the genes, but instead I did a search and replace. I searched for "100000" and "-1000000" and replaced it with an empty cell. In the find/replace window, I typed 100000 or -100000 in the find field and nothing at all in the replace field and then I clicked on the replace all button.
    • replaced: 14 "100000" entries with nothing
  • Then I copied and paste special > paste values into my scaling and centering sheet and tried again. Things are OK because my Average for each column is a very small number near zero and my standard deviation is near 1.