Class Journal Week 8
Contents |
Alina Vreeland
- What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
Baggerly and Coombs identified the issues of inconsistent data. DataONE touched upon the issue of being diligent in the entry of data into a spreadsheet, and this was obviously violated, since the data was not able to be successfully reproduced by others.
- What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
He recommends documentation of all steps and checking labels on every gene, etc, so that your peers can easily follow what you're doing, just like keeping an electronic journal in bio databases so that other people can follow the process that you followed. This would include being consistent in how you enter data into your spreadsheet, having all information in one place, and using file types that can be easily used be others in the future, like DataONE stresses in their powerpoint.
- Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
I still have the same general feeling about the case. It seems odd that people in the science field would not take better care of their data, and be so lazy when it comes to making their data valid and easy to use by others. In order to have any sort of high reputation in your respective field it seems like you wouldn't want to seem like an amateur when presenting your data to others, especially if you present fraudulent data as something to be prized.
- Look at the methods and results described in the Merrell et al. (2002) paper. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
The methods and results were not clearly defined. Therefore, it would be extremely difficult for another person to attempt to reproduce the results, since even the results were not clearly defined. The authors used many terms such as "one possibility," suggesting that even they cannot be certain in their own findings. For one to be able to successfully reproduce the data, more detailed information about the process would be needed.
Miles Malefyt
1.What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
The main issues with the data and analysis as identified by Baggerly and Coombs were that the end result of the data did not match up with the methods used. The numbers must have been made up or manipulated in many cases in order to get the results which ended up being non-reproduceable.
2.What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
The DataONE and Dr.Baggerly reccomend being consistent with the data used and adhering to the methods described so that they can be reproduceable
3.Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
I feel that much of the scientific community is more oriented towards showing results than making data that is able to be reproduced. It makes me feel like this is more about money and fame than it is about the actual science.
4.Look at the methods and results described in the Merrell et al. (2002) paper. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
Mmalefyt (talk) 18:44, 17 October 2013 (PDT)
Lauren Magee
- What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- The main issue with the data was that it didn't match up with the results that Baggerly and Coombs had formulate from the data. The data points given by the researchers were in between the extremes that were plotted by Baggerly and Coombs (i.e. those resistant to a specific drug or those sensitive to a specific drug). However, the heat maps they created were consistent with those done by Baggerly and Coombs for the most part, so it was there analysis of the graphs that were the main concern for biostatisticians. At some points the researchers had even interpreted the data backwards, so that their conclusion was the opposite of what it should have been. Baggerly also thought it important to note that later on in their analysis the researchers had numerous repeats in their data that was also messing with their final results and when they suggested taking these out, the researchers produced a new list with still a few repeats that even contradicted themselves. The general issue that Baggerly speaks of is the need for researchers to keep a detailed log of what processes they used to analyze their data. Not only did the researchers in question lack a detailed log of their work, but they also refused to give the biostatisticians some of their data, because it was "confidential".
- What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- Baggerly recommends keeping a detailed journal of your research making sure that someone looking back at these notes, would be able to reproduce your data with exact accuracy. If someone starts with the same numbers as you did, they should get the same numbers at the end of the analysis.
- Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- I think that this talk speaks volumes of the need for individuals, who can analyze big data correctly. In this case, there may have been fraudulent data being produced by the researchers themselves to create a desired outcome, but I also think that in general researchers have issues analyzing their data correctly and effectively. There are so many amazing programs in place to help analyze big data, but the individual must now how to interpret such results to make applicable conclusions. This is one of the reasons I decided to take this class, because I want to be able to enter the scientific research community with the knowledge of how to handle big data.
- Look at the methods and results described in the Merrell et al. (2002) paper. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
Hilda Delgadillo
- What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- The labeling of the genes associated with the particular drugs were incorrect. Their genes were one off in terms of the list, so the set of genes that the research paper described were not actually describing the corresponding biology. Baggerly and Coombs were not able to replicate the data analysis. Some of the common issues that were mentioned is the idea that the easier steps can often be erroneous such as the labeling of genes, sample labels, and group labels, all in all very simple mistakes. The practices that were also violated which is encountered in the DataONE slides was the inconsistency of data and the mislabeling of samples.
- What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- He recommends to record everything to avoid incomplete documentation. He recommends labeling as much as possible for graphs as an example and as the slides mention, labeling columns are important if charts are used. Also, Dr. Baggerly recommends providing the codes of the analysis.
- Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- It is shocking how the analysis that proved the data to have significant errors were practically ignored, so eventually the clinical trials were permitted. Therefore, the deaths of the cancer patients that took part in this research trial could have been prevented.
- Look at the methods and results described in the Merrell et al. (2002) paper. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
- I don't think there's enough information to reproduce their data analysis. Some of the descriptions are very specific, but some terms are not explained in detail and just mentioned. I was also hoping to see their microarray data by clicking on this provided link http://genome-www5.stanford.edu/microarray/SMD/), but it took me to a "Not Found' page.
HDelgadi (talk) 15:58, 17 October 2013 (PDT)
Lena Hunt
- 1.) What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- There was an off-by-one indexing error in the genes, so they were referencing the wrong genes in the paper. Furthermore, of the genes in the paper, some worked to split the test set, some worked to split the training set and some worked to explain why the biology worked, but there was no overlap. Overall, the genes had mislabeled as sensitive or resistance when they were in fact the opposite. DataONE enumerates that valid and organized to support ease of use, the data sets from Duke were neither. Dr. Baggerly claimed that mixing up sample labels, gene labels, and group labels were common mistakes.
- 2.) What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- Dr. Baggerly recommends keeping better records at the research level and labeling published data with a code so that others can reproduce the results. DataONE recommends the better data keeping as well, especially in regards to keeping data consistent.
- 3.) Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- I am still shocked at how careless the researchers were about checking their data.
- 4.) Look at the methods and results described in the Merrell et al. (2002) paper. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
- I think so, it seems more or less straightforward. It was written for a scientists with more knowledge about those particular techniques, and so while I think I understand mentally from what I have learned in class, I don't know how easy it would be to reproduce in a wetlab because I have never done it before.