Class Journal Week 8
From LMU BioDB 2015
Contents
Nicole Anguiano
- What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- Go back to the Merrell et al. (2002) paper and look at your "sanity check" where you compared the fold changes and p values for certain genes between your work and the paper. Did the values match? Why do you think that is? Do you think there is sufficient information there for you to reproduce their data analysis? Why or why not?
Nanguiano (talk) 15:06, 20 October 2015 (PDT)
Jake Woodlee
- What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- Go back to the Merrell et al. (2002) paper and look at your "sanity check" where you compared the fold changes and p values for certain genes between your work and the paper. Did the values match? Why do you think that is? Do you think there is sufficient information there for you to reproduce their data analysis? Why or why not?
Jwoodlee (talk) 15:35, 22 October 2015 (PDT)
Emily Simso
- What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- The most common issues are that, first, documentation is often poor in clinical trials, thus not explaining what researchers did in their work. There are also then problems with intuition, since people assume things about the data. Overall, Baggerly and Coombs stress that lack of thoroughness with data and research leads to complications further on.
- The violated best practices according to DataONE were: consistency, being descriptive, and lacking data or information.
- What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- Dr. Baggerly recommends that groups use the same standards for reports, templates are reused, there is a report structure for approval, and use executive summaries for complete documentation.
- This connects to DataONE because they stress reproducible research through appropriate file types, consistent formatting, clear definitions, and understanding how databases are set up before using them, amongst other points.
- Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- I think watching Dr. Baggerly's talk helped explain how something like the Duke case could happen, because there are so many details in data and research. It seems that mistakes could easily be covered up simply because there is a culture of not documenting every aspect of your work. I think that this needs to change so that future clinical trials are more closely regulated.
- Go back to the Merrell et al. (2002) paper and look at your "sanity check" where you compared the fold changes and p values for certain genes between your work and the paper. Did the values match? Why do you think that is? Do you think there is sufficient information there for you to reproduce their data analysis? Why or why not?
- The values did not match between the Merrell et al. analysis and my analysis. This is probably because they are much more trained in this field and have more resources at their disposal. They were also able to perform more tests on the data, whereas we only had the given values. While I think we are able to get fairly accurate results, we are not close enough to the experiment to get the exact same results.
Emilysimso (talk) 20:49, 25 October 2015 (PDT)
Veronica Pacheco
- What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- Alot of the data being analyzed had discrepancies. They noticed problems in the data and they predict that they switched their input giving a much different resistance number than the actual resistance number. They figured this out by using a different study.They also made the point that there other genes that wasn't making sense in their data. 14/19 genes were accounted for by cross referencing them with another paper but that still leaves the other 5 genes. At this point, the contacted the magazine to report these findings. They said the most common mistakes are simple. Some of these mistakes include experimental design, mixing up sample labels, gene labels and group labels. DataOne explains that organization, consistency, and description is key to practicing good data preservation skills.
- What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- The ask for labeling the columns to tell which sample is which and DataOne emphasizes this point as well. They also ask to provide the code so that it is clear when trying to reproduce the results in a given experiment.
- Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- I like how at the end, he tells the audience what Coombs and him do as part of their protocol.For example, the use literate programming like Sweave. Overall, I liked how this assignment was structured. Reading the case first, then hearing this talk on how they went about figuring out the issues was a neat experience.
- Go back to the Merrell et al. (2002) paper and look at your "sanity check" where you compared the fold changes and p values for certain genes between your work and the paper. Did the values match? Why do you think that is? Do you think there is sufficient information there for you to reproduce their data analysis? Why or why not?
- The values did not match. I think it has to do with the accuracy of their method.They were able to use SAM which is Statistical Analysis for Microarrays and it is probably more advanced and accurate than using the pvalues in Excel. Although Excel is a great tool.
Vpachec3 (talk) 21:59, 25 October 2015 (PDT)
Kevin Wyllie
- What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- Baggerly points out confounding in experimental design (mixing up interpretation of results: sensitive versus resistant), mixing up sample/gene/group labels, and incomplete documentation (which hides the previously-mentioned mistakes). DataONE mentions that data should be stored in a format which allows it to be used by any application. This was violated when the researchers added a column name to the gene ID’s, which tricked their code into offsetting each gene by one.
- What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- Baggerly recommends appropriately labeling data, and providing code (provenance) so that it can be tested by third parties. Baggerly thinks these things should be required before beginning clinical trials. DataONE recommends using “descriptive column names” as well as dataset provenance.
- Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- Dr. Baggerly speaks quite fast and uses some terminology that is unfamiliar to me, so after watching the video twice, I’m still not entirely sure which “errors” he is implying Dr. Potti committed intentionally (if any). To me, this contrasts with the 60 Minutes segment which seemed to mention solely the deliberate manipulation of the data. The off-by-one index error, for example, seems like it could have been an honest mistake (not to say this would relieve Potti of culpability), as I can’t imagine how that would actually add to the (false) significance of the results.
- Go back to the Merrell et al. (2002) paper and look at your "sanity check" where you compared the fold changes and p values for certain genes between your work and the paper. Did the values match? Why do you think that is? Do you think there is sufficient information there for you to reproduce their data analysis? Why or why not?
Brandon Klein
- What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- Go back to the Merrell et al. (2002) paper and look at your "sanity check" where you compared the fold changes and p values for certain genes between your work and the paper. Did the values match? Why do you think that is? Do you think there is sufficient information there for you to reproduce their data analysis? Why or why not?