Difference between revisions of "Class Journal Week 8"
From LMU BioDB 2017
(working on class journal - zach van ysseldyk) |
(added answers for Qlanners) |
||
Line 2: | Line 2: | ||
#The main issues with the data and analysis identified was that the data was not reproducible - actually, that the data was very very far from reproducible. They reused the same genes in statistical analysis, their indexing was off by one, and some of their verification tables (namely the 59 gene ovarian cancer model) matched 0% of what the line was supposed to be. Based on his overall observations, he says the most common mistakes are simple. He says how this simplicity is often hidden, and furthers to say that the most simple mistakes are common. Specifically, he found the most common mistakes concerning: Mixing up sample labels, Mixing up the gene labels, Mixing up the group labels, and incomplete documentation. He notes how the MOST common mistake is the complete confounding in the Experimental design. | #The main issues with the data and analysis identified was that the data was not reproducible - actually, that the data was very very far from reproducible. They reused the same genes in statistical analysis, their indexing was off by one, and some of their verification tables (namely the 59 gene ovarian cancer model) matched 0% of what the line was supposed to be. Based on his overall observations, he says the most common mistakes are simple. He says how this simplicity is often hidden, and furthers to say that the most simple mistakes are common. Specifically, he found the most common mistakes concerning: Mixing up sample labels, Mixing up the gene labels, Mixing up the group labels, and incomplete documentation. He notes how the MOST common mistake is the complete confounding in the Experimental design. | ||
#Baggerly first suggests that the data should be labeled in order to clearly be able to tell which data is which. | #Baggerly first suggests that the data should be labeled in order to clearly be able to tell which data is which. | ||
+ | |||
+ | |||
+ | ==QLanners Responses== | ||
+ | #There were a number of issues with the data and analysis identified by Baggerly and Coombs. Some of the main issues included a universal off-by-one indexing error brought about by poor attention to the software being used, an inaccurate use of secndary source data (their data labels seemed to be flipped from the published data labels), the use of duplicate data, and very poor documentation in general. The review panel even said that they could not figure out from the published data how to reproducce the work without some sort of outside help. A number of best practices enumarted by DataOne were broken, including a failure to maintain dataset provenance, a lack of documentation of all assumptions, a lack of any form of repdocuible workflow documentation, and very poor labeling techniques. Dr. Baggerly claimed that several of these were common mistakes, most prominently the off-by-one indexing error and the mixing-up of labels. Dr. Baggerly also pointed out how it is the poor documentation that often leads to these easy mistakes going undetected. | ||
+ | #Dr. Baggerly recommend more thorough documentation of the data, namely through labels for all published data. He also recommends a stricter requirement for data provenance and for code to be published along with the data. Overall, Dr. Baggerly stresses the need for the research to be reproducible. The corresponds very closely with what DataOne recommends, as several of the best practices (as outlined above) are essential for properly documenting data and data analysis and ensuring that someone else can perform the exact same steps on the data in the future using just the documentation. | ||
+ | #In this weeks assignment were performed a number of best practices. We ensured to provide distinct labels for all of our data points, we labeled our data files as descriptive names, we appropriately handled missing data, and we kept documentation on how we performed our data analysis so that it could be reproduced in the future by somebody else. | ||
+ | #I was very surprised at all of the pushback that Dr. Baggerly received from the scientific journals when he shared the errors in the data. I would have thought that scientific journals would have been much more committed to ensuring that the papers that they had published were accurate and would have been more helpful to Dr. Baggerly and tough on the Duke research team. I think going forward a higher sense of accountability needs to be adopted in the scientific field to avoid scenarios like this. <br> | ||
+ | [[User:Qlanners|Qlanners]] ([[User talk:Qlanners|talk]]) 17:28, 22 October 2017 (PDT) <br> | ||
+ | '''QLanners Links''' <br> | ||
+ | {{Template:QLannersLinks}} |
Revision as of 00:28, 23 October 2017
Zachary Van Ysseldyk's Responses
- The main issues with the data and analysis identified was that the data was not reproducible - actually, that the data was very very far from reproducible. They reused the same genes in statistical analysis, their indexing was off by one, and some of their verification tables (namely the 59 gene ovarian cancer model) matched 0% of what the line was supposed to be. Based on his overall observations, he says the most common mistakes are simple. He says how this simplicity is often hidden, and furthers to say that the most simple mistakes are common. Specifically, he found the most common mistakes concerning: Mixing up sample labels, Mixing up the gene labels, Mixing up the group labels, and incomplete documentation. He notes how the MOST common mistake is the complete confounding in the Experimental design.
- Baggerly first suggests that the data should be labeled in order to clearly be able to tell which data is which.
QLanners Responses
- There were a number of issues with the data and analysis identified by Baggerly and Coombs. Some of the main issues included a universal off-by-one indexing error brought about by poor attention to the software being used, an inaccurate use of secndary source data (their data labels seemed to be flipped from the published data labels), the use of duplicate data, and very poor documentation in general. The review panel even said that they could not figure out from the published data how to reproducce the work without some sort of outside help. A number of best practices enumarted by DataOne were broken, including a failure to maintain dataset provenance, a lack of documentation of all assumptions, a lack of any form of repdocuible workflow documentation, and very poor labeling techniques. Dr. Baggerly claimed that several of these were common mistakes, most prominently the off-by-one indexing error and the mixing-up of labels. Dr. Baggerly also pointed out how it is the poor documentation that often leads to these easy mistakes going undetected.
- Dr. Baggerly recommend more thorough documentation of the data, namely through labels for all published data. He also recommends a stricter requirement for data provenance and for code to be published along with the data. Overall, Dr. Baggerly stresses the need for the research to be reproducible. The corresponds very closely with what DataOne recommends, as several of the best practices (as outlined above) are essential for properly documenting data and data analysis and ensuring that someone else can perform the exact same steps on the data in the future using just the documentation.
- In this weeks assignment were performed a number of best practices. We ensured to provide distinct labels for all of our data points, we labeled our data files as descriptive names, we appropriately handled missing data, and we kept documentation on how we performed our data analysis so that it could be reproduced in the future by somebody else.
- I was very surprised at all of the pushback that Dr. Baggerly received from the scientific journals when he shared the errors in the data. I would have thought that scientific journals would have been much more committed to ensuring that the papers that they had published were accurate and would have been more helpful to Dr. Baggerly and tough on the Duke research team. I think going forward a higher sense of accountability needs to be adopted in the scientific field to avoid scenarios like this.
Qlanners (talk) 17:28, 22 October 2017 (PDT)
QLanners Links
Main Page
User Page
Assignment Pages: Week 1 | Week 2 | Week 3 | Week 4 | Week 5 | Week 6 | Week 7 | Week 8 | Week 9 | Week 10 | Week 11 | Week 12 | Week 14 | Week 15
Journal Entry Pages: Week 1 | Week 2 | Week 3 | Week 4 | Week 5 | Week 6 | Week 7 | Week 8 | Week 9 | Week 10 | Week 11 | Week 12 | Week 14 | Week 15
Shared Journal Pages: Week 1 | Week 2 | Week 3 | Week 4 | Week 5 | Week 6 | Week 7 | Week 8 | Week 9 | Week 10
Group Project Page: JASPAR the Friendly Ghost