Difference between revisions of "Class Journal Week 8"

From LMU BioDB 2017
Jump to: navigation, search
(Added my section for the response)
(edited QLanners response)
Line 11: Line 11:
 
==QLanners Responses==
 
==QLanners Responses==
 
#There were a number of issues with the data and analysis identified by Baggerly and Coombs. Some of the main issues included a universal off-by-one indexing error brought about by poor attention to the software being used, an inaccurate use of secndary source data (their data labels seemed to be flipped from the published data labels), the use of duplicate data, and very poor documentation in general. The review panel even said that they could not figure out from the published data how to reproducce the work without some sort of outside help. A number of best practices enumarted by DataOne were broken, including a failure to maintain dataset provenance, a lack of documentation of all assumptions, a lack of any form of repdocuible workflow documentation, and very poor labeling techniques. Dr. Baggerly claimed that several of these were common mistakes, most prominently the off-by-one indexing error and the mixing-up of labels. Dr. Baggerly also pointed out how it is the poor documentation that often leads to these easy mistakes going undetected.
 
#There were a number of issues with the data and analysis identified by Baggerly and Coombs. Some of the main issues included a universal off-by-one indexing error brought about by poor attention to the software being used, an inaccurate use of secndary source data (their data labels seemed to be flipped from the published data labels), the use of duplicate data, and very poor documentation in general. The review panel even said that they could not figure out from the published data how to reproducce the work without some sort of outside help. A number of best practices enumarted by DataOne were broken, including a failure to maintain dataset provenance, a lack of documentation of all assumptions, a lack of any form of repdocuible workflow documentation, and very poor labeling techniques. Dr. Baggerly claimed that several of these were common mistakes, most prominently the off-by-one indexing error and the mixing-up of labels. Dr. Baggerly also pointed out how it is the poor documentation that often leads to these easy mistakes going undetected.
#Dr. Baggerly recommend more thorough documentation of the data, namely through labels for all published data. He also recommends a stricter requirement for data provenance and for code to be published along with the data. Overall, Dr. Baggerly stresses the need for the research to be reproducible. The corresponds very closely with what DataOne recommends, as several of the best practices (as outlined above) are essential for properly documenting data and data analysis and ensuring that someone else can perform the exact same steps on the data in the future using just the documentation.
+
#Dr. Baggerly recommends more thorough documentation of the data, namely through labels for all published data. He also recommends a stricter requirement for data provenance and for code to be published along with the data. Overall, Dr. Baggerly stresses the need for the research to be reproducible. The corresponds very closely with what DataOne recommends, as several of the best practices (as outlined above) are essential for properly documenting data and data analysis and ensuring that someone else can perform the exact same steps on the data in the future using just the documentation.
#In this weeks assignment were performed a number of best practices. We ensured to provide distinct labels for all of our data points, we labeled our data files as descriptive names, we appropriately handled missing data, and we kept documentation on how we performed our data analysis so that it could be reproduced in the future by somebody else.
+
#In this weeks assignment we performed a number of best practices. We ensured to provide distinct labels for all of our data points, we labeled our data files as descriptive names, we appropriately handled missing data, and we kept documentation on how we performed our data analysis so that it could be reproduced in the future by somebody else.
 
#I was very surprised at all of the pushback that Dr. Baggerly received from the scientific journals when he shared the errors in the data. I would have thought that scientific journals would have been much more committed to ensuring that the papers that they had published were accurate and would have been more helpful to Dr. Baggerly and tough on the Duke research team. I think going forward a higher sense of accountability needs to be adopted in the scientific field to avoid scenarios like this. <br>
 
#I was very surprised at all of the pushback that Dr. Baggerly received from the scientific journals when he shared the errors in the data. I would have thought that scientific journals would have been much more committed to ensuring that the papers that they had published were accurate and would have been more helpful to Dr. Baggerly and tough on the Duke research team. I think going forward a higher sense of accountability needs to be adopted in the scientific field to avoid scenarios like this. <br>
 
[[User:Qlanners|Qlanners]] ([[User talk:Qlanners|talk]]) 17:28, 22 October 2017 (PDT) <br>
 
[[User:Qlanners|Qlanners]] ([[User talk:Qlanners|talk]]) 17:28, 22 October 2017 (PDT) <br>

Revision as of 05:04, 24 October 2017

Zachary Van Ysseldyk's Responses

  1. The main issues with the data and analysis identified was that the data was not reproducible - actually, that the data was very very far from reproducible. They reused the same genes in statistical analysis, their indexing was off by one, and some of their verification tables (namely the 59 gene ovarian cancer model) matched 0% of what the line was supposed to be. Based on his overall observations, he says the most common mistakes are simple. He says how this simplicity is often hidden, and furthers to say that the most simple mistakes are common. Specifically, he found the most common mistakes concerning: Mixing up sample labels, Mixing up the gene labels, Mixing up the group labels, and incomplete documentation. He notes how the MOST common mistake is the complete confounding in the Experimental design. Many of these points Dr. Baggerly expresses have been brought up when looking at DataOne. For one, not all of the labels are clear. Furthermore, the workflow is not easily reproducible.
  2. Baggerly first suggests that the data should be labeled in order to clearly be able to tell which data is which.The biggest thing that he expresses, of course, is the reproducibility of the workflow. All of the suggestions Beggarly expresses basically points towards having the data be reproducible. DataOne also stresses and strongly advocated the essential practice for proper data documentation.
  3. The main best practice that we performed was the reproducibility of the data as outlined on the individual work page. We were able to cater the instructions to our specific gene so that the analyses could be easily reproduced. We also made sure that all of the genes were labeled. Putting in summaries and electronic workbooks helps the user to have an overview of the project which enables them to have a clear objective going into the project.
  4. It seemed like the press and organizations didn't take him that seriously at first. Beggarly didn't seem to upset about it during his lecture, but I would be a little angry having gone through that much work just to have been brushed off. Although I am not looking to go into a biology rated workplace, his enthusiasm about the subject was inspiring. Also I liked his quirkiness.


Zvanysse (talk) 19:40, 23 October 2017 (PDT)
Zvanysse

BIOL/CMSI 367-01: Biological Databases Fall 2017

Assignments

Week 1 | Week 2 | Week 3 | Week 4 | Week 5 | Week 6 | Week 7 | Week 8 | Week 9 | Week 10 | Week 11 | Week 12 | Week 14

Individual Assignments

Zvanysse Week 1 | Zvanysse Week 2 | Zvanysse Week 3 | Zvanysse Week 4 | Zvanysse Week 5 | Zvanysse Week 6 | Zvanysse Week 7 | Zvanysse Week 8 | Zvanysse Week 9 | Zvanysse Week 10 | Zvanysse Week 11 | Zvanysse Week 12 | Zvanysse Week 14 | Zvanysse Week 15

Shared Journals

Zvanysse Week 1 Journal | Zvanysse Week 2 Journal | Zvanysse Week 3 Journal | Zvanysse Week 4 Journal | Zvanysse Week 5 Journal | Zvanysse Week 6 Journal | Zvanysse Week 7 Journal | Zvanysse Week 8 Journal | Zvanysse Week 9 Journal | Zvanysse Week 10 Journal | Zvanysse Week 11 Journal | Zvanysse Week 12 Journal | Zvanysse Week 14 Journal

QLanners Responses

  1. There were a number of issues with the data and analysis identified by Baggerly and Coombs. Some of the main issues included a universal off-by-one indexing error brought about by poor attention to the software being used, an inaccurate use of secndary source data (their data labels seemed to be flipped from the published data labels), the use of duplicate data, and very poor documentation in general. The review panel even said that they could not figure out from the published data how to reproducce the work without some sort of outside help. A number of best practices enumarted by DataOne were broken, including a failure to maintain dataset provenance, a lack of documentation of all assumptions, a lack of any form of repdocuible workflow documentation, and very poor labeling techniques. Dr. Baggerly claimed that several of these were common mistakes, most prominently the off-by-one indexing error and the mixing-up of labels. Dr. Baggerly also pointed out how it is the poor documentation that often leads to these easy mistakes going undetected.
  2. Dr. Baggerly recommends more thorough documentation of the data, namely through labels for all published data. He also recommends a stricter requirement for data provenance and for code to be published along with the data. Overall, Dr. Baggerly stresses the need for the research to be reproducible. The corresponds very closely with what DataOne recommends, as several of the best practices (as outlined above) are essential for properly documenting data and data analysis and ensuring that someone else can perform the exact same steps on the data in the future using just the documentation.
  3. In this weeks assignment we performed a number of best practices. We ensured to provide distinct labels for all of our data points, we labeled our data files as descriptive names, we appropriately handled missing data, and we kept documentation on how we performed our data analysis so that it could be reproduced in the future by somebody else.
  4. I was very surprised at all of the pushback that Dr. Baggerly received from the scientific journals when he shared the errors in the data. I would have thought that scientific journals would have been much more committed to ensuring that the papers that they had published were accurate and would have been more helpful to Dr. Baggerly and tough on the Duke research team. I think going forward a higher sense of accountability needs to be adopted in the scientific field to avoid scenarios like this.

Qlanners (talk) 17:28, 22 October 2017 (PDT)
QLanners Links
Main Page
User Page
Assignment Pages: Week 1 | Week 2 | Week 3 | Week 4 | Week 5 | Week 6 | Week 7 | Week 8 | Week 9 | Week 10 | Week 11 | Week 12 | Week 14 | Week 15
Journal Entry Pages: Week 1 | Week 2 | Week 3 | Week 4 | Week 5 | Week 6 | Week 7 | Week 8 | Week 9 | Week 10 | Week 11 | Week 12 | Week 14 | Week 15
Shared Journal Pages: Week 1 | Week 2 | Week 3 | Week 4 | Week 5 | Week 6 | Week 7 | Week 8 | Week 9 | Week 10
Group Project Page: JASPAR the Friendly Ghost


Mary Balducci's Responses

  1. The main issue with the data and analysis was that the results were not reproducible. There were errors with mislabeling data, as well as indexing errors. The data did not make sense, and their methods were not clear without outside help. Simple were also easy to miss due to poor documentation and recording of data. Dr. Baggerly claims that the most common mistakes are mixing up sample labels, mixing up gene labels, mixing up group labels, and incomplete documentation.
  2. Dr. Baggerly recommends labelling table columns, having a code, describing steps, especially steps which are planned in advance. These are very similar to DataOne's recommendations of labelling and having good documentations.
  3. Best practices that I performed for this week were labelling my data points with headers that explain exactly what the data point is. I also kept a detailed outline of every step I took to get my results.
  4. My reaction to this case after viewing the video is that I'm still shocked that this went on for so long. It seems like it was very obvious that the data was not reliable and yet it was allowed to go as far as clinical trials.

Mbalducc (talk) 20:44, 22 October 2017 (PDT)

Eddie Azinge's Responses

  1. The most prevalent issue with the data was that Baggerly and Coombs weren't able to reproduce the data themselves, given a plethora of issues from the original data and analysis. Simple errors such as off-by-one errors, indexing errors, use of duplicate data, poor documentation, as well as mixing up sample and gene labels all aggregated to create a dataset that was irreproducible by reasonable methods. This was augmented by the fact that the lab was not always following best practices, specifically those set forth by DataOne.
  2. Dr Baggerly recommends a more rigorous and strict adherence to following proper protocol, such as following conventions for labeling data, heavy documentation of processes, and most importantly having a reproducible workflow. DataOne echoes most of these points, specifically emphasizing reproducibility of experiments and proper documentation.
  3. This week, we adhered to best practices by documenting our process as we followed the assignment, practicing proper labeling conventions, consistently dealing with missing data, and ensuring that our results were reproducible by other students.
  4. Learning more about this case makes me understand just how vast this field of biology is. This whole fiasco at Duke, if not properly taken care of, potentially stood to earn people a vast amount of money off of illegitimate practices and false hopes. It really emphasizes how important sticking to best practices is in order to prevent our analyses from causing harm to the society at large.

Cazinge (talk) 20:13, 23 October 2017 (PDT)

Katie Wright's Response

  1. Baggerly and Coombs were not able to reproduce the data they analyzed after pouring over the data and using every available means of "Forensic Bioinformatics." And this was not because of one specific error, but because of a multitude of errors. Often times, the data was not labeled correctly or software was used incorrectly that lead to the mislabeling of data (the +1 problem). The best practices violated were consistency in data labeling and documentation. Mislabeling and misdocumentation are some of the most common errors in data analysis, and wouldn't be such an enormous problem if they were just done properly in the first place.
  2. Dr. Baggerly and DataONE both reccommend creating "reproducible workflow." Your process and reasoning should be transparent and understandable so it can be evaluated/critiqued by others. Processes should also be automated wherever possible.
  3. For this week we
    • Documented entire procedure in minute detail (thanks to procedure provided by professors in week 8 assignment)
    • formatted Excel spreadsheet with no spaces between rows or columns, and new worksheets were created often to provide a step-by-step look at how the dataset was analyzed/manipulated.
  4. I think this talk just made me more angry about the whole fiasco. There were multiple "disturbing" errors (as Dr. Baggerly called them) that were pointed out to journals time after time. It took so long for the scientific community to listen to the biostatisticians and take an in-depth look at the data. I think that Dr. Baggerly makes a very good suggestion when he says that every institution should have their own biostatisticians independently review and reproduce the data analysis for every experiment before it is published.

Kwrigh35 (talk) 14:28, 23 October 2017 (PDT)

Corinne Wong's Response

  1. There was inconsistent data, and the research was not reproducible. Baggerly and Coombs frequently found errors, discrepancies, or missing information in the data that were never fully corrected. DataONE’s best practices that were violated were inconsistent and missing data. Some of the issues that were common were standard input errors: mixing up the sample labels, gene labels, and group labels.
  2. Dr. Baggerly recommends to provide the data, code, and have clear labels, which relates to how DataOne says to have accessible and organized data. His recommendations of clear documentation of corrections, assumptions, and errors also correspond to DataONE’s recommendations.
  3. The best practices that we performed for this week’s assignment were consistent and organized data entry, and accessible and reproducible research. We had clear labels for our datasets, and they are on accessible Excel spreadsheets with clear and detailed steps.
  4. I still can’t believe how long it took for them to finally pull their research after all of the red flags that Baggerly and Coombs found. After finding the report of so many errors, you would think the scientific community would look into them, especially when the responses from Potti and Nevins were not clear and did not provide documentation.

Cwong34 (talk) 20:35, 23 October 2017 (PDT)

cwong34

BIOL/CMSI 367-01: Biological Databases Fall 2017

Assignments

Journal Entries:

Shared Journals:

Group Project

Emma Tyrnauer's Responses

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
  3. What best practices did you perform for this week's assignment?
  4. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

Emmatyrnauer (talk) 20:48, 23 October 2017 (PDT)

Links

  1. My User Page
  2. List of Assignments
  3. List of Journal Entries
  4. List of Shared Journal Entries


Blair Hamilton's Responses

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
  3. What best practices did you perform for this week's assignment?
  4. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?


Category Links
User Page Blair Hamilton
Weekly Assignments Bhamilton18 Week 2Bhamilton18 Week 3Bhamilton18 Week 4Animal QTLBhamilton18 Week 6Bhamilton18 Week 7Bhamilton18 Week 8Bhamilton18 Week 9Bhamilton18 Week 10Bhamilton18 Week 11Bhamilton18 Week 12Bhamilton18 Week 14Bhamilton18 Week 15
Weekly Assignment
Instructions
Week 1Week 2Week 3Week 4Week 5Week 6Week 7Week 8Week 9Week 10Week 11Week 12Week 14Week 15
Class Journals Class Journal Week 1Class Journal Week 2Class Journal Week 3Class Journal Week 4Class Journal Week 5Class Journal Week 6Class Journal Week 7Class Journal Week 8Class Journal Week 9Class Journal Week 10
Final Project Lights, Camera, InterACTION!Lights, Camera, InterACTION! Deliverables