Difference between revisions of "Class Journal Week 7"

From LMU BioDB 2019
Jump to navigation Jump to search
(DeLisa Madere's Response: added more answers to questions)
(DeLisa Madere's Response: linked to page)
Line 56: Line 56:
 
#Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
 
#Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
  
==DeLisa Madere's Response==
+
==[[User:Dmadere|DeLisa Madere's Response]]==
 
#What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
 
#What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
 
#*The main issues with the data and analysis that Baggerly and Coombes found was that the samples were inaccurate. The scientists scrambled the gene labels so bad that there is a large uncertainty as to which samples they belong to, which unfortunately happened to be the samples that were incorrect for the drugs that the scientists used in their clinical trials for 2 years. The best practices that were violated include inconsistencies of the data. In the experiment, there were many errors in the mislabeling of the genes, leaving them with inconsistencies in their gene titles. They also had missing data in which they still included in the research, whereas, if there is missing data, there should be no entry at all to indicate that instead of making up some kind of data that can be harmful. Dr. Baggerly claimed that the common issues occurred within the labeling of the actual gene, creating inaccurate results in the data itself.
 
#*The main issues with the data and analysis that Baggerly and Coombes found was that the samples were inaccurate. The scientists scrambled the gene labels so bad that there is a large uncertainty as to which samples they belong to, which unfortunately happened to be the samples that were incorrect for the drugs that the scientists used in their clinical trials for 2 years. The best practices that were violated include inconsistencies of the data. In the experiment, there were many errors in the mislabeling of the genes, leaving them with inconsistencies in their gene titles. They also had missing data in which they still included in the research, whereas, if there is missing data, there should be no entry at all to indicate that instead of making up some kind of data that can be harmful. Dr. Baggerly claimed that the common issues occurred within the labeling of the actual gene, creating inaccurate results in the data itself.

Revision as of 23:00, 14 October 2019

Contents

Mihir Samdarshi's Response

What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated?

Which of these did Dr. Baggerly claim were common issues?

What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?

What best practices did you perform for this week's assignment?

Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

Iliana Crespin's Responses

  • What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
  • What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
  • What best practices did you perform for this week's assignment?
  • Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

Naomi Tesfaiohannes's Responses

  • What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?

Baggerly states that our intuition of what makes sense is poor. Some documentations of research are written poorly, making it more difficult to reproduce the same method. Genes were being referenced even though they were not present. They are outliers. There was an offset of the P-Values by one. This was likely done because of the software they used which required two input files. Quantification matrix and gene names. The second input cannot have a hetero sample. There was likely a swapping of data in the software, meaning that medication is given to patients that don't need it. Poor clinical practice is a big issue in this case. Some samples were reused and sometimes labeled resistance and other times not labeled resistance. Of the 95 samples 15 were duplicated and 6 were inconsistent to each other. When matching the samples not all lined up and 16 did not match at all. Common mistakes are missing up labels, gene labels, group labels, and incomplete documentations.

  • What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?

Data having labeled columns, provenance, provide code, a description of non-scriptable steps, and a description of planned design. DataONE states to have consistent columns of data, consistent names, codes, and formats. DataONE also suggests to have data all in one table. With missing data leave a field empty or use a distinct value such as 9999 to indicate a missing value.

  • What best practices did you perform for this week's assignment?


  • Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

Poor clinical practicing cost the lives of many hopeful patients in stage 4 cancer. They knew their options were slim and put their trust in the clinical trial. The samples were duplicated and inconsistent multiple times. When trying to match the samples 16 did not math at all and 43 were mislabeled. These errors caused an incorrect validation dataset for clinical trials that were being used for 2 years.

Aby Mesfin's Response

What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?

One of the main issues Baggerly and Coombs found was that the sensitive and resistant labels in the Quantification Matrix were intentionally reversed by Potti and his colleagues in order to produce more favorable data. Rather than interpreting 0 as "resistant" and 1 as "sensitive", Potti's team switched how they interpreted the input files. Dr. Baggerly notes that the prominent violation of the best practices in regards to data integration performed by Potti was his inability to maintain the provenance of his data. The results of his research were very much skewed not only due to mislabeling of the quantification matrix but also because it contained duplicates.

What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?

Dr. Baggerly recommends that researchers include data, code, descriptions of nonscriptable steps, descriptions of the planned design, and maintain provenance. He also recommends that reproducible research report structure, executive summaries, and reuse templates. Some of these practices parallel those recommended by DataONE, such as maintaining provenance.

What best practices did you perform for this week's assignment?

Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

I appreciated how this video broke down the bioinformatics and data sharing that went behind discovering the fraudulence of Potti's research. It helped me better understand the mechanisms that went into this discovery while also underlining the value of reproducibility in research.

Christina Dominguez's Response

1.What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?

2.What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?

3.What best practices did you perform for this week's assignment?

4.Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

Ivy Macaraeg's Response

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
  3. What best practices did you perform for this week's assignment?
  4. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

DeLisa Madere's Response

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    • The main issues with the data and analysis that Baggerly and Coombes found was that the samples were inaccurate. The scientists scrambled the gene labels so bad that there is a large uncertainty as to which samples they belong to, which unfortunately happened to be the samples that were incorrect for the drugs that the scientists used in their clinical trials for 2 years. The best practices that were violated include inconsistencies of the data. In the experiment, there were many errors in the mislabeling of the genes, leaving them with inconsistencies in their gene titles. They also had missing data in which they still included in the research, whereas, if there is missing data, there should be no entry at all to indicate that instead of making up some kind of data that can be harmful. Dr. Baggerly claimed that the common issues occurred within the labeling of the actual gene, creating inaccurate results in the data itself.
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • Dr. Baggerly recommends that for reproducible research, the scientists should provide data along with identifiers, provenance, code descriptions of nonscriptable steps, and descriptions of the planned design. These correspond to DataONE's ideas because they recommend that the data is consistent with the usage of titles to properly label each column in a spreadsheet for example and they recommend descriptions and data that are literate.
  3. What best practices did you perform for this week's assignment?
  4. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • After hearing this talk, it seemed more so that the data presented was not complete at all and implied that the scientists behind the scandal knew this and wanted to give 50% of their effort into this experiment. The fact that the common issues with the data had to do with the labeling revealed that they were not being careful about even the most simple things including the labels.