Difference between revisions of "Class Journal Week 9"

From LMU BioDB 2024
Jump to navigation Jump to search
(Katie Miller: answering reflection questions)
(question 1)
Line 38: Line 38:
  
 
*What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
 
*What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
 +
 +
The main problems found by Baggerly and Coombs were mistakes in data handling and a lack of transparency. These issues didn't follow the best practices, such as making research reproducible and ensuring quality of data.
  
 
*What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
 
*What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?

Revision as of 22:22, 20 March 2024

Katie Miller

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    • While there were many issues identified in this talk, one of the main issues was that when the original 7 gene lists were compared to Baggerly and Coombs lists, the original data was off by one. Because the dataset had an indexing error, they were referencing a set of genes that was not involved. This issue was likely because the software they were using required 2 files, one of which had a header and one that did not. When entering the second file, they did not consider the header row, causing all of the data to be off by one. Another issue was that while 6 out of the 7 heat maps from the original data matched Baggerly and Coombs, only 3 out of the 7 gene lists matched. The prediction software that they were using produced the heat maps, causing there to be a discrepancy between the maps matching and the actual lists matching. Another issue was that in their graphs depicting resistant and sensitive genes, there was a mix up of the labels for what was actually resistant or sensitive and the values were swapped. So, if they were to go to clinical trials, they would be using this incorrect information to administer the medication to only the people who would not benefit from it. Another issue was that they reused test samples and reported it as multiple samples, and even when the same samples were used they were not always labeled consistently as resistant or sensitive. Another issue was that when new data was published in the midst of clinical trials, Baggerly and Coombs found that of the 59 samples, 43 were mislabeled, and 16 had gene labels that were so scrambled they could not understand what they were referring to. One more issue is that when a confidential document regarding the research was made public, it was revealed that its own review committee could not identify the methods that were used in the research and could not sufficiently replicate the data.

So, several best practices were violated. The data formatting is not consistent, with there being the obvious indexing error of the gene lists. The data names and labels are often wrong, and the same samples are reused with no indication that the data is coming from the same source. And, the workflow is not reproducible and the same results they found could not be obtained by Baggerly and Coombs.

The common issues were the inconsistencies concerning data organization and data labeling, as sample and gene labels were often mixed up.

  1. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends? For papers, Dr. Baggerly looks for:
    • Data
    • Provenance
    • Code
    • Descriptions of Nonscriptable Steps
    • Descriptions of Planned Design, if Used

He says that these should be absolute requirements for papers before clinical trials. He also says that all his research is now written in Sweave, which combines R code in LaTex. By written in Sweave, an independent person can run the data through R and get the same numbers, ensuring reproducible research. DataOne corresponds with Dr. Baggerly in that data should be clearly labeled and one must maintain dataset provenance. DataOne also says to use a reproducible workflow, so have descriptions of steps and planned design. DataOne also recommends using R for datasets, as it can be used to check and assure quality data.

  1. What best practices did you perform for this week's assignment?

We made sure that data was properly copied over and formatted correctly before beginning our analysis. We also ensured that the columns of data had clear labels and correct equations were used to analyze the data. All of our data was in one table, instead of several small tables. We used the data entry tool Excel, which prevents the entry of errors.

  1. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

I'm very surprised that even after all their work and documentation, Baggerly and Coombs work was initially dismissed. I cannot believe that it took so long for anything to happen after these issues were brought up, especially because they were administering in clinical trials and could be bringing potential harm or giving false promises to those who needed help.

Andrew Sandler

  1. There were so many issues with the data and analysis. From not sharing the data, to misrepresentation and faulty entry. This is a simplified answer. There was truly an astounding amount of issues.
  2. Dr. Baggerly recommends a few things. He recommends that Data is not just mentioned but actually shared and linked. He recommends that columns are labeled and tell which samples are used and give the code out.He recommends that there are descriptions of steps that aren't scriptable. He also recommends that they give descriptions of their planned design for experiments if they use them.
  3. This week I performed best practices of writing down every step I took into my electronic notebook. I was actually talking with my psychologist and telling him how annoying I find having to write down all the steps since it took me out of focus on the task. He was the head of psychology at Yale for a while and explained to me why it is so important, so this week I was on top of it.
  4. I sent the video to my cousin and dad who always make ridiculous non-scientific claims and find sketchy online science articles to back up their bad decisions.

Asandle1 (talk) 18:56, 20 March 2024 (PDT)



To User Page: User: Asandle1 To Template: Template:Asandle1

Assignment Pages

week 1

week 2

week 3

week 4

week 5

week 6

week 7

week 8

week 9

week 10

week 11

week 12

week 13

week 14

week 15

Journals

Individual

User:Asandle1

Asandle1 Week 2

SIR2 Week 3

Monarch Initiative Week 4

Asandle1 Week 5

Asandle1 Week 6

Asandle1 Week 8

Asandle1 Week 9

Asandle1 Week 10

Asandle1 Week 12

Asandle1 Week 13

Asandle1 Week 14

Asandle1 Week 15

Class Journals

Class Journal Week 1

Class Journal Week 2

Class Journal Week 3

Class Journal Week 4

Class Journal Week 5

Class Journal Week 6

Class Journal Week 8

Class Journal Week 9

Class Journal Week 10

Class Journal Week 12


Charlotte Kaplan

  • What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?

The main problems found by Baggerly and Coombs were mistakes in data handling and a lack of transparency. These issues didn't follow the best practices, such as making research reproducible and ensuring quality of data.

  • What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
  • What best practices did you perform for this week's assignment?
  • Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

Assignment Pages

Individual Journal Entry Pages

Shared Journal Entry Pages