Difference between revisions of "Class Journal Week 9"

From LMU BioDB 2024
Jump to navigation Jump to search
(question 3)
(question 4)
Line 53: Line 53:
  
 
*Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
 
*Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
 +
Dr. Baggerly's talk further showed me why data quality control is important. This talk definitely helped me realize the importance of meticulously checking my data for mistakes to ensure reproducibility in the future.
  
 
{{template:ckaplan}}
 
{{template:ckaplan}}
  
 
==Dean Symonds==
 
==Dean Symonds==

Revision as of 22:34, 20 March 2024

Katie Miller

What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?

  • The main issues identified in this talk were typically regarding data organization and labeling. One example is that when the original 7 gene lists were compared to Baggerly and Coombs lists, the original data was off by one. Because the dataset had an indexing error, they were referencing a set of genes that was not involved. This issue was likely because the software they were using required 2 files, one of which had a header and one that did not. When entering the second file, they did not consider the header row, causing all of the data to be off by one. Another issue was that while 6 out of the 7 heat maps from the original data matched Baggerly and Coombs, only 3 out of the 7 gene lists matched. The prediction software that they were using produced the heat maps, causing there to be a discrepancy between the maps matching and the actual lists matching. Another issue was that in their graphs depicting resistant and sensitive genes, there was a mix up of the labels for what was actually resistant or sensitive and the values were swapped. So, if they were to go to clinical trials, they would be using this incorrect information to administer the medication to only the people who would not benefit from it. Another issue was that they reused test samples and reported it as multiple samples, and even when the same samples were used they were not always labeled consistently as resistant or sensitive. Another issue was that when new data was published in the midst of clinical trials, Baggerly and Coombs found that of the 59 samples, 43 were mislabeled, and 16 had gene labels that were so scrambled they could not understand what they were referring to. One more issue is that when a confidential document regarding the research was made public, it was revealed that its own review committee could not identify the methods that were used in the research and could not sufficiently replicate the data.
  • Several best practices were violated. The data formatting is not consistent, with there being the obvious indexing error of the gene lists. The data names and labels are often wrong, and the same samples are reused with no indication that the data is coming from the same source. And, the workflow is not reproducible and the same results they found could not be obtained by Baggerly and Coombs.
  • The common issues were the inconsistencies concerning data organization and data labeling, as sample and gene labels were often mixed up.

What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?

  • For papers, Dr. Baggerly looks for:
    • Data
    • Provenance
    • Code
    • Descriptions of Nonscriptable Steps
    • Descriptions of Planned Design, if Used
  • He says that these should be absolute requirements for papers before clinical trials. He also says that all his research is now written in Sweave, which combines R code in LaTex. By writing in Sweave, an independent person can run the data through R and get the same numbers, ensuring reproducible research. DataOne corresponds with Dr. Baggerly in that data should be clearly labeled and one must maintain dataset provenance. DataOne also says to use a reproducible workflow, so have descriptions of steps and planned design. DataOne also recommends using R for datasets, as it can be used to check and assure quality data.

What best practices did you perform for this week's assignment?

  • We made sure that data was properly copied over and formatted correctly before beginning our analysis. We also ensured that the columns of data had clear labels and correct equations were used to analyze the data. All of our data was in one table, instead of several small tables. We used the data entry tool Excel, which prevents the entry of errors.

Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

  • I'm very surprised that even after all their work and documentation, Baggerly and Coombs work was initially dismissed. I cannot believe that it took so long for anything to happen after these issues were brought up, especially because they were administering in clinical trials and could be bringing potential harm or giving false promises to those who needed help.

Kmill104 (talk) 23:27, 20 March 2024 (PDT)

Andrew Sandler

  1. There were so many issues with the data and analysis. From not sharing the data, to misrepresentation and faulty entry. This is a simplified answer. There was truly an astounding amount of issues.
  2. Dr. Baggerly recommends a few things. He recommends that Data is not just mentioned but actually shared and linked. He recommends that columns are labeled and tell which samples are used and give the code out.He recommends that there are descriptions of steps that aren't scriptable. He also recommends that they give descriptions of their planned design for experiments if they use them.
  3. This week I performed best practices of writing down every step I took into my electronic notebook. I was actually talking with my psychologist and telling him how annoying I find having to write down all the steps since it took me out of focus on the task. He was the head of psychology at Yale for a while and explained to me why it is so important, so this week I was on top of it.
  4. I sent the video to my cousin and dad who always make ridiculous non-scientific claims and find sketchy online science articles to back up their bad decisions.

Asandle1 (talk) 18:56, 20 March 2024 (PDT)



To User Page: User: Asandle1 To Template: Template:Asandle1

Assignment Pages

week 1

week 2

week 3

week 4

week 5

week 6

week 7

week 8

week 9

week 10

week 11

week 12

week 13

week 14

week 15

Journals

Individual

User:Asandle1

Asandle1 Week 2

SIR2 Week 3

Monarch Initiative Week 4

Asandle1 Week 5

Asandle1 Week 6

Asandle1 Week 8

Asandle1 Week 9

Asandle1 Week 10

Asandle1 Week 12

Asandle1 Week 13

Asandle1 Week 14

Asandle1 Week 15

Class Journals

Class Journal Week 1

Class Journal Week 2

Class Journal Week 3

Class Journal Week 4

Class Journal Week 5

Class Journal Week 6

Class Journal Week 8

Class Journal Week 9

Class Journal Week 10

Class Journal Week 12


Charlotte Kaplan

  • What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?

The main problems found by Baggerly and Coombs were mistakes in data handling and a lack of transparency. These issues didn't follow the best practices, such as making research reproducible and ensuring quality of data.

  • What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?

Dr. Baggerly recommends documentation and sharing of data, as well as peer review to create reproducible data and research, as these aligns with Data One's values of quality assurance.

  • What best practices did you perform for this week's assignment?

Andrew and I consistently checked our data for quality control, and if we noticed something looked off, we brought it up to each other or asked Dr. Dahlquist for help.

  • Do you have any further reaction to this case after viewing Dr. Baggerly's talk?

Dr. Baggerly's talk further showed me why data quality control is important. This talk definitely helped me realize the importance of meticulously checking my data for mistakes to ensure reproducibility in the future.

Assignment Pages

Individual Journal Entry Pages

Shared Journal Entry Pages

Dean Symonds