Difference between revisions of "Class Journal Week 9"
(→Katie Miller: fixing formatting) |
(→Dean Symonds: added my answers and signature) |
||
(6 intermediate revisions by 3 users not shown) | |||
Line 23: | Line 23: | ||
Do you have any further reaction to this case after viewing Dr. Baggerly's talk? | Do you have any further reaction to this case after viewing Dr. Baggerly's talk? | ||
*I'm very surprised that even after all their work and documentation, Baggerly and Coombs work was initially dismissed. I cannot believe that it took so long for anything to happen after these issues were brought up, especially because they were administering in clinical trials and could be bringing potential harm or giving false promises to those who needed help. | *I'm very surprised that even after all their work and documentation, Baggerly and Coombs work was initially dismissed. I cannot believe that it took so long for anything to happen after these issues were brought up, especially because they were administering in clinical trials and could be bringing potential harm or giving false promises to those who needed help. | ||
+ | |||
+ | [[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 23:27, 20 March 2024 (PDT) | ||
=Andrew Sandler= | =Andrew Sandler= | ||
Line 45: | Line 47: | ||
*What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends? | *What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends? | ||
+ | Dr. Baggerly recommends documentation and sharing of data, as well as peer review to create reproducible data and research, as these aligns with Data One's values of quality assurance. | ||
*What best practices did you perform for this week's assignment? | *What best practices did you perform for this week's assignment? | ||
+ | Andrew and I consistently checked our data for quality control, and if we noticed something looked off, we brought it up to each other or asked Dr. Dahlquist for help. | ||
*Do you have any further reaction to this case after viewing Dr. Baggerly's talk? | *Do you have any further reaction to this case after viewing Dr. Baggerly's talk? | ||
+ | Dr. Baggerly's talk further showed me why data quality control is important. This talk definitely helped me realize the importance of meticulously checking my data for mistakes to ensure reproducibility in the future. | ||
{{template:ckaplan}} | {{template:ckaplan}} | ||
+ | |||
+ | ==Dean Symonds== | ||
+ | #In this talk, several key issues emerged. One of which was the discrepancies that arose when comparing the original 7 gene lists with those of Baggerly and Coombs, due to an indexing error from the software. Additionally, the graphs mislabeled resistant and sensitive genes, which could potentially impact clinical trials if those were carried out. Test sample reuse and inconsistent labeling further muddled results. Lastly, a confidential document exposed the inability to replicate the research due to unclear methods. | ||
+ | #Dr. Baggerly gives several recommendations. such as data being actively shared and linked, labeling columns that indicate sample usage and sharing the associated code, as well as providing step descriptions that cannot be scripted and outlining experimental design plans if utilized. | ||
+ | #A good practice that we implemented this week was having 2 people working on each strain so we could double check with each other. We had a reproducible procedure and we reproduced it simultaneously. Having two people working on the same dataset helped ensure that each of us were doing it correctly. Another practice that we had as part of our procedure was having our professor check our tables when we were about halfway through with them to ensure that all of us were on the right track before we started running more serious tests. | ||
+ | #Everytime I hear a case as such with massive errors found in people's work, I always find it hard to believe at first that working professionals also have such errors happen to them, even though they are not students such as myself who make errors all the time. I have to sometimes remind myself that all of these fields are still run by people and cannot be thought of as exact science, and I have to keep that in mind every time I do more research or learn more about this field that it can contain errors. | ||
+ | [[User:Msymond1|Msymond1]] ([[User talk:Msymond1|talk]]) 23:45, 20 March 2024 (PDT) |
Latest revision as of 22:45, 20 March 2024
Contents
Katie Miller
What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- The main issues identified in this talk were typically regarding data organization and labeling. One example is that when the original 7 gene lists were compared to Baggerly and Coombs lists, the original data was off by one. Because the dataset had an indexing error, they were referencing a set of genes that was not involved. This issue was likely because the software they were using required 2 files, one of which had a header and one that did not. When entering the second file, they did not consider the header row, causing all of the data to be off by one. Another issue was that while 6 out of the 7 heat maps from the original data matched Baggerly and Coombs, only 3 out of the 7 gene lists matched. The prediction software that they were using produced the heat maps, causing there to be a discrepancy between the maps matching and the actual lists matching. Another issue was that in their graphs depicting resistant and sensitive genes, there was a mix up of the labels for what was actually resistant or sensitive and the values were swapped. So, if they were to go to clinical trials, they would be using this incorrect information to administer the medication to only the people who would not benefit from it. Another issue was that they reused test samples and reported it as multiple samples, and even when the same samples were used they were not always labeled consistently as resistant or sensitive. Another issue was that when new data was published in the midst of clinical trials, Baggerly and Coombs found that of the 59 samples, 43 were mislabeled, and 16 had gene labels that were so scrambled they could not understand what they were referring to. One more issue is that when a confidential document regarding the research was made public, it was revealed that its own review committee could not identify the methods that were used in the research and could not sufficiently replicate the data.
- Several best practices were violated. The data formatting is not consistent, with there being the obvious indexing error of the gene lists. The data names and labels are often wrong, and the same samples are reused with no indication that the data is coming from the same source. And, the workflow is not reproducible and the same results they found could not be obtained by Baggerly and Coombs.
- The common issues were the inconsistencies concerning data organization and data labeling, as sample and gene labels were often mixed up.
What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- For papers, Dr. Baggerly looks for:
- Data
- Provenance
- Code
- Descriptions of Nonscriptable Steps
- Descriptions of Planned Design, if Used
- He says that these should be absolute requirements for papers before clinical trials. He also says that all his research is now written in Sweave, which combines R code in LaTex. By writing in Sweave, an independent person can run the data through R and get the same numbers, ensuring reproducible research. DataOne corresponds with Dr. Baggerly in that data should be clearly labeled and one must maintain dataset provenance. DataOne also says to use a reproducible workflow, so have descriptions of steps and planned design. DataOne also recommends using R for datasets, as it can be used to check and assure quality data.
What best practices did you perform for this week's assignment?
- We made sure that data was properly copied over and formatted correctly before beginning our analysis. We also ensured that the columns of data had clear labels and correct equations were used to analyze the data. All of our data was in one table, instead of several small tables. We used the data entry tool Excel, which prevents the entry of errors.
Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- I'm very surprised that even after all their work and documentation, Baggerly and Coombs work was initially dismissed. I cannot believe that it took so long for anything to happen after these issues were brought up, especially because they were administering in clinical trials and could be bringing potential harm or giving false promises to those who needed help.
Kmill104 (talk) 23:27, 20 March 2024 (PDT)
Andrew Sandler
- There were so many issues with the data and analysis. From not sharing the data, to misrepresentation and faulty entry. This is a simplified answer. There was truly an astounding amount of issues.
- Dr. Baggerly recommends a few things. He recommends that Data is not just mentioned but actually shared and linked. He recommends that columns are labeled and tell which samples are used and give the code out.He recommends that there are descriptions of steps that aren't scriptable. He also recommends that they give descriptions of their planned design for experiments if they use them.
- This week I performed best practices of writing down every step I took into my electronic notebook. I was actually talking with my psychologist and telling him how annoying I find having to write down all the steps since it took me out of focus on the task. He was the head of psychology at Yale for a while and explained to me why it is so important, so this week I was on top of it.
- I sent the video to my cousin and dad who always make ridiculous non-scientific claims and find sketchy online science articles to back up their bad decisions.
Asandle1 (talk) 18:56, 20 March 2024 (PDT)
To User Page: User: Asandle1
To Template: Template:Asandle1
Assignment Pages
Journals
Individual
Class Journals
Charlotte Kaplan
- What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
The main problems found by Baggerly and Coombs were mistakes in data handling and a lack of transparency. These issues didn't follow the best practices, such as making research reproducible and ensuring quality of data.
- What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
Dr. Baggerly recommends documentation and sharing of data, as well as peer review to create reproducible data and research, as these aligns with Data One's values of quality assurance.
- What best practices did you perform for this week's assignment?
Andrew and I consistently checked our data for quality control, and if we noticed something looked off, we brought it up to each other or asked Dr. Dahlquist for help.
- Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
Dr. Baggerly's talk further showed me why data quality control is important. This talk definitely helped me realize the importance of meticulously checking my data for mistakes to ensure reproducibility in the future.
Assignment Pages
- Week 1
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12
- Week 13
- Week 14
- Week 15
Individual Journal Entry Pages
- ckaplan Week 1
- ckaplan Week 2
- SIR2 Week 3
- AgeAnnoMO Week 4
- ckaplan Week 5
- ckaplan Week 6
- ckaplan Week 8
- ckaplan Week 9
- ckaplan Week 10
- ckaplan Week 11
- ckaplan Week 12
- ckaplan Week 13
- ckaplan Week 14
- ckaplan Week 15
- Class Journal Week 1
- Class Journal Week 2
- Class Journal Week 3
- Class Journal Week 4
- Class Journal Week 5
- Class Journal Week 6
- Class Journal Week 8
- Class Journal Week 9
- Class Journal Week 10
Dean Symonds
- In this talk, several key issues emerged. One of which was the discrepancies that arose when comparing the original 7 gene lists with those of Baggerly and Coombs, due to an indexing error from the software. Additionally, the graphs mislabeled resistant and sensitive genes, which could potentially impact clinical trials if those were carried out. Test sample reuse and inconsistent labeling further muddled results. Lastly, a confidential document exposed the inability to replicate the research due to unclear methods.
- Dr. Baggerly gives several recommendations. such as data being actively shared and linked, labeling columns that indicate sample usage and sharing the associated code, as well as providing step descriptions that cannot be scripted and outlining experimental design plans if utilized.
- A good practice that we implemented this week was having 2 people working on each strain so we could double check with each other. We had a reproducible procedure and we reproduced it simultaneously. Having two people working on the same dataset helped ensure that each of us were doing it correctly. Another practice that we had as part of our procedure was having our professor check our tables when we were about halfway through with them to ensure that all of us were on the right track before we started running more serious tests.
- Everytime I hear a case as such with massive errors found in people's work, I always find it hard to believe at first that working professionals also have such errors happen to them, even though they are not students such as myself who make errors all the time. I have to sometimes remind myself that all of these fields are still run by people and cannot be thought of as exact science, and I have to keep that in mind every time I do more research or learn more about this field that it can contain errors.