Difference between revisions of "Class Journal Week 7"
(→Ivy Macaraeg's Response: answered 1, invoked template) |
(→Ivy Macaraeg's Response: answered 2, 3, 4, signed) |
||
Line 72: | Line 72: | ||
{{Imacarae}} | {{Imacarae}} | ||
#What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues? | #What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues? | ||
− | #*The main issues with the data and analysis identified by Baggerly and Coombs include heat maps and predictions that were completely reproducible, misinterpretation of the data (ie. the switching of resistant or sensitive), or the replication of data. Best practices from | + | #*The main issues with the data and analysis identified by Baggerly and Coombs include heat maps and predictions that were completely reproducible, misinterpretation of the data (ie. the switching of resistant or sensitive), or the replication of data. Best practices from DataONE that were violated include standard representation and files that are readable into the future. The presented data was not in these forms for researchers to analyze. Some of the common issues include confounding the experimental design, mixing up data labels, simple mistakes usually from Excel. |
#What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends? | #What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends? | ||
+ | #*Dr. Baggerly recommends that in their papers, researchers use well-maintained data, provenance, code, descriptions of nonscriptable steps, and descriptions of planned design. These are very similar to the DataONE recommendations. | ||
#What best practices did you perform for this week's assignment? | #What best practices did you perform for this week's assignment? | ||
+ | #*For this weeks assignment, I tried to practice with formatting cells correctly (ie. using descriptive names without spaces) as well as formatting cells correctly, making sure the selected data was complete. | ||
#Do you have any further reaction to this case after viewing Dr. Baggerly's talk? | #Do you have any further reaction to this case after viewing Dr. Baggerly's talk? | ||
+ | #*I think my main reaction is surprise that it took so long for Dr. Baggerly's data observations to be fully analyzed. It is sad that these data errors were not taken seriously first-hand, and I think a lot of people's time, money, resources, and health could've been better off if these errors were caught. This presentation reemphasized how important data analysis is, as it is something I haven't given a second thought to. | ||
+ | [[User:Imacarae|Imacarae]] ([[User talk:Imacarae|talk]]) 01:55, 16 October 2019 (PDT) | ||
==[[User:Dmadere|DeLisa Madere's Response]]== | ==[[User:Dmadere|DeLisa Madere's Response]]== |
Revision as of 00:55, 16 October 2019
Contents
- 1 Mihir Samdarshi's Response
- 1.1 What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated?
- 1.2 Which of these did Dr. Baggerly claim were common issues?
- 1.3 What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- 1.4 What best practices did you perform for this week's assignment?
- 1.5 Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- 2 Iliana Crespin's Responses
- 3 Naomi Tesfaiohannes's Responses
- 4 Aby Mesfin's Response
- 4.1 What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- 4.2 What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- 4.3 What best practices did you perform for this week's assignment
- 4.4 Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- 5 Christina Dominguez's Response
- 5.1 1.What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- 5.2 2.What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- 5.3 3.What best practices did you perform for this week's assignment?
- 5.4 4.Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- 6 Ivy Macaraeg's Response
- 7 DeLisa Madere's Response
- 8 David Ramirez's Response
- 9 Jonar Cowan's Response
- 9.1 What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- 9.2 What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- 9.3 What best practices did you perform for this week's assignment?
- 9.4 Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- 10 Emma Young's Responce
- 11 Michael Armas' Response
- 11.1 What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- 11.2 What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- 11.3 What best practices did you perform for this week's assignment?
- 11.4 Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- 12 Marcus Avila's Answers
- 13 Links
Mihir Samdarshi's Response
What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated?
Which of these did Dr. Baggerly claim were common issues?
What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
What best practices did you perform for this week's assignment?
Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
Iliana Crespin's Responses
- What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- Dr. Baggerly states that the intuition of many scientists/biologists is considered poor. Biologists continue to find patterns are random lists. Most of the time the documentation is very poor and leads to "forensic bioinformatics". In DataONE, the violations that came up were the reversal of Quantification Matrix. Mislabeling and using a software that required two input files had issues with a hetero sample. He mentions that common issues are the outliers and predictions.
- What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- Dr. Baggerly recommends that there shouldn't be a hetero sample. Having consistent columns and data in general is similar to what DataONE recommends. Consistency is very important.
- What best practices did you perform for this week's assignment?
- Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- After this case, it is shocking how there are a lot of misconceptions dealing with the collection of data. Over and over the mistakes continue to be made. During non-health event, it can be something that doesn't mean much; however, if it's a case dealing with patients, it can cause a catastrophic situation. Many of them are barely holding on to hope, and dealing with sensitive information should be reviewed more thoroughly.Icrespin (talk) 20:28, 15 October 2019 (PDT)
Naomi Tesfaiohannes's Responses
- What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
Baggerly states that our intuition of what makes sense is poor. Some documentations of research are written poorly, making it more difficult to reproduce the same method. Genes were being referenced even though they were not present. They are outliers. There was an offset of the P-Values by one. This was likely done because of the software they used which required two input files. Quantification matrix and gene names. The second input cannot have a hetero sample. There was likely a swapping of data in the software, meaning that medication is given to patients that don't need it. Poor clinical practice is a big issue in this case. Some samples were reused and sometimes labeled resistance and other times not labeled resistance. Of the 95 samples 15 were duplicated and 6 were inconsistent to each other. When matching the samples not all lined up and 16 did not match at all. Common mistakes are missing up labels, gene labels, group labels, and incomplete documentations.
- What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
Data having labeled columns, provenance, provide code, a description of non-scriptable steps, and a description of planned design. DataONE states to have consistent columns of data, consistent names, codes, and formats. DataONE also suggests to have data all in one table. With missing data leave a field empty or use a distinct value such as 9999 to indicate a missing value.
- What best practices did you perform for this week's assignment?
- Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
Poor clinical practicing cost the lives of many hopeful patients in stage 4 cancer. They knew their options were slim and put their trust in the clinical trial. The samples were duplicated and inconsistent multiple times. When trying to match the samples 16 did not math at all and 43 were mislabeled. These errors caused an incorrect validation dataset for clinical trials that were being used for 2 years. This video helped me understand how Dr. Potti's mistakes were caught.
Ntesfaio Final Individual Reflection
Aby Mesfin's Response
What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
One of the main issues Baggerly and Coombs found was that the sensitive and resistant labels in the Quantification Matrix were intentionally reversed by Potti and his colleagues in order to produce more favorable data. Rather than interpreting 0 as "resistant" and 1 as "sensitive", Potti's team switched how they interpreted the input files. Dr. Baggerly notes that the prominent violation of the best practices in regards to data integration performed by Potti was his inability to maintain the provenance of his data. The results of his research were very much skewed not only due to mislabeling of the quantification matrix but also because it contained duplicates.
What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
Dr. Baggerly recommends that researchers include data, code, descriptions of nonscriptable steps, descriptions of the planned design, and maintain provenance. He also recommends that reproducible research report structure, executive summaries, and reuse templates. Some of these practices parallel those recommended by DataONE, such as maintaining provenance.
What best practices did you perform for this week's assignment
The best practice that I used was creating a descriptive file name for the dataset.
Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
I appreciated how this video broke down the bioinformatics and data sharing that went behind discovering the fraudulence of Potti's research. It helped me better understand the mechanisms that went into this discovery while also underlining the value of reproducibility in research.
Christina Dominguez's Response
1.What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
Issues with the data and analysis included swapping 0 and 1 when putting in the data. This means that the data was reversed. The same sample was also labeled as both resistant and sensitive. This caused medicine to be prescribed as the best treatment when it was not. This is unfortunate for the people that were a part of the clinical trials. The best practice of using descriptive column names was violated. Common issues include mixing up sample labels that can be easy to fix in excel. The best practice of creating descriptive column name and organizing your data correctly would help to resolve this.
2.What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
Consistency is important in reproducible research. Dr. Baggerly explains that always using a certain program as well as maintaining accurate reports and data management serve to make it reproducible. DataONE always emphasizes consistency in column naming and data filing for future use. This is important for accuracy as well as allowing others to come to the same conclusion that you did by reproducing one's research.
3.What best practices did you perform for this week's assignment?
I used the best practice of using a descriptive file name. This is important to be able to track your files and organize them in an efficient way.
4.Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
It is shocking how much manipulation was done to the data. It makes it even more sad for those that were part of the clinical trials; however, it shows the importance of bioinformatics and its ability to analysis the legitimacy of the data.
Cdomin12 (talk) 22:04, 15 October 2019 (PDT)
Ivy Macaraeg's Response
- What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- The main issues with the data and analysis identified by Baggerly and Coombs include heat maps and predictions that were completely reproducible, misinterpretation of the data (ie. the switching of resistant or sensitive), or the replication of data. Best practices from DataONE that were violated include standard representation and files that are readable into the future. The presented data was not in these forms for researchers to analyze. Some of the common issues include confounding the experimental design, mixing up data labels, simple mistakes usually from Excel.
- What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- Dr. Baggerly recommends that in their papers, researchers use well-maintained data, provenance, code, descriptions of nonscriptable steps, and descriptions of planned design. These are very similar to the DataONE recommendations.
- What best practices did you perform for this week's assignment?
- For this weeks assignment, I tried to practice with formatting cells correctly (ie. using descriptive names without spaces) as well as formatting cells correctly, making sure the selected data was complete.
- Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- I think my main reaction is surprise that it took so long for Dr. Baggerly's data observations to be fully analyzed. It is sad that these data errors were not taken seriously first-hand, and I think a lot of people's time, money, resources, and health could've been better off if these errors were caught. This presentation reemphasized how important data analysis is, as it is something I haven't given a second thought to.
Imacarae (talk) 01:55, 16 October 2019 (PDT)
DeLisa Madere's Response
- What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- The main issues with the data and analysis that Baggerly and Coombes found was that the samples were inaccurate. The scientists scrambled the gene labels so bad that there is a large uncertainty as to which samples they belong to, which unfortunately happened to be the samples that were incorrect for the drugs that the scientists used in their clinical trials for 2 years. The best practices that were violated include inconsistencies of the data. In the experiment, there were many errors in the mislabeling of the genes, leaving them with inconsistencies in their gene titles. They also had missing data in which they still included in the research, whereas, if there is missing data, there should be no entry at all to indicate that instead of making up some kind of data that can be harmful. Dr. Baggerly claimed that the common issues occurred within the labeling of the actual gene, creating inaccurate results in the data itself.
- What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- Dr. Baggerly recommends that for reproducible research, the scientists should provide data along with identifiers, provenance, code descriptions of nonscriptable steps, and descriptions of the planned design. These correspond to DataONE's ideas because they recommend that the data is consistent with the usage of titles to properly label each column in a spreadsheet for example and they recommend descriptions and data that are literate.
- What best practices did you perform for this week's assignment?
- Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- After hearing this talk, it seemed more so that the data presented was not complete at all and implied that the scientists behind the scandal knew this and wanted to give 50% of their effort into this experiment. The fact that the common issues with the data had to do with the labeling revealed that they were not being careful about even the most simple things including the labels.
David Ramirez's Response
User:Dramir36 template:Dramir36 Skinny Genes
- Week 1
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12/13
- Week 14
- Week 15
1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- The first issue mentioned was
2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
3. What best practices did you perform for this week's assignment?
4. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
Jonar Cowan's Response
What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
What best practices did you perform for this week's assignment?
Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
Misc. Links
Emma Young's Responce
Michael Armas' Response
What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- When Baggerly and Coombes were approached to try to recreate the results proposed by Potti, they used the publicly published data to do so. Baggerly and Coombes ran the same test but used known data to obtain "expected" results. When comparing these results to the results gathered from Potti's data, Baggerly and Coombes found data that was not consistent with the expected. It's as if Potti's team just kept moving on even after receiving results that were not expected. So many data points were "off by one" or completely opposite of that was expected.
- The main thing Baggerly talked about that would go agianst the best practices as stated by DataONE had a lot to do with data organization. Duke's paper was unorganized, making the data difficult to read. There was one plot that Baggerly showed that had data points that were so poorly mislabelled that the data was almost impossible to interpret.
- Baggerly claims that many common mistakes pertain to mislabelling samples and the lack of documentation. Both of these are important for the reproducibility of data, which was difficult due to the lack of organization provided by Duke.
What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- Dr. Baggerly recommends literate programming, reusing templates, report structure, executive summaries, and appendices. As these points mention many ways to adapt to the standards of the community (such as publishing code, using correct formatting, etc.) this is congruent with DataONE supporting researchers using a set of best practices so that other community members are able to understand the research with ease. As for Dr. Baggerly and his team, they are using Sweave to easily be able to reproduce data by running code that will be able to show the same results as gathered originally. They are stepping away from anything private and want their information to be available to the public to ensure credible research.
What best practices did you perform for this week's assignment?
Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- I am shocked to see how much information was mostly ignored by Duke. So many results showed the opposite of what was expected and they continued to run the clinical trials. After seeing the falsely interpreted data in front of me, I am even more shocked than last week about how Duke handled this. Additionally, the time frame Baggerly gave during his talk was not ehat I expected. I would expect Duke to shut down clinical trials immediately until an investigation was over, but it took them many months to even start an investigation, then eventually restarted the trials. This case gets more shocking the more I learn about it.
Marcus Avila's Answers
Links
- What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- Baggerly and Coombs identified that the data was one unit off and also that some data samples were input more than once to increase statistical significance. They also found that the labels of "resistant" and "sensitive" were switched for the data samples. The best practices in DataONE that are violated include using consistent codes in each column. Instead, Potti et al mixed up the sample labels, mixed up the gene labels, and mixed up the group labels. Baggerly adn Coombs considered these simple mistakes which are the most common.
- What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- What best practices did you perform for this week's assignment?
- Do you have any further reaction to this case after viewing Dr. Baggerly's talk?