Difference between revisions of "Ntesfaio Week 12/13"
(→Individual Journal Assignment: added to the electronic lab notebook) |
(→Electronic Lab Notebook: added sections for each person) |
||
Line 24: | Line 24: | ||
# Monitor the status of the report-in-progress and other related documentation. | # Monitor the status of the report-in-progress and other related documentation. | ||
# Coordinate team decisions and action items addressing any unforeseen delays or roadblocks. | # Coordinate team decisions and action items addressing any unforeseen delays or roadblocks. | ||
+ | |||
+ | ===Quality Assurance=== | ||
+ | |||
+ | === Milestone 1: Annotated Bibliography === | ||
+ | |||
+ | * The QA's will work with their teams to develop an annotated bibliography of papers relating to their team's assigned paper. | ||
+ | |||
+ | === Milestone 2: Journal Club Presentation === | ||
+ | |||
+ | * The QA's will work with their teams to create and deliver a Journal Club presentation about to their team's assigned paper. | ||
+ | |||
+ | === Milestone 3: Working with Data Analysts to understand microarray dataset === | ||
+ | |||
+ | As an overview, the QA team member is the link between the Coder/Designer and the Data Analyst. He or she needs to know the details of the microarray dataset being analyzed so that the database being designed and populated by the Coder/Designer is correct and useful for the Data Analyst. The QA will independently check that all of the data retrieved from SGD is present and accurately represented in the MS Access database. The QA will also provide assistance to the Data Analyst, making sure that the data analysis steps are being performed correctly and are being correctly documented. | ||
+ | |||
+ | Initially, the QA's will need to do the following: | ||
+ | * Along with the Data Analysts, download the microarray data associated with your group's article. | ||
+ | ** [https://sgd-prod-upload.s3.amazonaws.com/S000204227/Barreto_2012_PMID_23039231.zip Barreto et al. (2012)] | ||
+ | ** [https://sgd-prod-upload.s3.amazonaws.com/S000204415/Kitagawa_2002_PMID_12269742.zip Kitagawa et al. (2002)] | ||
+ | ** [https://sgd-prod-upload.s3.amazonaws.com/S000204367/Thorsen_2007_PMID_17327492.zip Thorsen et al. (2007)] | ||
+ | ** Along with the Data Analysts, make a "sample-data relationship table" that lists all of the samples (microarray chips), noting the treatment, time point, and replicate number. | ||
+ | ** Are all the samples described in the paper in the dataset? | ||
+ | ** Are all the samples in the dataset described in the paper? | ||
+ | * Come up with consistent column headers that summarize this information | ||
+ | ** For example, the Dahlquist Lab microarray data used strain_LogFC_timepoint-replicate number, as in wt_LogFC_t15-1. | ||
+ | * Organize the data in a worksheet in an Excel workbook so that: | ||
+ | * ID is in the first column | ||
+ | * Data columns are to the right, in increasing chronological order, using the column header pattern you created | ||
+ | * Replicates are grouped together | ||
+ | |||
+ | === Milestone 4: Making sure expression data has both Sytematic Name and Standard Name ID's === | ||
+ | |||
+ | * The design of the expression tables in the final database will need both an ID field (yeast systematic name) and Standard Name fields. | ||
+ | * You will need to check the IDs in the expression data and potentially populate one or both of these fields. | ||
+ | * One way to do this is use the [http://www.yeastract.com/formorftogene.php "ORF List <-> Gene List"] tool at YEASTRACT. | ||
+ | * The [http://llama.mshri.on.ca/synergizer/translate/ Synergizer] website may also be helpful. | ||
+ | * Here is a [https://rdrr.io/bioc/ClusterJudge/man/convert_Yeast_SGDId_2_systematic.html Bioconductor package] for it, too. | ||
+ | |||
+ | === Milestone 5: Work with Coder/Designers to Design a Database to Store Time-course Microarray Data from four sources === | ||
+ | |||
+ | * Databases created by the teams will be kept in a [https://lmu.box.com/s/gutpb5qm0a6b2pvjn1j6moqb6y47e903 "BIOL367_Fall2019 > Final Project Database" Box folder]. | ||
+ | * Coder/Designer guild members have rights as editor to this folder; all others in the class can only view/download. | ||
+ | * This folder will serve as as the version control mechanism for the Coder/Designer guild. | ||
+ | * Designer/Coders will work with the QA's to create a MS Access Database to store the yeast time-course microarray data for the dataset being analyzed by the Data Analysts. | ||
+ | * The starting point will be the database already used for the [[Week 10]] assignment, which can be found [https://lmu.box.com/s/kn8l6r639af683ioojqoce5w1g3z7kd2 here] on Box. | ||
+ | ** This database is already populated with tables for the Dahlquist Lab microarray data, degradation rates from Neymotin et al. (2014), and initial guesses for production rates. | ||
+ | *** You may need to change the table names of these existing tables so that they make sense with the overall database design. | ||
+ | ** You will need to add one or more expression tables for the expression data from your team's article. | ||
+ | *** Work with your team's QA and Data Analysts to determine appropriate column headings for the expression table. | ||
+ | ** You will also need to create one or more tables with metadata about the other tables because now the database will contain data from multiple sources, not just one. | ||
+ | *** A major part of the design work will be to figure out what information needs to be in the metadata table so that queries can be easily and uniquely performed on the data. | ||
+ | |||
+ | === Milestone 6: Validation and Quality Assurance on Database === | ||
+ | |||
+ | * After the Access database is built by the Coder/Designer, the QA will perform quality assurance to make sure that the database is correct and accurate. | ||
+ | ** In particular, the QA needs to make sure that all of the rows of data were imported into the database for the expression table(s). | ||
+ | ** The QA will make sure that both the ID (SGD systematic name) and Standard Names are included in each expression table and are correct. | ||
+ | * QA's will communicate to the Coder/Designers any changes needed to the database. | ||
+ | |||
+ | === Milestone 7: Final Documentation === | ||
+ | |||
+ | * With the Coder/Designer, finalize the [https://www.quackit.com/microsoft_access/microsoft_access_2016/howto/how_to_create_a_database_diagram_in_access_2016.cfm database schema diagram] | ||
+ | |||
+ | ===Data Analysis=== | ||
+ | |||
+ | === Milestone 1: Annotated Bibliography === | ||
+ | |||
+ | * The Data Analysts will work with their teams to develop an annotated bibliography of papers relating to their team's assigned paper. | ||
+ | |||
+ | === Milestone 2: Journal Club Presentation === | ||
+ | |||
+ | * The Data Analysts will work with their teams to create and deliver a Journal Club presentation about to their team's assigned paper. | ||
+ | |||
+ | === Milestone 3: Getting the data ready for analysis === | ||
+ | |||
+ | # Download and examine the microarray dataset, comparing it to the samples and experiment described in your journal club article. | ||
+ | #* [https://sgd-prod-upload.s3.amazonaws.com/S000204227/Barreto_2012_PMID_23039231.zip Barreto et al. (2012)] | ||
+ | #* [https://sgd-prod-upload.s3.amazonaws.com/S000204415/Kitagawa_2002_PMID_12269742.zip Kitagawa et al. (2002)] | ||
+ | #* [https://sgd-prod-upload.s3.amazonaws.com/S000204367/Thorsen_2007_PMID_17327492.zip Thorsen et al. (2007)] | ||
+ | # Along with the QA's, make a "sample-data relationship table" that lists all of the samples (microarray chips), noting the treatment, time point, and replicate number. | ||
+ | #* Come up with consistent column headers that summarize this information | ||
+ | #** For example, the Dahlquist Lab microarray data used strain_LogFC_timepoint-replicate number, as in wt_LogFC_t15-1. | ||
+ | # Organize the data in a worksheet in an Excel workbook so that: | ||
+ | #* ID is in the first column | ||
+ | #* Data columns are to the right, in increasing chronological order, using the column header pattern you created | ||
+ | #* Replicates are grouped together | ||
+ | |||
+ | === Milestone 4: ANOVA analysis === | ||
+ | |||
+ | # Perform an ANOVA analysis of the data, as you did on [[Week 8]] for the Dahlquist lab data. | ||
+ | #* Note that you will need to adjust your formulas to take into account the different number of timepoints and replicates in your article's dataset. | ||
+ | |||
+ | === Milestone 5: Clustering with stem and YEASTRACT === | ||
+ | |||
+ | # Cluster the data with stem, as you did on [[Week 9]]. | ||
+ | #* Note that we will make some adjustments to the GO term analysis because stem was not providing GO term names. | ||
+ | # Use YEASTRACT to generate a candidate gene regulatory network as you did on [[Week 9]]. | ||
+ | |||
+ | === Milestone 6: Create an input workbook for GRNmap using MS Access database === | ||
+ | |||
+ | # Create an input workbook for GRNmap based on a Microsoft Access database that the Coder/Designer and QA's make, following protocol in [[Week 10]] | ||
+ | # Run GRNmap and interpret data. | ||
+ | # As the end-user of the Access database, the Data Analysts will provide feedback to the QAs and Coder/Designer about the usability of database. | ||
+ | |||
+ | ===Coder/Designer=== | ||
+ | |||
+ | === Milestone 1: Annotated Bibliography === | ||
+ | |||
+ | * The Coder/Designer will work with their teams to develop an annotated bibliography of papers relating to their team's assigned paper. | ||
+ | |||
+ | === Milestone 2: Journal Club Presentation === | ||
+ | |||
+ | * The Coder/Designer will work with their teams to create and deliver a Journal Club presentation about to their team's assigned paper. | ||
+ | |||
+ | === Milestone 3: Working Environment Setup === | ||
+ | |||
+ | Coder/Designer work will require the following software/accounts. The Seaver 120 lab computers are already set up for this; this list is provided for Coders/Designers who need to work on a different computer or outside of the lab. | ||
+ | * Microsoft Access | ||
+ | * Box account (provided by LMU) | ||
+ | ** Databases created by the teams will be kept in a [https://lmu.box.com/s/gutpb5qm0a6b2pvjn1j6moqb6y47e903 "BIOL367_Fall2019 > Final Project Database" Box folder]. | ||
+ | ** Coder/Designer guild members have rights as editor to this folder; all others in the class can only view/download. | ||
+ | ** This folder will serve as as the version control mechanism for the Coder/Designer guild. | ||
+ | |||
+ | === Milestone 4: Design a Database to Store Time-course Microarray Data from four sources === | ||
+ | |||
+ | * Designer/Coders will work with the QA's to create a MS Access Database to store the yeast time-course microarray data for the dataset being analyzed by the Data Analysts. | ||
+ | * The starting point will be the database already used for the [[Week 10]] assignment, which can be found [https://lmu.box.com/s/kn8l6r639af683ioojqoce5w1g3z7kd2 here] on Box. | ||
+ | ** This database is already populated with tables for the Dahlquist Lab microarray data, degradation rates from Neymotin et al. (2014), and initial guesses for production rates. | ||
+ | *** You may need to change the table names of these existing tables so that they make sense with the overall database design. | ||
+ | ** You will need to add one or more expression tables for the expression data from your team's article. | ||
+ | *** Work with your team's QA and Data Analysts to determine appropriate column headings for the expression table. | ||
+ | ** You will also need to create one or more tables with metadata about the other tables because now the database will contain data from multiple sources, not just one. | ||
+ | *** A major part of the design work will be to figure out what information needs to be in the metadata table so that queries can be easily and uniquely performed on the data. | ||
+ | *** Think about what information would someone need to know to be able to understand how the dataset works. Consult with the QA and Data Analysts to figure out the sample-data relationships and how that should be encoded. | ||
+ | |||
+ | === Milestone 5: Build an individual database for your team === | ||
+ | |||
+ | * Once the design work has been completed, you need to actually import the data into the database. | ||
+ | * Initially, each team will have their own database so that the QA and Data analysts can validate and use the database. | ||
+ | |||
+ | === Milestone 6: Validation and Quality Assurance on Database === | ||
+ | |||
+ | * The QA will perform quality assurance to make sure that the database is correct and accurate. | ||
+ | ** In particular, the QA needs to make sure that all of the rows of data were imported into the database for the expression table(s). | ||
+ | ** The QA will make sure that both the ID (SGD systematic name) and Standard Names are included in each expression table and are correct. | ||
+ | * QA's will communicate to the Coder/Designers any changes needed to the database. | ||
+ | |||
+ | === Milestone 7: Merge completed databases into a single database for the class === | ||
+ | |||
+ | * As a guild, the Coder/Designers will merge their separate databases into a final product. | ||
+ | * With the QA's finalize the [https://www.quackit.com/microsoft_access/microsoft_access_2016/howto/how_to_create_a_database_diagram_in_access_2016.cfm database schema diagram] | ||
+ | |||
{{Template:Ntesfaio}} | {{Template:Ntesfaio}} |
Revision as of 19:26, 20 November 2019
Contents
- 1 Electronic Lab Notebook
- 1.1 Project Manager
- 1.2 Milestone 1: Project “Scaffolding”
- 1.3 Milestone 2: Periodic Updates
- 1.4 Quality Assurance
- 1.5 Milestone 1: Annotated Bibliography
- 1.6 Milestone 2: Journal Club Presentation
- 1.7 Milestone 3: Working with Data Analysts to understand microarray dataset
- 1.8 Milestone 4: Making sure expression data has both Sytematic Name and Standard Name ID's
- 1.9 Milestone 5: Work with Coder/Designers to Design a Database to Store Time-course Microarray Data from four sources
- 1.10 Milestone 6: Validation and Quality Assurance on Database
- 1.11 Milestone 7: Final Documentation
- 1.12 Data Analysis
- 1.13 Milestone 1: Annotated Bibliography
- 1.14 Milestone 2: Journal Club Presentation
- 1.15 Milestone 3: Getting the data ready for analysis
- 1.16 Milestone 4: ANOVA analysis
- 1.17 Milestone 5: Clustering with stem and YEASTRACT
- 1.18 Milestone 6: Create an input workbook for GRNmap using MS Access database
- 1.19 Coder/Designer
- 1.20 Milestone 1: Annotated Bibliography
- 1.21 Milestone 2: Journal Club Presentation
- 1.22 Milestone 3: Working Environment Setup
- 1.23 Milestone 4: Design a Database to Store Time-course Microarray Data from four sources
- 1.24 Milestone 5: Build an individual database for your team
- 1.25 Milestone 6: Validation and Quality Assurance on Database
- 1.26 Milestone 7: Merge completed databases into a single database for the class
- 2 Acknowledgments
- 3 References
Electronic Lab Notebook
Project Manager
Milestone 1: Project “Scaffolding”
This milestone pertains to setting up an initial schedule and any resources that your team will use for the duration of the project. It will be useful to get an overview of every team member’s own milestones so that you have an accurate big picture view.
- In consultation with your team, work backward from the final deadline to set intermediate deadlines for each deliverable. In particular you need to set deadlines for what you will accomplish by the journal deadline for Week 11, Week 12/13, and Week 15.
- Organize management tools for your team:
- Communication tools
- Workflow narratives
- Action items
- Testing results/reports
- Bugs/feature requests
- Question/answer sequences
Milestone 2: Periodic Updates
Not as much a milestone as an on-going task, once the project is up and running the Project Manager is responsible for keeping track of everyone’s progress.
- Get periodic updates on progress; in particular, the project’s “place” in the overall flow should be known at all times (transparency). Team members will be giving a status reports in class for the rest of the semester. However, the instructor will expect you to know and be able to report on the status of each member of your team at any time.
- Familiarize yourselves with the specific milestones of each team member so that you know how to monitor the team’s overall progress.
- Monitor the status of the report-in-progress and other related documentation.
- Coordinate team decisions and action items addressing any unforeseen delays or roadblocks.
Quality Assurance
Milestone 1: Annotated Bibliography
- The QA's will work with their teams to develop an annotated bibliography of papers relating to their team's assigned paper.
Milestone 2: Journal Club Presentation
- The QA's will work with their teams to create and deliver a Journal Club presentation about to their team's assigned paper.
Milestone 3: Working with Data Analysts to understand microarray dataset
As an overview, the QA team member is the link between the Coder/Designer and the Data Analyst. He or she needs to know the details of the microarray dataset being analyzed so that the database being designed and populated by the Coder/Designer is correct and useful for the Data Analyst. The QA will independently check that all of the data retrieved from SGD is present and accurately represented in the MS Access database. The QA will also provide assistance to the Data Analyst, making sure that the data analysis steps are being performed correctly and are being correctly documented.
Initially, the QA's will need to do the following:
- Along with the Data Analysts, download the microarray data associated with your group's article.
- Barreto et al. (2012)
- Kitagawa et al. (2002)
- Thorsen et al. (2007)
- Along with the Data Analysts, make a "sample-data relationship table" that lists all of the samples (microarray chips), noting the treatment, time point, and replicate number.
- Are all the samples described in the paper in the dataset?
- Are all the samples in the dataset described in the paper?
- Come up with consistent column headers that summarize this information
- For example, the Dahlquist Lab microarray data used strain_LogFC_timepoint-replicate number, as in wt_LogFC_t15-1.
- Organize the data in a worksheet in an Excel workbook so that:
- ID is in the first column
- Data columns are to the right, in increasing chronological order, using the column header pattern you created
- Replicates are grouped together
Milestone 4: Making sure expression data has both Sytematic Name and Standard Name ID's
- The design of the expression tables in the final database will need both an ID field (yeast systematic name) and Standard Name fields.
- You will need to check the IDs in the expression data and potentially populate one or both of these fields.
- One way to do this is use the "ORF List <-> Gene List" tool at YEASTRACT.
- The Synergizer website may also be helpful.
- Here is a Bioconductor package for it, too.
Milestone 5: Work with Coder/Designers to Design a Database to Store Time-course Microarray Data from four sources
- Databases created by the teams will be kept in a "BIOL367_Fall2019 > Final Project Database" Box folder.
- Coder/Designer guild members have rights as editor to this folder; all others in the class can only view/download.
- This folder will serve as as the version control mechanism for the Coder/Designer guild.
- Designer/Coders will work with the QA's to create a MS Access Database to store the yeast time-course microarray data for the dataset being analyzed by the Data Analysts.
- The starting point will be the database already used for the Week 10 assignment, which can be found here on Box.
- This database is already populated with tables for the Dahlquist Lab microarray data, degradation rates from Neymotin et al. (2014), and initial guesses for production rates.
- You may need to change the table names of these existing tables so that they make sense with the overall database design.
- You will need to add one or more expression tables for the expression data from your team's article.
- Work with your team's QA and Data Analysts to determine appropriate column headings for the expression table.
- You will also need to create one or more tables with metadata about the other tables because now the database will contain data from multiple sources, not just one.
- A major part of the design work will be to figure out what information needs to be in the metadata table so that queries can be easily and uniquely performed on the data.
- This database is already populated with tables for the Dahlquist Lab microarray data, degradation rates from Neymotin et al. (2014), and initial guesses for production rates.
Milestone 6: Validation and Quality Assurance on Database
- After the Access database is built by the Coder/Designer, the QA will perform quality assurance to make sure that the database is correct and accurate.
- In particular, the QA needs to make sure that all of the rows of data were imported into the database for the expression table(s).
- The QA will make sure that both the ID (SGD systematic name) and Standard Names are included in each expression table and are correct.
- QA's will communicate to the Coder/Designers any changes needed to the database.
Milestone 7: Final Documentation
- With the Coder/Designer, finalize the database schema diagram
Data Analysis
Milestone 1: Annotated Bibliography
- The Data Analysts will work with their teams to develop an annotated bibliography of papers relating to their team's assigned paper.
Milestone 2: Journal Club Presentation
- The Data Analysts will work with their teams to create and deliver a Journal Club presentation about to their team's assigned paper.
Milestone 3: Getting the data ready for analysis
- Download and examine the microarray dataset, comparing it to the samples and experiment described in your journal club article.
- Along with the QA's, make a "sample-data relationship table" that lists all of the samples (microarray chips), noting the treatment, time point, and replicate number.
- Come up with consistent column headers that summarize this information
- For example, the Dahlquist Lab microarray data used strain_LogFC_timepoint-replicate number, as in wt_LogFC_t15-1.
- Come up with consistent column headers that summarize this information
- Organize the data in a worksheet in an Excel workbook so that:
- ID is in the first column
- Data columns are to the right, in increasing chronological order, using the column header pattern you created
- Replicates are grouped together
Milestone 4: ANOVA analysis
- Perform an ANOVA analysis of the data, as you did on Week 8 for the Dahlquist lab data.
- Note that you will need to adjust your formulas to take into account the different number of timepoints and replicates in your article's dataset.
Milestone 5: Clustering with stem and YEASTRACT
- Cluster the data with stem, as you did on Week 9.
- Note that we will make some adjustments to the GO term analysis because stem was not providing GO term names.
- Use YEASTRACT to generate a candidate gene regulatory network as you did on Week 9.
Milestone 6: Create an input workbook for GRNmap using MS Access database
- Create an input workbook for GRNmap based on a Microsoft Access database that the Coder/Designer and QA's make, following protocol in Week 10
- Run GRNmap and interpret data.
- As the end-user of the Access database, the Data Analysts will provide feedback to the QAs and Coder/Designer about the usability of database.
Coder/Designer
Milestone 1: Annotated Bibliography
- The Coder/Designer will work with their teams to develop an annotated bibliography of papers relating to their team's assigned paper.
Milestone 2: Journal Club Presentation
- The Coder/Designer will work with their teams to create and deliver a Journal Club presentation about to their team's assigned paper.
Milestone 3: Working Environment Setup
Coder/Designer work will require the following software/accounts. The Seaver 120 lab computers are already set up for this; this list is provided for Coders/Designers who need to work on a different computer or outside of the lab.
- Microsoft Access
- Box account (provided by LMU)
- Databases created by the teams will be kept in a "BIOL367_Fall2019 > Final Project Database" Box folder.
- Coder/Designer guild members have rights as editor to this folder; all others in the class can only view/download.
- This folder will serve as as the version control mechanism for the Coder/Designer guild.
Milestone 4: Design a Database to Store Time-course Microarray Data from four sources
- Designer/Coders will work with the QA's to create a MS Access Database to store the yeast time-course microarray data for the dataset being analyzed by the Data Analysts.
- The starting point will be the database already used for the Week 10 assignment, which can be found here on Box.
- This database is already populated with tables for the Dahlquist Lab microarray data, degradation rates from Neymotin et al. (2014), and initial guesses for production rates.
- You may need to change the table names of these existing tables so that they make sense with the overall database design.
- You will need to add one or more expression tables for the expression data from your team's article.
- Work with your team's QA and Data Analysts to determine appropriate column headings for the expression table.
- You will also need to create one or more tables with metadata about the other tables because now the database will contain data from multiple sources, not just one.
- A major part of the design work will be to figure out what information needs to be in the metadata table so that queries can be easily and uniquely performed on the data.
- Think about what information would someone need to know to be able to understand how the dataset works. Consult with the QA and Data Analysts to figure out the sample-data relationships and how that should be encoded.
- This database is already populated with tables for the Dahlquist Lab microarray data, degradation rates from Neymotin et al. (2014), and initial guesses for production rates.
Milestone 5: Build an individual database for your team
- Once the design work has been completed, you need to actually import the data into the database.
- Initially, each team will have their own database so that the QA and Data analysts can validate and use the database.
Milestone 6: Validation and Quality Assurance on Database
- The QA will perform quality assurance to make sure that the database is correct and accurate.
- In particular, the QA needs to make sure that all of the rows of data were imported into the database for the expression table(s).
- The QA will make sure that both the ID (SGD systematic name) and Standard Names are included in each expression table and are correct.
- QA's will communicate to the Coder/Designers any changes needed to the database.
Milestone 7: Merge completed databases into a single database for the class
- As a guild, the Coder/Designers will merge their separate databases into a final product.
- With the QA's finalize the database schema diagram
Ntesfaio Final Individual Reflection