Difference between revisions of "Quality Assurance"
Kdahlquist (talk | contribs) (→Milestone 2: Working with Data Analysts to understand microarray dataset: change plural) |
Kdahlquist (talk | contribs) (→Milestone 4: Validation, Quality Assurance, and Documentation of the Database: paste in instructions for querying the access db) |
||
(8 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | |||
− | |||
{{Final Project Links}} | {{Final Project Links}} | ||
Line 25: | Line 23: | ||
As an overview, the QA team members are the link between the Coder/Designer and the Data Analyst. He or she needs to know the details of the microarray dataset being analyzed so that the database being designed and populated by the Coder/Designer is correct and useful for the Data Analysts. The QA's will independently check that all of the data retrieved from SGD is present and accurately represented in the MS Access database. The QA will also provide assistance to the Data Analysts, making sure that the data analysis steps are being performed correctly and are being correctly documented. | As an overview, the QA team members are the link between the Coder/Designer and the Data Analyst. He or she needs to know the details of the microarray dataset being analyzed so that the database being designed and populated by the Coder/Designer is correct and useful for the Data Analysts. The QA's will independently check that all of the data retrieved from SGD is present and accurately represented in the MS Access database. The QA will also provide assistance to the Data Analysts, making sure that the data analysis steps are being performed correctly and are being correctly documented. | ||
− | + | # Download and examine the microarray dataset, comparing it to the samples and experiment described in your journal club article. | |
− | + | #* [https://sgd-prod-upload.s3.amazonaws.com/S000204389/Sha_2013_PMID_24073228.zip Link to processed dataset from SGD.] | |
− | + | #* For your reference, this is the link to the dataset at the GEO Database:[https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE26169 GSE26169]. However, we will use the dataset processed by SGD. | |
− | * | + | # Along with the QA's, make a "sample-data relationship table" that lists all of the samples (microarray chips), noting the treatment, time point, and replicate number. |
− | + | # Report on these quality measures: | |
− | + | #* Are all the samples described in the paper in the dataset? | |
− | + | #* Are all the samples in the dataset described in the paper? | |
− | + | # Come up with consistent column headers that summarize this information | |
− | + | #* For example, the Dahlquist Lab microarray data used strain_LogFC_timepoint-replicate number, as in wt_LogFC_t15-1. | |
− | + | #* Do not use any special characters except for "-" or "_" (e.g., no commas, etc.) | |
− | * Organize the data in a worksheet in an Excel workbook so that: | + | # Organize the data in a worksheet in an Excel workbook so that: |
− | * ID is in the first column | + | #* ID (SGD systematic name) is in the first column |
− | * Data columns are to the right, in increasing chronological order, using the column header pattern you created | + | #* Data columns are to the right, in increasing chronological order, using the column header pattern you created |
− | * Replicates are grouped together | + | #* Treatments are grouped together |
+ | #* Replicates are grouped together | ||
+ | # This information needs to be relayed to the Coder/Designers so that they can design an appropriate <code>expression</code> table. | ||
=== Milestone 3: Making sure expression data has both Sytematic Name and Standard Name ID's === | === Milestone 3: Making sure expression data has both Sytematic Name and Standard Name ID's === | ||
Line 46: | Line 46: | ||
* One way to do this is use the [http://www.yeastract.com/formorftogene.php "ORF List <-> Gene List"] tool at YEASTRACT. | * One way to do this is use the [http://www.yeastract.com/formorftogene.php "ORF List <-> Gene List"] tool at YEASTRACT. | ||
* The [http://llama.mshri.on.ca/synergizer/translate/ Synergizer] website may also be helpful. | * The [http://llama.mshri.on.ca/synergizer/translate/ Synergizer] website may also be helpful. | ||
− | * Here is a [https://rdrr.io/bioc/ClusterJudge/man/convert_Yeast_SGDId_2_systematic.html Bioconductor package] for it, too. | + | <!--* Here is a [https://rdrr.io/bioc/ClusterJudge/man/convert_Yeast_SGDId_2_systematic.html Bioconductor package] for it, too.--> |
− | === Milestone | + | === Milestone 3: Design a database to store data needed to create a GRNmap input workbook === |
− | * Databases created by the teams will be kept in a [https://lmu.box.com/s/ | + | * Designer/Coders will work with the QA's to create a MS Access Database that will contain data needed to create a GRNmap input workbook. |
− | * Coder/Designer guild members have rights as editor to this folder; all others in the class can only view/download. | + | ** Databases created by the teams will be kept in a [https://lmu.box.com/s/lorqp5d5hkzqb7q161ldb66mhqfham3f "BIOL367_Spring2024" Box folder]. |
− | * This folder will serve as as the version control mechanism for the Coder/Designer guild. | + | ** Coder/Designer guild members have rights as editor to this folder; all others in the class can only view/download. |
− | * | + | ** This folder will serve as as the version control mechanism for the Coder/Designer guild. |
− | + | * The database will need to have the following tables: | |
− | ** | + | ** A <code>gene</code> table that contains all of the gene IDs for the entire yeast genome, obtained from [https://yeastmine.yeastgenome.org/yeastmine/begin.do YeastMine]. |
− | ** | + | ** An <code>expression</code> table to store the yeast time-course microarray data for the dataset being analyzed by the Data Analysts. You will consult with the Data Analysts and QA's to figure out the sample-data relationships and how that should be encoded as fields in the database. |
− | + | ** A <code>degradation_rates</code> table that contains degradation rates from Neymotin et al. (2014). This table will be provided. | |
− | + | ** A <code>production_rates</code> table that contains initial guesses for the production rates for each gene. This table will be provided. | |
− | ** | + | ** A <code>network</code> table that contains the gene regulatory network data from the Harbison et al. (2004) paper. |
+ | ** A <code>metadata</code> table that encodes information about the database itself, i.e., other tables in the database. | ||
*** A major part of the design work will be to figure out what information needs to be in the metadata table so that queries can be easily and uniquely performed on the data. | *** A major part of the design work will be to figure out what information needs to be in the metadata table so that queries can be easily and uniquely performed on the data. | ||
− | === Milestone | + | === Milestone 4: Validation, Quality Assurance, and Documentation of the Database === |
+ | |||
+ | * The QA will perform quality assurance to make sure that the database is correct and accurate. | ||
+ | ** In particular, the QA's need to make sure that all of the rows of data were imported into the database for each table. | ||
+ | ** The QA's will make sure that both the ID (SGD systematic name) and Standard Names are included in the expression table and are correct. | ||
+ | * QA's will collect feedback from the Data Analysts and communicate to the Coder/Designers any changes needed to the database. | ||
+ | * With the QA's finalize the [https://www.quackit.com/microsoft_access/microsoft_access_2016/howto/how_to_create_a_database_diagram_in_access_2016.cfm database schema diagram] | ||
− | + | ==== Using Microsoft Access Query Design ==== | |
− | |||
− | |||
− | |||
− | + | This is a loose set of instructions on how to use your Microsoft Access database to make the GRNmap input workbook. | |
− | * | + | # Import a table into the database that is the list of regulatory transcription factors that need to be included in the network (get from the Data Analysis team). |
+ | # Go to the Query Design view and select the tables that you need for the query. (For example, the TF table you just imported and the production_rates table). | ||
+ | # Link the ID fields that are equivalent. | ||
+ | # Right-click on the line between the fields and set the join properties: | ||
+ | #* Include all the records from the TF table, and only those records from the other table that match. | ||
+ | # Select the fields from the tables that you want to be output in the query and drag them to the grids at the bottom of the window. | ||
+ | # Choose "Make Table" query so that your results will be stored in a table. | ||
+ | # Run the query. | ||
+ | # Export the table created as tab-delimited text file. Bring it into Excel. | ||
+ | # Repeat as needed to create all of the worksheets you need. | ||
{{Final Project Links}} | {{Final Project Links}} | ||
[[Category:Team Project]] | [[Category:Team Project]] |
Latest revision as of 08:52, 26 April 2024
Final Project Links | |||||||
---|---|---|---|---|---|---|---|
Overview | Deliverables | Guilds | Project Manager | Quality Assurance | Data Analysis | Coder/Designer | |
Team | Yeast Beasts |
The QA team member is the link between the Coder/Designer and the Data Analyst. He or she needs to know the details of the microarray dataset being analyzed so that the database being designed and populated by the Coder/Designer is correct and useful for the Data Analyst. The QA also makes sure that the data analysis steps are being performed correctly and are being correctly documented.
Contents
- 1 Guild Members
- 2 Milestones
- 2.1 Milestone 1: Journal Club Presentation
- 2.2 Milestone 2: Working with Data Analysts to understand microarray dataset
- 2.3 Milestone 3: Making sure expression data has both Sytematic Name and Standard Name ID's
- 2.4 Milestone 3: Design a database to store data needed to create a GRNmap input workbook
- 2.5 Milestone 4: Validation, Quality Assurance, and Documentation of the Database
Guild Members
- Hailey
- Natalija
Milestones
The milestones do not necessarily correspond to particular weeks; instead they are sets of tasks grouped together.
- QA's can have a shared individual journal entry. Both students will be given the same grade and are expected to contribute equally to the electronic lab notebook.
- Detailed notes should be taken throughout consistent with reproducible research and contributing to the final deliverables.
Milestone 1: Journal Club Presentation
- The QA's will work with their teams to create and deliver a Journal Club presentation about to their team's assigned paper.
Milestone 2: Working with Data Analysts to understand microarray dataset
As an overview, the QA team members are the link between the Coder/Designer and the Data Analyst. He or she needs to know the details of the microarray dataset being analyzed so that the database being designed and populated by the Coder/Designer is correct and useful for the Data Analysts. The QA's will independently check that all of the data retrieved from SGD is present and accurately represented in the MS Access database. The QA will also provide assistance to the Data Analysts, making sure that the data analysis steps are being performed correctly and are being correctly documented.
- Download and examine the microarray dataset, comparing it to the samples and experiment described in your journal club article.
- Link to processed dataset from SGD.
- For your reference, this is the link to the dataset at the GEO Database:GSE26169. However, we will use the dataset processed by SGD.
- Along with the QA's, make a "sample-data relationship table" that lists all of the samples (microarray chips), noting the treatment, time point, and replicate number.
- Report on these quality measures:
- Are all the samples described in the paper in the dataset?
- Are all the samples in the dataset described in the paper?
- Come up with consistent column headers that summarize this information
- For example, the Dahlquist Lab microarray data used strain_LogFC_timepoint-replicate number, as in wt_LogFC_t15-1.
- Do not use any special characters except for "-" or "_" (e.g., no commas, etc.)
- Organize the data in a worksheet in an Excel workbook so that:
- ID (SGD systematic name) is in the first column
- Data columns are to the right, in increasing chronological order, using the column header pattern you created
- Treatments are grouped together
- Replicates are grouped together
- This information needs to be relayed to the Coder/Designers so that they can design an appropriate
expression
table.
Milestone 3: Making sure expression data has both Sytematic Name and Standard Name ID's
- The design of the expression tables in the final database will need both an ID field (yeast systematic name) and Standard Name fields.
- You will need to check the IDs in the expression data and potentially populate one or both of these fields.
- One way to do this is use the "ORF List <-> Gene List" tool at YEASTRACT.
- The Synergizer website may also be helpful.
Milestone 3: Design a database to store data needed to create a GRNmap input workbook
- Designer/Coders will work with the QA's to create a MS Access Database that will contain data needed to create a GRNmap input workbook.
- Databases created by the teams will be kept in a "BIOL367_Spring2024" Box folder.
- Coder/Designer guild members have rights as editor to this folder; all others in the class can only view/download.
- This folder will serve as as the version control mechanism for the Coder/Designer guild.
- The database will need to have the following tables:
- A
gene
table that contains all of the gene IDs for the entire yeast genome, obtained from YeastMine. - An
expression
table to store the yeast time-course microarray data for the dataset being analyzed by the Data Analysts. You will consult with the Data Analysts and QA's to figure out the sample-data relationships and how that should be encoded as fields in the database. - A
degradation_rates
table that contains degradation rates from Neymotin et al. (2014). This table will be provided. - A
production_rates
table that contains initial guesses for the production rates for each gene. This table will be provided. - A
network
table that contains the gene regulatory network data from the Harbison et al. (2004) paper. - A
metadata
table that encodes information about the database itself, i.e., other tables in the database.- A major part of the design work will be to figure out what information needs to be in the metadata table so that queries can be easily and uniquely performed on the data.
- A
Milestone 4: Validation, Quality Assurance, and Documentation of the Database
- The QA will perform quality assurance to make sure that the database is correct and accurate.
- In particular, the QA's need to make sure that all of the rows of data were imported into the database for each table.
- The QA's will make sure that both the ID (SGD systematic name) and Standard Names are included in the expression table and are correct.
- QA's will collect feedback from the Data Analysts and communicate to the Coder/Designers any changes needed to the database.
- With the QA's finalize the database schema diagram
Using Microsoft Access Query Design
This is a loose set of instructions on how to use your Microsoft Access database to make the GRNmap input workbook.
- Import a table into the database that is the list of regulatory transcription factors that need to be included in the network (get from the Data Analysis team).
- Go to the Query Design view and select the tables that you need for the query. (For example, the TF table you just imported and the production_rates table).
- Link the ID fields that are equivalent.
- Right-click on the line between the fields and set the join properties:
- Include all the records from the TF table, and only those records from the other table that match.
- Select the fields from the tables that you want to be output in the query and drag them to the grids at the bottom of the window.
- Choose "Make Table" query so that your results will be stored in a table.
- Run the query.
- Export the table created as tab-delimited text file. Bring it into Excel.
- Repeat as needed to create all of the worksheets you need.
Final Project Links | |||||||
---|---|---|---|---|---|---|---|
Overview | Deliverables | Guilds | Project Manager | Quality Assurance | Data Analysis | Coder/Designer | |
Team | Yeast Beasts |