Difference between revisions of "Coder/Designer"

From LMU BioDB 2024
Jump to navigation Jump to search
(Milestone 6: Document the schema and design queries to create the GRNmap input workbook: describe optimization_parameters worksheet)
(Milestone 6: Document the schema and design queries to create the GRNmap input workbook: creat --> create)
 
(3 intermediate revisions by one other user not shown)
Line 54: Line 54:
 
=== Milestone 6: Document the schema and design queries to create the GRNmap input workbook ===
 
=== Milestone 6: Document the schema and design queries to create the GRNmap input workbook ===
  
* Assist the Data Analysts with the queries needed to creat a GRNmap input workbook
+
* Assist the Data Analysts with the queries needed to create a GRNmap input workbook
 
* This is a sample of a GRNmap input workbook: [[Media:15-genes_28-edges_sample-GRNmap_Sigmoid_estimation.xlsx | 15-genes_28-edges_sample-GRNmap_Sigmoid_estimation.xlsx]]
 
* This is a sample of a GRNmap input workbook: [[Media:15-genes_28-edges_sample-GRNmap_Sigmoid_estimation.xlsx | 15-genes_28-edges_sample-GRNmap_Sigmoid_estimation.xlsx]]
 +
** '''''Note that you will need to match the column ids and worksheet names exactly, including case.  The worksheets need to be in the exact order of the sample workbook as well.'''''
 
* This is a description of what needs to go into each worksheet:
 
* This is a description of what needs to go into each worksheet:
 
** <code>production_rates</code>: two columns
 
** <code>production_rates</code>: two columns
Line 79: Line 80:
 
*** <code>id</code>: standard names of the genes encoding the regulatory transcription factors in the network, in alphabetical order
 
*** <code>id</code>: standard names of the genes encoding the regulatory transcription factors in the network, in alphabetical order
 
*** <code>threshold_b</code>: all 0.  This represents the initial guesses for the threshold b parameter in the network.
 
*** <code>threshold_b</code>: all 0.  This represents the initial guesses for the threshold b parameter in the network.
 +
 +
==== Using Microsoft Access Query Design ====
 +
 +
This is a loose set of instructions on how to use your Microsoft Access database to make the GRNmap input workbook.
 +
 +
# Import a table into the database that is the list of regulatory transcription factors that need to be included in the network (get from the Data Analysis team).
 +
# Go to the Query Design view and select the tables that you need for the query.  (For example, the TF table you just imported and the production_rates table).
 +
# Link the ID fields that are equivalent.
 +
# Right-click on the line between the fields and set the join properties:
 +
#* Include all the records from the TF table, and only those records from the other table that match.
 +
# Select the fields from the tables that you want to be output in the query and drag them to the grids at the bottom of the window.
 +
# Choose "Make Table" query so that your results will be stored in a table.
 +
# Run the query.
 +
# Export the table created as tab-delimited text file.  Bring it into Excel.
 +
# Repeat as needed to create all of the worksheets you need.
 +
 
{{Final Project Links}}
 
{{Final Project Links}}
  
 
[[Category:Team Project]]
 
[[Category:Team Project]]

Latest revision as of 21:03, 1 May 2024

Final Project Links
Overview Deliverables Guilds Project Manager Quality Assurance Data Analysis Coder/Designer
Team Yeast Beasts

The Coder/Designers are responsible for creating the Microsoft Access database that will be used by the Data Analysts to prepare an input workbook for GRNmap for the microarray dataset they are analyzing. The Coder/Designers are also the resident experts on the technology being used—assorted software, file management, version control, and troubleshooting. He or she coordinates with Dr. Dahlquist and fellow Coders/Designers in developing the Access database and storing it on Box.

Guild Members

  • Dean
  • Andrew

Milestones

The milestones do not necessarily correspond to particular weeks; instead they are sets of tasks grouped together. However, Milestone 3 is a hard prerequisite for proceeding to Milestone 4, so ideally the Coder/Coder guild should finish these milestones (they require some coordination; see below) as soon as possible.

  • Coder/Designers can have a shared individual journal entry. Both students will be given the same grade and are expected to contribute equally to the electronic lab notebook.
  • Detailed notes should be taken throughout consistent with reproducible research and contributing to the final deliverables.


Milestone 1: Journal Club Presentation

  • The Coder/Designers will work with one of the QA's to create and deliver a Journal Club presentation about to their assigned paper.

Milestone 2: Working Environment Setup

Coder/Designer work will require the following software/accounts. The Seaver 120 lab computers are already set up for this; this list is provided for Coders/Designers who need to work on a different computer or outside of the lab.

  • Microsoft Access
  • Box account (provided by LMU)
    • Databases created by the teams will be kept in a "BIOL367_Spring2024" Box folder.
    • Coder/Designer guild members have rights as editor to this folder; all others in the class can only view/download.
    • This folder will serve as as the version control mechanism for the Coder/Designer guild.

Milestone 3: Design a database to store data needed to create a GRNmap input workbook

  • Designer/Coders will work with the QA's to create a MS Access Database that will contain data needed to create a GRNmap input workbook. It will need to have the following tables:
    • A gene table that contains all of the gene IDs for the entire yeast genome, obtained from YeastMine.
    • An expression table to store the yeast time-course microarray data for the dataset being analyzed by the Data Analysts. You will consult with the Data Analysts and QA's to figure out the sample-data relationships and how that should be encoded as fields in the database.
    • A degradation_rates table that contains degradation rates from Neymotin et al. (2014). This table is provided at this link.
    • A production_rates table that contains initial guesses for the production rates for each gene. This table is provided at this link.
    • A network table that contains the gene regulatory network data from the Harbison et al. (2004) paper. Here is the link to the data.
    • A metadata table that encodes information about the database itself, i.e., other tables in the database.
      • A major part of the design work will be to figure out what information needs to be in the metadata table so that queries can be easily and uniquely performed on the data.

Milestone 4: Build the database

  • Once the design work has been completed, you need to actually import the data into the database.

Milestone 5: Validation, Quality Assurance, and Documentation of the Database

  • The QA will perform quality assurance to make sure that the database is correct and accurate.
    • In particular, the QA's need to make sure that all of the rows of data were imported into the database for each table.
    • The QA's will make sure that both the ID (SGD systematic name) and Standard Names are included in the expression table and are correct.
  • QA's will communicate to the Coder/Designers any changes needed to the database.
  • With the QA's finalize the database schema diagram

Milestone 6: Document the schema and design queries to create the GRNmap input workbook

  • Assist the Data Analysts with the queries needed to create a GRNmap input workbook
  • This is a sample of a GRNmap input workbook: 15-genes_28-edges_sample-GRNmap_Sigmoid_estimation.xlsx
    • Note that you will need to match the column ids and worksheet names exactly, including case. The worksheets need to be in the exact order of the sample workbook as well.
  • This is a description of what needs to go into each worksheet:
    • production_rates: two columns
      • id: standard names of the genes encoding the regulatory transcription factors in the network, in alphabetical order
      • production_rate: mRNA production rates for each of the genes
    • degradation_rates
      • id: standard names of the genes encoding the regulatory transcription factors in the network, in alphabetical order
      • degradation_rate: mRNA production rates for each of the genes
    • wt_log2_expression
      • id: standard names of the genes encoding the regulatory transcription factors in the network, in alphabetical order
      • Log2 fold changes for each gene for all the timepoints and replicates to be included in the analysis. The timepoints should be arranged in chronological order with the replicates grouped. The column headers are the time in minutes (just the number). See the sample for how it should be arranged.
    • network
      • Cell A1 needs to have cols regulators/rows targets
      • This is an adjacency matrix encoding the network connections. All genes to be included in the network should appear in column 1 and row 1 in alphabetical order.
      • A "0" indicates no connection between them.
      • A "1" indicates that the transcription factor in the column regulates the corresponding transcription factor in the row.
    • network_weights
      • This sheet should be identical to the network sheet. It represents the initial guesses for the weight values for the modeling.
    • optimization_parameters: this worksheet contains information needed for Matlab to run the model. This sheet should be copied from the sample and only two lines need to be modified:
      • expression_timepoints: this row should list the expression timepoints corresponding to the timepoints in the wt_log2_expression worksheet.
      • simulation_timepoints: this row is a list of timepoints that the model will simulate from the estimated parameters. It should begin and end with the same timepoint, but the intervals between the timepoints should be shorter than the experimental expression data.
    • threshold_b: two columns
      • id: standard names of the genes encoding the regulatory transcription factors in the network, in alphabetical order
      • threshold_b: all 0. This represents the initial guesses for the threshold b parameter in the network.

Using Microsoft Access Query Design

This is a loose set of instructions on how to use your Microsoft Access database to make the GRNmap input workbook.

  1. Import a table into the database that is the list of regulatory transcription factors that need to be included in the network (get from the Data Analysis team).
  2. Go to the Query Design view and select the tables that you need for the query. (For example, the TF table you just imported and the production_rates table).
  3. Link the ID fields that are equivalent.
  4. Right-click on the line between the fields and set the join properties:
    • Include all the records from the TF table, and only those records from the other table that match.
  5. Select the fields from the tables that you want to be output in the query and drag them to the grids at the bottom of the window.
  6. Choose "Make Table" query so that your results will be stored in a table.
  7. Run the query.
  8. Export the table created as tab-delimited text file. Bring it into Excel.
  9. Repeat as needed to create all of the worksheets you need.
Final Project Links
Overview Deliverables Guilds Project Manager Quality Assurance Data Analysis Coder/Designer
Team Yeast Beasts