Ymesfin Week 10

From LMU BioDB 2019
Revision as of 19:17, 1 November 2019 by Ymesfin (talk | contribs) (Methods: added methods)
Jump to navigation Jump to search

Purpose

The purpose of this assignment was to create a detailed electronic lab notebook to statistically analyze a DNA microarray dataset, demonstrate our understanding of p-value cut-offs, and display the relationships between the network of transcription factors in Saccharomyces cerevisiae. This week students will create a GRNmap input to further understand the relationship between the genes in Saccharomyces cerevisiae and to familiarize themselves with creating queries and modeling.

Methods

Creating the GRNmap Input Workbook

  1. Opened new excel worksheet

production_rates sheet

  1. Named first sheet "production_rates" and labeled the first and second columns 'id' and "production_rate", respectively.
  2. Downloaded Production Rates Database.
  3. Imported list of genes to a new table in the database. Clicked on the "External Data" tab and selected the Excel icon with the "up" arrow on it.
  4. Clicked the "Browse" button and selected Excel file containing network used to upload to GRNsight.
  5. Made sure the button next to "Import the source data into a new table in the current database" and clicked "OK".
  6. In the next window, selected the "network" worksheet, if it wasn't automatically selected. Clicked "Next".
  7. In the next window, made sure the "First Row Contains Column Headings" was checked. Clicked "Next".
  8. In the next window, changed the "Field Name" to "id". Clicked "Next".
  9. In the next window, selected the button for "Choose my own primary key." and chose the "id" field from the drop down next to it. Clicked "Next".
  10. In the next field, made sure it said "Import to Table: network". Clicked Finish.
  11. Clicked "Close".
  12. Went to the "Create" tab. Clicked on the icon for "Query Design".
  13. In the window that appeared, clicked on the "network" table and clicked "Add". Clicked on the "production_rates" table and clicked "Add". Clicked "Close".
  14. Clicked on the word "id" in the network table and dragged mouse to the "standard_name" field in the "production_rates" table, and released.
  15. Right-clicked on the line between those words and selected "Join Properties" from the menu that appears. Selected Option "2: Include ALL records from 'network' and only those records from 'production_rates' where the joined fields are equal." Clicked "OK".
  16. Clicked on the "id" word in the "network" table and dragged it to the bottom of the screen to the first column next to the word "Field" and released.
  17. Clicked on the "production_rate" field in the "production_rates" table and dragged it to the bottom of the screen to the second column next to the word "Field" and released.
  18. Right-clicked anywhere in the gray area near the two tables. In the menu that appeared, selected "Query Type > Make Table Query...".
  19. In the window that appeared, named the table "production_rates_1". Made sure that "Current Database" is selected and Clicked "OK".
  20. Went to the "Query Tools: Menus" tab. Clicked on the exclamation point icon. A window appeared that tells how many rows were being pasted into a new table. Clicked "Yes".
  21. Your new "production_rates_1" table will appear in the list at the left. Double-click on that table name to open it.
  22. Copied the data in this table and pasted it back into Excel workbook.
    • If there were missing values, value 0.1980 was substituted for the missing production rates.

degradation_rates sheet

  1. Added new sheet called "degradation_rates" and labeled the first two columns (from left to right) "id" and "degradation_rate".
  2. Executed a similar query as the "production_rates" sheet, substituting the appropriate "degradation_rates" table in the query.
    • Substituted the value 0.0990 for the missing degradation rates.

expression Data Sheets for Individual Yeast Strains

  1. Added 4 sheets for wt,dGLN3, dHAP4, and dCIN5.
    • Each sheet was given a unique name that followed the convention "STRAIN_log2_expression", where the word "STRAIN" is replaced by the strain designation
  2. First column in each sheet was labeled "id".
  3. The next series of columns were labeled with the timepoints at which the data were collected, without any units. For example, the 15 minute timepoint had a column header "15". Replicate data for the same timepoint were in columns immediately next to each other and had the same column headers. For example, three replicates of the 15 minute timepoint had "15", "15", "15" as the column headers.*# If data was provided for multiple strains, each strain had data for the same timepoints, although the number of replicates could vary.
    • The data for the 15, 30, and 60 minute timepoints, but not the 90 or 120 minute timepoints, were included.
    • The data used was contained in the database used to obtain the production and degradation rates.
  4. A similar query as that for the "production_rates" database sheet for each strains expression data was executed to import the data into the corresponding Excel sheet.

network sheet

  1. Added a new sheet labeled "network".
  2. The network derived from the YEASTRACT database for the Week 9 assignment was copied and pasted into this sheet directly

network_weights sheet

  1. Added new sheet labeled "network_weights"
  2. Copied the content of the "network" sheet to this sheet

optimization_parameters sheet

  1. A new sheet was added and labeled "optimization_parameters"
  2. The first two columns (from left to right) were entitled, "optimization_parameter" and "value".
  3. This worksheet was copied from the sample workbook.
    • Row 15, "Strain", was modified to include the strain designations for which the corresponding STRAIN_log2_expression sheets.

threshold_b sheet

  1. Added new sheet labeled "threshold_b".
  2. Labeled the first column "id" and listed the standard names for the genes in the model in the same order as in the other sheets.
  3. The second column was labeled "threshold_b" and contained the initial guesses of 0 for all the cells.

Data/Files

GRNmap Input Sheet

Conclusion

Acknowledgements

References