Mavila9 Week 10
Jump to navigation
Jump to search
Contents
Links
Purpose
This investigation serves to help gain experience with database queries and modeling on Microsoft Access, GRNMAP, and MATLAB, to prepare for the the final project.
Methods
Create the GRNmap Input Workbook
This sample workbook was downloaded to base the network and microarray data worksheet on.
production_rates sheet
- The production rates were provided in a Microsoft Access database, which wasdownloaded from here.
- A query was performed to get the list of production rates for each gene as a group.
- To perform the query, the following steps were taken:
- A list of the genes was imported to a new table in the database. The "External Data" tab was clicked and the Excel icon with the "up" arrow on it was selected.
- The "Browse" button was clicked and the Excel file containing the network that was used to upload to GRNsight was selected.
- The button next to "Import the source data into a new table in the current database" was selected and "OK" was clicked.
- In the next window, the "network" worksheet was selected then "Next" was clicked .
- In the next window, the "First Row Contains Column Headings" was checked and "Next" was cliked.
- In the next window, the left-most column was be highlighted and the"Field Name" was changed to "id" then "Next" was clicked.
- In the next window, the button for "Choose my own primary key." was selected and the "id" field was chosen from the drop down next to it then "Next" was clicked.
- In the next field, "Import to Table: network" was input then "Finish" was clicked.
- In the next window the import step was not saved, then "Close" was clicked.
- A table called "network" appeared in the list of tables at the left of the window.
- "Query Design" was selected from the "Create" tab.
- In the window that appears, the "network" table was chosen and "Add" was clicked. The "production_rates" table was selected and "Add" was clicked then "Close" was clicked.
- The two tables appear in the main part of the window. Then the word "id" was clicked in the network table and dragged to the "standard_name" field in the "production_rates" table, and released. A line appeared between those two words.
- After right-clicking the line between those words and "Join Properties" was selected from the menu that appears. Option "2: Include ALL records from 'network' and only those records from 'production_rates' where the joined fields are equal." was selected and "OK" was clicked.
- The "id" word was clicked on in the "network" table and dragged to the bottom of the screen to the first column next to the word "Field" and release.
- The "production_rate" field was clicked in the "production_rates" table and dragged it to the bottom of the screen to the second column next to the word "Field" and released.
- Anywhere in the gray area near the two tables was right-clicked then in the menu that appears, "Query Type > Make Table Query..." was selected.
- In the window that appears, the table was named "production_rates_1" then "Current Database" was selected and "OK" was clicked.
- On the "Query Tools: Menus" tab the exclamation point icon was selected and "Yes" was selected.
- The "production_rates_1" table appeared in the list at the left. The table name was double-clicked to open.
- The data in this table was copied and pasted back into the Excel workbook.
degradation_rates sheet
- Same steps as "production_rates sheet" were followed but with choosing degradation_rates data sheet
Expression Data Sheets for Individual Yeast Strains
- Each sheet was given a unique name that follows the convention "STRAIN_log2_expression", where the word "STRAIN" is replaced by the strain designation, which will appear in the optimization_diagnostics sheet.
- The transcription factors GLN3, HAP4, and CIN5 were included in the network. Thus, the expression data from the dGLN3, dHAP4, dCIN5 deletion strains in the workbooks were used as well, naming the worksheets "dgln3_log2_expression", "dhap4_log2_expression", and "dcin5_expression".
- Each sheet was given a unique name that follows the convention "STRAIN_log2_expression", where the word "STRAIN" is replaced by the strain designation, which will appear in the optimization_diagnostics sheet.
- The sheet had the following columns in this order:
- "id": list of all genes. The genes were listed in the same order in all the sheets in the Excel workbook.
- The next series of columns contained the expression data for each gene at a given timepoint given as log2 ratios (log2 fold changes). The column headers included the time at which the data were collected, without any units.
- Each strain had data for the same timepoints.
- The data for the 15, 30, and 60 minute timepoints were included, but not the 90 or 120 minute timepoints.
- The data used was contained in the Expression-and-Degradation-rate-database_2019.accdb file that was used to obtain the production and degradation rates.
- It is tedious to copy and paste all of these data by hand, so a query was executed in Microsoft Access. The steps listed for the "production_rates" sheet were followed for each strains expression data. After the data was imported into Excel, the column headers were changed to to "15", "15", etc., as described above.
network sheet
- The network you derived from the YEASTRACT database for the Week 9 assignment was copied and pasted into this sheet directly.
network_weights sheet
- Since these weights are initial guesses which will be optimized by GRNmap, the content of this sheet were identical to the "network" sheet.
optimization_parameters sheet
- The optimization_parameters sheet had two columns (from left to right) entitled, "optimization_parameter" and "value".
- This worksheet was copied from the sample workbook provided. The only row that was modified was row 15, "Strain". Just the strain designations for which you have a corresponding STRAIN_log2_expression sheet were included. If the dgln3, dhap4, or dcin5 expression sheets were not present, then they were deleted from this row.
threshold_b sheet
- These were the initial guesses for the estimation of the threshold_b parameters.
- There were two columns.
- The left-most column contained the header "id" and listed the standard names for the genes in the model in the same order as in the other sheets.
- The second column had the header "threshold_b" and which contained the initial guesses all of which are 0.
Dynamical Systems Modeling of your Gene Regulatory Network
- To run GRNmap from code, MATLAB R2014b must be installed on the computer.
- The GRNmap v1.10 code was downloaded from the GRNmap Downloads page.
- The file was unzipped. (Right-click, 7-zip > Extract here)
- MATLAB R2014b was launched.
- GRNmodel.m was opened, which was in the directory of the unzipped GRNmap-1.10 > matlab
- The Run button (green "play" arrow) was clicked.
- The workbook was selected to input.
- The optimization diagnostics graphic that shows the progress of the estimation appeared.
- When the run was over, expression plots were displayed.
- Output .xlsx and .mat files were saved in the same folder as the input folder, along with .jpg files containing the optimization diagnostic and individual expression plots. These files were saved.
Data
media:GRNmap output mavila9.zip
Conclusion
The results of running the MATLAB program showed the influence on transcription of the various genes analyzed, including inhibition and promotion.