Difference between revisions of "Dmadere Week 10"
Jump to navigation
Jump to search
(→Dynamical Systems Modeling of your Gene Regulatory Network: added procedures for using Matlab) |
(→References: added references) |
||
Line 111: | Line 111: | ||
==Acknowledgements== | ==Acknowledgements== | ||
==References== | ==References== | ||
+ | MATLAB | ||
+ | GRNsight | ||
+ | bio website |
Revision as of 22:34, 6 November 2019
Contents
Purpose
Methods/Results
Created GRNmap Input Workbook
production_rates sheet
- Created "production_rate" column in Excel workbook.
- Used Microsoft Access database to acquire all data used in analysis.
- Used these steps to perform query:
- Imported list of genes to a new table in the database. Clicked on the "External Data" tab and selected the Excel icon with the "up" arrow on it.
- Clicked the "Browse" button and selected Excel file containing network that was used to upload to GRNsight.
- Made sure the button next to "Import the source data into a new table in the current database" and clicked "OK".
- In the next window, selected the "network" worksheet, if it hasn't already been automatically selected. Clicked "Next".
- In the next window, made sure the "First Row Contains Column Headings" was checked. Clicked "Next".
- In the next window, the left-most column was highlighted. Changed the "Field Name" to "id" if it didn't say that already. Clicked "Next".
- In the next window, selected the button for "Choose my own primary key." and chose the "id" field from the drop down next to it. Clicked "Next".
- In the next field, made sure it said "Import to Table: network". Clicked Finish.
- In the next window did not save import steps, clicked "Close".
- A table called "network" appeared in the list of tables at the left of the window.
- Went to the "Create" tab. Clicked on the icon for "Query Design".
- In the window that appeared, clicked on the "network" table and clicked "Add". Clicked on the "production_rates" table and clicked "Add". Clicked "Close".
- The two tables appeared in the main part of the window. Told Access which fields in the two tables correspond to each other by clicking on the word "id" in the network table and dragging mouse to the "standard_name" field in the "production_rates" table, and release. Line appeared between those two words.
- Right-clicked on the line between those words and selected "Join Properties" from the menu that appeared. Selected Option "2: Include ALL records from 'network' and only those records from 'production_rates' where the joined fields are equal." Clicked "OK".
- Clicked on the "id" word in the "network" table and dragged it to the bottom of the screen to the first column next to the word "Field" and released.
- Clicked on the "production_rate" field in the "production_rates" table and dragged it to the bottom of the screen to the second column next to the word "Field" and released.
- Right-clicked anywhere in the gray area near the two tables. In the menu that appeared, selected "Query Type > Make Table Query...".
- In the window that appeared, name your table "production_rates_1" because couldn't have two tables with the same name in the database. Made sure that "Current Database" was selected and clicked "OK".
- Went to the "Query Tools: Menus" tab. Clicked on the exclamation point icon. Window appeared that said how many rows pasted into a new table. Clicked "Yes".
- New "production_rates_1" table appeared in the list at the left. Double-clicked on that table name to open it.
- Copied the data in this table and pasted it back into Excel workbook. Made sure that when pasting used "Paste Special > Paste values" so that the Access formatting didn't get carried along. Can also choose to export this table to Excel going to the "External Data" tab and selecting the Excel icon with the arrow pointing to the right. Selected the workbook want to export the table to, making sure that "Preserve Access formatting" was not checked. Clicked "OK", clicked "Close".
- If there were missing values, substituted the value 0.1980 for the missing production rates.
- Genes were listed in same order in all sheets of Excel workbook.
degradation_rates sheet
- This sheet contains degradation rates for all genes in the network, which are provided by the user.
- Currently, the Dahlquist Lab is using data based on published mRNA half-life data from Neymotin et al. (2006).
- We converted the half-life data values to the degradation rates by taking the natural log of the half-life and dividing by 2.
- The sheet contained two columns (from left to right) entitled "id", and "degradation_rate".
- The id was an identifier that will be used to identify a particular gene.
- The "degradation_rate" column contained the absolute value of the degradation rate for the corresponding gene as described above, rounded to four decimal places.
- To obtain these values, used the same file, Microsoft Access database that was used to obtain the production rates in the first worksheet. Again, copied and pasted the values one-by-one or followed the instructions to execute a query, substituting the appropriate "degradation_rates" table in the query. Noted that you didn't need to re-import your "network" table, just needed to create and execute the query.
- Genes listed in the same order in all the sheets in the Excel workbook.
- Substitute the value 0.0990 for the missing degradation rates.
Expression Data Sheets for Individual Yeast Strains
- Expression data can be provided for either a single strain or multiple strains of yeast (for example, the wild type strain and a transcription factor deletion strain).
- Each strain had its own sheet in the workbook.
- Each sheet was given a unique name that followed the convention "STRAIN_log2_expression", where the word "STRAIN" is replaced by the strain designation, which appeared in the optimization_diagnostics sheet.
- Everyone in the class had at least one expression worksheet called "wt_log2_expression".
- Included the transcription factors GLN3, HAP4, and CIN5 in your network. Thus, used the expression data from the dGLN3, dHAP4, dCIN5 deletion strains in workbooks as well, naming the worksheets "dgln3_log2_expression", "dhap4_log2_expression", and "dcin5_expression".
- If, for some reason, didn't have all three of those genes in your network, only included expression data for the wild type and the genes out of those three that were present in network.
- The sheet had the following columns in this order:
- "id": list of all genes. The genes were listed in the same order in all the sheets in the Excel workbook.
- The next series of columns contained the expression data for each gene at a given timepoint given as log2 ratios (log2 fold changes). The column header was the time at which the data were collected, without any units. For example, the 15 minute timepoint had a column header "15" and the 30 minute timepoint had the column header "30". GRNmap supports replicate data for each of the timepoints. Replicate data for the same timepoint were in columns immediately next to each other and had the same column headers. For example, three replicates of the 15 minute timepoint had "15", "15", "15" as the column headers.
- If data was provided for multiple strains, each strain had data for the same timepoints, although the number of replicates may have varied.
- Included the data for the 15, 30, and 60 minute timepoints, but not the 90 or 120 minute timepoints.
- The data used was contained in the Expression-and-Degradation-rate-database_2019.accdb file that was used to obtain the production and degradation rates.
- It is tedious to copy and paste all of these data by hand, so executed a query in Microsoft Access to do it instead. Followed the steps listed for the "production_rates" sheet for each strains expression data. After imported the data into Excel, changed the column headers to "15", "15", etc., as described above.
- Missing values in the expression data sheets were OK; didn't need to put any values there like for the production_rates or degradation_rates sheets.
network_weights sheet
- These are the initial guesses for the estimation of the weight parameters, w.
- Since these weights are initial guesses which will be optimized by GRNmap, the content of this sheet was identical to the "network" sheet.
optimization_parameters sheet
- The optimization_parameters sheet had two columns (from left to right) entitled, "optimization_parameter" and "value".
- Copied this worksheet from the sample workbook provided. The only row that was modified was row 15, "Strain". Included just the strain designations for which the sheet had a corresponding STRAIN_log2_expression sheet. If didn't have the dgln3, dhap4, or dcin5 expression sheets, then deleted those from this row. If deleted, made sure there were no gaps between cells.
- What is followed below is an explanation of what the optimization_parameters mean:
- alpha: Penalty term weighting (from the L-curve analysis)
- kk_max: Number of times to re-run the optimization loop. In some cases re-starting the optimization loop can improve performance of the estimation.
- MaxIter: Number of times MATLAB iterates through the optimization scheme. If this is set too low, *#MATLAB will stop before the parameters are optimized.
- TolFun: How different two least squares evaluations should be before the program determines that it is not making any improvement
- MaxFunEval: maximum number of times the program will evaluate the least squares cost
- TolX: How close successive least squares cost evaluations should be before the program determines that it is not making any improvement.
- production_function: = Sigmoid (case-insensitive) if sigmoidal model, =MM (case-insensitive) if *#Michaelis-Menten model
- L_curve: =0 if an L-curve analysis should NOT be run or =1 if an L-curve analysis SHOULD be run. The L-curve analysis will automatically run sequential rounds of estimation for an array of fixed alpha values (0.8, 0.5, 0.2, 0.1,0.08, 0.05,0.02,0.01, 0.008, 0.005, 0.002, 0.001, 0.0008, 0.0005, 0.0002, and 0.0001). GRNmap makes a copy of the user's selected input workbook and changes alpha to the first alpha in the list. The estimation runs and the resulting parameter values are used as the initial guesses for the next round of estimation with the next alpha value. This process repeats until all alpha values have been run. New input and output workbooks are generated for each alpha value, although currently, the graphs are only saved for the last run.
- estimate_params =1 if want to estimate parameters and =0 if the user wants to do just one forward run
- make_graphs =1 to output graphs; =0 to not output graphs
- fix_P =1 if the user does not want to estimate the production rate, P, parameter, just use the *#initial guess and never change; =0 to estimate
- fix_b =1 if the user does not want to estimate the b parameter, just use the initial guess and never change; =0 to estimate
- expression_timepoints: A row containing a list of the time points when the data was collected experimentally. Should correspond to the timepoint column headers in the STRAIN_log2_expression sheets.
- Strain: A row containing a list of all of the strains for which there is expression data in the workbook. Should correspond to the "STRAIN" portion of the names of the STRAIN_log2_expression sheets for each strain. Note that GRNmap will run the model for the wild type network (all genes present in the network) and for networks where the gene deleted from the designated STRAIN has been deleted from the network.
- simulation_timepoints: A row containing a list of the time points at which to evaluate the differential equations to generate the simulated data. This does not need to correspond to the actual measurement times, but should be in the same units (e.g. minutes).
threshold_b sheet
- These are the initial guesses for the estimation of the threshold_b parameters.
- There were two columns created:
- The left-most column contained the header "id" and listed the standard names for the genes in the model in the same order as in the other sheets.
- The second column had the header "threshold_b" and contained the initial guesses, used all 0.
Data and Files
- Data and files section includes excel workbook and GRNmap input workbook at this point.
- Dmadere (talk) 21:07, 4 November 2019 (PST)
Dynamical Systems Modeling of your Gene Regulatory Network
- Ran model using GRNmap written in MATLAB.
- Downloaded GRNmap v1.10 code from the Downloads page.
- Unzipped the file.
- Launched MATLAB R2014b.
- Open GRNmodel.m, which was in the directory that was unzipped.
- Clicked the Run button.
- Selected input workbook.
- When run was over, expression plots displayed.
- Zipped up files and uploaded to wiki.
Data & Files
Conclusion
Acknowledgements
References
MATLAB GRNsight bio website