Ntesfaio Week 10

Individual Journal Assignment

Purpose

The purpose of this week's assignment was to create a GRNmap with different sheets that go over the gene that was isolated from STEM profile. We also used GRNsight to visualize the results.

Methods

Creating a GRNmap Input Workbook

production_rates sheet

This sheet contained initial guesses for the production rate parameters, P, for all genes in the network.

Assuming that the system is in steady state with the relative expression of all genes equal to 1, (P/2) - lambda = 0, where lambda is the degradation rate, is a reasonable initial guess.

The sheet contained two columns (from left to right) entitled, "id", "production_rate".

The id was an identifier that the user used to identify a particular gene. In our case, we used the "StandardName", for example, GLN3.

The "production_rate" column contained the initial guesses for the P parameter as described above, rounded to four decimal places.

The production rates were provided in a Microsoft Access database

I performed a query to get the list of production rates for each gene as a group.

I Imported a list of genes to a new table in the database. I clicked on the "External Data" tab and selected the Excel icon with the "up" arrow on it.

I clicked the "Browse" button and selected the Excel file containing the network that was used to upload to GRNsight.

The button next to "Import the source data into a new table in the current database" was selected and I clicked "OK".

In the next window, I selected the "network" worksheet, if it wasn't already automatically selected. I Clicked "Next".

In the next window, I made sure the "First Row Contains Column Headings" was checked and I clicked "Next".

In the next window, the left-most column was highlighted. I changed the "Field Name" to "id". I clicked "Next".

In the next window, I selected the button for "Choose my own primary key." and chose the "id" field from the drop down next to it. I clicked "Next".

In the next field, I made sure it said "Import to Table: network". I clicked Finish.

In the next window I clicked "Close".

A table called "network" appeared in the list of tables at the left of the window.

I went to the "Create" tab. I clicked on the icon for "Query Design".

In the window that appeared, I clicked on the "network" table and clicked "Add". I Clicked on the "production_rates" table and clicked "Add". Lastly I clicked "Close".

The two tables should appear in the main part of the window. I clicked on the word "id" in the network table and dragged my mouse to the "standard_name" field in the "production_rates" table, and released.

I Right-clicked on the line between those words and selected "Join Properties" from the menu that appeared.

Selecting Option "2: Include ALL records from 'network' and only those records from 'production_rates' where the joined fields are equal." Click "OK".

I Clicked on the "id" word in the "network" table and dragged it to the bottom of the screen to the first column next to the word "Field" and released.

I Clicked on the "production_rate" field in the "production_rates" table and dragged it to the bottom of the screen to the second column next to the word "Field" and released.

I Right-clicked anywhere in the gray area near the two tables. In the menu that appeared, I selected "Query Type > Make Table Query...".

In the window that appeared, I named the table "production_rates_1", made sure"Current Database" was selected and Clicked "OK".

I went to the "Query Tools: Menus" tab. I Clicked on the exclamation point icon. A window appeared that said there are many rows being pasted into a new table. I Clicked "Yes".

The new "production_rates_1" table appeared in the list at the left. I Double-clicked on that table name to open it.

I copied the data in this table and pasted it back into the Excel workbook. I made sure that when I pasted that I used "Paste Special > Paste values" so that the Access formatting doesn't get carried along. I selected the workbook to export the table to, making sure that "Preserve Access formatting" was not checked. I Clicked "OK", clicked "Close".

If there were missing values, substitute the value 0.1980 for the missing production rates. Note that the genes should be listed in the same order in all the sheets in the Excel workbook.

degradation_rates sheet

This sheet contained degradation rates for all genes in the network, which are provided by the user.

The sheet contained two columns (from left to right) entitled "id", and "degradation_rate".

The id was an identifier that the user used to identify a particular gene.

The "degradation_rate" column should then contain the absolute value of the degradation rate for the corresponding gene as described above, rounded to four decimal places.

To obtain these values, Microsoft Access database that was used to obtain the production rates in the first worksheet. Again, I copy and pasted the values one-by-one

Again note, the genes should be listed in the same order in all the sheets in the Excel workbook.

If there are missing values, substitute the value 0.0990 for the missing degradation rates.

Expression Data Sheets for Individual Yeast Strains

Each strain has its own sheet in the workbook.

Each sheet should be given a unique name that follows the convention "STRAIN_log2_expression", where the word "STRAIN" is replaced by the strain designation, which appeared in the optimization_diagnostics sheet.

Everyone in the class had at least one expression worksheet called "wt_log2_expression".

You should have included the transcription factors GLN3, HAP4, and CIN5 in your network. Thus, we will use the expression data from the dGLN3, dHAP4, dCIN5 deletion strains in our workbooks as well, naming the worksheets "dgln3_log2_expression", "dhap4_log2_expression", and "dcin5_expression".

If, for some reason, you don't have all three of those genes in your network, only include expression data for the wild type and the genes out of those three that you have in your network.

The sheet should have the following columns in this order: "id": list of all genes. The genes should be listed in the same order in all the sheets in the Excel workbook. The next series of columns should contain the expression data for each gene at a given timepoint given as log2 ratios (log2 fold changes). The column header should be the time at which the data were collected, without any units. For example, the 15 minute timepoint would have a column header "15" and the 30 minute timepoint would have the column header "30". GRNmap supports replicate data for each of the timepoints. Replicate data for the same timepoint should be in columns immediately next to each other and have the same column headers. For example, three replicates of the 15 minute timepoint would have "15", "15", "15" as the column headers. If data are provided for multiple strains, each strain should have data for the same timepoints, although the number of replicates can vary.

I Included the data for the 15, 30, and 60 minute timepoints, but not the 90 or 120 minute timepoints.

The data used was contained in the Expression-and-Degradation-rate-database_2019.accdb file that was obtained to the production and degradation rates.

network sheet

The network derived from the YEASTRACT database for the Week 9 assignment can be copied and pasted into this sheet directly.

This sheet contained an adjacency matrix representation of the gene regulatory network.

The columns correspond to the transcription factors and the rows correspond to the target genes controlled by those transcription factors.

A “1” means there is an edge connecting them and a “0” means that there is no edge connecting them.

The upper-left cell (A1) should contain the text “cols regulators/rows targets”. This text is there as a reminder of the direction of the regulatory relationships specified by the adjacency matrix. The rest of row 1 should contain the names of the transcription factors that are controlling the other genes in the network, one transcription factor name per column. The rest of column A should contain the names of the target genes that are being controlled by the transcription factors heading each of the columns in the matrix, one target gene name per row. The transcription factor names should correspond to the "id" in the other sheets in the workbook. They should be capitalized the same way and occur in the same order along the top and side of the matrix. The matrix needs to be symmetric, i.e., the same transcription factors should appear along the top and left side of the matrix. The genes should be listed in the same order in all the sheets in the Excel workbook. Each cell in the matrix should then contain a zero (0) if there is no regulatory relationship between those two transcription factors, or a one (1) if there is a regulatory relationship between them. Again, the columns correspond to the transcription factors and the rows correspond to the target genes controlled by those transcription factors.

network_weights sheet

These are the initial guesses for the estimation of the weight parameters, w.

Since these weights are initial guesses which will be optimized by GRNmap, the content of this sheet can be identical to the "network" sheet.

optimization_parameters sheet

The optimization_parameters sheet should have two columns (from left to right) entitled, "optimization_parameter" and "value".

This worksheet was copied from the sample workbook provided. The only row that you need to modify is row 15, "Strain". Include just the strain designations for which you have a corresponding STRAIN_log2_expression sheet. If you don't have the dgln3, dhap4, or dcin5 expression sheets, then you will delete those from this row. If you do so, make sure that you don't leave any gaps between cells.

threshold_b sheet

There should be two columns.

The left-most column should contain the header "id" and list the standard names for the genes in the model in the same order as in the other sheets.

The second column should have the header "threshold_b" and should contain the initial guesses, we are going to use all 0.

Dynamical Systems Modeling of your Gene Regulatory Network

I Downloaded the GRNmap v1.10 code from the GRNmap Downloads page.

This is a direct link to start downloading (81 MB). Unzip the file. (Right-click, 7-zip > Extract here) Launched MATLAB R2014b.

I opened GRNmodel.m, which will be in the directory that you unzipped GRNmap-1.10 > matlab

Clicked the Run button (green "play" arrow).

I selected the input workbook.

When the run is over, expression plots displayed.

Output .xlsx and .mat files will be saved in the same folder as the input folder, along with .jpg files containing the optimization diagnostic and individual expression plots. I saved these files.

I uploaded the .xlsx file into GRNsight to visualize the results!

Conclusion

The purpose of this week's assignment was to continue working on the network excel spreadsheet. This spreadsheet was analyzed as the gene's time periods from 15-60 logs were considered. Using GrNsight I was able to view my results after matlab ran the expression plots. An xlsx, .mat, and .jpeg output was made from the Matlab run and uploaded to this wikipedia.

Acknowledgments

My homework partners this week are Aby User:Ymesfin and David User:Dramir36. We worked together on running the different databases such as GRNmap and YEASTRACT.

"Except for what is noted above, this individual journal entry was completed by me and not copied from another source."