2013-12-13T21:42:13Z

Laurmagee: /* GenMAPP and MAPPFinder Protocols */

==Statistical Analysis==
*Open the following spreadsheet [[File:Compiled Ratios and Logs.xls]] from [[Laurmagee: Week 13]].
*Begin a new workbook and copy over the Gene ID column into cell A1. Then in the subsequent columns, copy over the log values found in your previous sheet. Order all the different time periods in increasing intervals and sort the replicants at each time in increasing order as well. After this pasting has been done, your column titles across the top of the worksheet will say "Gene ID", "log_700S1-t15", "log_700S2-t15", "log_700S3-t15"... and so on for t30, t60, and t240.
*Begin scaling and centering the data by first inserting a new worksheet in Excel labeled "scaled_centered".
*Select and copy all of the data from your original worksheet. Then paste it into cell A1 in new worksheet.
*Insert two rows in between the top row of headers and the first data row. In cell A2, type "Average" and in cell A3, type "StdDev".
*You will now compute the Average log ratio for replicant and time period.
*In cell B2, type the following equation:
=AVERAGE(B4:B5224)
and press "Enter".
*Excel is computing the average value of the cells specified in the range given inside the parentheses. Instead of typing the cell designations, you can click on the beginning cell, scroll down to the bottom of the worksheet, and shift-click on the ending cell.
*You will now compute the Standard Deviation of the log ratios on each chip (each column of data). In cell B3, type the following equation:
=STDEV(B4:B5224)
and press "Enter".
*Excel will now do some work for you. Copy these two equations (cells B2 and B3) and paste them into the empty cells in the rest of the columns. Excel will automatically change the equation to match the cell designations for those columns.
You have now computed the average and standard deviation of the log ratios for replicant and time period.
*Copy the column headings for all of your data columns and then paste them to the right of the last data column so that you have a second set of headers above blank colums of cells. Edit the names of the columns so that they now read: log_700S1-t15_scaled_centered, log_700S2-t15_scaled_centered, etc.
*In cell N4, type the following equation:
=(B4-B$2)/B$3
*In this case, we want the data in cell B4 to have the average subtracted from it (cell B2) and be divided by the standard deviation (cell B3). We use the dollar sign symbols in front of the "2" and "3" to tell Excel to always reference that row in the equation, even though we will paste it for the entire column.
*Copy and paste this equation into the entire column.
*Copy and paste the scaling and centering equation for each of the columns of data with the "_scaled_centered" column header. Be sure that your equation is correct for the column you are calculating.

*Insert a new worksheet and name it "statistics".
*Go back to the "scaling_centering" worksheet and copy the first column ("ID").
*Paste the data into the first column of your new "statistics" worksheet.
*Go back to the "scaling_centering" worksheet and copy the columns that are designated "_scaled_centered".
*Go to your new worksheet and click on the B1 cell. Select "Paste Special" from the Edit menu. A window will open: click on the radio button for "Values" and click OK. This will paste the numerical result into your new worksheet instead of the equation which must make calculations on the fly.
*Go to a new column on the right of your worksheet. Type the header "Avg_LogFC_t15", "Avg_LogFC_t30", "Avg_LogFC_60", and "Avg_LogFC_240" into the top cell of the next four columns.
*Compute the average log fold change for the replicates for each patient by typing the equation:
=AVERAGE(B2:D2)
into cell N2. Copy this equation and paste it into the rest of the column.
*Create the equation for times t30, t60, and t240 and paste it into their respective columns.
*Label the next four columns "Tstat_t15", "Tstat_t30", "Tstat_t60", and "Tstat_t240". This will compute a T statistic that tells us whether the scaled and centered average log ratio is significantly different than 0 (no change). Enter the equation:
=N2/(STDEV(B2:D2)/SQRT(3))
*(NOTE: in this case the number of replicates is 3. Be careful that you are using the correct number of parentheses.) Copy the equation and paste it into all rows in that column as well as the next three column making sure to change the cells involved in the equation accordingly.
*Label the top cell in the next four columns "Pvalue_t15", "Pvalue_t30", "Pvalue_t60", and "Pvalue_t240". In the cell below the label, enter the equation:
=TDIST(ABS(R2),2, 2)
*The number of degrees of freedom is the number of replicates minus one, so in our case there are 2 degrees of freedom. Copy the equation and paste it into all rows in that column and the next three columns making sure to change the cell involved to the appropriate Tstat value.
*Insert a new worksheet and name it "forGenMAPP".
*Go back to the "statistics" worksheet and Select All and Copy.
*Go to your new sheet and click on cell A1 and select Paste Special, click on the Values radio button, and click OK. We will now format this worksheet for import into GenMAPP.
*Select Columns B through Q (all the fold changes). Select the menu item Format > Cells. Under the number tab, select 2 decimal places. Click OK.
*Select Columns R and Z. Select the menu item Format > Cells. Under the number tab, select 4 decimal places. Click OK.
*Select Columns N through Z and Cut. Select Column B by left-clicking on the "B" at the top of the column. Then right-click on the Column B header and select "Insert Cut Cells". This will insert the data without writing over your existing columns.
*Delete Rows 2 and 3 where it says "Average" and "StDev" so that your data rows with gene IDs are immediately below the header row 1.
*Insert a column to the right of the "ID" column. Type the header "SystemCode" into the top cell of this column. Fill the entire column (each cell) with the letter "N".
*Select the menu item File > Save As, and choose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu. Excel will make you click through a couple of warnings because it doesn't like you going all independent and choosing a different file type than the native .xls. This is OK. Your new *.txt file is now ready for import into GenMAPP.
*[[File:SinorhizobiumMeliloti LM GenMapp DataSheet.xls]]

==Sanity Check==
#How many genes have p value < 0.05 for the time set of 15 minutes?
#*3613 genes
#How many genes have p value < 0.05 for the time set of 30 minutes?
#*5225 genes
#How many genes have p value < 0.05 for the time set of 60 minutes?
#*5207 genes
#How many genes have p value < 0.05 for the time set of 240 minutes?
#*6790 genes
#How many genes have p value < 0.01 for the time set of 15 minutes?
#*907 genes
#How many genes have p value < 0.01 for the time set of 30 minutes?
#*1518 genes
#How many genes have p value < 0.01 for the time set of 60 minutes?
#*1553 genes
#How many genes have p value < 0.01 for the time set of 240 minutes?
#*2437 genes
#How many genes have p value < 0.001 for the time set of 15 minutes?
#*92 genes
#How many genes have p value < 0.001 for the time set of 30 minutes?
#*179 genes
#How many genes have p value < 0.001 for the time set of 60 minutes?
#*172 genes
#How many genes have p value < 0.001 for the time set of 240 minutes?
#*347 genes
#How many genes have p value < 0.0001 for the time set of 15 minutes?
#*7 genes
#How many genes have p value < 0.0001 for the time set of 30 minutes?
#*15 genes
#How many genes have p value < 0.0001 for the time set of 60 minutes?
#*13 genes
#How many genes have p value < 0.0001 for the time set of 240 minutes?
#*36 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 15 minutes?
#*1521 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 30 minutes?
#*1926 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 60 minutes?
#*2194 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 240 minutes?
#*2846 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 15 minutes?
#*2092 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 30 minutes?
#*3299 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 60 minutes?
#*3013 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 240 minutes?
#*3944 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 15 minutes?
#*1476 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 30 minutes?
#*1890 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 60 minutes?
#*2129 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 240 minutes?
#*2763 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 15 minutes?
#*2052 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 30 minutes?
#*3256 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 60 minutes?
#*2942 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 240 minutes?
#*3866 genes

==GenMAPP and MAPPFinder Protocols==
*To begin GenMAPP analysis, first launch GenMAPP 2 or download it off of the following website: http://genmapp.org.
*Look at the lower-left hand corner to see what gene database is loaded. For this assignment, the gene database is [[Media:Sinorhizobium_meliloti_1021_mpetredi_2013123-2.gdb]] should appear in the corner.
*If another database appears or if there is "No Gene Database", go to Data > Choose Gene Database and find the database you need to use.
*Once the correct database is loaded, go to Data > Expression Dataset Manager. This will allow you to input the data file created in the "Statistical Analysis" portion of this page.
*In the window that pops up, go to Expression Datasets > New Dataset and open the tab-delimited file you created for GenMAPP: [[Media:SinorhizobiumMeliloti_LM_GenMapp_DataSheet.txt]]
*In the "Data Type Specification" window that pops up, only check the box next to a column header if that column has character data. All of the boxes should remain unchecked, because none of the columns in our dataset contain non-numerical values.
*Give the Expression Dataset Manager time to convert your data into a GEX file.
*An error message may appear that states that the Expression Dataset Manager was unable to convert some of the lines of the data. These lines of data are not incorporated into the Expression Dataset but rather recorded in an exception file that contains all of your raw data and an additional column called ~Error~.
*The exception file is a tab-delimited file with the suffix .EX appended to the name of the raw data file you loaded into the Expression Dataset Manager.
*Open the the exception file in Excel and filter the data to note what errors have been recorded.
*Using the .gdb Gene Database created by my partners, there were 5,538 errors, each of which was "Gene not found in OrderedLocusNames or any related system."
*Customize the new Expression Dataset by creating Color Sets, which contain the instructions to GenMAPP for displaying data on MAPPs.
*In the "Color Sets" section, type in your own title into the "Name" field.
*To specify what value appears next to each gene on a MAPP, select "Avg_LogFC_t15" in the drop down menu in the "Gene Value" field.
*We are using the t15 time period for this step to represent the results from all four time intervals, because it would be too challenging to complete this protocol with all four time interval values.
*In the "Criteria Builder" section, click on the "New" button. Now, we will construct the criterion to query the data.
*We will set the criterion to query for all the genes that have a significant (i.e. [Pvalue] < 0.05) decrease in the average log fold change (i.e. [Avg_LogFC_t15] < -0.25).
*In the menu under "Columns" in the "Criteria Builder" section, select "Avg_LogFC_t240", which will then appear in the "Criterion" field.
*Then choose the "<", ">", and "=" as appropriate, paired with -0.25.
*Type out the word "AND" in this same field and select "Pvalue_t15" and the "Ops" accordingly. .
*Under "Ops", click on the "<" operator. Then, type 0.05 (this will appear in the "Criterion" field).
*Enter the name for the criterion in the "Label in Legend" field "Decreased", as we are looking for the Avg_LogFC that have decreased.
*Choose a color for the criterion by left-clicking on the box next to "Color". Choose a color from the Color window that appears and click OK. In this experiment, the color red was chosen.
*You may now click the "Add" button.
*Now we will add another criterion. In the "Criteria Builder" section, click on the "New" button. Now, we will construct the criterion to query the data.
*We will set the criterion to query for all the genes that have a significant (i.e. [Pvalue] < 0.05) decrease in the average log fold change (i.e. [Avg_LogFC_t15] > 0.25).
*In the menu under "Columns" in the "Criteria Builder" section, select "Avg_LogFC_t240", which will then appear in the "Criterion" field.
*Then choose the "<", ">", and "=" as appropriate, paired with 0.25.
*Type out the word "AND" in this same field and select "Pvalue_t15" and the "Ops" accordingly. .
*Under "Ops", click on the "<" operator. Then, type 0.05 (this will appear in the "Criterion" field).
*Enter the name for the criterion in the "Label in Legend" field "Increased", as we are looking for the Avg_LogFC that have increased.
*Choose a color for the criterion by left-clicking on the box next to "Color". Choose a color from the Color window that appears and click OK. In this experiment, the color green was chosen.
*You may now click the "Add" button.
*Save the entire Expression Dataset by going to Expression Datasets > Save.
*Exit the Expression Dataset to view the Color Sets on a MAPP.
*[[Media:ColorSets.mapp]]

*Moving onto the MAPPFinder Protocol, we will stay in GenMAPP, but selected Tools > MAPPFinder.
*Click on the button "Calculate New Results" and then choose the "Find File" button on the page to load your GEX file: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gex]]
*Choose the Color Set and Criteria with which to filter the data. Click on "Decreased" criteria in the right-hand box then check the two boxes labeled "Gene Ontology" and "p value".
*Then "Browse" through your computer and create a meaningful filename for the project.
*Now you can hit "Run MAPPFinder".
*It will take a while for this process to finish, but a Gene Ontology browsers with open showing your results when it has been completed.
*To see a list of the most significant Gene Ontology terms, click on the menu item "Show Ranked List".
*The top twenty go terms are listed in the following image: [[File:MAPPFinder.PNG]]
*In Windows, make a copy of the results file and open it in Excel.
*Click on a cell in the row of headers. On the tool bar, select Sort & Filter > Filter. Set the following filters:
**Z Score greater than 2
**Permute P less than 0.05
**Number Changed greater than or equal to 5 and less than 100.
**Percent Changed greater than or equal to 25
**108 results found
*Save the file as a different Excel spreadsheet name by selecting File > Save As and select Excel workbook (.xls) from the drop-down menu.
*Now look at the "Increased" data. We will stay in GenMAPP, but selected Tools > MAPPFinder.
*Click on the button "Calculate New Results" and then choose the "Find File" button on the page to load your GEX file: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gex]]
*Choose the Color Set and Criteria with which to filter the data. Click on "Increased" criteria in the right-hand box then check the two boxes labeled "Gene Ontology" and "p value".
*Then "Browse" through your computer and create a meaningful filename for the project.
*Now you can hit "Run MAPPFinder".
*It will take a while for this process to finish, but a Gene Ontology browsers with open showing your results when it has been completed.
*To see a list of the most significant Gene Ontology terms, click on the menu item "Show Ranked List".
*The top twenty go terms are listed in the following image: [[File: MAPPFinder Capture1.PNG]]
*In Windows, make a copy of the results file and open it in Excel.
*Click on a cell in the row of headers. On the tool bar, select Sort & Filter > Filter. Set the following filters:
**Z Score greater than 2
**PermuteP less than .05
**Number Changed greater than or equal to 5 and less than 100
**Percent Changed greater than or equal to 25
**6 results found
*Save the file as a different Excel spreadsheet name by selecting File > Save As and select Excel workbook (.xls) from the drop-down menu.

Laurmagee: Week 13

2013-12-13T19:10:31Z

Laurmagee:

The following link is to the Sinorhizobium meliloti team page: [[Team Name]]
* After downloading all of the files off of the following link last week: [[http://www.ebi.ac.uk/arrayexpress/experiments/E-MEXP-785/?keywords=&organism=Sinorhizobium%20meliloti&array=&exptype| Microarray Downloads]], I began compiling the data and computing the log((Cy5 Signal- Cy5 background)/(Cy3 Signal- Cy3 background)) for every replicate (1-3) corresponding with the four different times offered in the experiment (15, 30, 60, 240) when adding 0.7 M sucrose.
* The following files are representing the data for every specific replicate at a specific time. The labeling 700S represents .7M sucrose and the following numbers (1-3) represent the replicate number. Then t(15, 30, 60, 240) represents each time period that was taken.
**[[File:700S1-t15.xls]]
**[[File:700S1-t30.xls]]
**[[File:700S1-t60.xls]]
**[[File:700S1-t240.xls]]
**[[File:700S2-t15.xls]]
**[[File:700S2-t30.xls]]
**[[File:700S2-t60.xls]]
**[[File:700S2-t240.xls]]
**[[File:700S3-t15.xls]]
**[[File:700S3-t30.xls]]
**[[File:700S3-t60.xls]]
**[[File:700S3-t240.xls]]
* These files were created by extracting the Cy5 Signal, Cy5 Background, Cy3 Signal, and Cy3 Background median values and placing them into a new worksheet.
*In the A column is the Cy3 Signal median, B column Cy3 Background median, and the C column is Cy3 Signal-Cy3 Background, which are values found by entering in the following equation into C2: =A2-B2. You can copy and paste this equation into the rest of the column by copying the C2 cell and highlighting it, then scrolling down to the last value and selecting it while pressing the SHIFT key. This will highlight the entire column and you can paste the equation into all of the columns easily by hitting COMMAND and then V.
* In the D column, we will now add Cy5 Signal median, in the E column Cy5 Background median, and finally in column F label Cy5 Signal-Cy5 Background. In the F2 cell, you will enter the equation: =D2-E2. copy and paste this equation into the rest of the F column by using the steps described above for column C.
*Then finally in column G, we will find the value of (Cy5 Signal- Cy5 background)/(Cy3 Signal- Cy3 background) by entering in the following equation into the G2 cell: =C2/F2 and pasting it into the rest of the G column.
* The above steps were repeated twelve times for each of the three replicates and their four time measurements.
*The following file is a compilation of all the Cy5/Cy3 ratios, with the addition of the log of these values: [[File:Compiled Ratios and Logs.xls]]
*The A column includes the Gene IDs, and then the B, D, F, H, J, L, N, P, R, T, V, X include the Cy5/Cy3 ratios for each time period and replicant. To find the log of each column, the following equation was entered: =log(B2) into cell C2. Then the equation was copy and pasted in the whole C column. In the subsequent empty columns, the same log equation is inserted, with the appropriate letter representing the column to the left.
*The final data file is: [[File:Full Raw Data.xls]]

[[User:Laurmagee|Laurmagee]] ([[User talk:Laurmagee|talk]]) 12:35, 21 November 2013 (PST)
[[Category: Journal Entry]][[Category: Sinorhizobium meliloti]]

Laurmagee: Week 15

2013-12-13T04:31:56Z

Laurmagee: /* GenMAPP and MAPPFinder Protocols */

==Statistical Analysis==
*Open the following spreadsheet [[File:Compiled Ratios and Logs.xls]] from [[Laurmagee: Week 13]].
*Begin a new workbook and copy over the Gene ID column into cell A1. Then in the subsequent columns, copy over the log values found in your previous sheet. Order all the different time periods in increasing intervals and sort the replicants at each time in increasing order as well. After this pasting has been done, your column titles across the top of the worksheet will say "Gene ID", "log_700S1-t15", "log_700S2-t15", "log_700S3-t15"... and so on for t30, t60, and t240.
*Begin scaling and centering the data by first inserting a new worksheet in Excel labeled "scaled_centered".
*Select and copy all of the data from your original worksheet. Then paste it into cell A1 in new worksheet.
*Insert two rows in between the top row of headers and the first data row. In cell A2, type "Average" and in cell A3, type "StdDev".
*You will now compute the Average log ratio for replicant and time period.
*In cell B2, type the following equation:
=AVERAGE(B4:B5224)
and press "Enter".
*Excel is computing the average value of the cells specified in the range given inside the parentheses. Instead of typing the cell designations, you can click on the beginning cell, scroll down to the bottom of the worksheet, and shift-click on the ending cell.
*You will now compute the Standard Deviation of the log ratios on each chip (each column of data). In cell B3, type the following equation:
=STDEV(B4:B5224)
and press "Enter".
*Excel will now do some work for you. Copy these two equations (cells B2 and B3) and paste them into the empty cells in the rest of the columns. Excel will automatically change the equation to match the cell designations for those columns.
You have now computed the average and standard deviation of the log ratios for replicant and time period.
*Copy the column headings for all of your data columns and then paste them to the right of the last data column so that you have a second set of headers above blank colums of cells. Edit the names of the columns so that they now read: log_700S1-t15_scaled_centered, log_700S2-t15_scaled_centered, etc.
*In cell N4, type the following equation:
=(B4-B$2)/B$3
*In this case, we want the data in cell B4 to have the average subtracted from it (cell B2) and be divided by the standard deviation (cell B3). We use the dollar sign symbols in front of the "2" and "3" to tell Excel to always reference that row in the equation, even though we will paste it for the entire column.
*Copy and paste this equation into the entire column.
*Copy and paste the scaling and centering equation for each of the columns of data with the "_scaled_centered" column header. Be sure that your equation is correct for the column you are calculating.

*Insert a new worksheet and name it "statistics".
*Go back to the "scaling_centering" worksheet and copy the first column ("ID").
*Paste the data into the first column of your new "statistics" worksheet.
*Go back to the "scaling_centering" worksheet and copy the columns that are designated "_scaled_centered".
*Go to your new worksheet and click on the B1 cell. Select "Paste Special" from the Edit menu. A window will open: click on the radio button for "Values" and click OK. This will paste the numerical result into your new worksheet instead of the equation which must make calculations on the fly.
*Go to a new column on the right of your worksheet. Type the header "Avg_LogFC_t15", "Avg_LogFC_t30", "Avg_LogFC_60", and "Avg_LogFC_240" into the top cell of the next four columns.
*Compute the average log fold change for the replicates for each patient by typing the equation:
=AVERAGE(B2:D2)
into cell N2. Copy this equation and paste it into the rest of the column.
*Create the equation for times t30, t60, and t240 and paste it into their respective columns.
*Label the next four columns "Tstat_t15", "Tstat_t30", "Tstat_t60", and "Tstat_t240". This will compute a T statistic that tells us whether the scaled and centered average log ratio is significantly different than 0 (no change). Enter the equation:
=N2/(STDEV(B2:D2)/SQRT(3))
*(NOTE: in this case the number of replicates is 3. Be careful that you are using the correct number of parentheses.) Copy the equation and paste it into all rows in that column as well as the next three column making sure to change the cells involved in the equation accordingly.
*Label the top cell in the next four columns "Pvalue_t15", "Pvalue_t30", "Pvalue_t60", and "Pvalue_t240". In the cell below the label, enter the equation:
=TDIST(ABS(R2),2, 2)
*The number of degrees of freedom is the number of replicates minus one, so in our case there are 2 degrees of freedom. Copy the equation and paste it into all rows in that column and the next three columns making sure to change the cell involved to the appropriate Tstat value.
*Insert a new worksheet and name it "forGenMAPP".
*Go back to the "statistics" worksheet and Select All and Copy.
*Go to your new sheet and click on cell A1 and select Paste Special, click on the Values radio button, and click OK. We will now format this worksheet for import into GenMAPP.
*Select Columns B through Q (all the fold changes). Select the menu item Format > Cells. Under the number tab, select 2 decimal places. Click OK.
*Select Columns R and Z. Select the menu item Format > Cells. Under the number tab, select 4 decimal places. Click OK.
*Select Columns N through Z and Cut. Select Column B by left-clicking on the "B" at the top of the column. Then right-click on the Column B header and select "Insert Cut Cells". This will insert the data without writing over your existing columns.
*Delete Rows 2 and 3 where it says "Average" and "StDev" so that your data rows with gene IDs are immediately below the header row 1.
*Insert a column to the right of the "ID" column. Type the header "SystemCode" into the top cell of this column. Fill the entire column (each cell) with the letter "N".
*Select the menu item File > Save As, and choose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu. Excel will make you click through a couple of warnings because it doesn't like you going all independent and choosing a different file type than the native .xls. This is OK. Your new *.txt file is now ready for import into GenMAPP.
*[[File:SinorhizobiumMeliloti LM GenMapp DataSheet.xls]]

==Sanity Check==
#How many genes have p value < 0.05 for the time set of 15 minutes?
#*3613 genes
#How many genes have p value < 0.05 for the time set of 30 minutes?
#*5225 genes
#How many genes have p value < 0.05 for the time set of 60 minutes?
#*5207 genes
#How many genes have p value < 0.05 for the time set of 240 minutes?
#*6790 genes
#How many genes have p value < 0.01 for the time set of 15 minutes?
#*907 genes
#How many genes have p value < 0.01 for the time set of 30 minutes?
#*1518 genes
#How many genes have p value < 0.01 for the time set of 60 minutes?
#*1553 genes
#How many genes have p value < 0.01 for the time set of 240 minutes?
#*2437 genes
#How many genes have p value < 0.001 for the time set of 15 minutes?
#*92 genes
#How many genes have p value < 0.001 for the time set of 30 minutes?
#*179 genes
#How many genes have p value < 0.001 for the time set of 60 minutes?
#*172 genes
#How many genes have p value < 0.001 for the time set of 240 minutes?
#*347 genes
#How many genes have p value < 0.0001 for the time set of 15 minutes?
#*7 genes
#How many genes have p value < 0.0001 for the time set of 30 minutes?
#*15 genes
#How many genes have p value < 0.0001 for the time set of 60 minutes?
#*13 genes
#How many genes have p value < 0.0001 for the time set of 240 minutes?
#*36 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 15 minutes?
#*1521 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 30 minutes?
#*1926 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 60 minutes?
#*2194 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 240 minutes?
#*2846 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 15 minutes?
#*2092 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 30 minutes?
#*3299 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 60 minutes?
#*3013 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 240 minutes?
#*3944 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 15 minutes?
#*1476 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 30 minutes?
#*1890 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 60 minutes?
#*2129 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 240 minutes?
#*2763 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 15 minutes?
#*2052 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 30 minutes?
#*3256 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 60 minutes?
#*2942 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 240 minutes?
#*3866 genes

==GenMAPP and MAPPFinder Protocols==
*To begin GenMAPP analysis, first launch GenMAPP 2 or download it off of the following website: http://genmapp.org.
*Look at the lower-left hand corner to see what gene database is loaded. For this assignment, the gene database is [[Media:Sinorhizobium_meliloti_1021_mpetredi_2013123-2.gdb]] should appear in the corner.
*If another database appears or if there is "No Gene Database", go to Data > Choose Gene Database and find the database you need to use.
*Once the correct database is loaded, go to Data > Expression Dataset Manager. This will allow you to input the data file created in the "Statistical Analysis" portion of this page.
*In the window that pops up, go to Expression Datasets > New Dataset and open the tab-delimited file you created for GenMAPP: [[Media:SinorhizobiumMeliloti_LM_GenMapp_DataSheet.txt]]
*In the "Data Type Specification" window that pops up, only check the box next to a column header if that column has character data. All of the boxes should remain unchecked, because none of the columns in our dataset contain non-numerical values.
*Give the Expression Dataset Manager time to convert your data into a GEX file.
*An error message may appear that states that the Expression Dataset Manager was unable to convert some of the lines of the data. These lines of data are not incorporated into the Expression Dataset but rather recorded in an exception file that contains all of your raw data and an additional column called ~Error~.
*The exception file is a tab-delimited file with the suffix .EX appended to the name of the raw data file you loaded into the Expression Dataset Manager.
*Open the the exception file in Excel and filter the data to note what errors have been recorded.
*Using the .gdb Gene Database created by my partners, there were 5,538 errors, each of which was "Gene not found in OrderedLocusNames or any related system."
*Customize the new Expression Dataset by creating Color Sets, which contain the instructions to GenMAPP for displaying data on MAPPs.
*In the "Color Sets" section, type in your own title into the "Name" field.
*To specify what value appears next to each gene on a MAPP, select "Avg_LogFC_t15" in the drop down menu in the "Gene Value" field.
*We are using the t15 time period for this step to represent the results from all four time intervals, because it would be too challenging to complete this protocol with all four time interval values.
*In the "Criteria Builder" section, click on the "New" button. Now, we will construct the criterion to query the data.
*We will set the criterion to query for all the genes that have a significant (i.e. [Pvalue] < 0.05) decrease in the average log fold change (i.e. [Avg_LogFC_t15] < -0.25).
*In the menu under "Columns" in the "Criteria Builder" section, select "Avg_LogFC_t240", which will then appear in the "Criterion" field.
*Then choose the "<", ">", and "=" as appropriate, paired with -0.25.
*Type out the word "AND" in this same field and select "Pvalue_t15" and the "Ops" accordingly. .
*Under "Ops", click on the "<" operator. Then, type 0.05 (this will appear in the "Criterion" field).
*Enter the name for the criterion in the "Label in Legend" field "Decreased", as we are looking for the Avg_LogFC that have decreased.
*Choose a color for the criterion by left-clicking on the box next to "Color". Choose a color from the Color window that appears and click OK. In this experiment, the color red was chosen.
*You may now click the "Add" button.
*Now we will add another criterion. In the "Criteria Builder" section, click on the "New" button. Now, we will construct the criterion to query the data.
*We will set the criterion to query for all the genes that have a significant (i.e. [Pvalue] < 0.05) decrease in the average log fold change (i.e. [Avg_LogFC_t15] > 0.25).
*In the menu under "Columns" in the "Criteria Builder" section, select "Avg_LogFC_t240", which will then appear in the "Criterion" field.
*Then choose the "<", ">", and "=" as appropriate, paired with 0.25.
*Type out the word "AND" in this same field and select "Pvalue_t15" and the "Ops" accordingly. .
*Under "Ops", click on the "<" operator. Then, type 0.05 (this will appear in the "Criterion" field).
*Enter the name for the criterion in the "Label in Legend" field "Increased", as we are looking for the Avg_LogFC that have increased.
*Choose a color for the criterion by left-clicking on the box next to "Color". Choose a color from the Color window that appears and click OK. In this experiment, the color green was chosen.
*You may now click the "Add" button.
*Save the entire Expression Dataset by going to Expression Datasets > Save.
*Exit the Expression Dataset to view the Color Sets on a MAPP.
*[[Media:ColorSets.mapp]]

*Moving onto the MAPPFinder Protocol, we will stay in GenMAPP, but selected Tools > MAPPFinder.
*Click on the button "Calculate New Results" and then choose the "Find File" button on the page to load your GEX file: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gex]]
*Choose the Color Set and Criteria with which to filter the data. Click on "Decreased" criteria in the right-hand box then check the two boxes labeled "Gene Ontology" and "p value".
*Then "Browse" through your computer and create a meaningful filename for the project.
*Now you can hit "Run MAPPFinder".
*It will take a while for this process to finish, but a Gene Ontology browsers with open showing your results when it has been completed.
*To see a list of the most significant Gene Ontology terms, click on the menu item "Show Ranked List".
*The top twenty go terms are listed in the following image: [[File:MAPPFinder.PNG]]
*In Windows, make a copy of the results file and open it in Excel.
*Click on a cell in the row of headers. On the tool bar, select Sort & Filter > Filter. Set the following filters:
**Z Score greater than 2
**Permute P less than 0.05
**Number Changed greater than or equal to 5 and less than 100.
**Percent Changed greater than or equal to 25
**108 results found
*Save the file as a different Excel spreadsheet name by selecting File > Save As and select Excel workbook (.xls) from the drop-down menu.
*Now look at the "Increased" data. We will stay in GenMAPP, but selected Tools > MAPPFinder.
*Click on the button "Calculate New Results" and then choose the "Find File" button on the page to load your GEX file: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gex]]
*Choose the Color Set and Criteria with which to filter the data. Click on "Increased" criteria in the right-hand box then check the two boxes labeled "Gene Ontology" and "p value".
*Then "Browse" through your computer and create a meaningful filename for the project.
*Now you can hit "Run MAPPFinder".
*It will take a while for this process to finish, but a Gene Ontology browsers with open showing your results when it has been completed.
*To see a list of the most significant Gene Ontology terms, click on the menu item "Show Ranked List".
*The top twenty go terms are listed in the following image: [[Media:MAPPFinder Capture1.PNG]]
*In Windows, make a copy of the results file and open it in Excel.
*Click on a cell in the row of headers. On the tool bar, select Sort & Filter > Filter. Set the following filters:
**Z Score greater than 2
**PermuteP less than .05
**Number Changed greater than or equal to 5 and less than 100
**Percent Changed greater than or equal to 25
**6 results found
*Save the file as a different Excel spreadsheet name by selecting File > Save As and select Excel workbook (.xls) from the drop-down menu.

Team Name

2013-12-13T02:40:03Z

Laurmagee: /* Micro Array paper */

Species: ''Sinorhizobium Meliloti'' (Strain 1021)
==Group Project==
[[media:Sinorhizobium_Meliloti_group_project.pdf|Group Project]]

==Personnel==
===Stephen Louie===
Project Manager, Quality Assurance
:slouie4 at lion.lmu.edu
:1 LMU Drive MSB 5194
:Los Angeles, CA 90045
:[[Stephen Louie Project Notebook|Stephen Louie Project Notebook]]

===Lauren Magee===
GenMAPP Expert
:lmagee1 at lion.lmu.edu
:1 LMU Drive MSB-5258,
:Los Angeles, CA 90045
===Mitchell Petredis===
Coding Supervisor
:mrpetredis at gmail dot com
:mpetredi at lion dot lmu dot edu
:Loyola Marymount University
:1 LMU Drive MSB-5957
:Los Angeles, CA 90045-2659
===Miles Malefyt===
GenMAPP Coordinator
:milesm@malefyt.com
:mmalefyt@lion.lmu.edu
:8416 Campion drive
:Westchester, CA 90045
cell: 831-236-5402

[[User:Mmalefyt|Mmalefyt]] ([[User talk:Mmalefyt|talk]]) 10:31, 31 October 2013 (PDT)

==Micro Array paper==

[[http://www.ebi.ac.uk/arrayexpress/experiments/E-MEXP-785/?keywords=&organism=Sinorhizobium%20meliloti&array=&exptype Osmotic upshift elicited by salt and sucrose]]
*[[http://jb.asm.org/content/188/21/7617 HTML version]]
*[[http://jb.asm.org/content/188/21/7617.full.pdf+html PDF version]]
*[[Media:A-MEXP-230.adf.txt]]
*[[Media:E-MEXP-785.eSet.r]]-unable to download, because the file is too big
*[[Media:E-MEXP-785.processed.1.zip]]
*[[Media:E-MEXP-785.raw.2.zip]]
*[[Media:E-MEXP-785.idf.txt]]
*[[Media:E-MEXP-785.raw.1.zip]]-unable to download, because the file is too big
*[[Media:E-MEXP-785.sdrf.txt]]
*Raw Data File for 700S (1-3): [[File:Full Raw Data.xls]]
*[[Media:Compiled Ratios and Logs.xls]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_DataSheet.txt]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_DataSheet.xls]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.txt]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.EX.txt]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gex]]
*[[Media:ColorSets.mapp]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gmf]]
*[[Media:700S1-3-t15-Decreased-Criterion0-GO.txt]]
*[[Media:700S1-3-t15-Increased-Criterion0-GO.txt]]
Domínguez-Ferreras, A., Pérez-Arnedo, R., Becker, A., Olivares, J., Soto, M.J., Sanjuán, J. (2006) Transcriptome Profiling Reveals the Importance of Plasmid pSymB for Osmoadaptation of Sinorhizobium meliloti ''Journal of Bacteriology'' 188:7617-7625

==Genome Paper==

[http://search.proquest.com/docview/213572450?accountid=7418 The composite genome of the legume symbiont Sinorhizobium meliloti. Science 293, 668–672.]

Galibert, F., Finan, T.M., Long, S., Puhler, A., et al. (2001) The composite genome of the legume symbiont Sinorhizobium meliloti ''Science'' 293:668-672

[[Category: Sinorhizobium meliloti]]
[[Category: Group Projects]]

==Model Organism Database==
Link: http://cmr.jcvi.org/tigr-scripts/CMR/GenomePage.cgi?org=ntsm01

==Compiled raw Data==
[[Media:Team_Name_NaCl_compiled_raw_Data.xls|300 NaCl compiled data set]]
==Compiled processed data==
[[Media:Complete_processed_Data.xls|Processed Data]]

==Data ready for GenMAPP==
[[Media:Complete_processed_Data_MPM.xls|XLS Version]]

[[Media:Complete_processed_Data_MPM.txt|TXT version, USE THIS]]

==Important Files==

Taxon ID: 266834

Version of GenMAPP Builder: [[Media: gmbuilder-2.0b71.zip | gmbuilder-2.0b71.zip]]

Computer on which export was run: Keck Lab Computer, back computer (furthest from the whiteboard)

Postgres Database name: Sinorhizobium meliloti

UniProt XML filename:[[Media:Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml|Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_10
* Time taken to import:19.17 minutes

GO OBO-XML filename:[[Media:Go daily-termdb.obo-xml mpetredi 2013116.gz| Go daily-termdb.obo-xml mpetredi 2013116.gz]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): from 11/4/2013
* Time taken to import: 17.81 minutes
* Time taken to process: 15.54 minutes

GOA filename:[[Media:R meliloti.goa|R meliloti.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): from 11/7/2013
* Time taken to import: 0.21 minutes
*Note about GOA file
**From Running GenMAPP Builder page, clicked on the UniProt-GOA Downloads link.
**Was given an error message. Changed url from "ftp" to "http" at beginning.
**Was entered, was taken to Index of/pub/database/GO/goa
**Clicked on "proteomes" folder
**Directed to Index of /pub/databases/GO/goa/proteomes. Downloaded 58.R_meliloti.goa
***Note: R. meliloti is an alternative name to S. Melitoti.

Name of .gdb file: [[Media:Sinorhizobium_meliloti_1021_GenMAPP_database_mpetredi_2013117.gdb | Sinorhizobium_meliloti_1021_GenMAPP_database_mpetredi_2013117.gdb]]
* Time taken to export .gdb: Started at 2pm on 11-7-2013, finished at 9:47pm
* Upload your file and link to it here.

Note:

replace geneID with ~ when you find MOD link

==Important Files 2==

Info based on export done on 11/21/2013

Taxon ID: 266834

Version of GenMAPP Builder: [[Media:GenMAPP_Builder_2.0b72 S. meliloti.zip|GenMAPP_Builder_2.0b72 S. meliloti.zip]]

Computer on which export was run: Keck Lab Computer, front computer (closest to the whiteboard)

Postgres Database name: Sinorhizobium meliloti

UniProt XML filename:[[Media:Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml|Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_10
* Time taken to import: 7.34 minutes

GO OBO-XML filename:[[Media:Go daily-termdb.obo-xml mpetredi 2013116.gz| Go daily-termdb.obo-xml mpetredi 2013116.gz]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): from 11/4/2013
* Time taken to import: 6.31 minutes
* Time taken to process: 4.60 minutes

GOA filename:[[Media:R meliloti.goa|R meliloti.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): from 11/7/2013
* Time taken to import: 0.07 minutes
*Note about GOA file
**From Running GenMAPP Builder page, clicked on the UniProt-GOA Downloads link.
**Was given an error message. Changed url from "ftp" to "http" at beginning.
**Was entered, was taken to Index of/pub/database/GO/goa
**Clicked on "proteomes" folder
**Directed to Index of /pub/databases/GO/goa/proteomes. Downloaded 58.R_meliloti.goa
***Note: R. meliloti is an alternative name to S. Melitoti.

Name of .gdb file: [[Media:Sinorhizobium_meliloti_1021_GenMAPP_database_mpetredi_20131121.gdb|Sinorhizobium_meliloti_1021_GenMAPP_database_mpetredi_20131121.gdb]]
* Time taken to export .gdb: Started at 10:18 AM on 11/21/2013; ended at 1:34 PM

==Important Files 3==

'''From Week 14, December 3-5, 2013'''

NOTE: Forgot that I cannot import data into the same database. I will continue to do an export based on what I have, and will do another import/export cycle on another computer using the same version of gmbuilder that I used here.

Taxon ID: 266834

Version of GenMAPP Builder: [[Media: SmelilotiGenMAPP_Builder_2.0b73.zip]]

Computer on which export was run: Keck Lab Computer, front computer (closest to the whiteboard)

Postgres Database name: Sinorhizobium meliloti

UniProt XML filename:[[Media:Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml|Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_10
* Time taken to import: 6.27 minutes

GO OBO-XML filename:[[Media:Go daily-termdb.obo-xml mpetredi 2013116.gz| Go daily-termdb.obo-xml mpetredi 2013116.gz]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): from 11/4/2013
* Time taken to import: 6.07 minutes
* Time taken to process: 12.42 minutes
**NOTE: gmbuilder told me that the GO OBO-XML file was already processed in the database, and wanted to know if I wanted to process the information again. I chose yes.

GOA filename:[[Media:R meliloti.goa|R meliloti.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): from 11/7/2013
* Time taken to import: 0.07 minutes
*Note about GOA file
**From Running GenMAPP Builder page, clicked on the UniProt-GOA Downloads link.
**Was given an error message. Changed url from "ftp" to "http" at beginning.
**Was entered, was taken to Index of/pub/database/GO/goa
**Clicked on "proteomes" folder
**Directed to Index of /pub/databases/GO/goa/proteomes. Downloaded 58.R_meliloti.goa
***Note: R. meliloti is an alternative name to S. Melitoti.

Name of .gdb file:
* Time taken to export .gdb: started around 10am
* Upload your file and link to it here.

==Important Files 4==

'''From Week 14, December 3-5, 2013'''

NOTE: Here is the other import/export, using a new database on a different Keck Lab computer

Taxon ID: 266834

Version of GenMAPP Builder: [[Media: SmelilotiGenMAPP_Builder_2.0b73.zip]]

Computer on which export was run: Keck Lab Computer, front computer (closest to the whiteboard)

Postgres Database name: S meliloti

UniProt XML filename:[[Media:Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml|Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_10
* Time taken to import: 6.25 minutes

GO OBO-XML filename:[[Media:Go daily-termdb.obo-xml mpetredi 2013116.gz| Go daily-termdb.obo-xml mpetredi 2013116.gz]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): from 11/4/2013
* Time taken to import: 6.21 minutes
* Time taken to process: 4.50 minutes

GOA filename:[[Media:R meliloti.goa|R meliloti.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): from 11/7/2013
* Time taken to import: 0.07 minutes
*Note about GOA file
**From Running GenMAPP Builder page, clicked on the UniProt-GOA Downloads link.
**Was given an error message. Changed url from "ftp" to "http" at beginning.
**Was entered, was taken to Index of/pub/database/GO/goa
**Clicked on "proteomes" folder
**Directed to Index of /pub/databases/GO/goa/proteomes. Downloaded 58.R_meliloti.goa
***Note: R. meliloti is an alternative name to S. Melitoti.

Name of .gdb file: [[Media:Sinorhizobium_meliloti_1021_mpetredi_2013123-2.gdb]]
* Time taken to export .gdb: Started at 10:28am. Finished at 1:36pm

* Upload your file and link to it here.

GEX File
*[[Media:2013125-Complete_processed_Data_MPM.gex]]

==Wiki Navigation==
[[Template:Team Name]]
{{Team Name}}
*[[Teamname Week 13 Status Report]]
*[[Laurmagee: Week 13]]
*[[Teamname Week 15 Status Report]]
*[[Laurmagee: Week 15]]
*[[Electronic notebook: sinorhizobium meliloti|Miles Malefyt electronic notebook]]

Team Name

2013-12-13T02:39:33Z

Laurmagee: /* Micro Array paper */

Species: ''Sinorhizobium Meliloti'' (Strain 1021)
==Group Project==
[[media:Sinorhizobium_Meliloti_group_project.pdf|Group Project]]

==Personnel==
===Stephen Louie===
Project Manager, Quality Assurance
:slouie4 at lion.lmu.edu
:1 LMU Drive MSB 5194
:Los Angeles, CA 90045
:[[Stephen Louie Project Notebook|Stephen Louie Project Notebook]]

===Lauren Magee===
GenMAPP Expert
:lmagee1 at lion.lmu.edu
:1 LMU Drive MSB-5258,
:Los Angeles, CA 90045
===Mitchell Petredis===
Coding Supervisor
:mrpetredis at gmail dot com
:mpetredi at lion dot lmu dot edu
:Loyola Marymount University
:1 LMU Drive MSB-5957
:Los Angeles, CA 90045-2659
===Miles Malefyt===
GenMAPP Coordinator
:milesm@malefyt.com
:mmalefyt@lion.lmu.edu
:8416 Campion drive
:Westchester, CA 90045
cell: 831-236-5402

[[User:Mmalefyt|Mmalefyt]] ([[User talk:Mmalefyt|talk]]) 10:31, 31 October 2013 (PDT)

==Micro Array paper==

[[http://www.ebi.ac.uk/arrayexpress/experiments/E-MEXP-785/?keywords=&organism=Sinorhizobium%20meliloti&array=&exptype Osmotic upshift elicited by salt and sucrose]]
*[[http://jb.asm.org/content/188/21/7617 HTML version]]
*[[http://jb.asm.org/content/188/21/7617.full.pdf+html PDF version]]
*[[Media:A-MEXP-230.adf.txt]]
*[[Media:E-MEXP-785.eSet.r]]- unable to download, because the file is too big
*[[Media:E-MEXP-785.processed.1.zip]]
*[[Media:E-MEXP-785.raw.2.zip]]- unable to download, because the file is too big
*[[Media:E-MEXP-785.idf.txt]]
*[[Media:E-MEXP-785.raw.1.zip]]
*[[Media:E-MEXP-785.sdrf.txt]]
*Raw Data File for 700S (1-3): [[File:Full Raw Data.xls]]
*[[Media:Compiled Ratios and Logs.xls]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_DataSheet.txt]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_DataSheet.xls]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.txt]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.EX.txt]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gex]]
*[[Media:ColorSets.mapp]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gmf]]
*[[Media:700S1-3-t15-Decreased-Criterion0-GO.txt]]
*[[Media:700S1-3-t15-Increased-Criterion0-GO.txt]]
Domínguez-Ferreras, A., Pérez-Arnedo, R., Becker, A., Olivares, J., Soto, M.J., Sanjuán, J. (2006) Transcriptome Profiling Reveals the Importance of Plasmid pSymB for Osmoadaptation of Sinorhizobium meliloti ''Journal of Bacteriology'' 188:7617-7625

==Genome Paper==

[http://search.proquest.com/docview/213572450?accountid=7418 The composite genome of the legume symbiont Sinorhizobium meliloti. Science 293, 668–672.]

Galibert, F., Finan, T.M., Long, S., Puhler, A., et al. (2001) The composite genome of the legume symbiont Sinorhizobium meliloti ''Science'' 293:668-672

[[Category: Sinorhizobium meliloti]]
[[Category: Group Projects]]

==Model Organism Database==
Link: http://cmr.jcvi.org/tigr-scripts/CMR/GenomePage.cgi?org=ntsm01

==Compiled raw Data==
[[Media:Team_Name_NaCl_compiled_raw_Data.xls|300 NaCl compiled data set]]
==Compiled processed data==
[[Media:Complete_processed_Data.xls|Processed Data]]

==Data ready for GenMAPP==
[[Media:Complete_processed_Data_MPM.xls|XLS Version]]

[[Media:Complete_processed_Data_MPM.txt|TXT version, USE THIS]]

==Important Files==

Taxon ID: 266834

Version of GenMAPP Builder: [[Media: gmbuilder-2.0b71.zip | gmbuilder-2.0b71.zip]]

Computer on which export was run: Keck Lab Computer, back computer (furthest from the whiteboard)

Postgres Database name: Sinorhizobium meliloti

UniProt XML filename:[[Media:Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml|Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_10
* Time taken to import:19.17 minutes

GO OBO-XML filename:[[Media:Go daily-termdb.obo-xml mpetredi 2013116.gz| Go daily-termdb.obo-xml mpetredi 2013116.gz]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): from 11/4/2013
* Time taken to import: 17.81 minutes
* Time taken to process: 15.54 minutes

GOA filename:[[Media:R meliloti.goa|R meliloti.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): from 11/7/2013
* Time taken to import: 0.21 minutes
*Note about GOA file
**From Running GenMAPP Builder page, clicked on the UniProt-GOA Downloads link.
**Was given an error message. Changed url from "ftp" to "http" at beginning.
**Was entered, was taken to Index of/pub/database/GO/goa
**Clicked on "proteomes" folder
**Directed to Index of /pub/databases/GO/goa/proteomes. Downloaded 58.R_meliloti.goa
***Note: R. meliloti is an alternative name to S. Melitoti.

Name of .gdb file: [[Media:Sinorhizobium_meliloti_1021_GenMAPP_database_mpetredi_2013117.gdb | Sinorhizobium_meliloti_1021_GenMAPP_database_mpetredi_2013117.gdb]]
* Time taken to export .gdb: Started at 2pm on 11-7-2013, finished at 9:47pm
* Upload your file and link to it here.

Note:

replace geneID with ~ when you find MOD link

==Important Files 2==

Info based on export done on 11/21/2013

Taxon ID: 266834

Version of GenMAPP Builder: [[Media:GenMAPP_Builder_2.0b72 S. meliloti.zip|GenMAPP_Builder_2.0b72 S. meliloti.zip]]

Computer on which export was run: Keck Lab Computer, front computer (closest to the whiteboard)

Postgres Database name: Sinorhizobium meliloti

UniProt XML filename:[[Media:Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml|Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_10
* Time taken to import: 7.34 minutes

GO OBO-XML filename:[[Media:Go daily-termdb.obo-xml mpetredi 2013116.gz| Go daily-termdb.obo-xml mpetredi 2013116.gz]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): from 11/4/2013
* Time taken to import: 6.31 minutes
* Time taken to process: 4.60 minutes

GOA filename:[[Media:R meliloti.goa|R meliloti.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): from 11/7/2013
* Time taken to import: 0.07 minutes
*Note about GOA file
**From Running GenMAPP Builder page, clicked on the UniProt-GOA Downloads link.
**Was given an error message. Changed url from "ftp" to "http" at beginning.
**Was entered, was taken to Index of/pub/database/GO/goa
**Clicked on "proteomes" folder
**Directed to Index of /pub/databases/GO/goa/proteomes. Downloaded 58.R_meliloti.goa
***Note: R. meliloti is an alternative name to S. Melitoti.

Name of .gdb file: [[Media:Sinorhizobium_meliloti_1021_GenMAPP_database_mpetredi_20131121.gdb|Sinorhizobium_meliloti_1021_GenMAPP_database_mpetredi_20131121.gdb]]
* Time taken to export .gdb: Started at 10:18 AM on 11/21/2013; ended at 1:34 PM

==Important Files 3==

'''From Week 14, December 3-5, 2013'''

NOTE: Forgot that I cannot import data into the same database. I will continue to do an export based on what I have, and will do another import/export cycle on another computer using the same version of gmbuilder that I used here.

Taxon ID: 266834

Version of GenMAPP Builder: [[Media: SmelilotiGenMAPP_Builder_2.0b73.zip]]

Computer on which export was run: Keck Lab Computer, front computer (closest to the whiteboard)

Postgres Database name: Sinorhizobium meliloti

UniProt XML filename:[[Media:Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml|Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_10
* Time taken to import: 6.27 minutes

GO OBO-XML filename:[[Media:Go daily-termdb.obo-xml mpetredi 2013116.gz| Go daily-termdb.obo-xml mpetredi 2013116.gz]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): from 11/4/2013
* Time taken to import: 6.07 minutes
* Time taken to process: 12.42 minutes
**NOTE: gmbuilder told me that the GO OBO-XML file was already processed in the database, and wanted to know if I wanted to process the information again. I chose yes.

GOA filename:[[Media:R meliloti.goa|R meliloti.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): from 11/7/2013
* Time taken to import: 0.07 minutes
*Note about GOA file
**From Running GenMAPP Builder page, clicked on the UniProt-GOA Downloads link.
**Was given an error message. Changed url from "ftp" to "http" at beginning.
**Was entered, was taken to Index of/pub/database/GO/goa
**Clicked on "proteomes" folder
**Directed to Index of /pub/databases/GO/goa/proteomes. Downloaded 58.R_meliloti.goa
***Note: R. meliloti is an alternative name to S. Melitoti.

Name of .gdb file:
* Time taken to export .gdb: started around 10am
* Upload your file and link to it here.

==Important Files 4==

'''From Week 14, December 3-5, 2013'''

NOTE: Here is the other import/export, using a new database on a different Keck Lab computer

Taxon ID: 266834

Version of GenMAPP Builder: [[Media: SmelilotiGenMAPP_Builder_2.0b73.zip]]

Computer on which export was run: Keck Lab Computer, front computer (closest to the whiteboard)

Postgres Database name: S meliloti

UniProt XML filename:[[Media:Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml|Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_10
* Time taken to import: 6.25 minutes

GO OBO-XML filename:[[Media:Go daily-termdb.obo-xml mpetredi 2013116.gz| Go daily-termdb.obo-xml mpetredi 2013116.gz]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): from 11/4/2013
* Time taken to import: 6.21 minutes
* Time taken to process: 4.50 minutes

GOA filename:[[Media:R meliloti.goa|R meliloti.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): from 11/7/2013
* Time taken to import: 0.07 minutes
*Note about GOA file
**From Running GenMAPP Builder page, clicked on the UniProt-GOA Downloads link.
**Was given an error message. Changed url from "ftp" to "http" at beginning.
**Was entered, was taken to Index of/pub/database/GO/goa
**Clicked on "proteomes" folder
**Directed to Index of /pub/databases/GO/goa/proteomes. Downloaded 58.R_meliloti.goa
***Note: R. meliloti is an alternative name to S. Melitoti.

Name of .gdb file: [[Media:Sinorhizobium_meliloti_1021_mpetredi_2013123-2.gdb]]
* Time taken to export .gdb: Started at 10:28am. Finished at 1:36pm

* Upload your file and link to it here.

GEX File
*[[Media:2013125-Complete_processed_Data_MPM.gex]]

==Wiki Navigation==
[[Template:Team Name]]
{{Team Name}}
*[[Teamname Week 13 Status Report]]
*[[Laurmagee: Week 13]]
*[[Teamname Week 15 Status Report]]
*[[Laurmagee: Week 15]]
*[[Electronic notebook: sinorhizobium meliloti|Miles Malefyt electronic notebook]]

File:MAPPFinder Capture1.PNG

2013-12-13T01:52:36Z

Laurmagee:

Laurmagee: Week 15

2013-12-13T01:52:20Z

Laurmagee: /* GenMAPP and MAPPFinder Protocols */

==Statistical Analysis==
*Open the following spreadsheet [[File:Compiled Ratios and Logs.xls]] from [[Laurmagee: Week 13]].
*Begin a new workbook and copy over the Gene ID column into cell A1. Then in the subsequent columns, copy over the log values found in your previous sheet. Order all the different time periods in increasing intervals and sort the replicants at each time in increasing order as well. After this pasting has been done, your column titles across the top of the worksheet will say "Gene ID", "log_700S1-t15", "log_700S2-t15", "log_700S3-t15"... and so on for t30, t60, and t240.
*Begin scaling and centering the data by first inserting a new worksheet in Excel labeled "scaled_centered".
*Select and copy all of the data from your original worksheet. Then paste it into cell A1 in new worksheet.
*Insert two rows in between the top row of headers and the first data row. In cell A2, type "Average" and in cell A3, type "StdDev".
*You will now compute the Average log ratio for replicant and time period.
*In cell B2, type the following equation:
=AVERAGE(B4:B5224)
and press "Enter".
*Excel is computing the average value of the cells specified in the range given inside the parentheses. Instead of typing the cell designations, you can click on the beginning cell, scroll down to the bottom of the worksheet, and shift-click on the ending cell.
*You will now compute the Standard Deviation of the log ratios on each chip (each column of data). In cell B3, type the following equation:
=STDEV(B4:B5224)
and press "Enter".
*Excel will now do some work for you. Copy these two equations (cells B2 and B3) and paste them into the empty cells in the rest of the columns. Excel will automatically change the equation to match the cell designations for those columns.
You have now computed the average and standard deviation of the log ratios for replicant and time period.
*Copy the column headings for all of your data columns and then paste them to the right of the last data column so that you have a second set of headers above blank colums of cells. Edit the names of the columns so that they now read: log_700S1-t15_scaled_centered, log_700S2-t15_scaled_centered, etc.
*In cell N4, type the following equation:
=(B4-B$2)/B$3
*In this case, we want the data in cell B4 to have the average subtracted from it (cell B2) and be divided by the standard deviation (cell B3). We use the dollar sign symbols in front of the "2" and "3" to tell Excel to always reference that row in the equation, even though we will paste it for the entire column.
*Copy and paste this equation into the entire column.
*Copy and paste the scaling and centering equation for each of the columns of data with the "_scaled_centered" column header. Be sure that your equation is correct for the column you are calculating.

*Insert a new worksheet and name it "statistics".
*Go back to the "scaling_centering" worksheet and copy the first column ("ID").
*Paste the data into the first column of your new "statistics" worksheet.
*Go back to the "scaling_centering" worksheet and copy the columns that are designated "_scaled_centered".
*Go to your new worksheet and click on the B1 cell. Select "Paste Special" from the Edit menu. A window will open: click on the radio button for "Values" and click OK. This will paste the numerical result into your new worksheet instead of the equation which must make calculations on the fly.
*Go to a new column on the right of your worksheet. Type the header "Avg_LogFC_t15", "Avg_LogFC_t30", "Avg_LogFC_60", and "Avg_LogFC_240" into the top cell of the next four columns.
*Compute the average log fold change for the replicates for each patient by typing the equation:
=AVERAGE(B2:D2)
into cell N2. Copy this equation and paste it into the rest of the column.
*Create the equation for times t30, t60, and t240 and paste it into their respective columns.
*Label the next four columns "Tstat_t15", "Tstat_t30", "Tstat_t60", and "Tstat_t240". This will compute a T statistic that tells us whether the scaled and centered average log ratio is significantly different than 0 (no change). Enter the equation:
=N2/(STDEV(B2:D2)/SQRT(3))
*(NOTE: in this case the number of replicates is 3. Be careful that you are using the correct number of parentheses.) Copy the equation and paste it into all rows in that column as well as the next three column making sure to change the cells involved in the equation accordingly.
*Label the top cell in the next four columns "Pvalue_t15", "Pvalue_t30", "Pvalue_t60", and "Pvalue_t240". In the cell below the label, enter the equation:
=TDIST(ABS(R2),2, 2)
*The number of degrees of freedom is the number of replicates minus one, so in our case there are 2 degrees of freedom. Copy the equation and paste it into all rows in that column and the next three columns making sure to change the cell involved to the appropriate Tstat value.
*Insert a new worksheet and name it "forGenMAPP".
*Go back to the "statistics" worksheet and Select All and Copy.
*Go to your new sheet and click on cell A1 and select Paste Special, click on the Values radio button, and click OK. We will now format this worksheet for import into GenMAPP.
*Select Columns B through Q (all the fold changes). Select the menu item Format > Cells. Under the number tab, select 2 decimal places. Click OK.
*Select Columns R and Z. Select the menu item Format > Cells. Under the number tab, select 4 decimal places. Click OK.
*Select Columns N through Z and Cut. Select Column B by left-clicking on the "B" at the top of the column. Then right-click on the Column B header and select "Insert Cut Cells". This will insert the data without writing over your existing columns.
*Delete Rows 2 and 3 where it says "Average" and "StDev" so that your data rows with gene IDs are immediately below the header row 1.
*Insert a column to the right of the "ID" column. Type the header "SystemCode" into the top cell of this column. Fill the entire column (each cell) with the letter "N".
*Select the menu item File > Save As, and choose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu. Excel will make you click through a couple of warnings because it doesn't like you going all independent and choosing a different file type than the native .xls. This is OK. Your new *.txt file is now ready for import into GenMAPP.
*[[File:SinorhizobiumMeliloti LM GenMapp DataSheet.xls]]

==Sanity Check==
#How many genes have p value < 0.05 for the time set of 15 minutes?
#*3613 genes
#How many genes have p value < 0.05 for the time set of 30 minutes?
#*5225 genes
#How many genes have p value < 0.05 for the time set of 60 minutes?
#*5207 genes
#How many genes have p value < 0.05 for the time set of 240 minutes?
#*6790 genes
#How many genes have p value < 0.01 for the time set of 15 minutes?
#*907 genes
#How many genes have p value < 0.01 for the time set of 30 minutes?
#*1518 genes
#How many genes have p value < 0.01 for the time set of 60 minutes?
#*1553 genes
#How many genes have p value < 0.01 for the time set of 240 minutes?
#*2437 genes
#How many genes have p value < 0.001 for the time set of 15 minutes?
#*92 genes
#How many genes have p value < 0.001 for the time set of 30 minutes?
#*179 genes
#How many genes have p value < 0.001 for the time set of 60 minutes?
#*172 genes
#How many genes have p value < 0.001 for the time set of 240 minutes?
#*347 genes
#How many genes have p value < 0.0001 for the time set of 15 minutes?
#*7 genes
#How many genes have p value < 0.0001 for the time set of 30 minutes?
#*15 genes
#How many genes have p value < 0.0001 for the time set of 60 minutes?
#*13 genes
#How many genes have p value < 0.0001 for the time set of 240 minutes?
#*36 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 15 minutes?
#*1521 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 30 minutes?
#*1926 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 60 minutes?
#*2194 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 240 minutes?
#*2846 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 15 minutes?
#*2092 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 30 minutes?
#*3299 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 60 minutes?
#*3013 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 240 minutes?
#*3944 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 15 minutes?
#*1476 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 30 minutes?
#*1890 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 60 minutes?
#*2129 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 240 minutes?
#*2763 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 15 minutes?
#*2052 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 30 minutes?
#*3256 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 60 minutes?
#*2942 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 240 minutes?
#*3866 genes

==GenMAPP and MAPPFinder Protocols==
*To begin GenMAPP analysis, first launch GenMAPP 2 or download it off of the following website: http://genmapp.org.
*Look at the lower-left hand corner to see what gene database is loaded. For this assignment, the gene database is [[Media:Sinorhizobium_meliloti_1021_mpetredi_2013123-2.gdb]] should appear in the corner.
*If another database appears or if there is "No Gene Database", go to Data > Choose Gene Database and find the database you need to use.
*Once the correct database is loaded, go to Data > Expression Dataset Manager. This will allow you to input the data file created in the "Statistical Analysis" portion of this page.
*In the window that pops up, go to Expression Datasets > New Dataset and open the tab-delimited file you created for GenMAPP: [[Media:SinorhizobiumMeliloti_LM_GenMapp_DataSheet.txt]]
*In the "Data Type Specification" window that pops up, only check the box next to a column header if that column has character data. All of the boxes should remain unchecked, because none of the columns in our dataset contain non-numerical values.
*Give the Expression Dataset Manager time to convert your data into a GEX file.
*An error message may appear that states that the Expression Dataset Manager was unable to convert some of the lines of the data. These lines of data are not incorporated into the Expression Dataset but rather recorded in an exception file that contains all of your raw data and an additional column called ~Error~.
*The exception file is a tab-delimited file with the suffix .EX appended to the name of the raw data file you loaded into the Expression Dataset Manager.
*Open the the exception file in Excel and filter the data to note what errors have been recorded.
*Using the .gdb Gene Database created by my partners, there were 5,538 errors, each of which was "Gene not found in OrderedLocusNames or any related system."
*Customize the new Expression Dataset by creating Color Sets, which contain the instructions to GenMAPP for displaying data on MAPPs.
*In the "Color Sets" section, type in your own title into the "Name" field.
*To specify what value appears next to each gene on a MAPP, select "Avg_LogFC_t15" in the drop down menu in the "Gene Value" field.
*We are using the t15 time period for this step to represent the results from all four time intervals, because it would be too challenging to complete this protocol with all four time interval values.
*In the "Criteria Builder" section, click on the "New" button. Now, we will construct the criterion to query the data.
*We will set the criterion to query for all the genes that have a significant (i.e. [Pvalue] < 0.05) decrease in the average log fold change (i.e. [Avg_LogFC_t15] < -0.25).
*In the menu under "Columns" in the "Criteria Builder" section, select "Avg_LogFC_t240", which will then appear in the "Criterion" field.
*Then choose the "<", ">", and "=" as appropriate, paired with -0.25.
*Type out the word "AND" in this same field and select "Pvalue_t15" and the "Ops" accordingly. .
*Under "Ops", click on the "<" operator. Then, type 0.05 (this will appear in the "Criterion" field).
*Enter the name for the criterion in the "Label in Legend" field "Decreased", as we are looking for the Avg_LogFC that have decreased.
*Choose a color for the criterion by left-clicking on the box next to "Color". Choose a color from the Color window that appears and click OK. In this experiment, the color red was chosen.
*You may now click the "Add" button.
*Now we will add another criterion. In the "Criteria Builder" section, click on the "New" button. Now, we will construct the criterion to query the data.
*We will set the criterion to query for all the genes that have a significant (i.e. [Pvalue] < 0.05) decrease in the average log fold change (i.e. [Avg_LogFC_t15] > 0.25).
*In the menu under "Columns" in the "Criteria Builder" section, select "Avg_LogFC_t240", which will then appear in the "Criterion" field.
*Then choose the "<", ">", and "=" as appropriate, paired with 0.25.
*Type out the word "AND" in this same field and select "Pvalue_t15" and the "Ops" accordingly. .
*Under "Ops", click on the "<" operator. Then, type 0.05 (this will appear in the "Criterion" field).
*Enter the name for the criterion in the "Label in Legend" field "Increased", as we are looking for the Avg_LogFC that have increased.
*Choose a color for the criterion by left-clicking on the box next to "Color". Choose a color from the Color window that appears and click OK. In this experiment, the color green was chosen.
*You may now click the "Add" button.
*Save the entire Expression Dataset by going to Expression Datasets > Save.
*Exit the Expression Dataset to view the Color Sets on a MAPP.
*[[Media:ColorSets.mapp]]

*Moving onto the MAPPFinder Protocol, we will stay in GenMAPP, but selected Tools > MAPPFinder.
*Click on the button "Calculate New Results" and then choose the "Find File" button on the page to load your GEX file: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gex]]
*Choose the Color Set and Criteria with which to filter the data. Click on "Decreased" criteria in the right-hand box then check the two boxes labeled "Gene Ontology" and "p value".
*Then "Browse" through your computer and create a meaningful filename for the project.
*Now you can hit "Run MAPPFinder".
*It will take a while for this process to finish, but a Gene Ontology browsers with open showing your results when it has been completed.
*To see a list of the most significant Gene Ontology terms, click on the menu item "Show Ranked List".
*The top twenty go terms are listed in the following image: [[File:MAPPFinder.PNG]]
*In Windows, make a copy of the results file and open it in Excel.
*Click on a cell in the row of headers. On the tool bar, select Sort & Filter > Filter. Set the following filters:
**Z Score greater than 2
**Permute P less than 0.05
**Number Changed greater than or equal to 5 and less than 100.
**Percent Changed greater than or equal to 25
*Save the file as a different Excel spreadsheet name by selecting File > Save As and select Excel workbook (.xls) from the drop-down menu.
*Now look at the "Increased" data. We will stay in GenMAPP, but selected Tools > MAPPFinder.
*Click on the button "Calculate New Results" and then choose the "Find File" button on the page to load your GEX file: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gex]]
*Choose the Color Set and Criteria with which to filter the data. Click on "Increased" criteria in the right-hand box then check the two boxes labeled "Gene Ontology" and "p value".
*Then "Browse" through your computer and create a meaningful filename for the project.
*Now you can hit "Run MAPPFinder".
*It will take a while for this process to finish, but a Gene Ontology browsers with open showing your results when it has been completed.
*To see a list of the most significant Gene Ontology terms, click on the menu item "Show Ranked List".
*The top twenty go terms are listed in the following image: [[Media:MAPPFinder Capture1.PNG]]
*In Windows, make a copy of the results file and open it in Excel.
*Click on a cell in the row of headers. On the tool bar, select Sort & Filter > Filter. Set the following filters:
**Z Score greater than 2
**PermuteP less than .05
**Number Changed greater than or equal to 5 and less than 100
**Percent Changed greater thaan or equal to 25
*Save the file as a different Excel spreadsheet name by selecting File > Save As and select Excel workbook (.xls) from the drop-down menu.

File:MAPPFinder.PNG

2013-12-13T01:51:31Z

Laurmagee:

Laurmagee: Week 15

2013-12-13T01:51:05Z

Laurmagee: /* GenMAPP and MAPPFinder Protocols */

==Statistical Analysis==
*Open the following spreadsheet [[File:Compiled Ratios and Logs.xls]] from [[Laurmagee: Week 13]].
*Begin a new workbook and copy over the Gene ID column into cell A1. Then in the subsequent columns, copy over the log values found in your previous sheet. Order all the different time periods in increasing intervals and sort the replicants at each time in increasing order as well. After this pasting has been done, your column titles across the top of the worksheet will say "Gene ID", "log_700S1-t15", "log_700S2-t15", "log_700S3-t15"... and so on for t30, t60, and t240.
*Begin scaling and centering the data by first inserting a new worksheet in Excel labeled "scaled_centered".
*Select and copy all of the data from your original worksheet. Then paste it into cell A1 in new worksheet.
*Insert two rows in between the top row of headers and the first data row. In cell A2, type "Average" and in cell A3, type "StdDev".
*You will now compute the Average log ratio for replicant and time period.
*In cell B2, type the following equation:
=AVERAGE(B4:B5224)
and press "Enter".
*Excel is computing the average value of the cells specified in the range given inside the parentheses. Instead of typing the cell designations, you can click on the beginning cell, scroll down to the bottom of the worksheet, and shift-click on the ending cell.
*You will now compute the Standard Deviation of the log ratios on each chip (each column of data). In cell B3, type the following equation:
=STDEV(B4:B5224)
and press "Enter".
*Excel will now do some work for you. Copy these two equations (cells B2 and B3) and paste them into the empty cells in the rest of the columns. Excel will automatically change the equation to match the cell designations for those columns.
You have now computed the average and standard deviation of the log ratios for replicant and time period.
*Copy the column headings for all of your data columns and then paste them to the right of the last data column so that you have a second set of headers above blank colums of cells. Edit the names of the columns so that they now read: log_700S1-t15_scaled_centered, log_700S2-t15_scaled_centered, etc.
*In cell N4, type the following equation:
=(B4-B$2)/B$3
*In this case, we want the data in cell B4 to have the average subtracted from it (cell B2) and be divided by the standard deviation (cell B3). We use the dollar sign symbols in front of the "2" and "3" to tell Excel to always reference that row in the equation, even though we will paste it for the entire column.
*Copy and paste this equation into the entire column.
*Copy and paste the scaling and centering equation for each of the columns of data with the "_scaled_centered" column header. Be sure that your equation is correct for the column you are calculating.

*Insert a new worksheet and name it "statistics".
*Go back to the "scaling_centering" worksheet and copy the first column ("ID").
*Paste the data into the first column of your new "statistics" worksheet.
*Go back to the "scaling_centering" worksheet and copy the columns that are designated "_scaled_centered".
*Go to your new worksheet and click on the B1 cell. Select "Paste Special" from the Edit menu. A window will open: click on the radio button for "Values" and click OK. This will paste the numerical result into your new worksheet instead of the equation which must make calculations on the fly.
*Go to a new column on the right of your worksheet. Type the header "Avg_LogFC_t15", "Avg_LogFC_t30", "Avg_LogFC_60", and "Avg_LogFC_240" into the top cell of the next four columns.
*Compute the average log fold change for the replicates for each patient by typing the equation:
=AVERAGE(B2:D2)
into cell N2. Copy this equation and paste it into the rest of the column.
*Create the equation for times t30, t60, and t240 and paste it into their respective columns.
*Label the next four columns "Tstat_t15", "Tstat_t30", "Tstat_t60", and "Tstat_t240". This will compute a T statistic that tells us whether the scaled and centered average log ratio is significantly different than 0 (no change). Enter the equation:
=N2/(STDEV(B2:D2)/SQRT(3))
*(NOTE: in this case the number of replicates is 3. Be careful that you are using the correct number of parentheses.) Copy the equation and paste it into all rows in that column as well as the next three column making sure to change the cells involved in the equation accordingly.
*Label the top cell in the next four columns "Pvalue_t15", "Pvalue_t30", "Pvalue_t60", and "Pvalue_t240". In the cell below the label, enter the equation:
=TDIST(ABS(R2),2, 2)
*The number of degrees of freedom is the number of replicates minus one, so in our case there are 2 degrees of freedom. Copy the equation and paste it into all rows in that column and the next three columns making sure to change the cell involved to the appropriate Tstat value.
*Insert a new worksheet and name it "forGenMAPP".
*Go back to the "statistics" worksheet and Select All and Copy.
*Go to your new sheet and click on cell A1 and select Paste Special, click on the Values radio button, and click OK. We will now format this worksheet for import into GenMAPP.
*Select Columns B through Q (all the fold changes). Select the menu item Format > Cells. Under the number tab, select 2 decimal places. Click OK.
*Select Columns R and Z. Select the menu item Format > Cells. Under the number tab, select 4 decimal places. Click OK.
*Select Columns N through Z and Cut. Select Column B by left-clicking on the "B" at the top of the column. Then right-click on the Column B header and select "Insert Cut Cells". This will insert the data without writing over your existing columns.
*Delete Rows 2 and 3 where it says "Average" and "StDev" so that your data rows with gene IDs are immediately below the header row 1.
*Insert a column to the right of the "ID" column. Type the header "SystemCode" into the top cell of this column. Fill the entire column (each cell) with the letter "N".
*Select the menu item File > Save As, and choose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu. Excel will make you click through a couple of warnings because it doesn't like you going all independent and choosing a different file type than the native .xls. This is OK. Your new *.txt file is now ready for import into GenMAPP.
*[[File:SinorhizobiumMeliloti LM GenMapp DataSheet.xls]]

==Sanity Check==
#How many genes have p value < 0.05 for the time set of 15 minutes?
#*3613 genes
#How many genes have p value < 0.05 for the time set of 30 minutes?
#*5225 genes
#How many genes have p value < 0.05 for the time set of 60 minutes?
#*5207 genes
#How many genes have p value < 0.05 for the time set of 240 minutes?
#*6790 genes
#How many genes have p value < 0.01 for the time set of 15 minutes?
#*907 genes
#How many genes have p value < 0.01 for the time set of 30 minutes?
#*1518 genes
#How many genes have p value < 0.01 for the time set of 60 minutes?
#*1553 genes
#How many genes have p value < 0.01 for the time set of 240 minutes?
#*2437 genes
#How many genes have p value < 0.001 for the time set of 15 minutes?
#*92 genes
#How many genes have p value < 0.001 for the time set of 30 minutes?
#*179 genes
#How many genes have p value < 0.001 for the time set of 60 minutes?
#*172 genes
#How many genes have p value < 0.001 for the time set of 240 minutes?
#*347 genes
#How many genes have p value < 0.0001 for the time set of 15 minutes?
#*7 genes
#How many genes have p value < 0.0001 for the time set of 30 minutes?
#*15 genes
#How many genes have p value < 0.0001 for the time set of 60 minutes?
#*13 genes
#How many genes have p value < 0.0001 for the time set of 240 minutes?
#*36 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 15 minutes?
#*1521 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 30 minutes?
#*1926 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 60 minutes?
#*2194 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there for the time set of 240 minutes?
#*2846 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 15 minutes?
#*2092 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 30 minutes?
#*3299 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 60 minutes?
#*3013 genes
#Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there for the time set of 240 minutes?
#*3944 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 15 minutes?
#*1476 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 30 minutes?
#*1890 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 60 minutes?
#*2129 genes
#What about an average log fold change of > 0.25 and p < 0.05 for the time set of 240 minutes?
#*2763 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 15 minutes?
#*2052 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 30 minutes?
#*3256 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 60 minutes?
#*2942 genes
#Or an average log fold change of < -0.25 and p < 0.05 for the time set of 240 minutes?
#*3866 genes

==GenMAPP and MAPPFinder Protocols==
*To begin GenMAPP analysis, first launch GenMAPP 2 or download it off of the following website: http://genmapp.org.
*Look at the lower-left hand corner to see what gene database is loaded. For this assignment, the gene database is [[Media:Sinorhizobium_meliloti_1021_mpetredi_2013123-2.gdb]] should appear in the corner.
*If another database appears or if there is "No Gene Database", go to Data > Choose Gene Database and find the database you need to use.
*Once the correct database is loaded, go to Data > Expression Dataset Manager. This will allow you to input the data file created in the "Statistical Analysis" portion of this page.
*In the window that pops up, go to Expression Datasets > New Dataset and open the tab-delimited file you created for GenMAPP: [[Media:SinorhizobiumMeliloti_LM_GenMapp_DataSheet.txt]]
*In the "Data Type Specification" window that pops up, only check the box next to a column header if that column has character data. All of the boxes should remain unchecked, because none of the columns in our dataset contain non-numerical values.
*Give the Expression Dataset Manager time to convert your data into a GEX file.
*An error message may appear that states that the Expression Dataset Manager was unable to convert some of the lines of the data. These lines of data are not incorporated into the Expression Dataset but rather recorded in an exception file that contains all of your raw data and an additional column called ~Error~.
*The exception file is a tab-delimited file with the suffix .EX appended to the name of the raw data file you loaded into the Expression Dataset Manager.
*Open the the exception file in Excel and filter the data to note what errors have been recorded.
*Using the .gdb Gene Database created by my partners, there were 5,538 errors, each of which was "Gene not found in OrderedLocusNames or any related system."
*Customize the new Expression Dataset by creating Color Sets, which contain the instructions to GenMAPP for displaying data on MAPPs.
*In the "Color Sets" section, type in your own title into the "Name" field.
*To specify what value appears next to each gene on a MAPP, select "Avg_LogFC_t15" in the drop down menu in the "Gene Value" field.
*We are using the t15 time period for this step to represent the results from all four time intervals, because it would be too challenging to complete this protocol with all four time interval values.
*In the "Criteria Builder" section, click on the "New" button. Now, we will construct the criterion to query the data.
*We will set the criterion to query for all the genes that have a significant (i.e. [Pvalue] < 0.05) decrease in the average log fold change (i.e. [Avg_LogFC_t15] < -0.25).
*In the menu under "Columns" in the "Criteria Builder" section, select "Avg_LogFC_t240", which will then appear in the "Criterion" field.
*Then choose the "<", ">", and "=" as appropriate, paired with -0.25.
*Type out the word "AND" in this same field and select "Pvalue_t15" and the "Ops" accordingly. .
*Under "Ops", click on the "<" operator. Then, type 0.05 (this will appear in the "Criterion" field).
*Enter the name for the criterion in the "Label in Legend" field "Decreased", as we are looking for the Avg_LogFC that have decreased.
*Choose a color for the criterion by left-clicking on the box next to "Color". Choose a color from the Color window that appears and click OK. In this experiment, the color red was chosen.
*You may now click the "Add" button.
*Now we will add another criterion. In the "Criteria Builder" section, click on the "New" button. Now, we will construct the criterion to query the data.
*We will set the criterion to query for all the genes that have a significant (i.e. [Pvalue] < 0.05) decrease in the average log fold change (i.e. [Avg_LogFC_t15] > 0.25).
*In the menu under "Columns" in the "Criteria Builder" section, select "Avg_LogFC_t240", which will then appear in the "Criterion" field.
*Then choose the "<", ">", and "=" as appropriate, paired with 0.25.
*Type out the word "AND" in this same field and select "Pvalue_t15" and the "Ops" accordingly. .
*Under "Ops", click on the "<" operator. Then, type 0.05 (this will appear in the "Criterion" field).
*Enter the name for the criterion in the "Label in Legend" field "Increased", as we are looking for the Avg_LogFC that have increased.
*Choose a color for the criterion by left-clicking on the box next to "Color". Choose a color from the Color window that appears and click OK. In this experiment, the color green was chosen.
*You may now click the "Add" button.
*Save the entire Expression Dataset by going to Expression Datasets > Save.
*Exit the Expression Dataset to view the Color Sets on a MAPP.
*[[Media:ColorSets.mapp]]

*Moving onto the MAPPFinder Protocol, we will stay in GenMAPP, but selected Tools > MAPPFinder.
*Click on the button "Calculate New Results" and then choose the "Find File" button on the page to load your GEX file: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gex]]
*Choose the Color Set and Criteria with which to filter the data. Click on "Decreased" criteria in the right-hand box then check the two boxes labeled "Gene Ontology" and "p value".
*Then "Browse" through your computer and create a meaningful filename for the project.
*Now you can hit "Run MAPPFinder".
*It will take a while for this process to finish, but a Gene Ontology browsers with open showing your results when it has been completed.
*To see a list of the most significant Gene Ontology terms, click on the menu item "Show Ranked List".
*The top twenty go terms are listed in the following image: [[File:MAPPFinder.PNG]]
*In Windows, make a copy of the results file and open it in Excel.
*Click on a cell in the row of headers. On the tool bar, select Sort & Filter > Filter. Set the following filters:
**Z Score greater than 2
**Permute P less than 0.05
**Number Changed greater than or equal to 5 and less than 100.
**Percent Changed greater than or equal to 25
*Save the file as a different Excel spreadsheet name by selecting File > Save As and select Excel workbook (.xls) from the drop-down menu.
*Now look at the "Increased" data. We will stay in GenMAPP, but selected Tools > MAPPFinder.
*Click on the button "Calculate New Results" and then choose the "Find File" button on the page to load your GEX file: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gex]]
*Choose the Color Set and Criteria with which to filter the data. Click on "Increased" criteria in the right-hand box then check the two boxes labeled "Gene Ontology" and "p value".
*Then "Browse" through your computer and create a meaningful filename for the project.
*Now you can hit "Run MAPPFinder".
*It will take a while for this process to finish, but a Gene Ontology browsers with open showing your results when it has been completed.
*To see a list of the most significant Gene Ontology terms, click on the menu item "Show Ranked List".
*The top twenty go terms are listed in the following image: [[MAPPFinder Capture1.PNG]]
*In Windows, make a copy of the results file and open it in Excel.
*Click on a cell in the row of headers. On the tool bar, select Sort & Filter > Filter. Set the following filters:
**Z Score greater than 2
**PermuteP less than .05
**Number Changed greater than or equal to 5 and less than 100
**Percent Changed greater thaan or equal to 25
*Save the file as a different Excel spreadsheet name by selecting File > Save As and select Excel workbook (.xls) from the drop-down menu.

Laurmagee: Individual Assessment

2013-12-13T01:12:37Z

Laurmagee: /* Reflection on the Process */

==Statement of Work==
*Describe exactly what you did on the project.
*The first step in my role as GenMAPP user was to find an article that contained microarray data on the Rhizobacterium, Sinorhizobium Meliloti. I submitted a few articles for consideration, but ultimately my partner, Miles, found the article we have been using throughout this project. The mentioned article can be found on the following page: [[http://jb.asm.org/content/188/21/7617 HTML version]] and is references below.
*Domínguez-Ferreras, A., Pérez-Arnedo, R., Becker, A., Olivares, J., Soto, M.J., Sanjuán, J. (2006) Transcriptome Profiling Reveals the Importance of Plasmid pSymB for Osmoadaptation of Sinorhizobium meliloti ''Journal of Bacteriology'' 188:7617-7625
*From this article, we were able to procure the raw microarray data off of the following website: [[http://www.ebi.ac.uk/arrayexpress/experiments/E-MEXP-785/?keywords=&organism=Sinorhizobium%20meliloti&array=&exptype Osmotic upshift elicited by salt and sucrose]].
*The article that we were studying carried out four different experiments, with different levels of NaCl and sucrose being the manipulated variable. I personally studied the experiment done with 700mm of sucrose. Raw Data File for 700S (1-3): [[File:Full Raw Data.xls]]
*The columns of interest in the above data file were collect and scaling and centering was done to produce the log values needed for statistical analysis. Further description can be found on the following page: [[Laurmagee: Week 13]]. And the following data file was produced [[Media:Compiled Ratios and Logs.xls]].
*With this new file, statistical analysis could be done on the log values. Fist the Avg_LogFC values were calculated, averaging the log values of the three replicants present at each of the four individual time intervals (t15, t30, t60, t240). Therefore, I had to calculate four of these Avg_LogFC values, one for each time point. From these averages, I was able to calculate the T_stat and P_value for each time interval as well. This process if outlined in the "Statistical Analysis" section of the following journal: [[Laurmagee: Week 15]]. This spreadsheet was formatted specifically for GenMAPP standards and the following were produced: [[Media:SinorhizobiumMeliloti_LM_GenMapp_DataSheet.xls]] and [[Media:SinorhizobiumMeliloti_LM_GenMapp_DataSheet.txt]]. The text file is to be fed into GenMAPP.
*A Sanity Check was done and highlighted on the following page: [[Laurmagee: Week 15]], and it shows the number of microarray dots that were changed significantly in the process of the experiment.
*After this check proved to provide accurate results, I moved on to load my datafile into GenMAPP. However, this is where I found a snag. My GenMAPP file would not load into GenMAPP and the whole program would stop responding immediately after I would input my text file.
*After trying different computer, walking through every aspect of my data file with my partners, and trouble shooting different alternatives, I finally emailed Dr. Dondi, who found the main problem with my dataset. The gene IDs that had been present on my sheet did not follow the same format of those in the Gene Database that was created by my partners. The paralleled IDs were present, but they had extraneous information attached to them, which was inhibiting GenMAPP from recognizing them. The amount of error that the file was collecting was so large that the program had to stop responding entirely, which is why I wasn't getting an error message.
*However, even upon testing the modified data sheet, I found that it was giving me another error message. This time it was telling me my column titles were insufficient. After some research, I found that my text file was not in tab delimited format, despite the fact that I had saved it as such on my MacBook Pro computer. After transferring my previous .xls workbook onto a Windows computer, and saving it as a tab delimited text file, I finally got GenMAPP to accept the following data file: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.txt]].
*The GenMAPP and MAPPFiner protocols can be viewed at the bottom of the following journal page: [[Laurmagee: Week 15]].
*The exception file created with the GenMAPP program is included here: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.EX.txt] along with the following other three GenMAPP program exports: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gex]], [[Media:ColorSets.mapp]}, [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gmf]], which are all detailed on the journal page noted above.
*In addition, the MAPPFinder documents are the following: [[Media:700S1-3-t15-Decreased-Criterion0-GO.txt]], [[Media:700S1-3-t15-Increased-Criterion0-GO.txt]]
*Beyond my assigned tasks, I helped my group out in whatever areas were necessary. It was a challenge to keep up with the tasks of my partners, but I was always accessible to offer input or advice.

==Assessment of Project==
*Give an objective assessment of the success of your project workflow and teamwork.
*#What worked and what didn't work?
*#*I think my group and I should have stressed collaboration more on this project. Since we were all given an independent job to handle, I think it was difficult for us to establish a "team" environment. All four of us were busy outside of class with conflicting schedules, so it was very challenging to find time when we could all meet one another and discuss where we were in our project. I think having each other as check and balances would have been very helpful throughout the project, especially since I made it all the way to my GenMAPP stage of the project without anyone realizing I was using different GeneIDs then those contained in the database.
*#What would you do differently if you could do it all over again?
*#*If I could do the project all over again, I would have spent much less time scaling, centering, and performing statistical analysis on the data and I would spend more time completing my GenMAPP and MAPPFinder analysis to produce more conclusive results. I would also set aside time from the beginning of the project, where Miles and I could meet outside of class, because we had the exact same protocol yet we failed to use each other as a resource.
*Evaluate the Gene Database Project and Group Report in the following areas:
*#Content: What is the quality of the work?
*#*Our quality of work thus far has been admittedly lack luster, due to extenuating circumstances that occurred only days before our project assignments were due. However, I think our final deliverable and our final report will reflect how much we can improved in our quality of work, with the right circumstances.
*#Organization: Comment on the organization of the project and of your group's wiki pages.
*#*[[Team Name]] the wiki page is a bit overcrowded, but for the most part is organized. All of our contacts are contained in the first section and then all the files related to the Microarray Paper are contained in the next section. The Coder and QA personnel in my group then organized different sections based off the times when the files had been exported. Our believe that our powerpoint presentation could have been a lot more organized, but I think we were able to fix those flaws in our written report. A common theme with our group, i think, has been as not working as a cohesive unit, so our product come out disorganized and lacking fluidity. I think we remedied this in our final report, but other assignments prior may have reflected this.
*#Completeness: Did your team achieve all of the project objectives? Why or why not?
*#*There were some issues with time constraints, mainly with my portion of the project. I ran into a lot of GenMAPP issues during the end of our time with the project and this set me back many days. As far as final products go, however, I believe that we have completed the necessary items to the best of our ability, which is all I can ask of myself and my group.

==Reflection on the Process==
*What did you learn?
*#With your head (biological or computer science principles)
*#*I learned a lot about computers in this course. I was coming in with a fair amount of knowledge in the subject of biology, but very little knowledge of computers in general. Therefore, I have been exposed to coding, which was completely new to me, and also using data analysis programs such as GenMAPP which were otherwise foreign to me. Biologically, I learned what it was like to follow a biology based project all the way through data collect to conclusion. I do research on statistics education with Dr. Bargagliotti, in the math department, so i already new the processes of such projects, but had never had an opportunity to follow it through with a biological perspective.
*#With your heart (personal qualities and teamwork qualities that make things work or not work)?
*#*I learned how important working with a team is and how your cohesiveness as a unit can make our break a project. Everyone has busy schedules, but it is important to make the time to meet for collaboration, otherwise things like presentation seem choppy and you are much more prone to making errors. Although this project did assign personal responsibilities, that should not deter from the overall idea that you are a team and your success depends on one another.
*#With your hands (technical skills)?
*#*As I said previously, I learned a lot of computer skills, so whether that be inputing code, generating a .mapp on GenMAPP, or creating data calculation shortcuts in Excel, this was all new directions for my hands. I have never had a class taught primarily on the computer as well, so this was all together a new experience for me. In this project specifically, I learned how important it is to look over every single detail of a spreadsheet before you input it into a data processing program such as GenMAPP. With the issues I was struggling with using the GenMAPP program for hours, I will never make the mistake again of not checking the IDs of my file against those in the gene database.
*What lesson will you take away from this project that you will still use a year from now?
**I think all of the components that I listed above with stay with my even a year from now. I will definitely remember how to prepare a file for GenMAPP and complete an analysis on that file, but more generally, I will remember to look for the details, especially in research, that i have been tripped up on in this project and in this class as a whole as well. I will also take away the importance of teamwork, especially on such a multifaceted project as this one, and the excitement that can come out of your own research by creating a product that is completely your own.

Laurmagee: Individual Assessment

2013-12-13T00:51:46Z

Laurmagee: /* Assessment of Project */

==Statement of Work==
*Describe exactly what you did on the project.
*The first step in my role as GenMAPP user was to find an article that contained microarray data on the Rhizobacterium, Sinorhizobium Meliloti. I submitted a few articles for consideration, but ultimately my partner, Miles, found the article we have been using throughout this project. The mentioned article can be found on the following page: [[http://jb.asm.org/content/188/21/7617 HTML version]] and is references below.
*Domínguez-Ferreras, A., Pérez-Arnedo, R., Becker, A., Olivares, J., Soto, M.J., Sanjuán, J. (2006) Transcriptome Profiling Reveals the Importance of Plasmid pSymB for Osmoadaptation of Sinorhizobium meliloti ''Journal of Bacteriology'' 188:7617-7625
*From this article, we were able to procure the raw microarray data off of the following website: [[http://www.ebi.ac.uk/arrayexpress/experiments/E-MEXP-785/?keywords=&organism=Sinorhizobium%20meliloti&array=&exptype Osmotic upshift elicited by salt and sucrose]].
*The article that we were studying carried out four different experiments, with different levels of NaCl and sucrose being the manipulated variable. I personally studied the experiment done with 700mm of sucrose. Raw Data File for 700S (1-3): [[File:Full Raw Data.xls]]
*The columns of interest in the above data file were collect and scaling and centering was done to produce the log values needed for statistical analysis. Further description can be found on the following page: [[Laurmagee: Week 13]]. And the following data file was produced [[Media:Compiled Ratios and Logs.xls]].
*With this new file, statistical analysis could be done on the log values. Fist the Avg_LogFC values were calculated, averaging the log values of the three replicants present at each of the four individual time intervals (t15, t30, t60, t240). Therefore, I had to calculate four of these Avg_LogFC values, one for each time point. From these averages, I was able to calculate the T_stat and P_value for each time interval as well. This process if outlined in the "Statistical Analysis" section of the following journal: [[Laurmagee: Week 15]]. This spreadsheet was formatted specifically for GenMAPP standards and the following were produced: [[Media:SinorhizobiumMeliloti_LM_GenMapp_DataSheet.xls]] and [[Media:SinorhizobiumMeliloti_LM_GenMapp_DataSheet.txt]]. The text file is to be fed into GenMAPP.
*A Sanity Check was done and highlighted on the following page: [[Laurmagee: Week 15]], and it shows the number of microarray dots that were changed significantly in the process of the experiment.
*After this check proved to provide accurate results, I moved on to load my datafile into GenMAPP. However, this is where I found a snag. My GenMAPP file would not load into GenMAPP and the whole program would stop responding immediately after I would input my text file.
*After trying different computer, walking through every aspect of my data file with my partners, and trouble shooting different alternatives, I finally emailed Dr. Dondi, who found the main problem with my dataset. The gene IDs that had been present on my sheet did not follow the same format of those in the Gene Database that was created by my partners. The paralleled IDs were present, but they had extraneous information attached to them, which was inhibiting GenMAPP from recognizing them. The amount of error that the file was collecting was so large that the program had to stop responding entirely, which is why I wasn't getting an error message.
*However, even upon testing the modified data sheet, I found that it was giving me another error message. This time it was telling me my column titles were insufficient. After some research, I found that my text file was not in tab delimited format, despite the fact that I had saved it as such on my MacBook Pro computer. After transferring my previous .xls workbook onto a Windows computer, and saving it as a tab delimited text file, I finally got GenMAPP to accept the following data file: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.txt]].
*The GenMAPP and MAPPFiner protocols can be viewed at the bottom of the following journal page: [[Laurmagee: Week 15]].
*The exception file created with the GenMAPP program is included here: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.EX.txt] along with the following other three GenMAPP program exports: [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gex]], [[Media:ColorSets.mapp]}, [[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gmf]], which are all detailed on the journal page noted above.
*In addition, the MAPPFinder documents are the following: [[Media:700S1-3-t15-Decreased-Criterion0-GO.txt]], [[Media:700S1-3-t15-Increased-Criterion0-GO.txt]]
*Beyond my assigned tasks, I helped my group out in whatever areas were necessary. It was a challenge to keep up with the tasks of my partners, but I was always accessible to offer input or advice.

==Assessment of Project==
*Give an objective assessment of the success of your project workflow and teamwork.
*#What worked and what didn't work?
*#*I think my group and I should have stressed collaboration more on this project. Since we were all given an independent job to handle, I think it was difficult for us to establish a "team" environment. All four of us were busy outside of class with conflicting schedules, so it was very challenging to find time when we could all meet one another and discuss where we were in our project. I think having each other as check and balances would have been very helpful throughout the project, especially since I made it all the way to my GenMAPP stage of the project without anyone realizing I was using different GeneIDs then those contained in the database.
*#What would you do differently if you could do it all over again?
*#*If I could do the project all over again, I would have spent much less time scaling, centering, and performing statistical analysis on the data and I would spend more time completing my GenMAPP and MAPPFinder analysis to produce more conclusive results. I would also set aside time from the beginning of the project, where Miles and I could meet outside of class, because we had the exact same protocol yet we failed to use each other as a resource.
*Evaluate the Gene Database Project and Group Report in the following areas:
*#Content: What is the quality of the work?
*#*Our quality of work thus far has been admittedly lack luster, due to extenuating circumstances that occurred only days before our project assignments were due. However, I think our final deliverable and our final report will reflect how much we can improved in our quality of work, with the right circumstances.
*#Organization: Comment on the organization of the project and of your group's wiki pages.
*#*[[Team Name]] the wiki page is a bit overcrowded, but for the most part is organized. All of our contacts are contained in the first section and then all the files related to the Microarray Paper are contained in the next section. The Coder and QA personnel in my group then organized different sections based off the times when the files had been exported. Our believe that our powerpoint presentation could have been a lot more organized, but I think we were able to fix those flaws in our written report. A common theme with our group, i think, has been as not working as a cohesive unit, so our product come out disorganized and lacking fluidity. I think we remedied this in our final report, but other assignments prior may have reflected this.
*#Completeness: Did your team achieve all of the project objectives? Why or why not?
*#*There were some issues with time constraints, mainly with my portion of the project. I ran into a lot of GenMAPP issues during the end of our time with the project and this set me back many days. As far as final products go, however, I believe that we have completed the necessary items to the best of our ability, which is all I can ask of myself and my group.

==Reflection on the Process==
*What did you learn?
*#With your head (biological or computer science principles)
*#With your heart (personal qualities and teamwork qualities that make things work or not work)?
*#With your hands (technical skills)?
*What lesson will you take away from this project that you will still use a year from now?

Laurmagee: Individual Assessment

2013-12-13T00:21:47Z

Laurmagee: /* Assessment of Project */

2013-12-12T19:21:38Z

Laurmagee: /* Micro Array paper */

Species: ''Sinorhizobium Meliloti'' (Strain 1021)
==Group Project==
[[media:Sinorhizobium_Meliloti_group_project.pdf|Group Project]]

==Personnel==
===Stephen Louie===
Project Manager, Quality Assurance
:slouie4 at lion.lmu.edu
:1 LMU Drive MSB 5194
:Los Angeles, CA 90045
:[[Stephen Louie Project Notebook|Stephen Louie Project Notebook]]

===Lauren Magee===
GenMAPP Expert
:lmagee1 at lion.lmu.edu
:1 LMU Drive MSB-5258,
:Los Angeles, CA 90045
===Mitchell Petredis===
Coding Supervisor
:mrpetredis at gmail dot com
:mpetredi at lion dot lmu dot edu
:Loyola Marymount University
:1 LMU Drive MSB-5957
:Los Angeles, CA 90045-2659
===Miles Malefyt===
GenMAPP Coordinator
:milesm@malefyt.com
:mmalefyt@lion.lmu.edu
:8416 Campion drive
:Westchester, CA 90045
cell: 831-236-5402

[[User:Mmalefyt|Mmalefyt]] ([[User talk:Mmalefyt|talk]]) 10:31, 31 October 2013 (PDT)

==Micro Array paper==

[[http://www.ebi.ac.uk/arrayexpress/experiments/E-MEXP-785/?keywords=&organism=Sinorhizobium%20meliloti&array=&exptype Osmotic upshift elicited by salt and sucrose]]
*[[http://jb.asm.org/content/188/21/7617 HTML version]]
*[[http://jb.asm.org/content/188/21/7617.full.pdf+html PDF version]]
*[[Media:A-MEXP-230.adf.txt]]
*[[Media:E-MEXP-785.eSet.r]]
*[[Media:E-MEXP-785.processed.1.zip]]
*[[Media:E-MEXP-785.raw.2.zip]]
*[[Media:E-MEXP-785.idf.txt]]
*[[Media:E-MEXP-785.raw.1.zip]]
*[[Media:E-MEXP-785.sdrf.txt]]
*Raw Data File for 700S (1-3): [[File:Full Raw Data.xls]]
*[[Media:Compiled Ratios and Logs.xls]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_DataSheet.txt]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_DataSheet.xls]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.txt]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.EX.txt]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gex]]
*[[Media:ColorSets.mapp]]
*[[Media:SinorhizobiumMeliloti_LM_GenMapp_FinalFile.gmf]]
*[[Media:700S1-3-t15-Decreased-Criterion0-GO.txt]]
*[[Media:700S1-3-t15-Increased-Criterion0-GO]]
Domínguez-Ferreras, A., Pérez-Arnedo, R., Becker, A., Olivares, J., Soto, M.J., Sanjuán, J. (2006) Transcriptome Profiling Reveals the Importance of Plasmid pSymB for Osmoadaptation of Sinorhizobium meliloti ''Journal of Bacteriology'' 188:7617-7625

==Genome Paper==

[http://search.proquest.com/docview/213572450?accountid=7418 The composite genome of the legume symbiont Sinorhizobium meliloti. Science 293, 668–672.]

Galibert, F., Finan, T.M., Long, S., Puhler, A., et al. (2001) The composite genome of the legume symbiont Sinorhizobium meliloti ''Science'' 293:668-672

[[Category: Sinorhizobium meliloti]]
[[Category: Group Projects]]

==Model Organism Database==
Link: http://cmr.jcvi.org/tigr-scripts/CMR/GenomePage.cgi?org=ntsm01

==Compiled raw Data==
[[Media:Team_Name_NaCl_compiled_raw_Data.xls|300 NaCl compiled data set]]
==Compiled processed data==
[[Media:Complete_processed_Data.xls|Processed Data]]

==Data ready for GenMAPP==
[[Media:Complete_processed_Data_MPM.xls|XLS Version]]

[[Media:Complete_processed_Data_MPM.txt|TXT version, USE THIS]]

==Important Files==

Taxon ID: 266834

Version of GenMAPP Builder: [[Media: gmbuilder-2.0b71.zip | gmbuilder-2.0b71.zip]]

Computer on which export was run: Keck Lab Computer, back computer (furthest from the whiteboard)

Postgres Database name: Sinorhizobium meliloti

UniProt XML filename:[[Media:Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml|Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_10
* Time taken to import:19.17 minutes

GO OBO-XML filename:[[Media:Go daily-termdb.obo-xml mpetredi 2013116.gz| Go daily-termdb.obo-xml mpetredi 2013116.gz]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): from 11/4/2013
* Time taken to import: 17.81 minutes
* Time taken to process: 15.54 minutes

GOA filename:[[Media:R meliloti.goa|R meliloti.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): from 11/7/2013
* Time taken to import: 0.21 minutes
*Note about GOA file
**From Running GenMAPP Builder page, clicked on the UniProt-GOA Downloads link.
**Was given an error message. Changed url from "ftp" to "http" at beginning.
**Was entered, was taken to Index of/pub/database/GO/goa
**Clicked on "proteomes" folder
**Directed to Index of /pub/databases/GO/goa/proteomes. Downloaded 58.R_meliloti.goa
***Note: R. meliloti is an alternative name to S. Melitoti.

Name of .gdb file: [[Media:Sinorhizobium_meliloti_1021_GenMAPP_database_mpetredi_2013117.gdb | Sinorhizobium_meliloti_1021_GenMAPP_database_mpetredi_2013117.gdb]]
* Time taken to export .gdb: Started at 2pm on 11-7-2013, finished at 9:47pm
* Upload your file and link to it here.

Note:

replace geneID with ~ when you find MOD link

==Important Files 2==

Info based on export done on 11/21/2013

Taxon ID: 266834

Version of GenMAPP Builder: [[Media:GenMAPP_Builder_2.0b72 S. meliloti.zip|GenMAPP_Builder_2.0b72 S. meliloti.zip]]

Computer on which export was run: Keck Lab Computer, front computer (closest to the whiteboard)

Postgres Database name: Sinorhizobium meliloti

UniProt XML filename:[[Media:Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml|Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_10
* Time taken to import: 7.34 minutes

GO OBO-XML filename:[[Media:Go daily-termdb.obo-xml mpetredi 2013116.gz| Go daily-termdb.obo-xml mpetredi 2013116.gz]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): from 11/4/2013
* Time taken to import: 6.31 minutes
* Time taken to process: 4.60 minutes

GOA filename:[[Media:R meliloti.goa|R meliloti.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): from 11/7/2013
* Time taken to import: 0.07 minutes
*Note about GOA file
**From Running GenMAPP Builder page, clicked on the UniProt-GOA Downloads link.
**Was given an error message. Changed url from "ftp" to "http" at beginning.
**Was entered, was taken to Index of/pub/database/GO/goa
**Clicked on "proteomes" folder
**Directed to Index of /pub/databases/GO/goa/proteomes. Downloaded 58.R_meliloti.goa
***Note: R. meliloti is an alternative name to S. Melitoti.

Name of .gdb file: [[Media:Sinorhizobium_meliloti_1021_GenMAPP_database_mpetredi_20131121.gdb|Sinorhizobium_meliloti_1021_GenMAPP_database_mpetredi_20131121.gdb]]
* Time taken to export .gdb: Started at 10:18 AM on 11/21/2013; ended at 1:34 PM

==Important Files 3==

'''From Week 14, December 3-5, 2013'''

NOTE: Forgot that I cannot import data into the same database. I will continue to do an export based on what I have, and will do another import/export cycle on another computer using the same version of gmbuilder that I used here.

Taxon ID: 266834

Version of GenMAPP Builder: [[Media: SmelilotiGenMAPP_Builder_2.0b73.zip]]

Computer on which export was run: Keck Lab Computer, front computer (closest to the whiteboard)

Postgres Database name: Sinorhizobium meliloti

UniProt XML filename:[[Media:Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml|Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_10
* Time taken to import: 6.27 minutes

GO OBO-XML filename:[[Media:Go daily-termdb.obo-xml mpetredi 2013116.gz| Go daily-termdb.obo-xml mpetredi 2013116.gz]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): from 11/4/2013
* Time taken to import: 6.07 minutes
* Time taken to process: 12.42 minutes
**NOTE: gmbuilder told me that the GO OBO-XML file was already processed in the database, and wanted to know if I wanted to process the information again. I chose yes.

GOA filename:[[Media:R meliloti.goa|R meliloti.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): from 11/7/2013
* Time taken to import: 0.07 minutes
*Note about GOA file
**From Running GenMAPP Builder page, clicked on the UniProt-GOA Downloads link.
**Was given an error message. Changed url from "ftp" to "http" at beginning.
**Was entered, was taken to Index of/pub/database/GO/goa
**Clicked on "proteomes" folder
**Directed to Index of /pub/databases/GO/goa/proteomes. Downloaded 58.R_meliloti.goa
***Note: R. meliloti is an alternative name to S. Melitoti.

Name of .gdb file:
* Time taken to export .gdb: started around 10am
* Upload your file and link to it here.

==Important Files 4==

'''From Week 14, December 3-5, 2013'''

NOTE: Here is the other import/export, using a new database on a different Keck Lab computer

Taxon ID: 266834

Version of GenMAPP Builder: [[Media: SmelilotiGenMAPP_Builder_2.0b73.zip]]

Computer on which export was run: Keck Lab Computer, front computer (closest to the whiteboard)

Postgres Database name: S meliloti

UniProt XML filename:[[Media:Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml|Sinorhizobium meliloti 1021 mpetredi 2013115 UniProt XML.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_10
* Time taken to import: 6.25 minutes

GO OBO-XML filename:[[Media:Go daily-termdb.obo-xml mpetredi 2013116.gz| Go daily-termdb.obo-xml mpetredi 2013116.gz]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): from 11/4/2013
* Time taken to import: 6.21 minutes
* Time taken to process: 4.50 minutes

GOA filename:[[Media:R meliloti.goa|R meliloti.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): from 11/7/2013
* Time taken to import: 0.07 minutes
*Note about GOA file
**From Running GenMAPP Builder page, clicked on the UniProt-GOA Downloads link.
**Was given an error message. Changed url from "ftp" to "http" at beginning.
**Was entered, was taken to Index of/pub/database/GO/goa
**Clicked on "proteomes" folder
**Directed to Index of /pub/databases/GO/goa/proteomes. Downloaded 58.R_meliloti.goa
***Note: R. meliloti is an alternative name to S. Melitoti.

Name of .gdb file: [[Media:Sinorhizobium_meliloti_1021_mpetredi_2013123-2.gdb]]
* Time taken to export .gdb: Started at 10:28am. Finished at 1:36pm

* Upload your file and link to it here.

GEX File
*[[Media:2013125-Complete_processed_Data_MPM.gex]]

==Wiki Navigation==
[[Template:Team Name]]
{{Team Name}}
*[[Teamname Week 13 Status Report]]
*[[Laurmagee: Week 13]]
*[[Teamname Week 15 Status Report]]
*[[Laurmagee: Week 15]]
*[[Electronic notebook: sinorhizobium meliloti|Miles Malefyt electronic notebook]]

Laurmagee: Week 15

2013-12-12T07:39:08Z

Laurmagee: /* Statistical Analysis */

Laurmagee: Week 15

2013-12-12T07:38:02Z

Laurmagee:

2013-12-12T00:20:58Z

Laurmagee: /* Extra Wiki Links */

==Name==
===Lauren Magee===

[[File:LaurenMagee.png]]

[[Media:Lauren_Magee-_Resume.pdf]]

==Contact Information==
===Email===
lmagee1@lion.lmu.edu
===Mailing Address===
1 LMU Drive MSB-5258,
Los Angeles, CA 90045

==Education==
===August 2011 to Current Day===
Loyola Marymount University
*Major: Individualized Scientific Studies
*Minor: Psychology
*Expected Date of Graduation: May 2015
*Upper Division Courses
*#Biological Databases
*#Introduction to Probability and Statistics
*#Genetics Lab

==Research Experience==
===August 2011 to May 2012===
HHMI: Phage Discovery Lab
*A special research lab for Freshman involving the isolation and characterization of phages collected from the Loyola Marymount University campus.
*Mentors: Dr. Kulick, Dr. Urbinati, and Dr. Fang
*Undergraduate Research Symposium Presenter: ''2012''
*West Coast Biological Sciences Undergraduate Research Symposium Presenter: ''2012''

===January 2012 to Current Day===
Project-SET
*Project-SET is an NSF funded projects aimed at developing teacher level material to better facilitate student learning in statistics.
*Mentor: Dr. Bargagliotti
*Rains Research Grant Recipient: ''Spring 2012, Fall 2012, Spring 2013, Fall 2013''
*Undergraduate Research Symposium: ''2013''
*USCOTS (United States Conference On Teaching Statistics) Participant: ''2013''
[http://www.project-set.com Project-SET Website]

==Work Experience==
===August 2011 to Current Day===
Seaver College of Science and Engineering Dean's Office Assistant
*A source of communication between students, faculty, and staff. An aid in answering phones, organizing paperwork, and inputing online data.
*Employer: NaKesha Mayfield

==Community Service==
===Fall 2012 to Current===
MESA (''McCarthy Experience in Service and Action'') 2012-2013, Center for Service and Action House 2013-2014
*An intentional living learning community focused on serving the Los Angeles area.
Underwings Praxis Service Club
*A service club committed to being a positive presence in the Boyle Heights Community.
*Dolores Mission After School Program and Guadeloupe Homeless Project are both Underwings service programs for children and homeless men in downtown LA, creating a community base to where they feel supported and connected to their environment.
El Espejo
*A weekly mentoring program at Lennox Middle School in Inglewood, CA.
Gryphon Circle Service Organization
*A service organization focused on reforming secondary education to accomodate children from all types of demographics and with all types of learning styles.
*Blood Drive Representative for the Organization
*Social Justice Committee Member and Head

==Journal Entries==
# [[Class Journal Week 1]]
# [[Class Journal Week 2]]
# [[Class Journal Week 3]]
# [[Class Journal Week 4]]
# [[Class Journal Week 5]]
# [[Class Journal Week 6]]
# [[Class Journal Week 7]]
# [[Class Journal Week 8]]
# [[Class Journal Week 9]]

==Extra Wiki Links==
*[[Laurmagee: Week 2]]
*[[Laurmagee: Week 3]]
*[[Laurmagee: Week 4]]
*[[Laurmagee: Week 5]]
*[[Laurmagee: Week 6]]
*[[Laurmagee: Week 7]]
*[[Laurmagee: Week 8]]
*[[Laurmagee: Week 9]]
*[[Laurmagee: Week 10]]
*[[Laurmagee: Week 11]]
*[[Laurmagee: Week 12]]
*[[Laurmagee: Week 13]]
*[[Laurmagee: Week 15]]
*[[GeneDB]]
*[[Team Name]]
*[[Team Name Week 12]]
*[[Teamname Week 13 Status Report]]
*[[Teamname Week 15 Status Report]]
*[[Laurmagee: Individual Assessment]]
*Assignment Pages:
*#[[Week 1]]
*#[[Week 2]]
*#[[Week 3]]
*#[[Week 4]]
*#[[Week 5]]
*#[[Week 6]]
*#[[Week 7]]
*#[[Week 8]]
*#[[Week 9]]
*#[[Week 10]]
*#[[Week 11]]
*#[[Week 12]]
*#[[Week 13]]
*#[[Week 15]]
*[[Template:Laurmagee]]

{{template:Laurmagee}}

[[Category:Individual Homework]]
[[Category:User Page]]