LMU BioDB 2024 - User contributions [en]

Yeast Beasts Deliverables

2024-05-03T18:58:15Z

Msymond1: fixed report

# [[Yeast_Beasts_Deliverables|Organized Team deliverables wiki page with table of contents]]
# Group Report (''.docx'' or ''.pdf'' file) [[media:Yeast_Beasts_Deliverable.pdf|Report]]
# Individual statements of work, assessments, reflections (wiki page, ''.docx'', ''.pdf'', or e-mailed to Dr. Dahlquist) [[media:Symonds Ind. Assessment-Reflection.docx|Dean's reflection]]
# Group PowerPoint presentation (given on Thursday, May 2, ''.pptx'' or ''.pdf'' file) [[Media:Yeast_Beasts_Presentation.pdf|Presentation]]
# Sample-data relationship table in Excel (''.xlsx'') [[Media:Sample_Data_Table.xlsx | Sample-data Relationship Table]]
# Excel spreadsheet with ANOVA results/stem formatting (''.xlsx'') [[Media:ANOVA_STEM.xlsx | ANOVA and STEM Spreadsheet]]
# PowerPoint of ANOVA table, screenshots of stem results (''.pptx''), screenshot of black and white GRNsight input network and colored GRNmap/GRNsight output networks [[Media:ANOVAslides.pdf | PPT of ANOVA, STEM, and Networks]]
# Gene List and GO List files from each significant profile (''.txt'' compressed together in a ''.zip'' file) [https://lmu.app.box.com/file/1512445828707 GO List Zipped Files] and [https://lmu.app.box.com/file/1512445314285 Gene List Zipped Files]
# YEASTRACT "rank by TF" results (''.xlsx'') [https://lmu.app.box.com/file/1513580824058 YEASTRACT Results]
# GRNmap input workbook (''.xlsx'') [[media:GRNmap_930PM.xlsx|GRNmap Input Workbook]]
# GRNmap output workbook (''.xlsx'') and output plots (''.jpg'') zipped together [https://lmu.box.com/s/rathfd4jhdqdjzuci4kdagmptq1jbxeb link to Box with GRNmap Output]
# MS Access database, including all tables (''.accdb'') [https://lmu.box.com/s/2mcetil8n7vxzwe100avt5yhmc7psuh5 Database]
# ReadMe for the database that describes the design of the database, references the sources of the data, and has a [https://www.quackit.com/microsoft_access/microsoft_access_2016/howto/how_to_create_a_database_diagram_in_access_2016.cfm database schema diagram] (''.docx'', ''.pdf'') [https://lmu.box.com/s/0j6y9p1mrb3dji8wuvn8zhfwrb6lpo97 Database Diagram]
# Query design for populating a GRNmap input workbook from the database (screenshot of MS Access; or SQL code, ''.txt'') [https://lmu.box.com/s/y6x97kqcjlxfezhnbpvck4h4oe4pm366 Query Designs]
# Electronic notebook corresponding to these the microarray results files ([[Week 13]], [[Week 14]], and [[Week 15]]) to support ''reproducible research'' so that all manipulations of the data and files are documented so that someone else could begin with your starting file, follow the protocol, and obtain your results.

{{Yeast_Beasts}}
[[Template:Yeast_Beasts]]

Yeast Beasts Deliverables

2024-05-03T18:57:54Z

Msymond1: added report

# [[Yeast_Beasts_Deliverables|Organized Team deliverables wiki page with table of contents]]
# Group Report (''.docx'' or ''.pdf'' file) [[Yeast_Beasts_Deliverable.pdf|Report]]
# Individual statements of work, assessments, reflections (wiki page, ''.docx'', ''.pdf'', or e-mailed to Dr. Dahlquist) [[media:Symonds Ind. Assessment-Reflection.docx|Dean's reflection]]
# Group PowerPoint presentation (given on Thursday, May 2, ''.pptx'' or ''.pdf'' file) [[Media:Yeast_Beasts_Presentation.pdf|Presentation]]
# Sample-data relationship table in Excel (''.xlsx'') [[Media:Sample_Data_Table.xlsx | Sample-data Relationship Table]]
# Excel spreadsheet with ANOVA results/stem formatting (''.xlsx'') [[Media:ANOVA_STEM.xlsx | ANOVA and STEM Spreadsheet]]
# PowerPoint of ANOVA table, screenshots of stem results (''.pptx''), screenshot of black and white GRNsight input network and colored GRNmap/GRNsight output networks [[Media:ANOVAslides.pdf | PPT of ANOVA, STEM, and Networks]]
# Gene List and GO List files from each significant profile (''.txt'' compressed together in a ''.zip'' file) [https://lmu.app.box.com/file/1512445828707 GO List Zipped Files] and [https://lmu.app.box.com/file/1512445314285 Gene List Zipped Files]
# YEASTRACT "rank by TF" results (''.xlsx'') [https://lmu.app.box.com/file/1513580824058 YEASTRACT Results]
# GRNmap input workbook (''.xlsx'') [[media:GRNmap_930PM.xlsx|GRNmap Input Workbook]]
# GRNmap output workbook (''.xlsx'') and output plots (''.jpg'') zipped together [https://lmu.box.com/s/rathfd4jhdqdjzuci4kdagmptq1jbxeb link to Box with GRNmap Output]
# MS Access database, including all tables (''.accdb'') [https://lmu.box.com/s/2mcetil8n7vxzwe100avt5yhmc7psuh5 Database]
# ReadMe for the database that describes the design of the database, references the sources of the data, and has a [https://www.quackit.com/microsoft_access/microsoft_access_2016/howto/how_to_create_a_database_diagram_in_access_2016.cfm database schema diagram] (''.docx'', ''.pdf'') [https://lmu.box.com/s/0j6y9p1mrb3dji8wuvn8zhfwrb6lpo97 Database Diagram]
# Query design for populating a GRNmap input workbook from the database (screenshot of MS Access; or SQL code, ''.txt'') [https://lmu.box.com/s/y6x97kqcjlxfezhnbpvck4h4oe4pm366 Query Designs]
# Electronic notebook corresponding to these the microarray results files ([[Week 13]], [[Week 14]], and [[Week 15]]) to support ''reproducible research'' so that all manipulations of the data and files are documented so that someone else could begin with your starting file, follow the protocol, and obtain your results.

{{Yeast_Beasts}}
[[Template:Yeast_Beasts]]

File:Yeast Beasts Deliverable.pdf

2024-05-03T18:57:19Z

Msymond1:

Yeast Beasts Deliverables

2024-05-03T18:22:33Z

Msymond1: added my reflection

# [[Yeast_Beasts_Deliverables|Organized Team deliverables wiki page with table of contents]]
# Group Report (''.docx'' or ''.pdf'' file)
# Individual statements of work, assessments, reflections (wiki page, ''.docx'', ''.pdf'', or e-mailed to Dr. Dahlquist) [[media:Symonds Ind. Assessment-Reflection.docx|Dean's reflection]]
# Group PowerPoint presentation (given on Thursday, May 2, ''.pptx'' or ''.pdf'' file) [[Media:Yeast_Beasts_Presentation.pdf|Presentation]]
# Sample-data relationship table in Excel (''.xlsx'') [[Media:Sample_Data_Table.xlsx | Sample-data Relationship Table]]
# Excel spreadsheet with ANOVA results/stem formatting (''.xlsx'') [[Media:ANOVA_STEM.xlsx | ANOVA and STEM Spreadsheet]]
# PowerPoint of ANOVA table, screenshots of stem results (''.pptx''), screenshot of black and white GRNsight input network and colored GRNmap/GRNsight output networks [[Media:ANOVAslides.pdf | PPT of ANOVA, STEM, and Networks]]
# Gene List and GO List files from each significant profile (''.txt'' compressed together in a ''.zip'' file) [https://lmu.app.box.com/file/1512445828707 GO List Zipped Files] and [https://lmu.app.box.com/file/1512445314285 Gene List Zipped Files]
# YEASTRACT "rank by TF" results (''.xlsx'') [https://lmu.app.box.com/file/1513580824058 YEASTRACT Results]
# GRNmap input workbook (''.xlsx'') [[media:GRNmap_930PM.xlsx|GRNmap Input Workbook]]
# GRNmap output workbook (''.xlsx'') and output plots (''.jpg'') zipped together [https://lmu.box.com/s/rathfd4jhdqdjzuci4kdagmptq1jbxeb link to Box with GRNmap Output]
# MS Access database, including all tables (''.accdb'') [https://lmu.box.com/s/2mcetil8n7vxzwe100avt5yhmc7psuh5 Database]
# ReadMe for the database that describes the design of the database, references the sources of the data, and has a [https://www.quackit.com/microsoft_access/microsoft_access_2016/howto/how_to_create_a_database_diagram_in_access_2016.cfm database schema diagram] (''.docx'', ''.pdf'') [https://lmu.box.com/s/0j6y9p1mrb3dji8wuvn8zhfwrb6lpo97 Database Diagram]
# Query design for populating a GRNmap input workbook from the database (screenshot of MS Access; or SQL code, ''.txt'') [https://lmu.box.com/s/y6x97kqcjlxfezhnbpvck4h4oe4pm366 Query Designs]
# Electronic notebook corresponding to these the microarray results files ([[Week 13]], [[Week 14]], and [[Week 15]]) to support ''reproducible research'' so that all manipulations of the data and files are documented so that someone else could begin with your starting file, follow the protocol, and obtain your results.

{{Yeast_Beasts}}
[[Template:Yeast_Beasts]]

File:Symonds Ind. Assessment-Reflection.docx

2024-05-03T18:21:29Z

Msymond1:

Yeast Beasts

2024-05-03T02:05:35Z

Msymond1: /* Dean's Reflection */ added spaec

* This page will be the main place from which the Yeast Beasts team project will be managed. Include all of the information/links that you think will be useful for your team to organize your work and communicate with each other and with the instructors. ''Hint: the kinds of things that are on your own User pages and on the course Main page can be used as a guide.''

[[Media:Yeast_Beasts_Presentation.pdf | Final version of presentation]]

[[Yeast Beasts Deliverables]]

==Week Reflections==
===Week 13===
====[[User:Hivanson|Hailey's]] Reflection [Quality Assurance]====
*I worked closely with Charlotte and Katie toward completing Milestones 2 and 3. We completed milestone 2 and are close to completing milestone 3.
*I thought it worked well to split up, with Natalija going with the coder/designers and me going with data analysis, ''but'' I would love to see where they are at on their progress so that we can join up for the upcoming milestone 4. I want to do this on or before next Tuesday, April 30th.
*It did not work to try to do tasks simultaneously with the data analysts. To fix this, we had one person with an open Excel sheet on their computer, another reading and checking off the steps, and another checking that all of the data and equations were being entered properly. This solution worked well for us and we will continue to have just one computer with Excel open on it, but switching roles between the person inputting data and the one checking off steps could be better for the future.

[[User:Hivanson|Hivanson]] ([[User talk:Hivanson|talk]]) 23:35, 17 April 2024 (PDT)

====Andrew's Reflection [Coder/Designer]====
To find my electronic notebook for this week please click on [[Asandle1 Week 13#Electronic Lab Notebook|Andrew Sandler's Week 13 Lab Notebook]]
'''Executive Summary'''

#Classified Significant P-values as 1 (P < 0.01) or '0'
#Found issues with data including missing gene descriptions.
#Initially tried to use Yeastmine to find the missing gene information but it was inefficient.
#Found additional blanks in the dataset and need to speak with Dr. Dahlquist about how to solve this issue.

'''What worked?'''
Everything "worked" but some surprises came up.

'''What didn't work?'''
Not that this didn't work but it provided a challenge, the issues with the #REF boxes, the blank boxes, the random text in some boxes, and the NaN coming up in some spots. I don't know how to deal with this and will need help from Dr. Dahlquist so I am not just guessing at a solution. I also need to figure out how to get a complete Gene ID list and then compare the whole ID list to the missing ID's. I also need that list for the ID's for the Access Database.

'''What will I do next to fix what didn't work?'''
I plan on speaking with Dean and Dr. Dahlquist in class tomorrow to fix these issues and then move onto working on the Access section of this assignment.

[[Category:Journal Entry]]
[[Category:Team Project]]

====Dean's Reflection [Coder/Designer]====
# This week, me and my partner completed milestones 1 and 2, and we are currently working on milestone 3, there are some complications in milestone 3, for a large part of it requires Microsoft Access, and there are also some issues in importing tables to excel. [[MSymond1]]
# Each team member should reflect on the team's progress:
## The things that worked well are cleaning up the data for the network table, which was done in class on Tuesday, the data table looks much more organized and the p values have all been successfully converted
## The other data tables are not pasting into excel as neatly as anticipated, I am also unaware of how to obtain the data from the yeastmine website.
## To fix these issues, I will ask Dr. Dahlquist for further advice in class on Thursday.
[[User:Msymond1|Msymond1]] ([[User talk:Msymond1|talk]]) 13:33, 18 April 2024 (PDT)

====Katie's Reflection====
#This week, Charlotte, Hailey, and I worked on completing Milestones 2 and 3. These milestones consisted of preparing the dataset from SGD for analysis, and then performing an ANOVA analysis like we had done in Week 9. A more detailed summary of the steps we followed is outlined on mine and Charlotte's individual page, linked below.
#* [[Data Analysts Week 13]]
#The data analysts, me and Charlotte, worked together with Hailey on progressing through Milestones 2 and 3 on the Data Analysis page. We contacted each other throughout the week to check in on what each person was doing. We then met in person to work together on performing the ANOVA analysis. This worked well, because when we couldn't meet we were still able to get some work done, and then once we got together we were able to ask any questions that we had. It was slightly difficult to progress through the steps in person because when attempting to work on the dataset at the same time, only one person could be actively making changes. I don't believe it is possible for this issue to be fixed, as we cannot have multiple people working at exactly the same time, because steps need to be followed in a specific order. In the future, we will continue to make sure that we split up the steps so that each person is doing an equal amount of work, and to be communicative about any questions that we have or can answer.

[[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 23:09, 17 April 2024 (PDT)

====Charlotte's Reflection:====
[[User:Kmill104| Katherine Miller]] and I, being the data analysts, worked with Quality Assurance [[User:Hivanson| Hailey Ivanson]] to complete Milestone 2 and Milestone 3 in person on April 17th, 2024. We messaged the Coder/Designers and got an update from them. I wrote out the steps taken on our [[Data Analysts Week 13]] page. It was helpful that we were able to meet in person to collaborate. However, it was hard to make changes to the data since we were working on one computer. We ended up splitting up the work well, but at first everyone trying to make edits at once was hard. Now we know a system that works for us as a group.

[[User:Ckapla12|Ckapla12]] ([[User talk:Ckapla12|talk]]) 14:00, 18 April 2024 (PDT)

===Week 14===
===Week 15===
====Dean's Reflection====
# This week and last week, the entire group and I completed milestones 3-6, as well as the rest of the deliverables for the final project.
# Each team member should reflect on the team's progress:
## The things that worked well are creating the database and getting it all well organized and working out any bugs or issues in the database. Running the queries in the database the way I did them also worked very well and was very quick once the issues in the database were resolved. The creation of the final project report also worked well since we also had presented on the project already.
## The things that did not work well were the collaboration on running the queries in the coders/designers since Andrew first tried doing it in a much more complicated way that required typing all of the syntax in the SQL mode and there was little to no communication between us on how these were done or what needs to be done in the future.
## To fix these issues, I made sure that for the rest of the project we all collaborated and communicated well for the final presentation and project.

# Each person needs to write a short executive summary of that person's progress on the project for the week, with links to the relevant individual journal pages (which will have more detailed information).
# Each team member should reflect on the team's progress:
## What worked?
## What didn't work?
## What will I do next to fix what didn't work?
# Note that you will be directed to add specific information to your team's pages in the individual portion of the assignment for this and future weeks.

[[File:Yeast_Beasts_Presentation.pdf]]

{{Yeast_Beasts}}
[[Template:Yeast_Beasts]]

Yeast Beasts

2024-05-03T02:05:21Z

Msymond1: /* Week 15 */ added my reflection

MSymond1 Week 15

2024-05-03T02:03:06Z

Msymond1: /* Acknowledgements */ finished acknowledgements

==Progress Report==
===Milestone 3 continued===
*I was able to obtain the data from the yeastmine website by scrolling down and selecting the table with “all verified uncharacterized dubious ORFs”, the data table included all of the columns that Dr. Dahlquist suggested I include in the table
*The production and degradation tables could be saved as txt files which could then be opened in excel directly
*The metadata table was created with assistance from my professor in which she helped us gather the necessary materials for it
===Milestone 4===
*The database creation went mostly well with a few issues along the way such as
*Importing the tables from excel led to some problems of all values being either 0 or 1 on production rates or degradation rates tables, so we started to export the tables from excel as a txt file and then open them in access at txt files
*In the relationships window, we connected the systematic name column for each of the tables to ensure they would all be connected since that column was the primary key in each of the data tables
===Milesone 5===
*The quality assurance team was able to verify that the database was correct and had all necessary fields
===Milestone 6===
*We were informed by the data analysts that they selected profile 41 to analyze, this profile included 23 genes initially
*At first we were stuck and were not sure how to obtain the necessary data for the GRNmap, but Dr. Dahlquist pointed out that we can use the database that we created to run queries and find the necessary fields for each of the 23 genes
*I was informed that Andrew had completed the first three queries for the first three tables on the GRNmap excel sheet
*I followed the sample GRNmap excel sheet as a template for the queries. and I was running into problems running the query with the network table
*We imported profile 41 genes into access as another table to run queries with
*I noticed the problem was that profile 41 only included standard names of the genes rather than the systematic names, which was not the primary key for most of the database, and the network table did not include the standard names so I could not link those two tables together using it
*My solution was to link the gene table to the profile table using their standard names. However, I noticed that the gene table did not include the standard names for the genes, so I imported another gene table from yeastmine that does include the standard names. Then I was able to link the profile 41 table to the gene table using the standard names and then I was able to run a query using the query design and I was able to answer question 4.
*I also noticed that the first three questions were incorrect since they did not include all of the genes, I'm not sure how that happened with Andrew's queries, but I went ahead and ran them again and fixed that issue
*Dr. Dahlquist also noticed that Andrew's queries included the data from the control expression group rather than the CHP treated expression group, so I fixed that too
*I followed the rest of the directions for the rest of the questions for milestone 6 and ran into no problems
*I was able to give the GRNmap excel file to Dr. Dahlquist for her to run it

==Presentation==
===Progress===
*We were able to complete the presentation to the best of our abilities following the directions listed in the project deliverables page
*The biggest issues we ran into was structuring the presentation in a way that made sense since we all had a great understanding of our own part of the project but we did not know everyone else's as well, so we had to learn a little bit about everyone's part of the project in order to create a cohesive presentation that made sense
*We also did our best to take into account our feedback from the last presentation and tried to make our titles more descriptive and also make our bullet points more useful
slides [https://docs.google.com/presentation/d/1DnYfkl9j5hy6EqTc1XT0tT7A_husJCJNNgsxhncsozY/edit?usp=sharing|our presentation]

==Team Journal Assignment==
# This week and last week, the entire group and I completed milestones 3-6, as well as the rest of the deliverables for the final project.
# Each team member should reflect on the team's progress:
## The things that worked well are creating the database and getting it all well organized and working out any bugs or issues in the database. Running the queries in the database the way I did them also worked very well and was very quick once the issues in the database were resolved. The creation of the final project report also worked well since we also had presented on the project already.
## The things that did not work well were the collaboration on running the queries in the coders/designers since Andrew first tried doing it in a much more complicated way that required typing all of the syntax in the SQL mode and there was little to no communication between us on how these were done or what needs to be done in the future.
## To fix these issues, I made sure that for the rest of the project we all collaborated and communicated well for the final presentation and project.

==Acknowledgements==
I utilized all of my group members and my professor in the past two weeks to assist in completing this project. Except for what is noted above, this individual journal entry was completed by me and not copied from another source.
[[User:Msymond1|Msymond1]] ([[User talk:Msymond1|talk]]) 19:03, 2 May 2024 (PDT)

{{Template:MSymond1}}
[[Category:Team Project]]

MSymond1 Week 15

2024-05-03T02:02:23Z

Msymond1: added acknowledgements

MSymond1 Week 13

2024-04-18T03:18:44Z

Msymond1: added category and acknowledgements

2024-04-11T01:18:37Z

Msymond1: finished acknowledgements

==Individual Journal Page==
#A list of biological terms from the paper I did not know the definitions for when I first read the article
#*transcription regulator activity: A molecular function that controls the rate, timing and/or magnitude of gene transcription. The function of transcriptional regulators is to modulate gene expression at the transcription step so that they are expressed in the right cell at the right time and in the right amount throughout the life of the cell and the organism. Genes are transcriptional units, and include bacterial operons (Gene Ontology, 2024) https://amigo.geneontology.org/amigo/term/GO:0140110.
#*transcription cis-regulatory region binding: Binding to a specific sequence of DNA that is part of a regulatory region that controls transcription of that section of the DNA. The transcribed region might be described as a gene, cistron, or operon (Gene Ontology, 2024). https://amigo.geneontology.org/amigo/term/GO:0000976
#*respiratory electron transport chain: A process in which a series of electron carriers operate together to transfer electrons from donors such as NADH and FADH2 to any of several different terminal electron acceptors to generate a transmembrane electrochemical gradient (Gene Ontology, 2024). https://amigo.geneontology.org/amigo/term/GO:0022904
#*Lysis: The disintegration or rupture of the cell membrane, resulting in the release of cell contents or the subsequent death of the cell (Biology Online, 2024). https://www.biologyonline.com/dictionary/lysis
#*immunoprecipitate: the precipitate formed in an antigen‐antibody reaction (Oxford Reference, 2006). https://www.oxfordreference.com/display/10.1093/acref/9780198529170.001.0001/acref-9780198529170-e-9850?rskey=ZIIfXn&result=1
#*DNA ligation: The re-formation of a broken phosphodiester bond in the DNA backbone, carried out by DNA ligase (Gene Ontology, 2024). https://amigo.geneontology.org/amigo/term/GO:0006266
#*Phylogeny: the scientific study of phylogeny. It studies evolutionary relationships among various groups of organisms based on evolutionary history, similarities, and differences. It makes use of molecular sequencing data (such as homologous sequences, protein sequences, nucleotide sequences, etc.) and morphological data matrices to understand and analyze the protein and gene evolutions of genetically-related groups of organisms (Biology Online, 2024). https://www.biologyonline.com/dictionary/phylogenetics
#*PCR: A laboratory method used to make many copies of a specific piece of DNA from a sample that contains very tiny amounts of that DNA. PCR allows these pieces of DNA to be amplified so they can be detected. PCR may be used to look for certain changes in a gene or chromosome, which may help find and diagnose a genetic condition or a disease, such as cancer. It may also be used to look at pieces of the DNA of certain bacteria, viruses, or other microorganisms to help diagnose an infection. Also called polymerase chain reaction (National Cancer Institute, 2024). https://www.cancer.gov/publications/dictionaries/cancer-terms/def/pcr
#*Epitope: That part of an antigenic molecule to which the t-cell receptor responds, a site on a large molecule against which an antibody will be produced and to which it will bind (Biology Online, 2024).https://www.biologyonline.com/dictionary/epitope
#The main Findings of the paper are that the architecture of the promoter, meaning the arrangements of the DNA binding site, change depending on environmental conditions and can be predicted with confidence what the binding arrangement will be depending on the promoter and the environmental conditions (Biology Online, 2024). https://www.biologyonline.com/dictionary/epitope
#The significance of these findings is the fact that they combine genome-wide location data with phylogenetic conservation data. Using both these types of data allowed to cluster all significant results from the genome wide data location based upon their conservation data.
#The limitations of prior studies is the fact that it cannot be determined what the location is for the recognition sites of transcriptional regulators with phylogenetic sequence data alone, or with any other prior knowledge from any previous study. The fact that the sequences have been conserved through evolution indicates that they can be regulated, but does not reveal information about the binding process, or the conditions, or the architecture of such binding.
#They treated the Yeast cells by using PCR and they printed about 6000 DNA fragments to represent nearly all regions in the yeast genome.
#They used the W303 yeast strain from Saccharomyces cerevisiae, and it was haploid.
#They grew them in microarrays with PCR products. The article does not specify temperature or time, in the supplementary methods section, it does list that the times varies for each of the conditions. it is as low as 20 minutes for certain conditions (namely the moderately hypertonic condition). And the time is as high as 14 hours (namely the filamentation inducing condition). It does not specify the temperature for most of the conditions, except for the elevated temperature condition in which it specifies that it begins at 30 degrees celsius and is shifted to 37 degrees celsius.
#The controls group they used was an unenriched microarray to compare with the immunoprecipitated samples.
#They ran each program 50 times on a randomly selected set of sequences.
#The study conducted their genome wide location analysis by cross linking the proteins to the DNA, which then created precipitate which separated the DNA from the protein. These precipitates were then went through PCR procedures to hybridize them to a microarray of spotted PCR products, each representing a different location of the yeast genome. Such locations were used to compare the probabilities of binding interactions.
#They used an Axon 200B scanner to scan the microarrays, they compared the immunoprecipitated sample with the unenriched sample. They found the median of each channel to calculate a normalization factor. They then calculated the log ratio of the intensity of the test channel to the control channel. The log ratios were normalized by subracting the average log ratio of every spot across all arrays. Finally, they calculated an error model by calculating the significance of enrichment on each chip, and combining the data for all replicates to calculate an average ratio and significance of enrichment for every region in the genome.
#their supplementary tables are available to the public for download on nature.com and these tables have the results that they were able to calculate from their data, but their raw data and calculations are not to be found available to the public.
#The list of figures from the article
#*Figure 1 has 2 parts, part a essentially states that the conclusions from this study, regarding the identification of transcription factor binding site specificities, could only be concluded when using the three kinds of data they have. They had their genome-wide location data, their phylogenetic sequence conservation data, and other previous work. Part b shows the sequence specificities of some of the regulators. There are 2 columns, one of the columns displays sequences that had already been discovered and were rediscovered with this study, and the other column shows sequences that were newly discovered by this study. Each of the letters in the sequence have a size proportional to the product of their frequency and their information content
#*Figure 2 has three parts, part a displays the different chromosomes, as well as certain genes located on said chromosomes with , it also shows the locations of certain DNA sequences that are bound by transcriptional regulators. They obtained this information by mapping on the yeast genome sequences the motifs that they found to be bound by regulators at high confidence that were also conserved. The functions of the specific transcriptional regulators had already been previously established. Part b of the figure combines binding data with sequence conservation data. This part of the graph is in 3 parts, the first shows all sequence matches to DNA binding specificities. The second part shows all of the sequence matches to conserved sequences, and the third part shows all sequences that match with conserved sequences that are bound by regulators. Part c of the figure is a graph that shows the frequency of binding sites in relation to the distance from translational start site on the DNA sequence. The x axis is the distance from translational start site, and the y axis is the number of binding sites.
#*Figure 3 shows the different promoter architectures. The first one being single regulator, the second being repetitive motifs, the third being multiple regulators, and co-occuring regulators. They display this information by putting a different color box for each regulator on different lines representing different binding site sequences. They obtained this information through their microarray experiments in this study.
#*Figure 4 displays the environment-specific use of transcriptional regulatory code. It shows four different patterns of binding behavior in four different rows. The different patterns being Condition invariant, condition enabled, condition expanded, and condition altered. The regulators are represented by colored circles and shown above and below the genes/promoters, and there are lines connecting them to the genes that display their binding nature depending on their environments. To the right of these charts, there are 2 lines for each binding pattern representing different environments. The regulators are displayed near the genes, and they are shown in circles or colored boxes to show whether they are binding or not depending on the environment.
#*Supplementary figure 1 is a graph in which the x axis is the regulator under testing (1 through 203), and the y axis is the number of promoter regions bound to said regulator. There are 2 lines on the graph, one is in blue which is unadjusted per the number of conditions the regulator was profiled under, and another line is in pink in which it averages the distributions for the same set of p values among regulators and promoter regions. This information was also obtained by their microarray data.
#*Supplementary figure 2 represents the data calculations conducted in this study. First all of their motifs were identified by using a variety of methods as listed in the figure, which were then filtered to determine which were significant, and then clustered based upon representative motifs, they then used conservation data to identify which motifs had the highest confidence rate. The final step is the statistical test from specificity databases to assign a specific motif to each regulator.
#*Supplementary figure 3 is a photo that displays the binding of Cin5 to two different sequences. It shows 15 different lanes to demonstrate the different binding results of the protein with different sequences. The first lane shows it with no competitor, the lanes 2-8 show it binding with a competitor sequence found by one of the discovered motifs in this study, and lanes 9-15 show it binding with a previously established binding site for the regulator. And the concentration of the regulator was 27 times higher in the motif discovered in this study as opposed to the previously established sequence, meaning the results of the study were able to predict the binding capacity for this protein better than previously published literature.
#*Supplementary figure 4 is a bar graph in which the x axis is the regulators, and the y axis is the number of promoter regions bound for each of those regulators. This figure compares the number of promoter regions bound for each regulator depending on the environment they're growing in. The two different environments they compared in this figure are the rich medium environment and the amino acid starvation environment.
#*Supplementary figure 5 is a bar graph in which the x axis is the percent of maximum matching sites, which essentially translates to the quality of the matching sites found for each of them, which was determined based on to the best matching sequence to the Gcn4 binding specificity. The y axis is the frequency of matches found of that quality. The bars were clustered by which conditions they were grown in.
#This study does incorporate methods from other previous studies. Other previous studies have used conservation sequence data, but that has not allowed them to predict the binding sites and environments for each of the regulators with confidence that this study does.
#The authors could take the future direction of testing such transcription factors in higher eukaryotes, for if they are able to predict such binding mechanisms for yeast cells, they can likely do something similar for a higher level organism. Perhaps they will not be able to test as many regulators or have as high of a confidence rate, for I would assume it would be far more difficult to carry out such tests on another higher organism, for there are likely far more regulators and they may have a much larger genome to select from.
#I believe the authors in this study were able to support their conclusions well with the data acquired, but I do not believe the data was well presented or explained in the article. Much of the materials necessary to understand their methods or to know important details (temperature, time, conditions) of their experiment were not even on the article itself and had to be found in the supplementary section. And even in the supplementary section it was still a very dense topic that is very difficult to understand for anyone who is not a well established expert on the topic. Not to mention the fact there is no defined discussion or conclusion section of the paper. The final section of the article is the methods section which is rather unconventional.

==Acknowledgements==
I have been in contact with my group members for this week about the presentation and questions this week. We worked together in class and texted about the presentation. I also visited my professor, Dr. Dahlquist during her office hours to ask for help in interpreting the article for this week. Except for what is noted above, this individual journal entry was completed by me and not copied from another source.

==References==
*Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, Jennings EG, Zeitlinger J, Pokholok DK, Kellis M, Rolfe PA, Takusagawa KT, Lander ES, Gifford DK, Fraenkel E, Young RA. Transcriptional regulatory code of a eukaryotic genome. Nature. 2004 Sep 2;431(7004):99-104. doi: 10.1038/nature02800. PMID: 15343339; PMCID: PMC3006441.

{{Template:MSymond1}}
[[Category:Team Project]]

MSymond1 Week 12

2024-04-11T01:14:16Z

Msymond1: started acknowledgements and referencesdgements

==Individual Journal Page==
#A list of biological terms from the paper I did not know the definitions for when I first read the article
#*transcription regulator activity: A molecular function that controls the rate, timing and/or magnitude of gene transcription. The function of transcriptional regulators is to modulate gene expression at the transcription step so that they are expressed in the right cell at the right time and in the right amount throughout the life of the cell and the organism. Genes are transcriptional units, and include bacterial operons (Gene Ontology, 2024) https://amigo.geneontology.org/amigo/term/GO:0140110.
#*transcription cis-regulatory region binding: Binding to a specific sequence of DNA that is part of a regulatory region that controls transcription of that section of the DNA. The transcribed region might be described as a gene, cistron, or operon (Gene Ontology, 2024). https://amigo.geneontology.org/amigo/term/GO:0000976
#*respiratory electron transport chain: A process in which a series of electron carriers operate together to transfer electrons from donors such as NADH and FADH2 to any of several different terminal electron acceptors to generate a transmembrane electrochemical gradient (Gene Ontology, 2024). https://amigo.geneontology.org/amigo/term/GO:0022904
#*Lysis: The disintegration or rupture of the cell membrane, resulting in the release of cell contents or the subsequent death of the cell (Biology Online, 2024). https://www.biologyonline.com/dictionary/lysis
#*immunoprecipitate: the precipitate formed in an antigen‐antibody reaction (Oxford Reference, 2006). https://www.oxfordreference.com/display/10.1093/acref/9780198529170.001.0001/acref-9780198529170-e-9850?rskey=ZIIfXn&result=1
#*DNA ligation: The re-formation of a broken phosphodiester bond in the DNA backbone, carried out by DNA ligase (Gene Ontology, 2024). https://amigo.geneontology.org/amigo/term/GO:0006266
#*Phylogeny: the scientific study of phylogeny. It studies evolutionary relationships among various groups of organisms based on evolutionary history, similarities, and differences. It makes use of molecular sequencing data (such as homologous sequences, protein sequences, nucleotide sequences, etc.) and morphological data matrices to understand and analyze the protein and gene evolutions of genetically-related groups of organisms (Biology Online, 2024). https://www.biologyonline.com/dictionary/phylogenetics
#*PCR: A laboratory method used to make many copies of a specific piece of DNA from a sample that contains very tiny amounts of that DNA. PCR allows these pieces of DNA to be amplified so they can be detected. PCR may be used to look for certain changes in a gene or chromosome, which may help find and diagnose a genetic condition or a disease, such as cancer. It may also be used to look at pieces of the DNA of certain bacteria, viruses, or other microorganisms to help diagnose an infection. Also called polymerase chain reaction (National Cancer Institute, 2024). https://www.cancer.gov/publications/dictionaries/cancer-terms/def/pcr
#*Epitope: That part of an antigenic molecule to which the t-cell receptor responds, a site on a large molecule against which an antibody will be produced and to which it will bind (Biology Online, 2024).https://www.biologyonline.com/dictionary/epitope
#The main Findings of the paper are that the architecture of the promoter, meaning the arrangements of the DNA binding site, change depending on environmental conditions and can be predicted with confidence what the binding arrangement will be depending on the promoter and the environmental conditions (Biology Online, 2024). https://www.biologyonline.com/dictionary/epitope
#The significance of these findings is the fact that they combine genome-wide location data with phylogenetic conservation data. Using both these types of data allowed to cluster all significant results from the genome wide data location based upon their conservation data.
#The limitations of prior studies is the fact that it cannot be determined what the location is for the recognition sites of transcriptional regulators with phylogenetic sequence data alone, or with any other prior knowledge from any previous study. The fact that the sequences have been conserved through evolution indicates that they can be regulated, but does not reveal information about the binding process, or the conditions, or the architecture of such binding.
#They treated the Yeast cells by using PCR and they printed about 6000 DNA fragments to represent nearly all regions in the yeast genome.
#They used the W303 yeast strain from Saccharomyces cerevisiae, and it was haploid.
#They grew them in microarrays with PCR products. The article does not specify temperature or time, in the supplementary methods section, it does list that the times varies for each of the conditions. it is as low as 20 minutes for certain conditions (namely the moderately hypertonic condition). And the time is as high as 14 hours (namely the filamentation inducing condition). It does not specify the temperature for most of the conditions, except for the elevated temperature condition in which it specifies that it begins at 30 degrees celsius and is shifted to 37 degrees celsius.
#The controls group they used was an unenriched microarray to compare with the immunoprecipitated samples.
#They ran each program 50 times on a randomly selected set of sequences.
#The study conducted their genome wide location analysis by cross linking the proteins to the DNA, which then created precipitate which separated the DNA from the protein. These precipitates were then went through PCR procedures to hybridize them to a microarray of spotted PCR products, each representing a different location of the yeast genome. Such locations were used to compare the probabilities of binding interactions.
#They used an Axon 200B scanner to scan the microarrays, they compared the immunoprecipitated sample with the unenriched sample. They found the median of each channel to calculate a normalization factor. They then calculated the log ratio of the intensity of the test channel to the control channel. The log ratios were normalized by subracting the average log ratio of every spot across all arrays. Finally, they calculated an error model by calculating the significance of enrichment on each chip, and combining the data for all replicates to calculate an average ratio and significance of enrichment for every region in the genome.
#their supplementary tables are available to the public for download on nature.com and these tables have the results that they were able to calculate from their data, but their raw data and calculations are not to be found available to the public.
#The list of figures from the article
#*Figure 1 has 2 parts, part a essentially states that the conclusions from this study, regarding the identification of transcription factor binding site specificities, could only be concluded when using the three kinds of data they have. They had their genome-wide location data, their phylogenetic sequence conservation data, and other previous work. Part b shows the sequence specificities of some of the regulators. There are 2 columns, one of the columns displays sequences that had already been discovered and were rediscovered with this study, and the other column shows sequences that were newly discovered by this study. Each of the letters in the sequence have a size proportional to the product of their frequency and their information content
#*Figure 2 has three parts, part a displays the different chromosomes, as well as certain genes located on said chromosomes with , it also shows the locations of certain DNA sequences that are bound by transcriptional regulators. They obtained this information by mapping on the yeast genome sequences the motifs that they found to be bound by regulators at high confidence that were also conserved. The functions of the specific transcriptional regulators had already been previously established. Part b of the figure combines binding data with sequence conservation data. This part of the graph is in 3 parts, the first shows all sequence matches to DNA binding specificities. The second part shows all of the sequence matches to conserved sequences, and the third part shows all sequences that match with conserved sequences that are bound by regulators. Part c of the figure is a graph that shows the frequency of binding sites in relation to the distance from translational start site on the DNA sequence. The x axis is the distance from translational start site, and the y axis is the number of binding sites.
#*Figure 3 shows the different promoter architectures. The first one being single regulator, the second being repetitive motifs, the third being multiple regulators, and co-occuring regulators. They display this information by putting a different color box for each regulator on different lines representing different binding site sequences. They obtained this information through their microarray experiments in this study.
#*Figure 4 displays the environment-specific use of transcriptional regulatory code. It shows four different patterns of binding behavior in four different rows. The different patterns being Condition invariant, condition enabled, condition expanded, and condition altered. The regulators are represented by colored circles and shown above and below the genes/promoters, and there are lines connecting them to the genes that display their binding nature depending on their environments. To the right of these charts, there are 2 lines for each binding pattern representing different environments. The regulators are displayed near the genes, and they are shown in circles or colored boxes to show whether they are binding or not depending on the environment.
#*Supplementary figure 1 is a graph in which the x axis is the regulator under testing (1 through 203), and the y axis is the number of promoter regions bound to said regulator. There are 2 lines on the graph, one is in blue which is unadjusted per the number of conditions the regulator was profiled under, and another line is in pink in which it averages the distributions for the same set of p values among regulators and promoter regions. This information was also obtained by their microarray data.
#*Supplementary figure 2 represents the data calculations conducted in this study. First all of their motifs were identified by using a variety of methods as listed in the figure, which were then filtered to determine which were significant, and then clustered based upon representative motifs, they then used conservation data to identify which motifs had the highest confidence rate. The final step is the statistical test from specificity databases to assign a specific motif to each regulator.
#*Supplementary figure 3 is a photo that displays the binding of Cin5 to two different sequences. It shows 15 different lanes to demonstrate the different binding results of the protein with different sequences. The first lane shows it with no competitor, the lanes 2-8 show it binding with a competitor sequence found by one of the discovered motifs in this study, and lanes 9-15 show it binding with a previously established binding site for the regulator. And the concentration of the regulator was 27 times higher in the motif discovered in this study as opposed to the previously established sequence, meaning the results of the study were able to predict the binding capacity for this protein better than previously published literature.
#*Supplementary figure 4 is a bar graph in which the x axis is the regulators, and the y axis is the number of promoter regions bound for each of those regulators. This figure compares the number of promoter regions bound for each regulator depending on the environment they're growing in. The two different environments they compared in this figure are the rich medium environment and the amino acid starvation environment.
#*Supplementary figure 5 is a bar graph in which the x axis is the percent of maximum matching sites, which essentially translates to the quality of the matching sites found for each of them, which was determined based on to the best matching sequence to the Gcn4 binding specificity. The y axis is the frequency of matches found of that quality. The bars were clustered by which conditions they were grown in.
#This study does incorporate methods from other previous studies. Other previous studies have used conservation sequence data, but that has not allowed them to predict the binding sites and environments for each of the regulators with confidence that this study does.
#The authors could take the future direction of testing such transcription factors in higher eukaryotes, for if they are able to predict such binding mechanisms for yeast cells, they can likely do something similar for a higher level organism. Perhaps they will not be able to test as many regulators or have as high of a confidence rate, for I would assume it would be far more difficult to carry out such tests on another higher organism, for there are likely far more regulators and they may have a much larger genome to select from.
#I believe the authors in this study were able to support their conclusions well with the data acquired, but I do not believe the data was well presented or explained in the article. Much of the materials necessary to understand their methods or to know important details (temperature, time, conditions) of their experiment were not even on the article itself and had to be found in the supplementary section. And even in the supplementary section it was still a very dense topic that is very difficult to understand for anyone who is not a well established expert on the topic. Not to mention the fact there is no defined discussion or conclusion section of the paper. The final section of the article is the methods section which is rather unconventional.

==Acknowledgements==
I have been in contact with my group members for this week about the presentation and questions this week. We worked together in class and texted about the presentation. I also visited my professor, Dr. Dahlquist during her office hours to ask for help in interpreting the article for this week.

==References==
*Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, Jennings EG, Zeitlinger J, Pokholok DK, Kellis M, Rolfe PA, Takusagawa KT, Lander ES, Gifford DK, Fraenkel E, Young RA. Transcriptional regulatory code of a eukaryotic genome. Nature. 2004 Sep 2;431(7004):99-104. doi: 10.1038/nature02800. PMID: 15343339; PMCID: PMC3006441.

{{Template:MSymond1}}
[[Category:Team Project]]