Difference between revisions of "Asandle1 Week 13"

From LMU BioDB 2024
Jump to navigation Jump to search
(Adding my week 13 journal which will be notes & methods (electronic notebook))
 
(Initial Things I notice: adding *)
Line 3: Line 3:
  
 
===Initial Things I notice===
 
===Initial Things I notice===
- The document is called Pval by Gene which makes me think that we are looking at the Pvalues in the genes which tracks considering what the paper is based on.
+
* The document is called Pval by Gene which makes me think that we are looking at the Pvalues in the genes which tracks considering what the paper is based on.
- The document has 6,231 rows which aligns with the gene numbers from the paper
+
* The document has 6,231 rows which aligns with the gene numbers from the paper
- The document has 206 columns, 203 are values. This makes me think the columns are for each transcriptional regulator.
+
* The document has 206 columns, 203 are values. This makes me think the columns are for each transcriptional regulator.
- There are some value boxes that just say NaN. Not sure how to deal with these
+
* There are some value boxes that just say NaN. Not sure how to deal with these
- We have the gene names in column 1
+
* We have the gene names in column 1
- We have the gene tags that humans can actually read in column 2
+
* We have the gene tags that humans can actually read in column 2
- We have plain english descriptions of what everything is in column 3
+
* We have plain english descriptions of what everything is in column 3
 
 
  
 
===Thought Process:===
 
===Thought Process:===

Revision as of 12:12, 13 April 2024

Biological Databases P-val by Gene Initial Notes

Harbison Paper

Initial Things I notice

  • The document is called Pval by Gene which makes me think that we are looking at the Pvalues in the genes which tracks considering what the paper is based on.
  • The document has 6,231 rows which aligns with the gene numbers from the paper
  • The document has 206 columns, 203 are values. This makes me think the columns are for each transcriptional regulator.
  • There are some value boxes that just say NaN. Not sure how to deal with these
  • We have the gene names in column 1
  • We have the gene tags that humans can actually read in column 2
  • We have plain english descriptions of what everything is in column 3

Thought Process:

Layout Assumptions: (Maybe incorrect)

  • Primary Key should be Gene ID Column for the Data

Questions:

  • Data preprocessing?
    • Can we worry about the NaN entries once everything has been added to Access or do we need to figure out how to remove those without removing every entry?
 - Do we want to import by 

- We want to be able to view by Gene or by environmental difference, does this mean making an access entry for the Genes and for the different experimental conditions? - How do we make sure the database for the Harbison paper also works with all the others? I think this probably has to do with the primary key which then means we can really only organize across the Gene ID’s because that is what will be common across other experiments. - Is there anything I am missing?