Difference between revisions of "Asandle1 Week 13"
Jump to navigation
Jump to search
(→Thought Process:: ** instead of * *) |
(→Biological Databases P-val by Gene Initial Notes: added in class section) |
||
Line 22: | Line 22: | ||
*How do we make sure the database for the Harbison paper also works with all the others? I think this probably has to do with the primary key which then means we can really only organize across the Gene ID’s because that is what will be common across other experiments. | *How do we make sure the database for the Harbison paper also works with all the others? I think this probably has to do with the primary key which then means we can really only organize across the Gene ID’s because that is what will be common across other experiments. | ||
*Is there anything I am missing? | *Is there anything I am missing? | ||
+ | |||
+ | |||
+ | ===In Class Tuesday April 16th Notes=== |
Revision as of 13:21, 16 April 2024
Contents
Biological Databases P-val by Gene Initial Notes
Harbison Paper
Initial Things I notice
- The document is called Pval by Gene which makes me think that we are looking at the Pvalues in the genes which tracks considering what the paper is based on.
- The document has 6,231 rows which aligns with the gene numbers from the paper
- The document has 206 columns, 203 are values. This makes me think the columns are for each transcriptional regulator.
- There are some value boxes that just say NaN. Not sure how to deal with these
- We have the gene names in column 1
- We have the gene tags that humans can actually read in column 2
- We have plain english descriptions of what everything is in column 3
Thought Process:
Layout Assumptions: (Maybe incorrect)
- Primary Key should be Gene ID Column for the Data
Questions:
- Data preprocessing?
- Can we worry about the NaN entries once everything has been added to Access or do we need to figure out how to remove those without removing every entry?
- How do we want to import by?
- We want to be able to view by Gene or by environmental difference, does this mean making an access entry for the Genes and for the different experimental conditions?
- How do we make sure the database for the Harbison paper also works with all the others? I think this probably has to do with the primary key which then means we can really only organize across the Gene ID’s because that is what will be common across other experiments.
- Is there anything I am missing?