QLanners Week 14
From LMU BioDB 2017
Used the gene HSF1, which is a transcription factor, to determine which fields should be pulled from each database.
General info we want about each gene:
- Gene ID from each database
- Description/Function (ensembl)
- DNA Sequence (ensembl)
- Protein Sequence (UniProt)
- Locus tag (NCBI)
- Also Known As (NCBI)
- Consensus Sequence (JASPAR)
- Regulation (SGD)
- Interaction (SGD)
- Similar Proteins (UniProt)
- Gene Ontology (SGD - see if we can find it on UniProt)
We decided that from JASPAR we will pull:
- Gene ID (this will be the matrix id
- Sequence Logo
- Frequency Matrix
- also get class and family
Breakdown of what we want from all other databases:
NCBI:
- Gene ID
- Locus Tag
- Also Known As
- Also get RefSeq IDs for chromosome, mRNA, and protein.
Ensembl:
- Gene ID
- Description/Function
- DNA Sequence
- also get chromosomal location, about this gene
UniProt:
- Gene ID (Note that for UniProt, it will be a protein ID, and that there are two different ones that you need to get. — Kdahlquist (talk) 15:36, 30 November 2017 (PST)
- Protein Sequence
- Similar Proteins
- Protein Type/Name
- Please get the species from UniProt as well, even though we know it is going to be yeast.
SGD:
- Gene ID
- Standard Name, i.e., HSF1
- Systematic Name, i.e., YGL073W
- SGD ID, i.e., S000003041
- Regulation
- Interaction
- Gene Ontology