Difference between revisions of "Week 5"

From LMU BioDB 2013
Jump to: navigation, search
(Reflect: pasted in notes from BioQuest 2011 workshop KD attended)
(Correct Spelling of Lauren)
 
(18 intermediate revisions by one user not shown)
Line 1: Line 1:
{{Under Construction}}
 
 
 
'''The individual journal entry (UniProt exercise) is due on Friday, September 27, at midnight PDT.''' ''(Thursday night/Friday morning)''
 
'''The individual journal entry (UniProt exercise) is due on Friday, September 27, at midnight PDT.''' ''(Thursday night/Friday morning)''
  
Line 6: Line 4:
  
 
''A note on the grading for this assignment:''
 
''A note on the grading for this assignment:''
* The individual journal entry, and shared journal entries are worth a total of 10 points.  Students will be graded on an individual basis for this portion of the assignment.
+
* The individual journal entry and shared journal entries are worth a total of 10 points.  Students will be graded on an individual basis for this portion of the assignment.
* The database wiki page and presentation is worth a total of 20 points; each member of the group will receive the same grade for this portion of the assignment.
+
* The database wiki page is worth a total of 20 points; each member of the group will receive the same grade for this portion of the assignment.
 +
* The presentation is worth a total of 20 points; each member of the group will receive the same grade for this portion of the assignment.
  
 
{{Individual Journal Instructions|week=5}}
 
{{Individual Journal Instructions|week=5}}
Line 13: Line 12:
 
=== UniProt Exercise ===
 
=== UniProt Exercise ===
  
For this exercise, you will read and follow the links in [[Media:BioinformaticsForDummies_Ch4_2003_edited.pdf | Chapter 4: Using Protein and Specialized Sequence Databases of the book ''Bioinformatics for Dummies'']].
+
For this exercise, you will read and follow the links in [https://mylmuconnect.blackboard.com/webapps/portal/frameset.jsp?tab_tab_group_id=_2_1&url=%2Fwebapps%2Fblackboard%2Fexecute%2Flauncher%3Ftype%3DCourse%26id%3D_35506_1%26url%3D Chapter 4: Using Protein and Specialized Sequence Databases of the book ''Bioinformatics for Dummies'' (on MyLMU Connect)]. We are delving quite deeply into UniProt in particular because the gene databases that you will generate later in the semester in your final project are going to be derived from UniProt.
 +
 
 +
'''For this assignment, you will keep an electronic laboratory notebook on your individual journal page that records the steps you carried out in exploring the UniProtKB.'''
  
 
* Since the publication of this book in 2003, the SWISS-PROT database has become the UniProt Knowledgebase.  The underlying data are the same, but the scope and user interface for the database have been updated.  Thus, some of the exact instructions of the chapter have to be changed to reflect the change to UniProt.  These changes are noted below by page number.
 
* Since the publication of this book in 2003, the SWISS-PROT database has become the UniProt Knowledgebase.  The underlying data are the same, but the scope and user interface for the database have been updated.  Thus, some of the exact instructions of the chapter have to be changed to reflect the change to UniProt.  These changes are noted below by page number.
 
* Page 123:
 
* Page 123:
*# The URL for the SWISS-PROT/UniProt server is [http://www.expasy.org/sprot/ http://www.expasy.org/sprot/].
+
*# The URL for the UniProtKB/SWISS-PROT server is now [http://www.uniprot.org http://www.uniprot.org].
*# The Quick Search field is now found at the upper right of the page.
+
*# The Quick Search field is now found at the middle top of the page.
*# Choose "UniProtKB" from the drop-down menu (it is the default), and click the "GO" button.
+
*#* Alternately, you can go directly to [http://www.uniprot.org http://www.uniprot.org], the search field is in the top middle of the page.
+
 
** The information described in subsequent pages can all be found, but will be in a different order on the page.  There is a set of navigation links near the top of the page to help you jump to each section.
 
** The information described in subsequent pages can all be found, but will be in a different order on the page.  There is a set of navigation links near the top of the page to help you jump to each section.
 
* General information about the entry (bottom of page 123):
 
* General information about the entry (bottom of page 123):
Line 27: Line 26:
 
* The References (page 126) are near the middle of the page.
 
* The References (page 126) are near the middle of the page.
 
* The Comments (page 126) is now known as "General annotation (comments)".
 
* The Comments (page 126) is now known as "General annotation (comments)".
* The Cross-Refernces (page 128) are even more extensive and are organized by sub-categories of databases.
+
* The Cross-References (page 128) are even more extensive and are organized by sub-categories of databases.
 
** In particular, click on a sample cross-reference link for each of the following databases, and for each, state what type of information is found there:
 
** In particular, click on a sample cross-reference link for each of the following databases, and for each, state what type of information is found there:
 
*** EMBL
 
*** EMBL
Line 46: Line 45:
  
 
* '''Write a one-paragraph summary of what you have learned about the human EGFR protein from this exercise.'''
 
* '''Write a one-paragraph summary of what you have learned about the human EGFR protein from this exercise.'''
* '''Reflect and answer the following questions:
+
* '''Reflect and answer the following questions on your ''individual'' journal page:
 
*# '''What was the purpose of this exercise?
 
*# '''What was the purpose of this exercise?
 
*# '''What did I learn from this exercise?
 
*# '''What did I learn from this exercise?
Line 53: Line 52:
 
==== Additional UniProt Resources ====
 
==== Additional UniProt Resources ====
  
* [http://nar.oxfordjournals.org/content/38/suppl_1/D142.full UniProt NAR Database Issue 2010 article]
+
* [http://nar.oxfordjournals.org/content/41/D1/D43.full UniProt NAR Database Issue 2013 article]
* [http://www.youtube.com/watch?v=TCF3qWn7siI YouTube video tutorial about UniProt (8:17 minutes)]
+
* [http://www.ebi.ac.uk/training/online/course/uniprot-quick-tour EBI Training: UniProt Quick Tour]
* [http://www.ebi.ac.uk/2can/tutorials/protein/blast5.html EBI guide to interpreting an UniProt record]
+
* [http://www.uniprot.org/demos/diabetes UniProt demo from UniProt itself]
+
  
 
== NAR Exercise and Presentation ==
 
== NAR Exercise and Presentation ==
  
 
Each year, the journal ''Nucleic Acids Research'' (''NAR'') devotes the first issue in January to biological databases.  The goal of this assignment is to dive into the deep end of the pool and experience the breadth and depth of biological databases available on the Web:
 
Each year, the journal ''Nucleic Acids Research'' (''NAR'') devotes the first issue in January to biological databases.  The goal of this assignment is to dive into the deep end of the pool and experience the breadth and depth of biological databases available on the Web:
* Read (if you haven't already done so): [http://nar.oxfordjournals.org/content/41/D1/D1.full Introduction to ''NAR'' Database Issue]
+
* Read (if you haven't already done so):
 +
** [http://nar.oxfordjournals.org/content/41/D1/D1.full Introduction to ''NAR'' Database Issue]
 +
** Slides from [http://dataone.org DataONE.org]
 +
*** [http://www.dataone.org/sites/all/documents/L01_DataManagement.pptx DataONE: Why Data Management]
 +
*** [http://www.dataone.org/sites/all/documents/L02_DataSharing.pptx DataONE: Data Sharing]
 
* Choose your database:
 
* Choose your database:
 
** [http://nar.oxfordjournals.org/content/41/D1.toc ''Nucleic Acids Research'' Database Issue Table of Contents 2013]
 
** [http://nar.oxfordjournals.org/content/41/D1.toc ''Nucleic Acids Research'' Database Issue Table of Contents 2013]
Line 68: Line 69:
  
 
For this exercise, you will work with an assigned buddy.  Choose a database from this issue and answer the following questions about that database.  Each pair should choose a different database to profile.  So, to claim your first choice, go to the [[Class Journal Week 5]] page and stake your claim to a database.  When you are choosing your database, look at the other students' entries to make sure you are not doing the same one. The buddy assignments are:
 
For this exercise, you will work with an assigned buddy.  Choose a database from this issue and answer the following questions about that database.  Each pair should choose a different database to profile.  So, to claim your first choice, go to the [[Class Journal Week 5]] page and stake your claim to a database.  When you are choosing your database, look at the other students' entries to make sure you are not doing the same one. The buddy assignments are:
* ''To be determined''
+
 
 +
* Hilda - Mitchell
 +
* Kurt - Kevin Meilek
 +
* Lena - Miles - Tauras
 +
* Viktoria - Kevin McGee
 +
* Gabriel - Katrina
 +
* Stephen - Alina
 +
* Lauren - Dillon
  
 
=== Database Wiki Page ===
 
=== Database Wiki Page ===
Line 115: Line 123:
 
* '''''Your PowerPoint slides must be uploaded to the wiki page you created for your database, by midnight Monday/Tuesday, even if your group is scheduled to present on Thursday.'''''
 
* '''''Your PowerPoint slides must be uploaded to the wiki page you created for your database, by midnight Monday/Tuesday, even if your group is scheduled to present on Thursday.'''''
 
** You can update your slides before your presentation, but we will be grading the ones you upload by the deadline.
 
** You can update your slides before your presentation, but we will be grading the ones you upload by the deadline.
* Your presentation (both the slides and the oral presentation) will be evalutated by the instructors using the [[Media:PresentationCritiques.pdf‎ | guidelines shown here]].
+
* Your presentation (both the slides and the oral presentation) will be evalutated by the instructors using the [[Some_Topics_to_Consider_When_Critiquing_Talks | guidelines shown here]].
 
* Your presentation will also be evaluated by your fellow classmates (anonymously) who will answer the following questions:
 
* Your presentation will also be evaluated by your fellow classmates (anonymously) who will answer the following questions:
 
*# What is the speaker's take-home message (one short sentence)?
 
*# What is the speaker's take-home message (one short sentence)?
Line 125: Line 133:
 
=== Reflect ===
 
=== Reflect ===
  
''After'' completing the both exercises, answer the following questions on the shared [[Class Journal Week 5]] page:
+
The following is a list of core competencies for ''scientific data literacy''.  ''After'' completing the all of the exercises in this assignment, answer the following questions on the shared [[Class Journal Week 5]] page:
 +
# Which of these core competencies (if any) were you familiar with ''before'' taking this class?  How did you become familiar with them?
 +
# Which of these core competencies (if any) did you gain a deeper understanding of by doing this exercise?  What about the exercise taught you about them?
 +
# Which of these core competencies (if any) do you want to know more about?  Why?
 +
 
 +
==== Scientific Data Literacy Core Competencies ====
 +
 
 +
# Databases and Data Formats
 +
#* Understand how to query relational databases, and be familiar with data types and formats for the discipline.
 +
# Discovery and Acquisition of Data
 +
#* Locate and utilize disciplinary data repositories, and identify appropriate data sources
 +
# Data Management and Organization
 +
#* Understand the lifecycle of data, and use data management plans to track subsets of processed data.
 +
# Data Conversion and Interoperability
 +
#* Migrate data from one format to another, and understand the benefits of standard data formats.
 +
# Quality Assurance
 +
#* Use metadata and screening procedures to recognize artifacts, incompletion, or corruption of data sets.
 +
# Metadata
 +
#* Interpret metadata from external sources, and annotate data so it can be used by external users.
 +
# Data Curation and Re-use
 +
#* Recognize the role of curation throughout the data lifecycle in its value in effective reuse of data.
 +
# Cultures of Practice
 +
#* Know the practices, values, and norms of discipline as they relate to managing, sharing, and curating data.
 +
# Data Preservation
 +
#* Understand the technology, resource, and organizational components of preserving data.
 +
# Data Analysis
 +
#* Understand the basic analysis tools of their discipline including workflow management tools.
 +
# Data Visualization
 +
#* Use visualization tools of discipline, and understand the advantages of the different types of visualization.
 +
# Ethics, including citation of data
 +
#* Understand intellectual property, privacy, and the ethos of the discipline around sharing and citing data.
 +
 
 +
 
 +
<!--Old questions for this exercise
 
# What was the most beneficial aspect of working with a buddy on this assignment (other than what you answered last week)?
 
# What was the most beneficial aspect of working with a buddy on this assignment (other than what you answered last week)?
 
# What was the most challenging aspect of working with a buddy on this assignment (other than what you answered last week)?
 
# What was the most challenging aspect of working with a buddy on this assignment (other than what you answered last week)?
 
# What was most interesting to you in this week's exercise (SWISS-PROT/UniProt or NAR)?  Why?
 
# What was most interesting to you in this week's exercise (SWISS-PROT/UniProt or NAR)?  Why?
 
# What was least interesting?  Why?
 
# What was least interesting?  Why?
 
<!--
 
Databases and Data Formats
 
Understand how to query relational databases, and be familiar with data types and formats for the discipline.
 
 
• In Phase 2 of the course, students learn how to query the PostgreSQL relational database from the command line, practicing with the Netflix movie database (fields, records, keys, select queries)
 
• In Phase 3 of the course, students use XMLPipeDB to load data into a PostgreSQL database and then export it to an MS Access formatted database.  They use queries of the PostgreSQL database for quality assurance.
 
• Data types:  UniProt XML, GOA (tab-delimited text), GO XML, microarray data (numerical, tab-delimited text)
 
• Formats: .txt, .xml, .zip, .doc, .pdf, .jpg, .gdb, .gex, .mapp, .xls, .exe, .jar (maybe more)
 
 
Discovery and Acquisition of Data
 
Locate and utilize disciplinary data repositories, and identify appropriate data sources
 
• Phase 2: do journal club and database exploration exercise based on a biological database from the Nucleic Acids Research annual database issue from the previous January.
 
• Phase 3:  microarray databases (ArrayExpress, GEO, Stanford Microarray Database), Integr8, UniProt, GO
 
 
Data Management and Organization
 
Understand the lifecycle of data, and use data management plans to track subsets of processed data.
 
• Phase 3, need to track versioning of original data sources for database (UniProt XML, GO, GOA), need to track versioning of processed microarray data.  Groups are asked to come up with a file naming convention and create a page on their group wiki for the files to be stored and linked to.
 
 
Data Conversion and Interoperability
 
Migrate data from one format to another, and understand the benefits of standard data formats.
 
• This is what XMLPipeDB is for!
 
 
Quality Assurance
 
Use metadata and screening procedures to recognize artifacts, incompletion, or corruption of data sets.
 
• Phase 3:  One team member is designated the quality assurance officer and is responsible for showing that all of the data from the input files was found in the PostgreSQL intermediate database and then in the GenMAPP Gene Database.  S/he uses XMLPipeDB match on the command line for the XML data, queries of PostgreSQL database, and visual inspection of MS Access-formatted .gdb.  The QA person also has to compare the data to an outside resource (not UniProt, usually the model organism database) and determine what was in this other resource that was not in UniProt and vice versa.  They can also compare to what was on the microarray.
 
 
Metadata
 
Interpret metadata from external sources, and annotate data so it can be used by external users.
 
• Phase 2:  when students do the NAR exercise, they might be evaluating some of this, but could be more explicit.
 
• We don’t do annotation, but we talk about it; there is probably some way we could fit it in.
 
 
Data Curation and Re-use
 
Recognize the role of curation throughout the data lifecycle in its value in effective reuse of data.
 
• Phase 2:  when students do the NAR exercise, they evaluate this for the database they report on
 
• Phase 3:  inevitably, we encounter problems when trying to use the microarray data, we could foreground this there
 
 
Cultures of Practice
 
Know the practices, values, and norms of discipline as they relate to managing, sharing, and curating data.
 
• Phase 2:  introduced when introduced to the NAR exercise
 
• Could be made more explicit than just telling them about it.
 
 
Data Preservation
 
Understand the technology, resource, and organizational components of preserving data.
 
• Introduced a little with the NAR exercise, could be made more explicit
 
 
Data Analysis
 
Understand the basic analysis tools of their discipline including workflow management tools.
 
• It would be great to add some workflow management tools, right now it is in just a simple diagram for the flow of the overall project (see below).  It is badly needed for the microarray analysis portion.
 
 
 
Data Visualization
 
Use visualization tools of discipline, and understand the advantages of the different types of visualization.
 
• GenMAPP and MAPPFinder are used to visualize the microarray data.  Could possibly show some other methods with the microarray data, but things are really jam-packed already.
 
 
Ethics, including citation of data
 
Understand intellectual property, privacy, and the ethos of the discipline around sharing and citing data.
 
• Phase 2/3:  They are introduced to open source licensing and we ask them to evaluate this in the NAR exercise.  We could really use a case study here like the one I developed for the Hwang stem cell fraud case.
 
 
-->
 
-->

Latest revision as of 02:36, 21 September 2013

The individual journal entry (UniProt exercise) is due on Friday, September 27, at midnight PDT. (Thursday night/Friday morning)

The shared journal entry, database wiki page, and PowerPoint slides for your presentation are due on Tuesday, October 1, at midnight PDT. (Monday night/Tuesday morning)

A note on the grading for this assignment:

  • The individual journal entry and shared journal entries are worth a total of 10 points. Students will be graded on an individual basis for this portion of the assignment.
  • The database wiki page is worth a total of 20 points; each member of the group will receive the same grade for this portion of the assignment.
  • The presentation is worth a total of 20 points; each member of the group will receive the same grade for this portion of the assignment.

Contents

[edit] Individual Journal Assignment

  • Store this journal entry as "username Week 5" (i.e., this is the text to place between the square brackets when you link to this page).
  • Link from your user page to this Assignment page.
  • Link to your journal entry from your user page.
  • Link back from your journal entry to your user page.
  • Don't forget to add the "Journal Entry" category to the end of your wiki page.
    • Note: you can easily fulfill all of these links by adding them to your template and then using your template on your journal entry.

[edit] UniProt Exercise

For this exercise, you will read and follow the links in Chapter 4: Using Protein and Specialized Sequence Databases of the book Bioinformatics for Dummies (on MyLMU Connect). We are delving quite deeply into UniProt in particular because the gene databases that you will generate later in the semester in your final project are going to be derived from UniProt.

For this assignment, you will keep an electronic laboratory notebook on your individual journal page that records the steps you carried out in exploring the UniProtKB.

  • Since the publication of this book in 2003, the SWISS-PROT database has become the UniProt Knowledgebase. The underlying data are the same, but the scope and user interface for the database have been updated. Thus, some of the exact instructions of the chapter have to be changed to reflect the change to UniProt. These changes are noted below by page number.
  • Page 123:
    1. The URL for the UniProtKB/SWISS-PROT server is now http://www.uniprot.org.
    2. The Quick Search field is now found at the middle top of the page.
    • The information described in subsequent pages can all be found, but will be in a different order on the page. There is a set of navigation links near the top of the page to help you jump to each section.
  • General information about the entry (bottom of page 123):
    • This information is found under the header "Entry information" and is near the bottom of the web page, instead of the top.
  • Name and origin of the protein (page 124) is near the top of the page.
  • The References (page 126) are near the middle of the page.
  • The Comments (page 126) is now known as "General annotation (comments)".
  • The Cross-References (page 128) are even more extensive and are organized by sub-categories of databases.
    • In particular, click on a sample cross-reference link for each of the following databases, and for each, state what type of information is found there:
      • EMBL
      • InterPro
      • PDB
      • Pfam
      • RefSeq
      • GeneID
  • The Keywords (page 130) are now found listed under "Ontologies".
  • The Features (page 131) are now listed as "Sequence annotation (Features)".
  • In the section "Finding Out More about Your Protein" (page 135-139), some of the databases are defunct, highlighting how biological databases are a moving target (this book was first published in 2003).
  • A new feature of the UniProt interface is that you can view the data in several different formats. Click on the buttons on the top-right of the page to view the data as:
    • TXT: flat file text data, the original format of the SWISS-PROT data (even before it was put in a relational database)
    • XML: text data structured with tags (like you praacticed with for last week's assignment)
    • RDF/XML: a semantic web format
    • GFF: a specialized format for genomic information
    • FASTA: a basic text format for sequence information
  • Write a one-paragraph summary of what you have learned about the human EGFR protein from this exercise.
  • Reflect and answer the following questions on your individual journal page:
    1. What was the purpose of this exercise?
    2. What did I learn from this exercise?
    3. What did I not understand (yet) about this exercise?

[edit] Additional UniProt Resources

[edit] NAR Exercise and Presentation

Each year, the journal Nucleic Acids Research (NAR) devotes the first issue in January to biological databases. The goal of this assignment is to dive into the deep end of the pool and experience the breadth and depth of biological databases available on the Web:

For this exercise, you will work with an assigned buddy. Choose a database from this issue and answer the following questions about that database. Each pair should choose a different database to profile. So, to claim your first choice, go to the Class Journal Week 5 page and stake your claim to a database. When you are choosing your database, look at the other students' entries to make sure you are not doing the same one. The buddy assignments are:

  • Hilda - Mitchell
  • Kurt - Kevin Meilek
  • Lena - Miles - Tauras
  • Viktoria - Kevin McGee
  • Gabriel - Katrina
  • Stephen - Alina
  • Lauren - Dillon

[edit] Database Wiki Page

For your assignment, create a new wiki page to profile your database. There will be one page per group; both partners will contribute to the same page.

  • Link to your database page from the Class Journal Week 5 page. These pages will be a resource for the class as we move forward with this unit of the course.
  • Link to your database page from your user page.
  • Link from your database page to the Class Journal Week 5 page.
  • Link from your database page to your user pages.

Read the article about the database from the Nucleic Acids Research journal and then go online to the database itself. When you answer the questions below, provide a hyperlink to the page that you got the information from.

  1. What database did you access? (link to the home page of the database)
  2. What is the purpose of the database?
  3. What biological information does it contain?
  4. What species are covered in the database?
  5. What biological questions can it be used to answer?
  6. What type (or types) of database is it (sequence, structure model organism, or specialty [what?]; primary or “meta”; curated electronically, manually [in-house], manually [community])?
  7. What individual or organization maintains the database?
  8. What is their funding source(s)?
  9. Is there a license agreement or any restrictions on access to the database?
  10. How often is the database updated? When was the last update?
  11. Are there links to other databases?
  12. Can the information be downloaded?
    • In what file formats?
  13. Evaluate the “user-friendliness” of the database.
    • Is the Web site well-organized?
    • Does it have a help section or tutorial?
    • Run a sample query. Do the results make sense?

[edit] Some Definitions

  • Electronic curation occurs when someone writes a program to add information to a database record from another database.
  • Manual curation occurs when a human reviews the information being added to a record to validate it as true.
    • In-house is when the human works for the database organization.
    • Community is when the database allows members of the scientific community that don't work for the database organization to add information to the record.

[edit] PowerPoint Presentation

Each group will prepare and give a 10-15 minute PowerPoint presentation based on their chosen database.

  • Four groups will present on Tuesday 10/1 and three groups will present on Thursday, 10/3. The order of presentations will be determined in class on Thursday 9/26.
  • Please follow the Presentation Guidelines for how to format your slides.
  • You will need to prepare ~10-15 slides (assume 1 slide per minute of presentation).
  • You need to present the information you gathered about your database that you listed in your wiki above, but organized as a presentation.
  • You may give a live demo of the database if you wish, but practice carefully so that you can do the presentation in 15 minutes.
    • Alternately, you may choose to show screen shots instead of the live demo.
  • Your PowerPoint slides must be uploaded to the wiki page you created for your database, by midnight Monday/Tuesday, even if your group is scheduled to present on Thursday.
    • You can update your slides before your presentation, but we will be grading the ones you upload by the deadline.
  • Your presentation (both the slides and the oral presentation) will be evalutated by the instructors using the guidelines shown here.
  • Your presentation will also be evaluated by your fellow classmates (anonymously) who will answer the following questions:
    1. What is the speaker's take-home message (one short sentence)?
    2. What are the best points about the presentation's content, organization, clarity of visuals, and presentation style? Please give at least 2 specific examples.
    3. What points need improvement? How would you improve them? Please give at least 2 specific examples.

[edit] Shared Journal Assignment

  • Store your journal entry in the shared Class Journal Week 5 page. If this page does not exist yet, go ahead and create it (congratulations on getting in first :) )
  • Link to your journal entry from your user page.
  • Link back from the journal entry to your user page.
    • NOTE: you can easily fulfill the links part of these instructions by adding them to your template and using the template on your user page.
  • Sign your portion of the journal with the standard wiki signature shortcut (~~~~).
  • Add the "Journal Entry" and "Shared" categories to the end of the wiki page (if someone has not already done so).

[edit] Reflect

The following is a list of core competencies for scientific data literacy. After completing the all of the exercises in this assignment, answer the following questions on the shared Class Journal Week 5 page:

  1. Which of these core competencies (if any) were you familiar with before taking this class? How did you become familiar with them?
  2. Which of these core competencies (if any) did you gain a deeper understanding of by doing this exercise? What about the exercise taught you about them?
  3. Which of these core competencies (if any) do you want to know more about? Why?

[edit] Scientific Data Literacy Core Competencies

  1. Databases and Data Formats
    • Understand how to query relational databases, and be familiar with data types and formats for the discipline.
  2. Discovery and Acquisition of Data
    • Locate and utilize disciplinary data repositories, and identify appropriate data sources
  3. Data Management and Organization
    • Understand the lifecycle of data, and use data management plans to track subsets of processed data.
  4. Data Conversion and Interoperability
    • Migrate data from one format to another, and understand the benefits of standard data formats.
  5. Quality Assurance
    • Use metadata and screening procedures to recognize artifacts, incompletion, or corruption of data sets.
  6. Metadata
    • Interpret metadata from external sources, and annotate data so it can be used by external users.
  7. Data Curation and Re-use
    • Recognize the role of curation throughout the data lifecycle in its value in effective reuse of data.
  8. Cultures of Practice
    • Know the practices, values, and norms of discipline as they relate to managing, sharing, and curating data.
  9. Data Preservation
    • Understand the technology, resource, and organizational components of preserving data.
  10. Data Analysis
    • Understand the basic analysis tools of their discipline including workflow management tools.
  11. Data Visualization
    • Use visualization tools of discipline, and understand the advantages of the different types of visualization.
  12. Ethics, including citation of data
    • Understand intellectual property, privacy, and the ethos of the discipline around sharing and citing data.


Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox