How to Read JSON Files

From LMU BioDB 2017
Revision as of 19:46, 24 October 2017 by Dondi (talk | contribs) (Start writing out JSON page.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

To this point, you have been working with what are called “plain text” files and information — that is, information that is viewed as a simple sequence of symbols or characters (letters, numbers, punctuation, spaces, etc.), without any additional structure.

There are, however, other text “formats” that do impose a structure over the included data. One such format is called JSON (short for JavaScript Object Notation). This page seeks to introduce you to this type of text information.

Overall Concept

The core idea behind JSON data is that the information inside it can be thought of as an outline or tree. Our own wiki pages have outlines, in the form of either sections or bulleted lists:

  • Level 1, item 1
    • Level 2, item 1 (of level 1, item 1)
    • Level 2, item 2 (of level 1, item 1)
  • Level 1, item 2
  • Level 1, item 3
    • Level 2, item 1 (of level 1, item 3)
    • Level 2, item 2 (of level 1, item 3)
    • Level 2, item 3 (of level 1, item 3)
    • Level 2, item 4 (of level 1, item 3)

JSON also captures an outline; it just looks different. Here’s an example:

{
  organism: {
    key: "2",
    name: {
      type: "scientific",
      text: "Vibrio cholerae"
    },
    dbReference: {
      type: "NCBI Taxonomy",
      key: "3",
      id: "666"
    },
    lineage: [
      { taxon: "Bacteria" },
      { taxon: "Proteobacteria" },
      { taxon: "Gammaproteobacteria" },
      { taxon: "Vibrionales" },
      { taxon: "Vibrionaceae" },
      { taxon: "Vibrio" }
    ]
  }
}

This piece of JSON breaks down, roughly, to this outline:

  • The JSON is for a single object, represented by braces { }, that has one property, organism, which itself is an object
    • The organism object has a key property whose value is "2"
    • The name property is another object whose type is "scientific" and text is "Vibrio cholerae"
    • The dbReference property is an object whose type is "NCBI Taxonomy", key is "3", and id is "666"
    • lineage is a list of objects, indicated by the use of brackets [ ] rather than braces { }, where each object has a single taxon property…
      • taxon: "Bacteria"
      • taxon: "Proteobacteria"
      • taxon: "Gammaproteobacteria"
      • taxon: "Vibrionales"
      • taxon: "Vibrionaceae"
      • taxon: "Vibrio"

Even now, you might already be seeing a pattern in terms of how the JSON looks and what outline it represents. That’s one of the intentions of JSON: it’s meant to strike a balance between human readability and machine readability. The “human readability” part manifests in recognizable words (“name,” “lineage,” “taxon”), while “machine readability” comes in through some special symbols and rules.

Specific Parts

A JSON file consists of three primary parts, each expressed in a very specific manner.

Objects

Objects represent distinct, self-contained items or records of data. They begin with a left brace { followed by the object’s properties—names and values. Whereas humans are generally capable of figuring out where a piece of data starts and ends, computers need more help. Thus, every object { has a matching }.

Computers don’t care about spacing within JSON, but humans can read JSON a lot more easily with proper spacing. For human consumption, the braces of an object are typically on their own lines (as above), with the properties indented by a couple of spaces from the braces.

Lists

Lists or arrays represent collections of objects