HTML microdata allows machine-readable data to be embedded in HTML documents in the form of nested groups of name-value pairs.

Schema.org provides a collection of shared vocabularies in microdata format that webmasters can use to mark up their pages in ways that can be understood by the major search engines.

Schema.org has a generic Table item. JSON-stat proposes a specific type of table for statistics based in the JSON-stat vocabulary: the StatisticalTable.

This specification is in its early stages.

Are you looking for the JSON-stat JSON Schema? Please, visit the JSON-stat format section.

StatisticalTable Documentation

A statistical HTML table (Website)

This could be a table in a national statistical office website:

Population in Tuvalu in . By sex
Sex Persons
Men 4729
Women 4832
Total 9561

This format is mainly aimed at humans.

A JSON-stat dataset response (API)

This is the same information expressed in the JSON-stat format:

{
   "version" : "2.0",
   "class" : "dataset",
   "label" : "Population in Tuvalu in 2002. By sex",
   "value" : [4729, 4832, 9561],
   "id" : ["metric", "time", "geo", "sex"],
   "size" : [1, 1, 1, 3],
   "dimension" : {
      "sex" : {
         "label" : "sex",
         "category" : {
            "index" : {
              "M" : 0,
              "F" : 1,
              "T" : 2
            },
            "label" : {
              "M" : "men",
              "F" : "women",
              "T" : "total"
            }
         }
      }
   }
}

It is aimed at machines.

The StatisticalTable schema (proposal) tries to reconcile both views: it’s a microdata schema (see https://schema.org) that allows you to enrich your HTML tables with JSON-stat semantics.

In the next section you can see the actual HTML code of the table shown before.

Simple semantically-enriched HTML statistical table

<table itemscope itemtype="https://schema.org/Table/Stats">
 <caption itemprop="label">
   <span itemprop="metric">Population</span> in
   <span itemprop="geo" itemscope itemtype="https://schema.org/Place">
     <span itemprop="name">Tuvalu</span>
     <meta itemprop="latitude" content="-8.516667" />
     <meta itemprop="longitude" content="179.216667" />
   </span>
   in <time datetime="2002-12-31">2002</time>. By sex
 </caption>
 <thead>
   <tr>
     <th itemprop="dimension">Sex</th>
     <th itemprop="units">Persons</th>
   </tr>
 </thead>
 <tbody>
   <tr>
     <th>Men</th>
     <td>4729</td>
   </tr>
   <tr>
     <th>Women</th>
     <td>4832</td>
   </tr>
   <tr>
     <th>Total</th>
     <td>9561</td>
   </tr>
 </tbody>
</table>

This is how the JSON-stat semantics could be embedded in a webpage using the StatisticalTable schema:

<table itemscope itemtype="https://schema.org/Table/Stats">
   <caption itemprop="label">
      <span itemprop="metric">Population</span> in
      <span itemprop="geo" itemscope itemtype="https://schema.org/Place">
         <span itemprop="name">Tuvalu</span>
         <meta itemprop="latitude" content="-8.516667" />
         <meta itemprop="longitude" content="179.216667" />
      </span>
      in <time datetime="2002-12-31">2002</time>. By sex
   </caption>
   <thead>
      <tr>
         <th itemprop="dimension">Sex</th>
         <th itemprop="units">Persons</th>
      </tr>
   </thead>
   <tbody>
      <tr>
         <th>Men</th>
         <td>4729</td>
      </tr>
      <tr>
         <th>Women</th>
         <td>4832</td>
      </tr>
      <tr>
         <th>Total</th>
         <td>9561</td>
      </tr>
   </tbody>
</table>

In the example, no standard name is provided for dimensions (sex) and categories (men, women, total). A web-to-JSON converter should be able to parse the HTML code and create on-the-fly names (dim1) and indexes (0, 1, 2) for dimensions and categories.

Of course, for compatibility between different responses and benefitting from already available classifications, it is better to embed standard codes in the table (even though they might not be displayed in the webpage). See how this can be done in the next section.

If the label itemprop is not present, the caption tag should probably be assumed to contain such information.

Semantically-enriched HTML statistical table with standard codes

The itemid attribute is the general procedure to attach a code to an element (for example, the classification used in the categories of the geo dimension).

<table itemscope itemtype="https://schema.org/Table/Stats">
   <caption itemprop="label">
      <span itemprop="metric">Population</span> in
      <span itemprop="geo" itemscope itemtype="https://schema.org/Place">
         <span itemprop="name">Tuvalu</span>
         <meta itemprop="latitude" content="-8.516667" />
         <meta itemprop="longitude" content="179.216667" />
      </span>
      in <time datetime="2002-12-31">2002</time>. By sex
   </caption>
   <thead>
      <tr>
         <th
             itemprop="dimension"
             itemid="https://jsonstat.dataprovider.org/dimension/sex.json"
         >
            Sex
         </th>
         <th itemprop="units">Persons</th>
      </tr>
   </thead>
   <tbody>
      <tr>
         <th itemid="https://jsonstat.dataprovider.org/dimension/sex.json#M">
            Men
         </th>
         <td>4729</td>
      </tr>
      <tr>
         <th itemid="https://jsonstat.dataprovider.org/dimension/sex.json#F">
            Women
         </th>
         <td>4832</td>
      </tr>
      <tr>
         <th itemid="https://jsonstat.dataprovider.org/dimension/sex.json#T">
            Total
         </th>
         <td>9561</td>
      </tr>
   </tbody>
</table>

Several dimensions and order

JSON-stat uses an array for data dissemination. When a table has only one dimension, this array is built parsing the data cells sequentially. When more than one dimension is present, the data array is built using the same procedure but an order has to be attached to the dimensions to make sense of the data. The meta tag is used to that purpose.

<th
   itemprop="dimension"
   itemid="https://jsonstat.dataprovider.org/dimension/sex.json"
>
   <meta itemprop="index" content="3" />
   Sex
</th>

Use cases

Use cases of the StatisticalTable schema can probably be divided in two big groups according to the type of direct user: humans and agents. Humans can benefit directly from such schema using StatisticalTable-aware browsers (and agents in general). Browsers can support the StatisticalTable schema natively or via extensions and bookmarklets. Agents (for example, web crawlers) can benefit directly by making sense of webpages’ content and transforming it. Some obvious cases that come to mind are:

  1. Users visiting a statistical website could use their browsers to dinamically build a data visualization based on tabular information displayed on the webpage. For example, a bookmarklet or extension could parse the StatisticalTable microdata, notice that a table includes several time series and draw a line chart helping understand the cold figures.
  2. Users visiting a statistical website could use their browsers to import table contents and transform them to several formats (not necessarily tabular formats) as they would be able to derive meaning from the data (this is space, this is time, this is a classification variable, etc.).
  3. Search engines would be able to answer questions like “How many women live in Tuvalu?”
  4. An agent could act as a middleman transforming webpage requests into API responses for the benefit, for example, of a third party application.