Retrieve Oxford Dictionaries API Dictionary Data as XML with XQuery and BaseX

Retrieve Oxford Dictionaries API Dictionary Data as XML with XQuery and BaseX
My notes from the Vanderbilt University XQuery Working Group

We retrieved dictionary data from the Oxford Dictionaries application programming interface (API) and returned Extensible Markup Language (XML) with XQuery, an XML query language, and BaseX, an XML database engine and XQuery processor. This tutorial illustrates how to retrieve dictionary data—definitions, example sentences, and pronunciations—as XML from the Oxford Dictionaries API with XQuery and BaseX.

The Oxford Dictionaries API returns JavaScript Object Notation (JSON) responses that yield undesired XML structures when converted automatically with BaseX. Fortunately, we’re able to use XQuery to fill in some blanks after converting JSON to XML. My GitHub repository od-api-xquery contains XQuery code for this tutorial.

Retrieve dictionary data

Using XQuery to retrieve dictionary entry data as XML from the Oxford Dictionaries API is simple, and we can get started with the code template od-api.xquery. First, we import an XQuery dictionary library module, od-api-basex.xquery, and assign it the namespace od-api. Second, we replace myId and myKey with our own Oxford Dictionaries API Credentials. Third, we set options, such as $source-lang and $dictionary-filters, following the “Dictionary entries” section of the Oxford Dictionaries API documentation, to retrieve dictionary data, such as definitions, example sentences, and pronunciations. Finally, we request dictionary data using $dictionary(). Sample XQuery code for retrieving dictionary data as XML for the word ‘ace’ follows:

xquery version "3.1" encoding "UTF-8";

import module namespace od-api="od-api-basex" at "https://raw.githubusercontent.com/AdamSteffanick/od-api-xquery/master/od-api-basex.xquery";

let $id := "myId"
let $key := "myKey"

let $source-lang := "en"
let $dictionary-filters := ""

let $dictionary := od-api:dictionary($source-lang, ?, $dictionary-filters, $id, $key)

return $dictionary("ace")

In the XQuery code above, we use return $dictionary("ace") to trigger our request to the Oxford Dictionaries API. The library module automatically alters the argument of $dictionary() by replacing spaces with underscores, forcing lower-case characters, and encoding reserved characters. For example, returning $dictionary("étouffée") sends the word_id parameter %C3%A9touff%C3%A9e to the Oxford Dictionaries API. Similarly, returning $dictionary("Vancouver Island") sends the word_id parameter vancouver_island. Returning $dictionary("ace") triggers a request to the Oxford Dictionaries API through the use of a partial function application. The actual request is sent by calling the library module’s od-api:dictionary() function, whose argument ? is assigned its value by $dictionary("ace"). Running our XQuery code above with BaseX returns the following result:

<dictionary input="ace" language="en">
  <metadata>
    <provider>Oxford University Press</provider>
    <date>Thu, 16 Feb 2017 18:00:00 GMT</date>
  </metadata>
  <results>
    <result>
      <id>ace</id>
      <language>en</language>
      <type>headword</type>
      <word>ace</word>
      <lexicalEntries>
        <lexicalEntry>
          <language>en</language>
          <lexicalCategory>Adjective</lexicalCategory>
          <text>ace</text>
          <pronunciations>
            <pronunciation>
              <audioFile>http://audio.oxforddictionaries.com/en/mp3/ace_gb_1.mp3</audioFile>
              <dialects>
                <dialect>British English</dialect>
              </dialects>
              <phoneticNotation>IPA</phoneticNotation>
              <phoneticSpelling>eɪs</phoneticSpelling>
            </pronunciation>
          </pronunciations>
          <entries>
            <entry>
              <grammaticalFeatures>
                <grammaticalFeature>
                  <text>Positive</text>
                  <type>Degree</type>
                </grammaticalFeature>
              </grammaticalFeatures>
              <homographNumber>001</homographNumber>
              <senses>
                <sense>
                  <definitions>
                    <definition>very good:</definition>
                  </definitions>
                  <examples>
                    <example>
                      <text>Ace! You've done it!</text>
                    </example>
                    <example>
                      <text>an ace swimmer</text>
                    </example>
                  </examples>
                  <id>m_en_gb0004640.006</id>
                  <registers>
                    <register>informal</register>
                  </registers>
                </sense>
              </senses>
            </entry>
          </entries>
        </lexicalEntry>
        <lexicalEntry>
          <language>en</language>
          <lexicalCategory>Noun</lexicalCategory>
          <text>ace</text>
          <pronunciations>
            <pronunciation>
              <audioFile>http://audio.oxforddictionaries.com/en/mp3/ace_gb_1.mp3</audioFile>
              <dialects>
                <dialect>British English</dialect>
              </dialects>
              <phoneticNotation>IPA</phoneticNotation>
              <phoneticSpelling>eɪs</phoneticSpelling>
            </pronunciation>
          </pronunciations>
          <entries>
            <entry>
              <etymologies>
                <etymology>Middle English (denoting the ‘one’ on dice): via Old French from Latin as unity, a unit</etymology>
              </etymologies>
              <grammaticalFeatures>
                <grammaticalFeature>
                  <text>Singular</text>
                  <type>Number</type>
                </grammaticalFeature>
              </grammaticalFeatures>
              <homographNumber>000</homographNumber>
              <senses>
                <sense>
                  <definitions>
                    <definition>a playing card with a single spot on it, ranked as the highest card in its suit in most card games:</definition>
                  </definitions>
                  <domains>
                    <domain>Cards</domain>
                  </domains>
                  <examples>
                    <example>
                      <registers>
                        <register>figurative</register>
                      </registers>
                      <text>life had started dealing him aces again</text>
                    </example>
                    <example>
                      <text>the ace of diamonds</text>
                    </example>
                  </examples>
                  <id>m_en_gb0004640.001</id>
                </sense>
                <sense>
                  <definitions>
                    <definition>a person who excels at a particular sport or other activity:</definition>
                  </definitions>
                  <domains>
                    <domain>Sport</domain>
                  </domains>
                  <examples>
                    <example>
                      <text>a motorcycle ace</text>
                    </example>
                  </examples>
                  <id>m_en_gb0004640.002</id>
                  <registers>
                    <register>informal</register>
                  </registers>
                  <subsenses>
                    <subsense>
                      <definitions>
                        <definition>a pilot who has shot down many enemy aircraft:</definition>
                      </definitions>
                      <domains>
                        <domain>Air Force</domain>
                      </domains>
                      <examples>
                        <example>
                          <text>a Battle of Britain ace</text>
                        </example>
                      </examples>
                      <id>m_en_gb0004640.003</id>
                      <registers>
                        <register>informal</register>
                      </registers>
                    </subsense>
                  </subsenses>
                </sense>
                <sense>
                  <definitions>
                    <definition>(in tennis and similar games) a service that an opponent is unable to return and thus wins a point:</definition>
                  </definitions>
                  <domains>
                    <domain>Tennis</domain>
                  </domains>
                  <examples>
                    <example>
                      <text>Nadal banged down eight aces in the set</text>
                    </example>
                  </examples>
                  <id>m_en_gb0004640.004</id>
                  <subsenses>
                    <subsense>
                      <definitions>
                        <definition>a hole in one:</definition>
                      </definitions>
                      <domains>
                        <domain>Golf</domain>
                      </domains>
                      <examples>
                        <example>
                          <text>his hole in one at the 15th was Senior's second ace as a professional</text>
                        </example>
                      </examples>
                      <id>m_en_gb0004640.005</id>
                      <registers>
                        <register>informal</register>
                      </registers>
                    </subsense>
                  </subsenses>
                </sense>
              </senses>
            </entry>
          </entries>
        </lexicalEntry>
        <lexicalEntry>
          <language>en</language>
          <lexicalCategory>Verb</lexicalCategory>
          <text>ace</text>
          <pronunciations>
            <pronunciation>
              <audioFile>http://audio.oxforddictionaries.com/en/mp3/ace_gb_1.mp3</audioFile>
              <dialects>
                <dialect>British English</dialect>
              </dialects>
              <phoneticNotation>IPA</phoneticNotation>
              <phoneticSpelling>eɪs</phoneticSpelling>
            </pronunciation>
          </pronunciations>
          <entries>
            <entry>
              <grammaticalFeatures>
                <grammaticalFeature>
                  <text>Transitive</text>
                  <type>Subcategorization</type>
                </grammaticalFeature>
                <grammaticalFeature>
                  <text>Present</text>
                  <type>Tense</type>
                </grammaticalFeature>
              </grammaticalFeatures>
              <homographNumber>002</homographNumber>
              <senses>
                <sense>
                  <definitions>
                    <definition>(in tennis and similar games) serve an ace against (an opponent):</definition>
                  </definitions>
                  <domains>
                    <domain>Tennis</domain>
                  </domains>
                  <examples>
                    <example>
                      <text>he can ace opponents with serves of no more than 62 mph</text>
                    </example>
                  </examples>
                  <id>m_en_gb0004640.007</id>
                  <registers>
                    <register>informal</register>
                  </registers>
                  <subsenses>
                    <subsense>
                      <definitions>
                        <definition>score an ace on (a hole) or with (a shot):</definition>
                      </definitions>
                      <domains>
                        <domain>Golf</domain>
                      </domains>
                      <examples>
                        <example>
                          <text>there was a prize for the first player to ace the hole</text>
                        </example>
                      </examples>
                      <id>m_en_gb0004640.008</id>
                    </subsense>
                  </subsenses>
                </sense>
                <sense>
                  <definitions>
                    <definition>achieve high marks in (a test or exam):</definition>
                  </definitions>
                  <examples>
                    <example>
                      <text>I aced my grammar test</text>
                    </example>
                  </examples>
                  <id>m_en_gb0004640.009</id>
                  <regions>
                    <region>North American</region>
                  </regions>
                  <registers>
                    <register>informal</register>
                  </registers>
                  <subsenses>
                    <subsense>
                      <definitions>
                        <definition>outdo someone in a competitive situation:</definition>
                      </definitions>
                      <examples>
                        <example>
                          <text>the magazine won an award, acing out its rivals</text>
                        </example>
                      </examples>
                      <id>m_en_gb0004640.010</id>
                      <regions>
                        <region>North American</region>
                      </regions>
                    </subsense>
                  </subsenses>
                </sense>
              </senses>
            </entry>
          </entries>
        </lexicalEntry>
      </lexicalEntries>
    </result>
  </results>
</dictionary>

Setting a filter with the XQuery code let $dictionary-filters := "lexicalCategory=noun" returns only noun-related data:

<dictionary input="ace" language="en">
  <metadata>
    <provider>Oxford University Press</provider>
    <date>Thu, 16 Feb 2017 18:00:00 GMT</date>
  </metadata>
  <results>
    <result>
      <id>ace</id>
      <language>en</language>
      <type>headword</type>
      <word>ace</word>
      <lexicalEntries>
        <lexicalEntry>
          <language>en</language>
          <lexicalCategory>Noun</lexicalCategory>
          <text>ace</text>
        </lexicalEntry>
      </lexicalEntries>
    </result>
  </results>
</dictionary>

Similarly, we can assign $dictionary-filters a value of definitions, examples, or pronunciations to return only that specific type of data. For example, let $dictionary-filters := "definitions" returns only definition-related data:

<dictionary input="ace" language="en">
  <metadata>
    <provider>Oxford University Press</provider>
    <date>Thu, 16 Feb 2017 18:00:00 GMT</date>
  </metadata>
  <results>
    <result>
      <id>ace</id>
      <language>en</language>
      <type>headword</type>
      <word>ace</word>
      <lexicalEntries>
        <lexicalEntry>
          <language>en</language>
          <lexicalCategory>Adjective</lexicalCategory>
          <text>ace</text>
          <entries>
            <entry>
              <homographNumber>001</homographNumber>
              <senses>
                <sense>
                  <definitions>
                    <definition>very good:</definition>
                  </definitions>
                  <id>m_en_gb0004640.006</id>
                </sense>
              </senses>
            </entry>
          </entries>
        </lexicalEntry>
        <lexicalEntry>
          <language>en</language>
          <lexicalCategory>Noun</lexicalCategory>
          <text>ace</text>
          <entries>
            <entry>
              <homographNumber>000</homographNumber>
              <senses>
                <sense>
                  <definitions>
                    <definition>a playing card with a single spot on it, ranked as the highest card in its suit in most card games:</definition>
                  </definitions>
                  <id>m_en_gb0004640.001</id>
                </sense>
                <sense>
                  <definitions>
                    <definition>a person who excels at a particular sport or other activity:</definition>
                  </definitions>
                  <id>m_en_gb0004640.002</id>
                  <subsenses>
                    <subsense>
                      <definitions>
                        <definition>a pilot who has shot down many enemy aircraft:</definition>
                      </definitions>
                      <id>m_en_gb0004640.003</id>
                    </subsense>
                  </subsenses>
                </sense>
                <sense>
                  <definitions>
                    <definition>(in tennis and similar games) a service that an opponent is unable to return and thus wins a point:</definition>
                  </definitions>
                  <id>m_en_gb0004640.004</id>
                  <subsenses>
                    <subsense>
                      <definitions>
                        <definition>a hole in one:</definition>
                      </definitions>
                      <id>m_en_gb0004640.005</id>
                    </subsense>
                  </subsenses>
                </sense>
              </senses>
            </entry>
          </entries>
        </lexicalEntry>
        <lexicalEntry>
          <language>en</language>
          <lexicalCategory>Verb</lexicalCategory>
          <text>ace</text>
          <entries>
            <entry>
              <homographNumber>002</homographNumber>
              <senses>
                <sense>
                  <definitions>
                    <definition>(in tennis and similar games) serve an ace against (an opponent):</definition>
                  </definitions>
                  <id>m_en_gb0004640.007</id>
                  <subsenses>
                    <subsense>
                      <definitions>
                        <definition>score an ace on (a hole) or with (a shot):</definition>
                      </definitions>
                      <id>m_en_gb0004640.008</id>
                    </subsense>
                  </subsenses>
                </sense>
                <sense>
                  <definitions>
                    <definition>achieve high marks in (a test or exam):</definition>
                  </definitions>
                  <id>m_en_gb0004640.009</id>
                  <subsenses>
                    <subsense>
                      <definitions>
                        <definition>outdo someone in a competitive situation:</definition>
                      </definitions>
                      <id>m_en_gb0004640.010</id>
                    </subsense>
                  </subsenses>
                </sense>
              </senses>
            </entry>
          </entries>
        </lexicalEntry>
      </lexicalEntries>
    </result>
  </results>
</dictionary>

I’ve preserved the original structure of the JSON response as much as possible, and have made minimal alterations beyond those descibed in the below section “Dictionary library module”. One change is returning <dictionary> as the root XML element, which has the attibutes input and language. The input attribute is assigned a value matching the ‘word’ sent to the Oxford Dictionaries API, and is equivalent to the API’s word_id parameter. Likewise, the language attribute is assigned a value equivalent to the API’s source_lang parameter. As a result, <dictionary input="ace" language="en"> allows us to create efficient XML queries on cached dictionary data from English related to the string ‘ace’. Another addition is a <date> element within the <metadata> element. Text content within <date> is from the response header received from the Oxford Dictionaries API. This is useful for evaluating whether or not to update cached dictionary data by making a new request for a given word_id to the Oxford Dictionaries API.

Dictionary library module

The dictionary module od-api-basex.xquery contains XQuery functions related to dictionary data, and can be used with the XQuery code below:

import module namespace od-api="od-api-basex" at "https://raw.githubusercontent.com/AdamSteffanick/od-api-xquery/master/od-api-basex.xquery";

Using this library module helps create clean, verbose XML structures that follow the models found in the “Dictionary entries” section of the Oxford Dictionaries API documentation. Consider that lexical entry data have the following JSON model schema:

{
  "lexicalEntries": [
    {
      "entries": [
        {
          "etymologies": [
            "string"
          ],
          "grammaticalFeatures": [
            {
              "text": "string",
              "type": "string"
            }
          ],
          "homographNumber": "string",
          "pronunciations": [
            {
              "audioFile": "string",
              "dialects": [
                "string"
              ],
              "phoneticNotation": "string",
              "phoneticSpelling": "string",
              "regions": [
                "string"
              ]
            }
          ],
          "senses": [
            {
              "crossReferenceMarkers": [
                "string"
              ],
              "crossReferences": [
                {
                  "id": "string",
                  "text": "string",
                  "type": "string"
                }
              ],
              "definitions": [
                "string"
              ],
              "domains": [
                "string"
              ],
              "examples": [
                {
                  "definitions": [
                    "string"
                  ],
                  "domains": [
                    "string"
                  ],
                  "regions": [
                    "string"
                  ],
                  "registers": [
                    "string"
                  ],
                  "senseIds": [
                    "string"
                  ],
                  "text": "string",
                  "translations": [
                    {
                      "domains": [
                        "string"
                      ],
                      "grammaticalFeatures": [
                        {
                          "text": "string",
                          "type": "string"
                        }
                      ],
                      "language": "string",
                      "regions": [
                        "string"
                      ],
                      "registers": [
                        "string"
                      ],
                      "text": "string"
                    }
                  ]
                }
              ],
              "id": "string",
              "pronunciations": [
                {
                  "audioFile": "string",
                  "dialects": [
                    "string"
                  ],
                  "phoneticNotation": "string",
                  "phoneticSpelling": "string",
                  "regions": [
                    "string"
                  ]
                }
              ],
              "regions": [
                "string"
              ],
              "registers": [
                "string"
              ],
              "subsenses": [
                {}
              ],
              "translations": [
                {
                  "domains": [
                    "string"
                  ],
                  "grammaticalFeatures": [
                    {
                      "text": "string",
                      "type": "string"
                    }
                  ],
                  "language": "string",
                  "regions": [
                    "string"
                  ],
                  "registers": [
                    "string"
                  ],
                  "text": "string"
                }
              ],
              "variantForms": [
                {
                  "regions": [
                    "string"
                  ],
                  "text": "string"
                }
              ]
            }
          ],
          "variantForms": [
            {
              "regions": [
                "string"
              ],
              "text": "string"
            }
          ]
        }
      ],
      "grammaticalFeatures": [
        {
          "text": "string",
          "type": "string"
        }
      ],
      "language": "string",
      "lexicalCategory": "string",
      "pronunciations": [
        {
          "audioFile": "string",
          "dialects": [
            "string"
          ],
          "phoneticNotation": "string",
          "phoneticSpelling": "string",
          "regions": [
            "string"
          ]
        }
      ],
      "text": "string",
      "variantForms": [
        {
          "regions": [
            "string"
          ],
          "text": "string"
        }
      ]
    }
  ]
}

Requesting dictionary data for the word ‘ace’ returns a JSON response including:

{
  "lexicalEntries": [
    {
      "entries": [
        {
          "grammaticalFeatures": [
            {
              "text": "Positive",
              "type": "Degree"
            }
          ],
          "homographNumber": "001",
          "senses": [
            {
              "definitions": [
                "very good:"
              ],
              "examples": [
                {
                  "text": "Ace! You've done it!"
                },
                {
                  "text": "an ace swimmer"
                }
              ],
              "id": "m_en_gb0004640.006",
              "registers": [
                "informal"
              ]
            }
          ]
        }
      ],
      "language": "en",
      "lexicalCategory": "Adjective",
      "pronunciations": [
        {
          "audioFile": "http://audio.oxforddictionaries.com/en/mp3/ace_gb_1.mp3",
          "dialects": [
            "British English"
          ],
          "phoneticNotation": "IPA",
          "phoneticSpelling": "eɪs"
        }
      ],
      "text": "ace"
    }
  ]
}

By default, converting the response above from JSON to XML with BaseX yields an undesired XML structure:

<lexicalEntries type="array">
  <_ type="object">
    <entries type="array">
      <_ type="object">
        <grammaticalFeatures type="array">
          <_ type="object">
            <text>Positive</text>
            <type>Degree</type>
          </_>
        </grammaticalFeatures>
        <homographNumber>001</homographNumber>
        <senses type="array">
          <_ type="object">
            <definitions type="array">
              <_>very good:</_>
            </definitions>
            <examples type="array">
              <_ type="object">
                <text>Ace! You've done it!</text>
              </_>
              <_ type="object">
                <text>an ace swimmer</text>
              </_>
            </examples>
            <id>m_en_gb0004640.006</id>
            <registers type="array">
              <_>informal</_>
            </registers>
          </_>
        </senses>
      </_>
    </entries>
    <language>en</language>
    <lexicalCategory>Adjective</lexicalCategory>
    <pronunciations type="array">
      <_ type="object">
        <audioFile>http://audio.oxforddictionaries.com/en/mp3/ace_gb_1.mp3</audioFile>
        <dialects type="array">
          <_>British English</_>
        </dialects>
        <phoneticNotation>IPA</phoneticNotation>
        <phoneticSpelling>eɪs</phoneticSpelling>
      </_>
    </pronunciations>
    <text>ace</text>
  </_>
</lexicalEntries>

We can use XQuery to assign names to the <_> elements above. In the case of elements with plural names, such as <definitions>, functions within the library module create child elements with singular names, such as <definition>:

<lexicalEntries>
  <lexicalEntry>
    <language>en</language>
    <lexicalCategory>Adjective</lexicalCategory>
    <text>ace</text>
    <pronunciations>
      <pronunciation>
        <audioFile>http://audio.oxforddictionaries.com/en/mp3/ace_gb_1.mp3</audioFile>
        <dialects>
          <dialect>British English</dialect>
        </dialects>
        <phoneticNotation>IPA</phoneticNotation>
        <phoneticSpelling>eɪs</phoneticSpelling>
      </pronunciation>
    </pronunciations>
    <entries>
      <entry>
        <grammaticalFeatures>
          <grammaticalFeature>
            <text>Positive</text>
            <type>Degree</type>
          </grammaticalFeature>
        </grammaticalFeatures>
        <homographNumber>001</homographNumber>
        <senses>
          <sense>
            <definitions>
              <definition>very good:</definition>
            </definitions>
            <examples>
              <example>
                <text>Ace! You've done it!</text>
              </example>
              <example>
                <text>an ace swimmer</text>
              </example>
            </examples>
            <id>m_en_gb0004640.006</id>
            <registers>
              <register>informal</register>
            </registers>
          </sense>
        </senses>
      </entry>
    </entries>
  </lexicalEntry>
</lexicalEntries>

The majority of the XQuery code in the library module od-api-basex.xquery is not specific to BaseX, however I did use the BaseX HTTP Module. Changes to the library module may be required when using a different XQuery processor.

What we learned

Thanks to this session of the Vanderbilt University XQuery Working Group, we can now:

Thank you for reading, and have fun coding.