Retrieve Oxford Dictionaries API Thesaurus Data as XML with XQuery and BaseX

Retrieve Oxford Dictionaries API Thesaurus Data as XML with XQuery and BaseX
My notes from the Vanderbilt University XQuery Working Group

We retrieved thesaurus data from the Oxford Dictionaries application programming interface (API) and returned Extensible Markup Language (XML) with XQuery, an XML query language, and BaseX, an XML database engine and XQuery processor. This tutorial illustrates how to retrieve thesaurus data—synonyms and antonyms—as XML from the Oxford Dictionaries API with XQuery and BaseX.

The Oxford Dictionaries API returns JavaScript Object Notation (JSON) responses that yield undesired XML structures when converted automatically with BaseX. Fortunately, we’re able to use XQuery to fill in some blanks after converting JSON to XML. My GitHub repository od-api-xquery contains XQuery code for this tutorial.

Retrieve thesaurus data

Retreiving thesaurus data from the Oxford Dictionaries API as XML with XQuery can be easy, and we’ll use od-api.xquery as a template to help us get started. First, we import an XQuery thesaurus library module, od-api-basex.xquery, and assign it the namespace od-api. Second, we replace myId and myKey with our own Oxford Dictionaries API Credentials. Third, we set options, such as $source-lang and $thesaurus-operation, following the “Thesaurus” section of the Oxford Dictionaries API documentation, to retrieve synonyms (i.e., words similar in meaning), antonyms (i.e., words opposite in meaning), or both. Finally, we request thesarus data using $thesaurus(). Sample XQuery code for retrieving thesaurus data as XML for the word ‘ace’ follows:

xquery version "3.1" encoding "UTF-8";

import module namespace od-api="od-api-basex" at "https://raw.githubusercontent.com/AdamSteffanick/od-api-xquery/master/od-api-basex.xquery";

let $id := "myId"
let $key := "myKey"

let $source-lang := "en"
let $thesaurus-operation := "synonyms;antonyms"

let $thesaurus := od-api:thesaurus($source-lang, ?, $thesaurus-operation, $id, $key)

return $thesaurus("ace")

In the XQuery code above, we use return $thesaurus("ace") to trigger our request to the Oxford Dictionaries API. The library module automatically alters the argument of $thesaurus() by replacing spaces with underscores, forcing lower-case characters, and encoding reserved characters. For example, returning $thesaurus("pâté") sends the word_id parameter p%C3%A2t%C3%A9 to the Oxford Dictionaries API. Similarly, returning $thesaurus("United States of America") sends the word_id parameter united_states_of_america. Returning $thesaurus("ace") triggers a request to the Oxford Dictionaries API through the use of a partial function application. The actual request is sent by calling the library module’s od-api:thesaurus() function, whose argument ? is assigned its value by $thesaurus("ace"). Running our XQuery code above with BaseX returns the following result:

<thesaurus input="ace" language="en">
  <metadata>
    <provider>Oxford University Press</provider>
    <date>Thu, 09 Feb 2017 18:00:00 GMT</date>
  </metadata>
  <results>
    <result>
      <id>ace</id>
      <language>en</language>
      <type>headword</type>
      <word>ace</word>
      <lexicalEntries>
        <lexicalEntry>
          <language>en</language>
          <lexicalCategory>Noun</lexicalCategory>
          <text>ace</text>
          <entries>
            <entry>
              <homographNumber>000</homographNumber>
              <senses>
                <sense>
                  <examples>
                    <example>
                      <text>a rowing ace</text>
                    </example>
                  </examples>
                  <id>t_en_gb0000173.001</id>
                  <registers>
                    <register>informal</register>
                  </registers>
                  <synonyms>
                    <synonym>
                      <id>adept</id>
                      <language>en</language>
                      <text>adept</text>
                    </synonym>
                    <synonym>
                      <id>champion</id>
                      <language>en</language>
                      <text>champion</text>
                    </synonym>
                    <synonym>
                      <id>doyen</id>
                      <language>en</language>
                      <text>doyen</text>
                    </synonym>
                    <synonym>
                      <id>expert</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>expert</text>
                    </synonym>
                    <synonym>
                      <id>genius</id>
                      <language>en</language>
                      <text>genius</text>
                    </synonym>
                    <synonym>
                      <id>maestro</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>maestro</text>
                    </synonym>
                    <synonym>
                      <id>master</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>master</text>
                    </synonym>
                    <synonym>
                      <id>past_master</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>past master</text>
                    </synonym>
                    <synonym>
                      <id>professional</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>professional</text>
                    </synonym>
                    <synonym>
                      <id>star</id>
                      <language>en</language>
                      <text>star</text>
                    </synonym>
                    <synonym>
                      <id>virtuoso</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>virtuoso</text>
                    </synonym>
                    <synonym>
                      <id>winner</id>
                      <language>en</language>
                      <text>winner</text>
                    </synonym>
                  </synonyms>
                  <antonyms>
                    <antonym>
                      <id>amateur</id>
                      <language>en</language>
                      <text>amateur</text>
                    </antonym>
                  </antonyms>
                  <subsenses>
                    <subsense>
                      <id>genID_d82564e26385</id>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <synonyms>
                        <synonym>
                          <id>wunderkind</id>
                          <language>en</language>
                          <text>wunderkind</text>
                        </synonym>
                      </synonyms>
                    </subsense>
                    <subsense>
                      <id>genID_d82564e26392</id>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <synonyms>
                        <synonym>
                          <id>demon</id>
                          <language>en</language>
                          <registers>
                            <register>informal</register>
                          </registers>
                          <text>demon</text>
                        </synonym>
                        <synonym>
                          <id>hotshot</id>
                          <language>en</language>
                          <text>hotshot</text>
                        </synonym>
                        <synonym>
                          <id>ninja</id>
                          <language>en</language>
                          <text>ninja</text>
                        </synonym>
                        <synonym>
                          <id>pro</id>
                          <language>en</language>
                          <text>pro</text>
                        </synonym>
                        <synonym>
                          <id>whizz</id>
                          <language>en</language>
                          <text>whizz</text>
                        </synonym>
                        <synonym>
                          <id>wiz</id>
                          <language>en</language>
                          <text>wiz</text>
                        </synonym>
                        <synonym>
                          <id>wizard</id>
                          <language>en</language>
                          <registers>
                            <register>informal</register>
                          </registers>
                          <text>wizard</text>
                        </synonym>
                      </synonyms>
                    </subsense>
                    <subsense>
                      <id>genID_d82564e26411</id>
                      <regions>
                        <region>British</region>
                      </regions>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <synonyms>
                        <synonym>
                          <id>dab_hand</id>
                          <language>en</language>
                          <text>dab hand</text>
                        </synonym>
                      </synonyms>
                    </subsense>
                    <subsense>
                      <id>genID_d82564e26420</id>
                      <regions>
                        <region>North American</region>
                      </regions>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <synonyms>
                        <synonym>
                          <id>crackerjack</id>
                          <language>en</language>
                          <text>crackerjack</text>
                        </synonym>
                        <synonym>
                          <id>maven</id>
                          <language>en</language>
                          <text>maven</text>
                        </synonym>
                      </synonyms>
                    </subsense>
                  </subsenses>
                </sense>
              </senses>
            </entry>
          </entries>
        </lexicalEntry>
        <lexicalEntry>
          <language>en</language>
          <lexicalCategory>Adjective</lexicalCategory>
          <text>ace</text>
          <entries>
            <entry>
              <homographNumber>001</homographNumber>
              <senses>
                <sense>
                  <examples>
                    <example>
                      <text>an ace tennis player</text>
                    </example>
                  </examples>
                  <id>t_en_gb0000173.002-se1-2</id>
                  <registers>
                    <register>informal</register>
                  </registers>
                  <synonyms>
                    <synonym>
                      <id>adept</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>adept</text>
                    </synonym>
                    <synonym>
                      <id>champion</id>
                      <language>en</language>
                      <text>champion</text>
                    </synonym>
                    <synonym>
                      <id>consummate</id>
                      <language>en</language>
                      <text>consummate</text>
                    </synonym>
                    <synonym>
                      <id>excellent</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>excellent</text>
                    </synonym>
                    <synonym>
                      <id>expert</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>expert</text>
                    </synonym>
                    <synonym>
                      <id>fine</id>
                      <language>en</language>
                      <text>fine</text>
                    </synonym>
                    <synonym>
                      <id>first-class</id>
                      <language>en</language>
                      <text>first-class</text>
                    </synonym>
                    <synonym>
                      <id>first-rate</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>first-rate</text>
                    </synonym>
                    <synonym>
                      <id>formidable</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>formidable</text>
                    </synonym>
                    <synonym>
                      <id>magnificent</id>
                      <language>en</language>
                      <text>magnificent</text>
                    </synonym>
                    <synonym>
                      <id>marvellous</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>marvellous</text>
                    </synonym>
                    <synonym>
                      <id>masterly</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>masterly</text>
                    </synonym>
                    <synonym>
                      <id>outstanding</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>outstanding</text>
                    </synonym>
                    <synonym>
                      <id>skilful</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>skilful</text>
                    </synonym>
                    <synonym>
                      <id>superlative</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>superlative</text>
                    </synonym>
                    <synonym>
                      <id>very_good</id>
                      <language>en</language>
                      <text>very good</text>
                    </synonym>
                    <synonym>
                      <id>virtuoso</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>virtuoso</text>
                    </synonym>
                    <synonym>
                      <id>wonderful</id>
                      <language>en</language>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <text>wonderful</text>
                    </synonym>
                  </synonyms>
                  <antonyms>
                    <antonym>
                      <id>mediocre</id>
                      <language>en</language>
                      <text>mediocre</text>
                    </antonym>
                    <antonym>
                      <id>amateurish</id>
                      <language>en</language>
                      <text>amateurish</text>
                    </antonym>
                  </antonyms>
                  <subsenses>
                    <subsense>
                      <id>genID_d82564e26515</id>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <synonyms>
                        <synonym>
                          <id>a1</id>
                          <language>en</language>
                          <text>A1</text>
                        </synonym>
                        <synonym>
                          <id>awesome</id>
                          <language>en</language>
                          <text>awesome</text>
                        </synonym>
                        <synonym>
                          <id>crack</id>
                          <language>en</language>
                          <text>crack</text>
                        </synonym>
                        <synonym>
                          <id>demon</id>
                          <language>en</language>
                          <registers>
                            <register>informal</register>
                          </registers>
                          <text>demon</text>
                        </synonym>
                        <synonym>
                          <id>fab</id>
                          <language>en</language>
                          <text>fab</text>
                        </synonym>
                        <synonym>
                          <id>fabulous</id>
                          <language>en</language>
                          <registers>
                            <register>informal</register>
                          </registers>
                          <text>fabulous</text>
                        </synonym>
                        <synonym>
                          <id>fantastic</id>
                          <language>en</language>
                          <registers>
                            <register>informal</register>
                          </registers>
                          <text>fantastic</text>
                        </synonym>
                        <synonym>
                          <id>great</id>
                          <language>en</language>
                          <registers>
                            <register>informal</register>
                          </registers>
                          <text>great</text>
                        </synonym>
                        <synonym>
                          <id>hotshot</id>
                          <language>en</language>
                          <text>hotshot</text>
                        </synonym>
                        <synonym>
                          <id>magic</id>
                          <language>en</language>
                          <text>magic</text>
                        </synonym>
                        <synonym>
                          <id>mean</id>
                          <language>en</language>
                          <text>mean</text>
                        </synonym>
                        <synonym>
                          <id>sensational</id>
                          <language>en</language>
                          <registers>
                            <register>informal</register>
                          </registers>
                          <text>sensational</text>
                        </synonym>
                        <synonym>
                          <id>smashing</id>
                          <language>en</language>
                          <registers>
                            <register>informal</register>
                          </registers>
                          <text>smashing</text>
                        </synonym>
                        <synonym>
                          <id>stellar</id>
                          <language>en</language>
                          <text>stellar</text>
                        </synonym>
                        <synonym>
                          <id>superb</id>
                          <language>en</language>
                          <registers>
                            <register>informal</register>
                          </registers>
                          <text>superb</text>
                        </synonym>
                        <synonym>
                          <id>terrific</id>
                          <language>en</language>
                          <registers>
                            <register>informal</register>
                          </registers>
                          <text>terrific</text>
                        </synonym>
                        <synonym>
                          <id>tip-top</id>
                          <language>en</language>
                          <text>tip-top</text>
                        </synonym>
                        <synonym>
                          <id>top-notch</id>
                          <language>en</language>
                          <registers>
                            <register>informal</register>
                          </registers>
                          <text>top-notch</text>
                        </synonym>
                        <synonym>
                          <id>tremendous</id>
                          <language>en</language>
                          <registers>
                            <register>informal</register>
                          </registers>
                          <text>tremendous</text>
                        </synonym>
                        <synonym>
                          <id>wicked</id>
                          <language>en</language>
                          <registers>
                            <register>informal</register>
                          </registers>
                          <text>wicked</text>
                        </synonym>
                      </synonyms>
                    </subsense>
                    <subsense>
                      <id>genID_d82564e26581</id>
                      <registers>
                        <register>vulgar slang</register>
                        <register>informal</register>
                      </registers>
                      <synonyms>
                        <synonym>
                          <id>shit-hot</id>
                          <language>en</language>
                          <text>shit-hot</text>
                        </synonym>
                      </synonyms>
                    </subsense>
                    <subsense>
                      <id>genID_d82564e26561</id>
                      <regions>
                        <region>British</region>
                      </regions>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <synonyms>
                        <synonym>
                          <id>brill</id>
                          <language>en</language>
                          <text>brill</text>
                        </synonym>
                        <synonym>
                          <id>brilliant</id>
                          <language>en</language>
                          <regions>
                            <region>British</region>
                          </regions>
                          <registers>
                            <register>informal</register>
                          </registers>
                          <text>brilliant</text>
                        </synonym>
                      </synonyms>
                    </subsense>
                    <subsense>
                      <id>genID_d82564e26572</id>
                      <regions>
                        <region>North American</region>
                      </regions>
                      <registers>
                        <register>informal</register>
                      </registers>
                      <synonyms>
                        <synonym>
                          <id>badass</id>
                          <language>en</language>
                          <text>badass</text>
                        </synonym>
                      </synonyms>
                    </subsense>
                  </subsenses>
                </sense>
              </senses>
            </entry>
          </entries>
        </lexicalEntry>
      </lexicalEntries>
    </result>
  </results>
</thesaurus>

I’ve preserved the original structure of the JSON response as much as possible, and have made minimal alterations beyond those descibed in the below section “Thesaurus library module”. One change is returning <thesaurus> as the root XML element, which has the attibutes input and language. The input attribute is assigned a value matching the ‘word’ sent to the Oxford Dictionaries API, and is equivalent to the API’s word_id parameter. Likewise, the language attribute is assigned a value equivalent to the API’s source_lang parameter. As a result, <thesaurus input="ace" language="en"> allows us to create efficient XML queries on cached thesaurus data from English related to the string ‘ace’. Another change is the addition of a <date> element within the <metadata> element. Text content within <date> is from the response header received from the Oxford Dictionaries API. This is useful for evaluating whether or not to update cached thesaurus data by making a new request for a given word_id to the Oxford Dictionaries API.

Thesaurus library module

The library module od-api-basex.xquery contains XQuery functions related to thesaurus data, and can be used with the XQuery code below:

import module namespace od-api="od-api-basex" at "https://raw.githubusercontent.com/AdamSteffanick/od-api-xquery/master/od-api-basex.xquery";

Using this library module helps create clean, verbose XML structures that follow the models found in the “Thesaurus” section of the Oxford Dictionaries API documentation. Consider that synonym data have the following JSON model schema:

{
  "synonyms": [
    {
      "domains": [
        "string"
      ],
      "id": "string",
      "language": "string",
      "regions": [
        "string"
      ],
      "registers": [
        "string"
      ],
      "text": "string"
    }
  ]
}

The Oxford Dictionaries API returns arrays of synonym data. Requesting thesaurus data for the word ‘ace’ returns a JSON response including:

{
  "synonyms": [
    {
      "id": "expert",
      "language": "en",
      "registers": [
        "informal"
      ],
      "text": "expert"
    },
    {
      "id": "master",
      "language": "en",
      "registers": [
        "informal"
      ],
      "text": "master"
    }
  ]
}

By default, converting the response above from JSON to XML with BaseX yields an undesired XML structure:

<synonyms type="array">
  <_ type="object">
    <id>expert</id>
    <language>en</language>
    <registers type="array">
      <_>informal</_>
    </registers>
    <text>expert</text>
  </_>
  <_ type="object">
    <id>master</id>
    <language>en</language>
    <registers type="array">
      <_>informal</_>
    </registers>
    <text>master</text>
  </_>
</synonyms>

We can use XQuery to assign names to the <_> elements above. In the case of elements with plural names, such as <synonyms>, functions within the library module create child elements with singular names, such as <synonym>:

<synonyms>
  <synonym>
    <id>expert</id>
    <language>en</language>
    <registers>
      <register>informal</register>
    </registers>
    <text>expert</text>
  </synonym>
  <synonym>
    <id>master</id>
    <language>en</language>
    <registers>
      <register>informal</register>
    </registers>
    <text>master</text>
  </synonym>
</synonyms>

The majority of the XQuery code in the library module od-api-basex.xquery is not specific to BaseX, however I did use the BaseX HTTP Module. Changes to the library module may be required when using a different XQuery processor.

What we learned

Thanks to this session of the Vanderbilt University XQuery Working Group, we can now:

Thank you for reading, and have fun coding.