We retrieved thesaurus data from the Oxford Dictionaries application programming interface (API) and returned Extensible Markup Language (XML) with XQuery, an XML query language, and BaseX, an XML database engine and XQuery processor. This tutorial illustrates how to retrieve thesaurus data—synonyms and antonyms—as XML from the Oxford Dictionaries API with XQuery and BaseX.
The Oxford Dictionaries API returns JavaScript Object Notation (JSON) responses that yield undesired XML structures when converted automatically with BaseX. Fortunately, we’re able to use XQuery to fill in some blanks after converting JSON to XML. My GitHub repository od-api-xquery contains XQuery code for this tutorial.
Retrieve thesaurus data
Retreiving thesaurus data from the Oxford Dictionaries API as XML with XQuery can be easy, and we’ll use od-api.xquery as a template to help us get started. First, we import an XQuery thesaurus library module, od-api-basex.xquery, and assign it the namespace od-api
. Second, we replace myId
and myKey
with our own Oxford Dictionaries API Credentials. Third, we set options, such as $source-lang
and $thesaurus-operation
, following the “Thesaurus” section of the Oxford Dictionaries API documentation, to retrieve synonyms (i.e., words similar in meaning), antonyms (i.e., words opposite in meaning), or both. Finally, we request thesarus data using $thesaurus()
. Sample XQuery code for retrieving thesaurus data as XML for the word ‘ace’ follows:
xquery version "3.1" encoding "UTF-8";
import module namespace od-api="od-api-basex" at "https://raw.githubusercontent.com/AdamSteffanick/od-api-xquery/master/od-api-basex.xquery";
let $id := "myId"
let $key := "myKey"
let $source-lang := "en"
let $thesaurus-operation := "synonyms;antonyms"
let $thesaurus := od-api:thesaurus($source-lang, ?, $thesaurus-operation, $id, $key)
return $thesaurus("ace")
In the XQuery code above, we use return $thesaurus("ace")
to trigger our request to the Oxford Dictionaries API. The library module automatically alters the argument of $thesaurus()
by replacing spaces with underscores, forcing lower-case characters, and encoding reserved characters. For example, returning $thesaurus("pâté")
sends the word_id
parameter p%C3%A2t%C3%A9
to the Oxford Dictionaries API. Similarly, returning $thesaurus("United States of America")
sends the word_id
parameter united_states_of_america
. Returning $thesaurus("ace")
triggers a request to the Oxford Dictionaries API through the use of a partial function application. The actual request is sent by calling the library module’s od-api:thesaurus()
function, whose argument ?
is assigned its value by $thesaurus("ace")
. Running our XQuery code above with BaseX returns the following result:
<thesaurus input="ace" language="en">
<metadata>
<provider>Oxford University Press</provider>
<date>Thu, 09 Feb 2017 18:00:00 GMT</date>
</metadata>
<results>
<result>
<id>ace</id>
<language>en</language>
<type>headword</type>
<word>ace</word>
<lexicalEntries>
<lexicalEntry>
<language>en</language>
<lexicalCategory>Noun</lexicalCategory>
<text>ace</text>
<entries>
<entry>
<homographNumber>000</homographNumber>
<senses>
<sense>
<examples>
<example>
<text>a rowing ace</text>
</example>
</examples>
<id>t_en_gb0000173.001</id>
<registers>
<register>informal</register>
</registers>
<synonyms>
<synonym>
<id>adept</id>
<language>en</language>
<text>adept</text>
</synonym>
<synonym>
<id>champion</id>
<language>en</language>
<text>champion</text>
</synonym>
<synonym>
<id>doyen</id>
<language>en</language>
<text>doyen</text>
</synonym>
<synonym>
<id>expert</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>expert</text>
</synonym>
<synonym>
<id>genius</id>
<language>en</language>
<text>genius</text>
</synonym>
<synonym>
<id>maestro</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>maestro</text>
</synonym>
<synonym>
<id>master</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>master</text>
</synonym>
<synonym>
<id>past_master</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>past master</text>
</synonym>
<synonym>
<id>professional</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>professional</text>
</synonym>
<synonym>
<id>star</id>
<language>en</language>
<text>star</text>
</synonym>
<synonym>
<id>virtuoso</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>virtuoso</text>
</synonym>
<synonym>
<id>winner</id>
<language>en</language>
<text>winner</text>
</synonym>
</synonyms>
<antonyms>
<antonym>
<id>amateur</id>
<language>en</language>
<text>amateur</text>
</antonym>
</antonyms>
<subsenses>
<subsense>
<id>genID_d82564e26385</id>
<registers>
<register>informal</register>
</registers>
<synonyms>
<synonym>
<id>wunderkind</id>
<language>en</language>
<text>wunderkind</text>
</synonym>
</synonyms>
</subsense>
<subsense>
<id>genID_d82564e26392</id>
<registers>
<register>informal</register>
</registers>
<synonyms>
<synonym>
<id>demon</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>demon</text>
</synonym>
<synonym>
<id>hotshot</id>
<language>en</language>
<text>hotshot</text>
</synonym>
<synonym>
<id>ninja</id>
<language>en</language>
<text>ninja</text>
</synonym>
<synonym>
<id>pro</id>
<language>en</language>
<text>pro</text>
</synonym>
<synonym>
<id>whizz</id>
<language>en</language>
<text>whizz</text>
</synonym>
<synonym>
<id>wiz</id>
<language>en</language>
<text>wiz</text>
</synonym>
<synonym>
<id>wizard</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>wizard</text>
</synonym>
</synonyms>
</subsense>
<subsense>
<id>genID_d82564e26411</id>
<regions>
<region>British</region>
</regions>
<registers>
<register>informal</register>
</registers>
<synonyms>
<synonym>
<id>dab_hand</id>
<language>en</language>
<text>dab hand</text>
</synonym>
</synonyms>
</subsense>
<subsense>
<id>genID_d82564e26420</id>
<regions>
<region>North American</region>
</regions>
<registers>
<register>informal</register>
</registers>
<synonyms>
<synonym>
<id>crackerjack</id>
<language>en</language>
<text>crackerjack</text>
</synonym>
<synonym>
<id>maven</id>
<language>en</language>
<text>maven</text>
</synonym>
</synonyms>
</subsense>
</subsenses>
</sense>
</senses>
</entry>
</entries>
</lexicalEntry>
<lexicalEntry>
<language>en</language>
<lexicalCategory>Adjective</lexicalCategory>
<text>ace</text>
<entries>
<entry>
<homographNumber>001</homographNumber>
<senses>
<sense>
<examples>
<example>
<text>an ace tennis player</text>
</example>
</examples>
<id>t_en_gb0000173.002-se1-2</id>
<registers>
<register>informal</register>
</registers>
<synonyms>
<synonym>
<id>adept</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>adept</text>
</synonym>
<synonym>
<id>champion</id>
<language>en</language>
<text>champion</text>
</synonym>
<synonym>
<id>consummate</id>
<language>en</language>
<text>consummate</text>
</synonym>
<synonym>
<id>excellent</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>excellent</text>
</synonym>
<synonym>
<id>expert</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>expert</text>
</synonym>
<synonym>
<id>fine</id>
<language>en</language>
<text>fine</text>
</synonym>
<synonym>
<id>first-class</id>
<language>en</language>
<text>first-class</text>
</synonym>
<synonym>
<id>first-rate</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>first-rate</text>
</synonym>
<synonym>
<id>formidable</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>formidable</text>
</synonym>
<synonym>
<id>magnificent</id>
<language>en</language>
<text>magnificent</text>
</synonym>
<synonym>
<id>marvellous</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>marvellous</text>
</synonym>
<synonym>
<id>masterly</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>masterly</text>
</synonym>
<synonym>
<id>outstanding</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>outstanding</text>
</synonym>
<synonym>
<id>skilful</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>skilful</text>
</synonym>
<synonym>
<id>superlative</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>superlative</text>
</synonym>
<synonym>
<id>very_good</id>
<language>en</language>
<text>very good</text>
</synonym>
<synonym>
<id>virtuoso</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>virtuoso</text>
</synonym>
<synonym>
<id>wonderful</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>wonderful</text>
</synonym>
</synonyms>
<antonyms>
<antonym>
<id>mediocre</id>
<language>en</language>
<text>mediocre</text>
</antonym>
<antonym>
<id>amateurish</id>
<language>en</language>
<text>amateurish</text>
</antonym>
</antonyms>
<subsenses>
<subsense>
<id>genID_d82564e26515</id>
<registers>
<register>informal</register>
</registers>
<synonyms>
<synonym>
<id>a1</id>
<language>en</language>
<text>A1</text>
</synonym>
<synonym>
<id>awesome</id>
<language>en</language>
<text>awesome</text>
</synonym>
<synonym>
<id>crack</id>
<language>en</language>
<text>crack</text>
</synonym>
<synonym>
<id>demon</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>demon</text>
</synonym>
<synonym>
<id>fab</id>
<language>en</language>
<text>fab</text>
</synonym>
<synonym>
<id>fabulous</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>fabulous</text>
</synonym>
<synonym>
<id>fantastic</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>fantastic</text>
</synonym>
<synonym>
<id>great</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>great</text>
</synonym>
<synonym>
<id>hotshot</id>
<language>en</language>
<text>hotshot</text>
</synonym>
<synonym>
<id>magic</id>
<language>en</language>
<text>magic</text>
</synonym>
<synonym>
<id>mean</id>
<language>en</language>
<text>mean</text>
</synonym>
<synonym>
<id>sensational</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>sensational</text>
</synonym>
<synonym>
<id>smashing</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>smashing</text>
</synonym>
<synonym>
<id>stellar</id>
<language>en</language>
<text>stellar</text>
</synonym>
<synonym>
<id>superb</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>superb</text>
</synonym>
<synonym>
<id>terrific</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>terrific</text>
</synonym>
<synonym>
<id>tip-top</id>
<language>en</language>
<text>tip-top</text>
</synonym>
<synonym>
<id>top-notch</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>top-notch</text>
</synonym>
<synonym>
<id>tremendous</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>tremendous</text>
</synonym>
<synonym>
<id>wicked</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>wicked</text>
</synonym>
</synonyms>
</subsense>
<subsense>
<id>genID_d82564e26581</id>
<registers>
<register>vulgar slang</register>
<register>informal</register>
</registers>
<synonyms>
<synonym>
<id>shit-hot</id>
<language>en</language>
<text>shit-hot</text>
</synonym>
</synonyms>
</subsense>
<subsense>
<id>genID_d82564e26561</id>
<regions>
<region>British</region>
</regions>
<registers>
<register>informal</register>
</registers>
<synonyms>
<synonym>
<id>brill</id>
<language>en</language>
<text>brill</text>
</synonym>
<synonym>
<id>brilliant</id>
<language>en</language>
<regions>
<region>British</region>
</regions>
<registers>
<register>informal</register>
</registers>
<text>brilliant</text>
</synonym>
</synonyms>
</subsense>
<subsense>
<id>genID_d82564e26572</id>
<regions>
<region>North American</region>
</regions>
<registers>
<register>informal</register>
</registers>
<synonyms>
<synonym>
<id>badass</id>
<language>en</language>
<text>badass</text>
</synonym>
</synonyms>
</subsense>
</subsenses>
</sense>
</senses>
</entry>
</entries>
</lexicalEntry>
</lexicalEntries>
</result>
</results>
</thesaurus>
I’ve preserved the original structure of the JSON response as much as possible, and have made minimal alterations beyond those descibed in the below section “Thesaurus library module”. One change is returning <thesaurus>
as the root XML element, which has the attibutes input
and language
. The input
attribute is assigned a value matching the ‘word’ sent to the Oxford Dictionaries API, and is equivalent to the API’s word_id
parameter. Likewise, the language
attribute is assigned a value equivalent to the API’s source_lang
parameter. As a result, <thesaurus input="ace" language="en">
allows us to create efficient XML queries on cached thesaurus data from English related to the string ‘ace’. Another change is the addition of a <date>
element within the <metadata>
element. Text content within <date>
is from the response header received from the Oxford Dictionaries API. This is useful for evaluating whether or not to update cached thesaurus data by making a new request for a given word_id
to the Oxford Dictionaries API.
Thesaurus library module
The library module od-api-basex.xquery contains XQuery functions related to thesaurus data, and can be used with the XQuery code below:
import module namespace od-api="od-api-basex" at "https://raw.githubusercontent.com/AdamSteffanick/od-api-xquery/master/od-api-basex.xquery";
Using this library module helps create clean, verbose XML structures that follow the models found in the “Thesaurus” section of the Oxford Dictionaries API documentation. Consider that synonym data have the following JSON model schema:
{
"synonyms": [
{
"domains": [
"string"
],
"id": "string",
"language": "string",
"regions": [
"string"
],
"registers": [
"string"
],
"text": "string"
}
]
}
The Oxford Dictionaries API returns arrays of synonym data. Requesting thesaurus data for the word ‘ace’ returns a JSON response including:
{
"synonyms": [
{
"id": "expert",
"language": "en",
"registers": [
"informal"
],
"text": "expert"
},
{
"id": "master",
"language": "en",
"registers": [
"informal"
],
"text": "master"
}
]
}
By default, converting the response above from JSON to XML with BaseX yields an undesired XML structure:
<synonyms type="array">
<_ type="object">
<id>expert</id>
<language>en</language>
<registers type="array">
<_>informal</_>
</registers>
<text>expert</text>
</_>
<_ type="object">
<id>master</id>
<language>en</language>
<registers type="array">
<_>informal</_>
</registers>
<text>master</text>
</_>
</synonyms>
We can use XQuery to assign names to the <_>
elements above. In the case of elements with plural names, such as <synonyms>
, functions within the library module create child elements with singular names, such as <synonym>
:
<synonyms>
<synonym>
<id>expert</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>expert</text>
</synonym>
<synonym>
<id>master</id>
<language>en</language>
<registers>
<register>informal</register>
</registers>
<text>master</text>
</synonym>
</synonyms>
The majority of the XQuery code in the library module od-api-basex.xquery is not specific to BaseX, however I did use the BaseX HTTP Module. Changes to the library module may be required when using a different XQuery processor.
What we learned
Thanks to this session of the Vanderbilt University XQuery Working Group, we can now:
- retrieve thesaurus data as XML with XQuery
- use an XQuery thesaurus library module
Thank you for reading, and have fun coding.