Convert CSV to XML with XQuery and BaseX

Convert CSV to XML with XQuery and BaseX

My notes from the Vanderbilt University XQuery Working Group

We sorted data after converting it from Comma Separated Values (CSV) to Extensible Markup Language (XML) with XQuery, an XML query language, and BaseX, an XML database engine and XQuery processor. This guide covers the basics of how to convert data from CSV to XML and sort the result with XQuery and BaseX.

Clifford Anderson provided us with a GitHub gist of CSV formatted data and XQuery code to convert our CSV to XML. This guide references a forked GitHub gist to illustrate each step of our process.

Step 1: Locate a CSV data file

Our CSV data file, books.csv, contains five values (i.e., fields) per row (i.e., record). The delimiter , separates the fields within each record, and fields containing a comma begin and end with ". The first record is a header, which names the fields within each record:

  1. Author
  2. Title
  3. ISBN
  4. Binding
  5. Year Published

Each subsequent record composes our data:

Author,Title,ISBN,Binding,Year Published
Jeannette Walls,The Glass Castle,074324754X,Paperback,2006
James Surowiecki,The Wisdom of Crowds,385721706,Paperback,2005
Lawrence Lessig,The Future of Ideas,375726446,Paperback,2002
"Larry Bossidy, Ram Charan, Charles Burck",Execution,609610570,Hardcover,2002
Kurt Vonnegut,Slaughterhouse-Five,385333846,Paperback,1999
Haruki Murakami,After the Quake,375713271,Paperback,2003
Nassim Nicholas Taleb,Fooled by Randomness,812975219,Paperback,2005
Robert A. Dahl,On Democracy (Yale Nota Bene),300084552,Paperback,2000
Daniel Defoe,A Journal of the Plague Year,375757899,Paperback,2001
Philip K. Dick,Do Androids Dream of Electric Sheep?,345404475,Paperback,1996
Janna Levin,A Madman Dreams of Turing Machines,1400040302,Hardcover,2006
"Alan Moore, David Lloyd",V for Vendetta,930289528,Paperback,1995
Edward W. Said,Orientalism,039474067X,Paperback,1979
Gary Shteyngart,The Russian Debutante's Handbook,1573229881,Paperback,2003
Raymond Chandler,"Farewell, My Lovely",394758277,Paperback,1992
Stefan Zweig,Beware of Pity,1590172000,Paperback,2006
James Blish,A Case of Conscience,345438353,Paperback,2000
Alan Moore,Watchmen,930289234,Paperback,1995
Marilynne Robinson,Gilead,374153892,Hardcover,2004

Step 2: Convert CSV to XML with XQuery

Converting our CSV to XML with XQuery is a two-part process. First, we fetch our data from books.csv and bind it to the variable $csv. Second, we return our data in XML by parsing $csv and mapping the fields in each record to our header. The XQuery code to convert our CSV to XML follows in fetch-books.xquery:

xquery version "3.1";

let $csv := fetch:text("https://gist.githubusercontent.com/AdamSteffanick/ecf1240e182fe7d86fd4/raw/690b90b8b8aaa5dd9f29ca1949af12e9c9fa996e/books.csv")
return csv:parse($csv, map { "header": true() })

Our code in fetch-books.xquery converts the fields within each record to XML nodes within a <record> parent node. Running fetch-books.xquery with BaseX returns the following result in XML:

<csv>
  <record>
    <Author>Jeannette Walls</Author>
    <Title>The Glass Castle</Title>
    <ISBN>074324754X</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2006</Year_Published>
  </record>
  <record>
    <Author>James Surowiecki</Author>
    <Title>The Wisdom of Crowds</Title>
    <ISBN>385721706</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2005</Year_Published>
  </record>
  <record>
    <Author>Lawrence Lessig</Author>
    <Title>The Future of Ideas</Title>
    <ISBN>375726446</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2002</Year_Published>
  </record>
  <record>
    <Author>Larry Bossidy, Ram Charan, Charles Burck</Author>
    <Title>Execution</Title>
    <ISBN>609610570</ISBN>
    <Binding>Hardcover</Binding>
    <Year_Published>2002</Year_Published>
  </record>
  <record>
    <Author>Kurt Vonnegut</Author>
    <Title>Slaughterhouse-Five</Title>
    <ISBN>385333846</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>1999</Year_Published>
  </record>
  <record>
    <Author>Haruki Murakami</Author>
    <Title>After the Quake</Title>
    <ISBN>375713271</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2003</Year_Published>
  </record>
  <record>
    <Author>Nassim Nicholas Taleb</Author>
    <Title>Fooled by Randomness</Title>
    <ISBN>812975219</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2005</Year_Published>
  </record>
  <record>
    <Author>Robert A. Dahl</Author>
    <Title>On Democracy (Yale Nota Bene)</Title>
    <ISBN>300084552</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2000</Year_Published>
  </record>
  <record>
    <Author>Daniel Defoe</Author>
    <Title>A Journal of the Plague Year</Title>
    <ISBN>375757899</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2001</Year_Published>
  </record>
  <record>
    <Author>Philip K. Dick</Author>
    <Title>Do Androids Dream of Electric Sheep?</Title>
    <ISBN>345404475</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>1996</Year_Published>
  </record>
  <record>
    <Author>Janna Levin</Author>
    <Title>A Madman Dreams of Turing Machines</Title>
    <ISBN>1400040302</ISBN>
    <Binding>Hardcover</Binding>
    <Year_Published>2006</Year_Published>
  </record>
  <record>
    <Author>Alan Moore, David Lloyd</Author>
    <Title>V for Vendetta</Title>
    <ISBN>930289528</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>1995</Year_Published>
  </record>
  <record>
    <Author>Edward W. Said</Author>
    <Title>Orientalism</Title>
    <ISBN>039474067X</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>1979</Year_Published>
  </record>
  <record>
    <Author>Gary Shteyngart</Author>
    <Title>The Russian Debutante's Handbook</Title>
    <ISBN>1573229881</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2003</Year_Published>
  </record>
  <record>
    <Author>Raymond Chandler</Author>
    <Title>Farewell, My Lovely</Title>
    <ISBN>394758277</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>1992</Year_Published>
  </record>
  <record>
    <Author>Stefan Zweig</Author>
    <Title>Beware of Pity</Title>
    <ISBN>1590172000</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2006</Year_Published>
  </record>
  <record>
    <Author>James Blish</Author>
    <Title>A Case of Conscience</Title>
    <ISBN>345438353</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2000</Year_Published>
  </record>
  <record>
    <Author>Alan Moore</Author>
    <Title>Watchmen</Title>
    <ISBN>930289234</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>1995</Year_Published>
  </record>
  <record>
    <Author>Marilynne Robinson</Author>
    <Title>Gilead</Title>
    <ISBN>374153892</ISBN>
    <Binding>Hardcover</Binding>
    <Year_Published>2004</Year_Published>
  </record>
</csv>

Step 3: Sort XML data with XQuery

Sort XML data by a value within each record

After we had converted our CSV to XML, Clifford Anderson challenged us to sort books.csv alphabetically by author. To accomplish this, we bind the <record> nodes within the result from fetch-books.xquery to the variable $books. From here, we create a stack by looping through $books to order each <record> node by its <Author> child node and adding it to the variable $book. After our loop ends, we return $book to get our sorted data in XML. The changes to fetch-books.xquery are added in fetch-books-and-order-by-author.xquery:

xquery version "3.1";

let $csv := fetch:text("https://gist.githubusercontent.com/AdamSteffanick/ecf1240e182fe7d86fd4/raw/690b90b8b8aaa5dd9f29ca1949af12e9c9fa996e/books.csv")
let $books := csv:parse($csv, map { "header": true() })/csv/record
for $book in $books
order by $book/Author (: Sort books alphabetically by author :)
return $book

Note that our code in fetch-books-and-order-by-author.xquery orders <record> nodes by the full content of <Author> nodes, not by surname. Running fetch-books-and-order-by-author.xquery with BaseX returns the following result in XML:

<record>
  <Author>Alan Moore</Author>
  <Title>Watchmen</Title>
  <ISBN>930289234</ISBN>
  <Binding>Paperback</Binding>
  <Year_Published>1995</Year_Published>
</record>
<record>
  <Author>Alan Moore, David Lloyd</Author>
  <Title>V for Vendetta</Title>
  <ISBN>930289528</ISBN>
  <Binding>Paperback</Binding>
  <Year_Published>1995</Year_Published>
</record>
<record>
  <Author>Daniel Defoe</Author>
  <Title>A Journal of the Plague Year</Title>
  <ISBN>375757899</ISBN>
  <Binding>Paperback</Binding>
  <Year_Published>2001</Year_Published>
</record>
<record>
  <Author>Edward W. Said</Author>
  <Title>Orientalism</Title>
  <ISBN>039474067X</ISBN>
  <Binding>Paperback</Binding>
  <Year_Published>1979</Year_Published>
</record>
<record>
  <Author>Gary Shteyngart</Author>
  <Title>The Russian Debutante's Handbook</Title>
  <ISBN>1573229881</ISBN>
  <Binding>Paperback</Binding>
  <Year_Published>2003</Year_Published>
</record>
<record>
  <Author>Haruki Murakami</Author>
  <Title>After the Quake</Title>
  <ISBN>375713271</ISBN>
  <Binding>Paperback</Binding>
  <Year_Published>2003</Year_Published>
</record>
<record>
  <Author>James Blish</Author>
  <Title>A Case of Conscience</Title>
  <ISBN>345438353</ISBN>
  <Binding>Paperback</Binding>
  <Year_Published>2000</Year_Published>
</record>
<record>
  <Author>James Surowiecki</Author>
  <Title>The Wisdom of Crowds</Title>
  <ISBN>385721706</ISBN>
  <Binding>Paperback</Binding>
  <Year_Published>2005</Year_Published>
</record>
<record>
  <Author>Janna Levin</Author>
  <Title>A Madman Dreams of Turing Machines</Title>
  <ISBN>1400040302</ISBN>
  <Binding>Hardcover</Binding>
  <Year_Published>2006</Year_Published>
</record>
<record>
  <Author>Jeannette Walls</Author>
  <Title>The Glass Castle</Title>
  <ISBN>074324754X</ISBN>
  <Binding>Paperback</Binding>
  <Year_Published>2006</Year_Published>
</record>
<record>
  <Author>Kurt Vonnegut</Author>
  <Title>Slaughterhouse-Five</Title>
  <ISBN>385333846</ISBN>
  <Binding>Paperback</Binding>
  <Year_Published>1999</Year_Published>
</record>
<record>
  <Author>Larry Bossidy, Ram Charan, Charles Burck</Author>
  <Title>Execution</Title>
  <ISBN>609610570</ISBN>
  <Binding>Hardcover</Binding>
  <Year_Published>2002</Year_Published>
</record>
<record>
  <Author>Lawrence Lessig</Author>
  <Title>The Future of Ideas</Title>
  <ISBN>375726446</ISBN>
  <Binding>Paperback</Binding>
  <Year_Published>2002</Year_Published>
</record>
<record>
  <Author>Marilynne Robinson</Author>
  <Title>Gilead</Title>
  <ISBN>374153892</ISBN>
  <Binding>Hardcover</Binding>
  <Year_Published>2004</Year_Published>
</record>
<record>
  <Author>Nassim Nicholas Taleb</Author>
  <Title>Fooled by Randomness</Title>
  <ISBN>812975219</ISBN>
  <Binding>Paperback</Binding>
  <Year_Published>2005</Year_Published>
</record>
<record>
  <Author>Philip K. Dick</Author>
  <Title>Do Androids Dream of Electric Sheep?</Title>
  <ISBN>345404475</ISBN>
  <Binding>Paperback</Binding>
  <Year_Published>1996</Year_Published>
</record>
<record>
  <Author>Raymond Chandler</Author>
  <Title>Farewell, My Lovely</Title>
  <ISBN>394758277</ISBN>
  <Binding>Paperback</Binding>
  <Year_Published>1992</Year_Published>
</record>
<record>
  <Author>Robert A. Dahl</Author>
  <Title>On Democracy (Yale Nota Bene)</Title>
  <ISBN>300084552</ISBN>
  <Binding>Paperback</Binding>
  <Year_Published>2000</Year_Published>
</record>
<record>
  <Author>Stefan Zweig</Author>
  <Title>Beware of Pity</Title>
  <ISBN>1590172000</ISBN>
  <Binding>Paperback</Binding>
  <Year_Published>2006</Year_Published>
</record>

Group XML data by a value within each record

Before concluding our meeting, we thought it would be fun to separate our books into groups by cover type: paperback and hardcover. For this, all we need to do is take our stack of records from fetch-books-and-order-by-author.xquery and group it by the values found within <Binding> nodes. To start, we bind the <Binding> nodes to the variable $binding. Then, we use the values stored in $binding to group our stack of records. Finally, we return $binding and $book as elements to get our grouped and sorted XML. Our final changes are added in fetch-books-order-by-author-and-group-by-cover.xquery:

xquery version "3.1";

let $csv := fetch:text("https://gist.githubusercontent.com/AdamSteffanick/ecf1240e182fe7d86fd4/raw/690b90b8b8aaa5dd9f29ca1949af12e9c9fa996e/books.csv")
let $books := csv:parse($csv, map { "header": true() })/csv/record
for $book in $books
order by $book/Author (: Sort books alphabetically by author :)
let $binding := $book/Binding
group by $binding (: Group books by binding :)
return element {$binding} {$book}

Our new return statement creates parent nodes from each unique value stored in $binding, and these parent nodes contain records stored in our grouped stack of <record> nodes: $book. In this case, it creates the node <Paperback> as the parent of the first <record> group and <Hardcover> as the parent of the second <record> group. Running fetch-books-order-by-author-and-group-by-cover.xquery with BaseX returns the following result in XML:

<Paperback>
  <record>
    <Author>Alan Moore</Author>
    <Title>Watchmen</Title>
    <ISBN>930289234</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>1995</Year_Published>
  </record>
  <record>
    <Author>Alan Moore, David Lloyd</Author>
    <Title>V for Vendetta</Title>
    <ISBN>930289528</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>1995</Year_Published>
  </record>
  <record>
    <Author>Daniel Defoe</Author>
    <Title>A Journal of the Plague Year</Title>
    <ISBN>375757899</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2001</Year_Published>
  </record>
  <record>
    <Author>Edward W. Said</Author>
    <Title>Orientalism</Title>
    <ISBN>039474067X</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>1979</Year_Published>
  </record>
  <record>
    <Author>Gary Shteyngart</Author>
    <Title>The Russian Debutante's Handbook</Title>
    <ISBN>1573229881</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2003</Year_Published>
  </record>
  <record>
    <Author>Haruki Murakami</Author>
    <Title>After the Quake</Title>
    <ISBN>375713271</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2003</Year_Published>
  </record>
  <record>
    <Author>James Blish</Author>
    <Title>A Case of Conscience</Title>
    <ISBN>345438353</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2000</Year_Published>
  </record>
  <record>
    <Author>James Surowiecki</Author>
    <Title>The Wisdom of Crowds</Title>
    <ISBN>385721706</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2005</Year_Published>
  </record>
  <record>
    <Author>Jeannette Walls</Author>
    <Title>The Glass Castle</Title>
    <ISBN>074324754X</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2006</Year_Published>
  </record>
  <record>
    <Author>Kurt Vonnegut</Author>
    <Title>Slaughterhouse-Five</Title>
    <ISBN>385333846</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>1999</Year_Published>
  </record>
  <record>
    <Author>Lawrence Lessig</Author>
    <Title>The Future of Ideas</Title>
    <ISBN>375726446</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2002</Year_Published>
  </record>
  <record>
    <Author>Nassim Nicholas Taleb</Author>
    <Title>Fooled by Randomness</Title>
    <ISBN>812975219</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2005</Year_Published>
  </record>
  <record>
    <Author>Philip K. Dick</Author>
    <Title>Do Androids Dream of Electric Sheep?</Title>
    <ISBN>345404475</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>1996</Year_Published>
  </record>
  <record>
    <Author>Raymond Chandler</Author>
    <Title>Farewell, My Lovely</Title>
    <ISBN>394758277</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>1992</Year_Published>
  </record>
  <record>
    <Author>Robert A. Dahl</Author>
    <Title>On Democracy (Yale Nota Bene)</Title>
    <ISBN>300084552</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2000</Year_Published>
  </record>
  <record>
    <Author>Stefan Zweig</Author>
    <Title>Beware of Pity</Title>
    <ISBN>1590172000</ISBN>
    <Binding>Paperback</Binding>
    <Year_Published>2006</Year_Published>
  </record>
</Paperback>
<Hardcover>
  <record>
    <Author>Janna Levin</Author>
    <Title>A Madman Dreams of Turing Machines</Title>
    <ISBN>1400040302</ISBN>
    <Binding>Hardcover</Binding>
    <Year_Published>2006</Year_Published>
  </record>
  <record>
    <Author>Larry Bossidy, Ram Charan, Charles Burck</Author>
    <Title>Execution</Title>
    <ISBN>609610570</ISBN>
    <Binding>Hardcover</Binding>
    <Year_Published>2002</Year_Published>
  </record>
  <record>
    <Author>Marilynne Robinson</Author>
    <Title>Gilead</Title>
    <ISBN>374153892</ISBN>
    <Binding>Hardcover</Binding>
    <Year_Published>2004</Year_Published>
  </record>
</Hardcover>

What we learned

Thanks to this session of the Vanderbilt University XQuery Working Group, we can now:

  • convert CSV to XML with XQuery
  • sort XML data by a value within each record
  • group XML data by a value within each record

Thank you for reading, and have fun coding.