The CITE Architecture Technology-independent, machine-actionable citation of scholarly resources

A gist for converting OHCO2 text archives

The abstract OHCO2 model of citable text can be equivalently represented in many formats, but in practical terms, local XML files are a convenient representation for human editors.

This gist from Neel Smith gives you a single command you can use to convert a set of local XML files to equivalent representations as tabular structures in delimited text files, and to an RDF graph expressed in TTL notation. cts.groovy is a groovy script: you’ll need groovy to run it. It resolves and retrieves library dependencies dynamically, so you’ll also need to be connected to the internet. Finally, since it creates new output files, you’ll need to run it in a directory where you have write permission.

The script takes three pieces of information as input: a TextInventory file documenting your texts and their citation schemes; the root directory of the files where your XML editions are stored; and a Relax NG schema for validating the TextInventory. As the prefatory comment in the script shows, you can run it with

groovy cts.groovy INVENTORYFILE ARCHIVEDIRECTORY SCHEMAFILE

and you’ll get as output a new directory cts-tabulated containing one delimited text file for each text, and a new file cts.ttl expressing your entire archive in a single RDF graph.