The semantics of the CTS URN is based on a model of citable text as an “Ordered Hierarchy of Citation Objects.”
In the OHCO2 model, citable text is a set of citable nodes, each with four properties:
The OHCO2 model can be implemented as:
Each model has advantages for different applications, and in the libraries we have developed, we use all three models.
For manually editing texts, the tree model of an XML document is useful; for many kinds of analysis, a tabular structure is simpler to navigate and manipulate; while for integrating with other kinds of information, a we use directed graphs expressed in RDF. We can demonstrate the equivalence under OHCO2 of all these representations by showing that each node preserves the four properties listed above across all representations.
The generality of the model is well illustrated in the fact that at present, there are implementations of the CTS protocol for retrieving texts identified by CTS URNs in the OHCO2 model using:
The name OHCO2 echoes the 1990 paper by DeRose et al.1 proposing that the true structure of text was an Ordered Hierarchy of Content Objects (OHCO). When the majority of the co-authors returned to the subject in 1993,2 they reached a kind of Socratic aporia: since a text can be analyzed in an unlimited number of ways, and any analysis can impose its own, overlapping hierarchy on the content of the text, there can be no single, true OHCO.
Smith and Weaver asked how we are able to discuss the same text from multiple perspectives, and argued that canonical citation serves as a kind of “interchange format” for identifying content within differing analytical hierarchies.3 Hence, their model is called OHCO2, for Ordered Hierarchy of Citation Objects.
Originally proposed by Smith and Weaver (2009), “Applying Domain Knowledge from Structured Citation Formats to Text and Data Mining: Examples Using the CITE Architecture”, Dartmouth Computer Science Technical Report TR2009-649. ↩