Release status.
This document specifies version 3.0.1 of the CITE Exchange format. Version numbers follow semantic versioning guidelines.
The CITE Exchange format is a plain-text, line-oriented data format for serializing citable content following the models of the CITE Architecture. Distinct types of content are grouped in separate labelled blocks, as specified below, so that a single CEX source can integrate any content citable in the CITE Architecture.
Blocks are optional (although some blocks may require the presence of one or more other blocks: see details below). Authors may limit a CEX serialization to include only those kinds of citable content they choose. A null string or empty text file is a syntactically valid, although empty, CEX data serialization.
Valid block labels are:
#!cexversion
#!citelibrary
#!ctsdata
#!ctscatalog
#!citecollections
#!citeproperties
#!citedata
#!imagedata
#!relations
#!datamodels
Within a block, the block label is followed by an ordered sequence of lines. That is, while the appearance of blocks in a CEX source is not ordered, line are ordered within each block.
Empty (zero-length) lines are allowed but are ignored. Lines beginning with the string //
are comments and are ignored. Other lines are treated as the block contents.
The syntax of block contents is specific to the type of the block.
cexversion
The cexversion
block contains a single content line with a string identifying the version of the CITE Exchange format followed in this CEX source.
Example: The following example is a valid cexversion
block. It includes an empty line and a comment line, but only one content line, specifying the version of the CEX data format.
#!cexversion
// note: currently using version 3.0
3.0
citelibrary
The citelibrary
block must include three content lines with metadata about the entire library of material serialized in this CEX file. The three items are formatted as key-value pairs. Each key and value is separated by a string delimiter that does not otherwise appear in content lines of the block. The required strings for keys and their meaning are;
name
: a human-readable name or label for this data seturn
: a CITE2 URN uniquely identifying this librarylicense
: a licensing statement describing rights to use the entire library, as a unit. Individual components may have more permissive licenses.The citelibrary
may optionally include one or more lines associating CITE namespace abbreviations with URIs. The line is formatted as three delimited values: the labelling string namespace
, a string value used as a namespace abbreviation in the library’s CTS of CITE2 URNs, and a URI value.
Example: The following example is a valid citelibrary
block, using #
for its delimiting string. It includes empty lines and comment lines, in addition to the three required key-value pairs, and definitions for two namespace URIs.
#!citelibrary
// Metadata applying to the entire library.
// Note that throughout this source, the string "#" is used as
// the column delimiter.
name#Iliadic Metrical Summaries
urn#urn:cite2:hmt:cex.2017_1:metsumm
license#Creative Commons Attribution, Non-Commercial 4.0 License <https://creativecommons.org/licenses/by-nc/4.0/>.
// Optional namespace definitions
namespace#hmt#http://www.homermultitext.org/citens/hmt
namespace#greekLit#http://chs.harvard.edu/ctsns/greekLit
ctscatalog
The ctscatalog
block contains a table with minimal cataloging data about one or more citable texts. The table is represented as eight columns of delimited text, with columns separated by a string delimiter that does not otherwise appear in content lines of the block. The first content line is a header line with labels for each column. Subsequent content lines document citable versions or exemplars of a text. The eight columns may have any String labels in the header row, but in all rows columns must follow this sequence:
true
or false
Note that it is possible to catalog texts that are not online. Within a CEX serialization, cataloging a work as online means that citable texts nodes for this CTS URN must be available in the ctsdata
block of the CEX.
Example: The following example is a valid ctscatalog
block, using #
for its delimiting string. The tiers of the citation hierarchy are separated using ,
as the delimiting string. It defines a version of a text: note that the column for exemplar label is empty.
#!ctscatalog
// Complete catalog for a single citable text: an edition
// of scholia in a manuscript of the *Iliad*. There is no
// specific exemplar of this edition.
urn#citationScheme#groupName#workTitle#versionLabel#exemplarLabel#online#lang
urn:cts:greekLit:tlg5026.msA.hmt:#book,comment,section#Scholia Vetera in Iliadem#Main scholia to Venetus A#Homer Multitext##true#grc
ctsdata
The ctsdata
block contains a two-column representation of a citable text in the OHCO2 model. Columns are separated by a string delimiter that does not otherwise appear in content lines of the block. The first column gives the CTS URN for a citable node; the second column gives it text contents. Within a given citable version or exemplar, nodes must be in document order.
Example: The following example is a valid ctscatalog
block, using #
for its delimiting string. It defines two citable nodes of text.
#!ctsdata
// CTS data: "#" is the column delimiter.
urn:cts:greekLit:tlg5026.msA.hmt:1.2.lemma#μῆνις
urn:cts:greekLit:tlg5026.msA.hmt:1.2.comment#παρὰ τὸ μένω μῆνις ὡς ἐνὸς ἦνις· οἱ δὲ περὶ Γλαύκωνα τὸν Ταρσέα ἠξίουν ὀξύνειν τὸ ὄνομα οὐκ ὀρθῶς.
citecollections
The citecollections
block contains delimited text documenting one or more citable collections of data in five columns of delimited text data. These give, in order:
label
.The first content line of the citecollections
block is a labelling header line and is ignored in processing.
Example: The following example is a valid citecollections
block, using #
for its delimiting string. It defines an unordered collection.
#!citecollections
URN#Description#Labelling property#Ordering property#License
urn:cite2:hmt:vaimg.v1:#Images of the Venetus A manuscriptscript#urn:cite2:hmt:vaimg.v1.caption:#CC-attribution-share-alike
citeproperties
The citeproperties
block defines properties for one or more collections in four further columns of delimited text data. These give, in order:
String
,CtsUrn
,Cite2Urn
,Number
and Boolean
.String
type. These are separated by a secondary delimiter that does not occur elsewhere in the value of this property.Every collection must have one property with property identifier urn
; its type must be Cite2Urn
.
The first content line of the citeproperties
block is a labelling header line and is ignored in processing.
Example: The following example is a valid citeproperties
block, using #
for its delimiting string, and ,
for its secondary delimiter for controlled vocabulary lists. It defines a collection with three properties. In addition to the required urn
property, there are two properties of String
type. The string property labelled “License for binary image data” has a controlled vocabulary list with two comma-delimited items.
#!citeproperties
Property#Label#Type#Authority list
urn:cite2:hmt:msA.v1.urn:#Image URN#Cite2Urn#
urn:cite2:hmt:msA.v1.caption:#Caption#String#
urn:cite2:hmt:msA.v1.rights:#License for binary image data#String#CC-attribution-share-alike,public domain
citedata
The citedata
block contains delimited text records for objects in a single CITE Collection. A CEX source documenting multiple CITE Collections will therefore have one citedata
block for each collection. The collection represented by the citedata
block must be documented by content in a citecollections
and citeproperties
block defining its structure. (On the other hand, the citecollections
and citeproperties
blocks may define the structure of collections with no data in a citedata
block.)
The first content line of the citedata
block is a header line identifying the property defined in the citeproperties
block that is represented in this column. The property identifier in the header line must match the property identifier of the catalog’s property URN, in a case-insensitivie match. Any of the headers URN
, Urn
or urn
will match the required urn
property of the collection, for example.
Subsequent content lines give data values for a single object. Data values in each delimited column must be correct string serializations of that data type, as follows:
String
data type: The value may be any string not containing the delimiting string used in this CEX source, unless the property is defined with a controlled vocabulary list. In that case, the value must be one of the string values defined in the catalog.CtsUrn
data type: The value must be a valid string representation of a Cts URN.Cite2Urn
data type: The value must be a valid string representation of a CITE2 URN. (See this introduction to the CITE2 URN type.)Boolean
data type: The value must be one of the strings true
or false
.Number
data type: For integers, the value must be a string of digit characters representing a base-ten integer. For non-integral values, the text representation must begin with a string of digit characters representing a base-ten integer or 0
, followed by a decimal point .
, followed by a string of digit characters representing decimal digits.Values for the required urn
property must be unique within a collection.
Example: The following example is a valid citedata
block, using #
for its delimiting string. It defines a single object with three properties. The values in each column are valid for the property definitions in the example citecollections
block above.
#!citedata
// Images of the Venetus A manuscript:
URN#Caption#Rights
urn:cite2:hmt:vaimg.v1:IMG1#Folio 1r of the Venetus A# photographed in natural light#CC-attribution-share-alike
imagedata
The CITE Image extension extends a collection to support working with binary image data. A CEX source including an imagedata
block must therefore also include citecollections
, citeproperties
and citedata
blocks. Content lines of the imagedata
block are composed of four columns:
CITE
for the REST API of the CITE Image serviceiiifString
for the API of the International Image Ineroperability FrameworkiipImageString
for the IIP Image ProcotocllocalJpegString
for JPG files on the local filesystemlocalDzString
for image data stored as DeepZoom files in a local file systemCITE image
protocol, a base URL to which CITE Image API requests can be appended.iiifString
protocol, the URL of the context definition of the service.iipImageString
protocol, the URL of the context definition of the service.localDzString
and localJpegString
protocols, a relative URL to a base directory for a file system tree.Note that it is possible to extend a single image collection with multiple protocols, each represented by a single line in the imagedata
block. Since the CEX structure identifies a property with licensing data for each protocol, it is equally possible to have the same license apply to all forms of binary access, or to document protocol-specific licensing for each image.
Example: The following example is a valid imagedata
block, using #
for its delimiting string. It extends a single CITE Collection with data about images for use with three different protocols for using binary image data. The same licensing information is applied to all three forms of binary data access.
#!imagedata
// Lines are structured as:
// collection#protocol#base URL
urn:cite2:hmt:vaimg.v1:#CITE image#http://www.homermultitext.org/hmtdigital/images?
urn:cite2:hmt:vaimg.v1:#IIIF#http://www.homermultitext.org/image2/context.json
urn:cite2:hmt:vaimg.v1:#localDZ
relations
Content lines in the relations
block relate two citable objects in S-V-O statements. The S-V-O statement is represented as three columns of delimited text, with columns separated by a string delimiter that does not otherwise appear in content lines of the block.
The subject and object elements of each statement must be URN values (either CITE2 URNs, or CTS URNs). The verb of each statement must be a CITE2 URN. The CEX source must therefore include a citecollections
, citeproperties
and citedata
block. If the relations
block includes statements about CTS
URNs, it must also include ctscatalog
and ctsdata
blocks.
Example: The following example is a valid relations
block. Its two content lines are symmetrical statements describing the relations between two objects, a text passage and a page of a manuscript.
#!relations
urn:cts:greekLit:tlg0012.tlg001.msA:1.1-1.25#urn:cite2:dse:verbs.v1:appearsOn:#urn:cite2:hmt:msA.v1:12r
urn:cite2:hmt:msA.v1:12r#urn:cite2:dse:verbs.v1:hasOnit:#urn:cts:greekLit:tlg0012.tlg001.msA:1.1-1.25
datamodels
Content lines in the datamodels
block associate CITE Collections with externally defined data models. Collections are identified by a CITE2 URN; data models are identified by a CITE2 URN, and are described by both a brief label, and a fuller statement that may include references to sources of further information about the data model. These four items are represented as columns of delimited text, with columns separated by a string delimiter that does not otherwise appear in content lines of the block.
The first content line of the datamodels
block is a labelling header line and is ignored in processing.
Collections identified in the datamodel
block must be cataloged in citecollections
and citeproperties
blocks and instantiated in a citedata
block. The collection associated with a data model could be a collection of verbs relating objects in further collections in a relations
block.
Example: The following example is a valid datamodel
block.
#!datamodels
Collection#Model#Label#Description
urn:cite2:dse:verbs.v1:#urn:cite2:dse:verbs.v1:#DSE model#Diplomatic Scholarly Edition (DSE) model. See documentation at <https://github.com/cite-architecture/dse>.