package ohco2
Provides classes for working with a repository of citable texts.
Overview
The highest-level class is the TextRepository. It contains a Catalog with metadata about texts and a Corpus of textual data. The Corpus is simply a Vector of CitableNode objects.
The library supports reading from and serializing to CEX format.
In the JVM branch, the TextRepositorySource object has functions for creating a repository from a cataloged set of texts in local files.
- Alphabetic
- By Inheritance
- ohco2
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
case class
Catalog(texts: Vector[CatalogEntry]) extends LogSupport with Product with Serializable
Catalog for an entire text repository.
Catalog for an entire text repository.
- texts
Set of catalog entries.
- Annotations
- @JSExportAll()
-
case class
CatalogEntry(urn: CtsUrn, citationScheme: String, lang: String, groupName: String, workTitle: String, versionLabel: Option[String], exemplarLabel: Option[String] = None, online: Boolean = true) extends Product with Serializable
Entry for a single concrete version of a text.
Entry for a single concrete version of a text.
- urn
URN for the version.
- citationScheme
Label for citation scheme, with levels separated by a delimiter.
- lang
ISO 639-2 three-letter language code.
- groupName
Label for text group.
- workTitle
Title of notional work.
- versionLabel
Label for edition or translation.
- exemplarLabel
Label for optional exemplar, or None.
- online
True if the text is present in the cataloged Corpus.
- Annotations
- @JSExportAll()
-
case class
CitableNode(urn: CtsUrn, text: String) extends LogSupport with Product with Serializable
The smallest canonically citable unit of a text.
The smallest canonically citable unit of a text.
- urn
URN identifying the node.
- text
Text contents of the node.
- Annotations
- @JSExportAll()
-
case class
CitationLabel(urn: CtsUrn, citationScheme: String) extends Product with Serializable
- urn
URN for the version.
- Annotations
- @JSExportAll()
-
case class
Corpus(nodes: Vector[CitableNode]) extends LogSupport with Product with Serializable
A corpus of citable texts.
A corpus of citable texts.
- nodes
Contents of the citable corpus
- Annotations
- @JSExportAll()
-
sealed
trait
DocumentFormat extends AnyRef
Set of recognized formats implementing OHCO2 model of citable texts.
-
case class
LabelCatalog(texts: Vector[CitationLabel]) extends Product with Serializable
Catalog for an entire text repository.
Catalog for an entire text repository.
- texts
Set of catalog entries.
- Annotations
- @JSExportAll()
-
case class
LabelledCtsUrn(urn: CtsUrn, label: String) extends Product with Serializable
Association of a CtsUrn with labelling information from a text catalog.
Association of a CtsUrn with labelling information from a text catalog.
- urn
The CtsUrn.
- label
A label for it.
-
case class
Ohco2Exception(message: String = "", cause: Option[Throwable] = None) extends Exception with Product with Serializable
Exception thrown by a class in this library.
-
case class
OnlineDocument(urn: CtsUrn, format: DocumentFormat, docName: String, namespaces: Option[Map[String, String]] = None, xpathTemplate: Option[String] = None) extends Product with Serializable
Metadata about a citable text in a local file.
Metadata about a citable text in a local file.
- urn
CtsUrn for the text.
- format
OHCO2 format of the file.
- docName
Relative or absolute path to the document.
- namespaces
Map of prefix abbreviations to URIs for XML namespaces, or None for non-XML document formats.
- xpathTemplate
XPath-like string defining mapping of citation scheme to XML markup, or None for non-XML document formats.
-
case class
RangeIndex(a: Int, b: Int) extends Product with Serializable
A cataloged corpus of texts.
A cataloged corpus of texts.
- a
Index of first node in range.
- b
Index of last node in range.
- Annotations
- @JSExportAll()
-
case class
StringCount(s: String, count: Int) extends Product with Serializable
Number of occurrences of a String in a corpus.
Number of occurrences of a String in a corpus.
- s
String
- count
Number of occurrences in a corpus.
- Annotations
- @JSExportAll()
-
case class
StringHistogram(histogram: Vector[StringCount]) extends Product with Serializable
Counts of occurrences of strings.
Counts of occurrences of strings.
- histogram
String counts, created in descending order by Corpus object's histogram builders.
- Annotations
- @JSExportAll()
-
case class
TextRepository(corpus: Corpus, catalog: Catalog) extends Product with Serializable
A cataloged corpus of texts.
A cataloged corpus of texts.
- corpus
The text contents.
- catalog
The catalog
- Annotations
- @JSExportAll()
-
trait
Token extends AnyRef
Tokenization is a langauge and corpus-specific operation, the results should share these functions.
Tokenization is a langauge and corpus-specific operation, the results should share these functions.
- Annotations
- @JSExportAll()
-
case class
XPathTemplate(s: String) extends Product with Serializable
Class for working with XPathTemplates.
Class for working with XPathTemplates. An XPathTemplate is an XPath-like String identifying elements in an XML hierarchy. Elements may or may not be qualified by a prefix. Elements carrying a citation value on an attribute may have an attribute template expression of the form [@ATTRIBUTE = '?'].
-
case class
XfRow(urn: String, nxt: String, prv: String, txt: String) extends Product with Serializable
A citable node in the model of the 82XF format.
A citable node in the model of the 82XF format.
- urn
string value of the node's URN.
- prv
string value of the preceding node's URN, or an empty string if this node is the first node of a text.
- txt
text contents of the node.
Value Members
-
def
fileName(dir: String, f: String): String
Format directory plus file as a path String.
Format directory plus file as a path String.
- dir
Directory name.
- f
File name
-
val
punctuationListRE: Regex
Regular expression defining recognized punctuation characters.
Regular expression defining recognized punctuation characters. The list incudes all characters with the Unicode
Punctuation
prooerty plus a selection of other non-alphabetic characters known to be used as punctuation values in some digital editions. -
def
twoColumnsFrom82xf(xf: String, inputDelimiter: String = "#", outputDelimiter: String = "#"): String
Create two-column OHCO2 representation from a string in 82XF form.
Create two-column OHCO2 representation from a string in 82XF form. Note that 82XF input must already be ordered.
- xf
String in 82XF form.
- inputDelimiter
String delimiting columns of 82XF input.
- outputDelimiter
String to use to delimit columns in two-column output.
-
def
twoColumnsFromHocusPocus(hpString: String, inputDelimiter: String = "#", outputDelimiter: String = "#"): String
Create two-column OHCO2 representation from a string formatted in the tabular format of the hocuspocus class.
Create two-column OHCO2 representation from a string formatted in the tabular format of the hocuspocus class. Note that input must already be ordered.
- inputDelimiter
String delimiting columns of hocuspocus input.
- outputDelimiter
String to use to delimit columns in two-column output.
-
object
Catalog extends Serializable
Factory for making catalogs from text sources.
-
object
CitableNode extends Serializable
Factory for citable nodes with punctuation stripped out.
-
object
Corpus extends LogSupport
Factory for edu.holycross.shot.ohco2.Corpus instances.
-
object
CorpusSource extends LogSupport
A utility class for creating Corpus objects from various kinds of concrete sources available in the JVM, and serializing Corpus objects to various kinds of output.
-
object
Markdown extends DocumentFormat with Product with Serializable
Markdown, adhering to OHCO2 requirements for hierarchical consistency.
- object OnlineDocument extends Serializable
-
object
SimpleTabulator
Factory for building edu.holycross.shot.ohco2.Corpus instances from cataloged XML source files.
-
object
TextPassageTopology extends Enumeration
Enumeration of possible relations of two text passages.
Enumeration of possible relations of two text passages.
- Annotations
- @JSExportAll()
-
object
TextRepository extends Serializable
Factory for constructing TextRepository fromFile source data in CEX format.
-
object
TextRepositorySource
Factory for TextRepository objects and string representations of repositories in
.cex
format. -
object
Two_Column extends DocumentFormat with Product with Serializable
Two-column format, with lines representing citable nodes in document order.
-
object
Wf_Xml extends DocumentFormat with Product with Serializable
Well-formed XML with citation encoded as specified in citation configuration.
- object XPathTemplate extends Serializable