Packages

package ohco2

Provides classes for working with a repository of citable texts.

Overview

The highest-level class is the TextRepository. It contains a Catalog with metadata about texts and a Corpus of textual data. The Corpus is simply a Vector of CitableNode objects.

The library supports reading from and serializing to CEX format.

In the JVM branch, the TextRepositorySource object has functions for creating a repository from a cataloged set of texts in local files.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. ohco2
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class Catalog(texts: Vector[CatalogEntry]) extends LogSupport with Product with Serializable

    Catalog for an entire text repository.

    Catalog for an entire text repository.

    texts

    Set of catalog entries.

    Annotations
    @JSExportAll()
  2. case class CatalogEntry(urn: CtsUrn, citationScheme: String, lang: String, groupName: String, workTitle: String, versionLabel: Option[String], exemplarLabel: Option[String] = None, online: Boolean = true) extends Product with Serializable

    Entry for a single concrete version of a text.

    Entry for a single concrete version of a text.

    urn

    URN for the version.

    citationScheme

    Label for citation scheme, with levels separated by a delimiter.

    lang

    ISO 639-2 three-letter language code.

    groupName

    Label for text group.

    workTitle

    Title of notional work.

    versionLabel

    Label for edition or translation.

    exemplarLabel

    Label for optional exemplar, or None.

    online

    True if the text is present in the cataloged Corpus.

    Annotations
    @JSExportAll()
  3. case class CitableNode(urn: CtsUrn, text: String) extends LogSupport with Product with Serializable

    The smallest canonically citable unit of a text.

    The smallest canonically citable unit of a text.

    urn

    URN identifying the node.

    text

    Text contents of the node.

    Annotations
    @JSExportAll()
  4. case class CitationLabel(urn: CtsUrn, citationScheme: String) extends Product with Serializable

    urn

    URN for the version.

    Annotations
    @JSExportAll()
  5. case class Corpus(nodes: Vector[CitableNode]) extends LogSupport with Product with Serializable

    A corpus of citable texts.

    A corpus of citable texts.

    nodes

    Contents of the citable corpus

    Annotations
    @JSExportAll()
  6. sealed trait DocumentFormat extends AnyRef

    Set of recognized formats implementing OHCO2 model of citable texts.

  7. case class LabelCatalog(texts: Vector[CitationLabel]) extends Product with Serializable

    Catalog for an entire text repository.

    Catalog for an entire text repository.

    texts

    Set of catalog entries.

    Annotations
    @JSExportAll()
  8. case class LabelledCtsUrn(urn: CtsUrn, label: String) extends Product with Serializable

    Association of a CtsUrn with labelling information from a text catalog.

    Association of a CtsUrn with labelling information from a text catalog.

    urn

    The CtsUrn.

    label

    A label for it.

  9. case class Ohco2Exception(message: String = "", cause: Option[Throwable] = None) extends Exception with Product with Serializable

    Exception thrown by a class in this library.

  10. case class OnlineDocument(urn: CtsUrn, format: DocumentFormat, docName: String, namespaces: Option[Map[String, String]] = None, xpathTemplate: Option[String] = None) extends Product with Serializable

    Metadata about a citable text in a local file.

    Metadata about a citable text in a local file.

    urn

    CtsUrn for the text.

    format

    OHCO2 format of the file.

    docName

    Relative or absolute path to the document.

    namespaces

    Map of prefix abbreviations to URIs for XML namespaces, or None for non-XML document formats.

    xpathTemplate

    XPath-like string defining mapping of citation scheme to XML markup, or None for non-XML document formats.

  11. case class RangeIndex(a: Int, b: Int) extends Product with Serializable

    A cataloged corpus of texts.

    A cataloged corpus of texts.

    a

    Index of first node in range.

    b

    Index of last node in range.

    Annotations
    @JSExportAll()
  12. case class StringCount(s: String, count: Int) extends Product with Serializable

    Number of occurrences of a String in a corpus.

    Number of occurrences of a String in a corpus.

    s

    String

    count

    Number of occurrences in a corpus.

    Annotations
    @JSExportAll()
  13. case class StringHistogram(histogram: Vector[StringCount]) extends Product with Serializable

    Counts of occurrences of strings.

    Counts of occurrences of strings.

    histogram

    String counts, created in descending order by Corpus object's histogram builders.

    Annotations
    @JSExportAll()
  14. case class TextRepository(corpus: Corpus, catalog: Catalog) extends Product with Serializable

    A cataloged corpus of texts.

    A cataloged corpus of texts.

    corpus

    The text contents.

    catalog

    The catalog

    Annotations
    @JSExportAll()
  15. trait Token extends AnyRef

    Tokenization is a langauge and corpus-specific operation, the results should share these functions.

    Tokenization is a langauge and corpus-specific operation, the results should share these functions.

    Annotations
    @JSExportAll()
  16. case class XPathTemplate(s: String) extends Product with Serializable

    Class for working with XPathTemplates.

    Class for working with XPathTemplates. An XPathTemplate is an XPath-like String identifying elements in an XML hierarchy. Elements may or may not be qualified by a prefix. Elements carrying a citation value on an attribute may have an attribute template expression of the form [@ATTRIBUTE = '?'].

  17. case class XfRow(urn: String, nxt: String, prv: String, txt: String) extends Product with Serializable

    A citable node in the model of the 82XF format.

    A citable node in the model of the 82XF format.

    urn

    string value of the node's URN.

    prv

    string value of the preceding node's URN, or an empty string if this node is the first node of a text.

    txt

    text contents of the node.

Value Members

  1. def fileName(dir: String, f: String): String

    Format directory plus file as a path String.

    Format directory plus file as a path String.

    dir

    Directory name.

    f

    File name

  2. val punctuationListRE: Regex

    Regular expression defining recognized punctuation characters.

    Regular expression defining recognized punctuation characters. The list incudes all characters with the Unicode Punctuation prooerty plus a selection of other non-alphabetic characters known to be used as punctuation values in some digital editions.

  3. def twoColumnsFrom82xf(xf: String, inputDelimiter: String = "#", outputDelimiter: String = "#"): String

    Create two-column OHCO2 representation from a string in 82XF form.

    Create two-column OHCO2 representation from a string in 82XF form. Note that 82XF input must already be ordered.

    xf

    String in 82XF form.

    inputDelimiter

    String delimiting columns of 82XF input.

    outputDelimiter

    String to use to delimit columns in two-column output.

  4. def twoColumnsFromHocusPocus(hpString: String, inputDelimiter: String = "#", outputDelimiter: String = "#"): String

    Create two-column OHCO2 representation from a string formatted in the tabular format of the hocuspocus class.

    Create two-column OHCO2 representation from a string formatted in the tabular format of the hocuspocus class. Note that input must already be ordered.

    inputDelimiter

    String delimiting columns of hocuspocus input.

    outputDelimiter

    String to use to delimit columns in two-column output.

  5. object Catalog extends Serializable

    Factory for making catalogs from text sources.

  6. object CitableNode extends Serializable

    Factory for citable nodes with punctuation stripped out.

  7. object Corpus extends LogSupport

    Factory for edu.holycross.shot.ohco2.Corpus instances.

  8. object CorpusSource extends LogSupport

    A utility class for creating Corpus objects from various kinds of concrete sources available in the JVM, and serializing Corpus objects to various kinds of output.

  9. object Markdown extends DocumentFormat with Product with Serializable

    Markdown, adhering to OHCO2 requirements for hierarchical consistency.

  10. object OnlineDocument extends Serializable
  11. object SimpleTabulator

    Factory for building edu.holycross.shot.ohco2.Corpus instances from cataloged XML source files.

  12. object TextPassageTopology extends Enumeration

    Enumeration of possible relations of two text passages.

    Enumeration of possible relations of two text passages.

    Annotations
    @JSExportAll()
  13. object TextRepository extends Serializable

    Factory for constructing TextRepository fromFile source data in CEX format.

  14. object TextRepositorySource

    Factory for TextRepository objects and string representations of repositories in .cex format.

  15. object Two_Column extends DocumentFormat with Product with Serializable

    Two-column format, with lines representing citable nodes in document order.

  16. object Wf_Xml extends DocumentFormat with Product with Serializable

    Well-formed XML with citation encoded as specified in citation configuration.

  17. object XPathTemplate extends Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped