final def !=(arg0: Any): Boolean

Definition Classes: AnyRef → Any

final def ##(): Int

Definition Classes: AnyRef → Any

def ++(corpus2: Corpus): Corpus

Create a new corpus by adding a second corpus to this one.

corpus2: second corpus with contents to be added.

def --(corpus2: Corpus): Corpus

Create a new corpus by subtracting a second corpus from this one.

@ corpus2 second corpus with contents to be removed from this one.

final def ==(arg0: Any): Boolean

Definition Classes: AnyRef → Any

def >=(urn: CtsUrn): Corpus

Create a new corpus of nodes that are contained by a given URN.

urn: CtsUrn to use in filtering the corpus.

final def asInstanceOf[T0]: T0

Definition Classes: Any

def cex(delimiter: String = "#"): String

Two-column serialization of this Corpus as formated for CEX serialization.

delimiter: String value to separate two columns.

def chunkByCitation(drop: Int = 1): Vector[Corpus]

Split a Corpus in to a Vector[Corpus] by citation (Will first chunk by Text).

drop: How many levels of the passage-hierarchy, from the right, to drop when grouping

def chunkByText: Vector[Corpus]

Split a Corpus in to a Vector[Corpus] by distinct text (versions & exemplars)

def citedWorks: Vector[CtsUrn]

List all versions or exemplars cited in a corpus.

def clone(): AnyRef

Attributes: protected[lang]
Definition Classes: AnyRef
Annotations: @throws( ... ) @native() @HotSpotIntrinsicCandidate()

def compressReff(urns: Vector[CtsUrn]): Vector[CtsUrn]

Given a Vector[CtsUrn] compress it so that any sequences of URNs that can be expressed as ranges are expressed as ranges.

urns: Vector[CtsUrn]

def concrete(urn: CtsUrn): Set[CtsUrn]

Find list of all concrete texts for a given URN.

urn: URN to find concrete texts for.

def concreteMap: Map[CtsUrn, Corpus]

Map each concrete text's URN to a Vector of [CitableNode]s.

def containedNodes(u: CtsUrn): Corpus

Create a new corpus comprising nodes contained by a given URN.

u: A CtsUrn at either version or exemplar level.

def contents: Vector[String]

Project text contents of the corpus to a vector of Strings.

macro def debug(message: Any, cause: Throwable): Unit

Attributes: protected
Definition Classes: LoggingMethods

macro def debug(message: Any): Unit

Attributes: protected
Definition Classes: LoggingMethods

val dupes: Iterable[CtsUrn]

Erroneously duplicated URN values.

final def eq(arg0: AnyRef): Boolean

Definition Classes: AnyRef

macro def error(message: Any, cause: Throwable): Unit

Attributes: protected
Definition Classes: LoggingMethods

macro def error(message: Any): Unit

Attributes: protected
Definition Classes: LoggingMethods

def exemplarToVersion(newVersionId: String): Corpus

Creates a new corpus by reducing exemplar-level URNs to version-level URNs.

Creates a new corpus by reducing exemplar-level URNs to version-level URNs. Order of exemplar-level nodes is maintained in the flattened, version-level corpus.

newVersionId: Value for version identifier of newly generated version.

def exemplars(urn: CtsUrn): Set[CtsUrn]

Find the set of exemplars in the present corpus matching a given URN.

urn: URN to find exemplars for.

def find(v: Vector[String]): Corpus

Create a new corpus containing citable nodes with content matching all of a list of strings.

Create a new corpus containing citable nodes with content matching all of a list of strings. This is equivalent to successively filtering from a given corpus for nodes matching each string. E.g., corpus.find (Vector[s1,s2]) is equivalent to corpus.find(s1).find(s2).

v: Strings to search for.

def find(v: Vector[String], currentCorpus: Corpus): Corpus

Create a new corpus containing citable nodes with content matching all strings in a given list by recursively finding matches for the first string in the list.

v: Strings to search for.
currentCorpus: Corpus to search in.

def find(str: String): Corpus

Create a new corpus containing citable nodes with content matching a given string.

str: String to search for.
returns: A Corpus object.

def findToken(t: String, omitPunctuation: Boolean = true): Corpus

Create a new corpus containing citable nodes with content matching a white-space delimited token.

Create a new corpus containing citable nodes with content matching a white-space delimited token. Optionally, ignore punctuation characters.

omitPunctuation: True if punctuation should be ignored.

def findTokens(v: Vector[String], currentCorpus: Corpus, omitPunctuation: Boolean = true): Corpus

Create a new corpus with nodes containing all tokens in a given list by recursively finding matches for the first token in the list.

Create a new corpus with nodes containing all tokens in a given list by recursively finding matches for the first token in the list. Optionally omit or include punctuation in token definition.

v: Tokens to search for.
currentCorpus: Corpus to search in.
omitPunctuation: True if punctuation should be omitted from tokens.

def findTokensWithin(v: Vector[String], distance: Int, omitPunctuation: Boolean = true): Corpus

Create a new corpus containing citable nodes with content matching all of a list of whitespace-delimited tokens with a given number of words of each other.

v: Vector of tokens.
distance: Maximum size of consecutive tokens all tokens in v must fall within.

def findWhiteSpaceTokens(v: Vector[String]): Corpus

Create a new corpus containing citable nodes with content matching all of a list of whitespace-delimited tokens.

Create a new corpus containing citable nodes with content matching all of a list of whitespace-delimited tokens. This is equivalent to successively filtering from a given corpus for nodes matching each token. E.g., corpus.findTokens (Vector[s1,s2]) is equivalent to corpus.findTokens(s1).findTokens(s2).

v: Strings to search for.

def findWordTokens(v: Vector[String]): Corpus

Create a new corpus containing citable nodes with content matching all of a list of whitespace-delimited tokens, ignoring punctuation ("word" tokens).

Create a new corpus containing citable nodes with content matching all of a list of whitespace-delimited tokens, ignoring punctuation ("word" tokens). This is equivalent to successively filtering from a given corpus for nodes matching each token. E.g., corpus.findTokens (Vector[s1,s2]) is equivalent to corpus.findTokens(s1).findTokens(s2).

v: Strings to search for.

def first: CitableNode

Find first citable node in the corpus.

Find first citable node in the corpus. It is an exception if the passage does not include at least one citable node.

def firstNode(filterUrn: CtsUrn): CitableNode

Find first citable node in a passage.

Find first citable node in a passage. It is an exception if the passage does not include at least one citable node.

filterUrn: URN identifying the passage.

def firstNodeIndex(urn: CtsUrn): Option[Int]

Find index in this corpus of a URN's first node.

Find index in this corpus of a URN's first node. If urn is a leaf node, it's simply the index of the node, but for a containing node, it's the first contained leaf node.

urn: First node of a range.

def firstNodeOption(filterUrn: CtsUrn): Option[CitableNode]

Find first citable node in a passage.

Find first citable node in a passage. Option is None if no citable nodes are found for the requested passage.

filterUrn: URN identifying the passage.

def flattenTriple(v: Vector[(String, CitableNode, Int)], newVersion: String): (Int, CitableNode)

Pairs a CitableNode with a sequential index number for that node.

v: Vector of triples, comprised of passage identifier (a String value), a citable node, and a sequence number within the passage node.
newVersion: Version identifier for the new node.

final def getClass(): Class[_]

Definition Classes: AnyRef → Any
Annotations: @native() @HotSpotIntrinsicCandidate()

macro def info(message: Any, cause: Throwable): Unit

Attributes: protected
Definition Classes: LoggingMethods

macro def info(message: Any): Unit

Attributes: protected
Definition Classes: LoggingMethods

def isEmpty: Boolean

True if citable nodes vector is empty.

final def isInstanceOf[T0]: Boolean

Definition Classes: Any

def last: CitableNode

Find the last citable node in the corpus.

Find the last citable node in the corpus. It is an exception if the passage does not include at least one citable node.

def lastNode(filterUrn: CtsUrn): CitableNode

Find the last citable node in a passage.

Find the last citable node in a passage. It is an exception if the passage does not include at least one citable node.

filterUrn: URN identifying the passage.

def lastNodeIndex(urn: CtsUrn): Option[Int]

Find index in this corpus of a URN's last node.

Find index in this corpus of a URN's last node. If urn is a leaf node, it's simply the index of the node, but for a containing node, it's the last contained leaf node.

urn: Last node of a range.

def lastNodeOption(filterUrn: CtsUrn): Option[CitableNode]

Find the last citable node in a passage.

Find the last citable node in a passage. Option is None if no citable nodes are found for the requested passage.

filterUrn: URN identifying the passage.

macro def logAt(logLevel: LogLevel, message: Any): Unit

Attributes: protected
Definition Classes: LoggingMethods

lazy val logger: Logger

Attributes: protected[this]
Definition Classes: LazyLogger

final def ne(arg0: AnyRef): Boolean

Definition Classes: AnyRef

def next(filterUrn: CtsUrn): Vector[CitableNode]

Find nodes following a passage.

Find nodes following a passage. The number of nodes will equal the number of nodes in the passage unless fewer than that number of nodes follow the passage. In that case, all following nodes will be returned. If no nodes follow the passage, an empty vector is returned.

filterUrn: passage to find nodes before

def nextUrn(filterUrn: CtsUrn): Option[CtsUrn]

Find URN for nodes following a passage.

filterUrn: Passage to find nodes after.

def ngramHisto(str: String, n: Int, threshhold: Int, dropPunctuation: Boolean): StringHistogram

Create a histogram of ngrams of size n, occurring more than threshold times, and including a specified string.

str: String that must be part of indexed ngram.
n: size of ngram desired
threshhold: only include ngrams that occur more than threshhold times. (Default value of 0 therefore collects all ngrams of the given sie.)
dropPunctuation: true if punctuation should be omitted from ngrams
returns: a vector of word+count pairs sorted from high to low

def ngramHisto(n: Int, threshhold: Int = 0, dropPunctuation: Boolean = true): StringHistogram

Create a histogram of ngrams of size n, occurring more than threshold times.

n: size of ngram desired
threshhold: only include ngrams that occur more than threshhold times. (Default value of 0 therefore collects all ngrams of the given sie.)
dropPunctuation: true if punctuation should be omitted from ngrams
returns: a vector of word+count pairs sorted from high to low

val nodes: Vector[CitableNode]

final def notify(): Unit

Definition Classes: AnyRef
Annotations: @native() @HotSpotIntrinsicCandidate()

final def notifyAll(): Unit

Definition Classes: AnyRef
Annotations: @native() @HotSpotIntrinsicCandidate()

def passageVersions(urn: CtsUrn): Vector[CtsUrn]

Find all versions of a given CtsUrn in this corpus.

urn: URN to find versions for

def passagesToWords(skipPunct: Boolean = true): Vector[Vector[String]]

Convert strings to vectors of words, tokenizing on whitespace.

Convert strings to vectors of words, tokenizing on whitespace. Optionally, omit puncutation characters from result.

skipPunct: true if punctuation should be omitted.

def pointIndex(urn: CtsUrn): Int

Find index in nodes of a given CtsUrn.

def prev(filterUrn: CtsUrn): Vector[CitableNode]

Find nodes preceding a passage.

Find nodes preceding a passage. The number of nodes will equal the number of nodes in the passage unless fewer than that number of nodes preceding the passage. In that case, all preceding nodes will be returned. If no nodes precede the passage, an empty vector is returned.

filterUrn: passage to find nodes before

def prevUrn(filterUrn: CtsUrn): Option[CtsUrn]

Find URN for nodes preceding a passage.

filterUrn: Passage to find nodes before.

def rangeExtract(urn: CtsUrn): Corpus

Create a new corpus from a single URN idetnifying a range.

Create a new corpus from a single URN idetnifying a range. The given URN must refer to a concrete text.

urn: Range URN identifying corpus to extract.

def rangeIndex(urn: CtsUrn): RangeIndex

Find beginning and end index in this corpus of a given range URN.

Find beginning and end index in this corpus of a given range URN. Beginning and end references of ranges may either be node references or containing references.

def relation(u1: CtsUrn, u2: CtsUrn): TextPassageTopology.Value

Computes topological relation of passage components of two CtsUrns.

u1: First CtsUrn to compare.
u2: Second CtsUrn to compare.

def size: Int

Number of citable nodes in the corpus.

def sortPassages(passages: Iterable[CtsUrn]): Vector[CtsUrn]

Given an Iterable[CtsUrn] return a Vector[CtsUrn] sorted by document order according to the order in the Corpus.

Given an Iterable[CtsUrn] return a Vector[CtsUrn] sorted by document order according to the order in the Corpus. If any URNs in the parameter Iterable are range-URNs, this expands them to leaf-nodes before sorting.

def sumCorpora(corpora: Vector[Corpus], sumCorpus: Corpus): Corpus

Create a single Corpus by summing up the contents of a vector of corpora.

corpora: Corpus instances to concatenate.

final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes: AnyRef

def textContents(filter: String, connector: String = "\n"): String

Format text contents of passages matching a given string as a single string.

connector: String value separating citable nodes in the resulting string.

def to2colString(delimiter: String): String

Represent the Corpus in two-column delimited-text format.

delimiter: String value to use as to separate URN strings from text contents.

def to82xfString(delimiter: String): String

Represent the Corpus in 82XF format.

delimiter: String value to use as a column separator.

def to82xfVector: Vector[XfRow]

Create a vector of edu.holycross.shot.ohco2.XfRow instances equivalent to the present corpus.

macro def trace(message: Any, cause: Throwable): Unit

Attributes: protected
Definition Classes: LoggingMethods

macro def trace(message: Any): Unit

Attributes: protected
Definition Classes: LoggingMethods

def urns: Vector[CtsUrn]

Project all URNs in the corpus to a vector.

def urnsForNGram(gram: String, threshhold: Int = 2, dropPunctuation: Boolean = true): Vector[CtsUrn]

Find passages, identified by URN, where a given ngram occurs.

Find passages, identified by URN, where a given ngram occurs. The value of n is derived from the number of whitespace-delimited tokens in gram.

gram: The desired ngram, with white space separating tokens.
dropPunctuation: True if punctuation should be omitted.

def validReff(urn: CtsUrn): Vector[CtsUrn]

Extract all URNs for all citable nodes identified by a given URN.

Extract all URNs for all citable nodes identified by a given URN. Note that it is not an error if the resulting Vector is empty.

urn: URN identifying passage for which to find node URNs.

def validReff2(filterUrn: CtsUrn): Vector[CtsUrn]

def versions(urn: CtsUrn): Set[CtsUrn]

Find the set of versions in the present corpus matching a given URN.

urn: URN to find versions for.

final def wait(arg0: Long, arg1: Int): Unit

Definition Classes: AnyRef
Annotations: @throws( ... )

final def wait(arg0: Long): Unit

Definition Classes: AnyRef
Annotations: @throws( ... ) @native()

final def wait(): Unit

Definition Classes: AnyRef
Annotations: @throws( ... )

macro def warn(message: Any, cause: Throwable): Unit

Attributes: protected
Definition Classes: LoggingMethods

macro def warn(message: Any): Unit

Attributes: protected
Definition Classes: LoggingMethods

def ~=(filterUrn: CtsUrn): Corpus

Create a new corpus of nodes that are URN-similar to a given CtsUrn, limited to a given Version or Exemplar.

Create a new corpus of nodes that are URN-similar to a given CtsUrn, limited to a given Version or Exemplar. Collect all texts where this URN is cited, then collect citable nodes for the cited version. Note that chaining these filters therefore successively filters the corpus and can be thought of as filtering by logically ANDing the URNs.

filterUrn: URN identifying a set of nodes to select from this corpus.

def ~~(urnV: Vector[CtsUrn], resultCorpus: Corpus): Corpus

Recursively add to a given corpus all nodes in the present corpus that are URN-similar to the first URN in a given vector of URNs.

Recursively add to a given corpus all nodes in the present corpus that are URN-similar to the first URN in a given vector of URNs. When all nodes in the vector have been applied, the result is the final accumulation of all added nodes.

urnV: vector of URNs to use in filtering the corpus.

def ~~(urnV: Vector[CtsUrn]): Corpus

Create a new corpus of nodes that are URN-similar to any CtsUrn in a given vector of CtsUrns.

Create a new corpus of nodes that are URN-similar to any CtsUrn in a given vector of CtsUrns. Note that this can be thought of as filtering by logically ORing the CtsUrns in the Vector.

urnV: vector of URNs to use in filtering the corpus.

def ~~(filterUrn: CtsUrn): Corpus

Create a new corpus of nodes that are URN-similar to a given CtsUrn.

Create a new corpus of nodes that are URN-similar to a given CtsUrn. Collect all texts where this URN is cited, then collect citable nodes for the cited version. Note that chaining these filters therefore successively filters the corpus and can be thought of as filtering by logically ANDing the URNs.

filterUrn: URN identifying a set of nodes to select from this corpus.

Packages

Overview

Corpus

Companion object Corpus

case class Corpus(nodes: Vector[CitableNode]) extends LogSupport with Product with Serializable

Instance Constructors

Value Members

Deprecated Value Members

Inherited from Product

Inherited from Equals

Inherited from LogSupport

Inherited from LazyLogger

Inherited from LoggingMethods

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

Overview

Corpus 

Companion object Corpus

case class Corpus(nodes: Vector[CitableNode]) extends LogSupport with Product with Serializable

Instance Constructors

Value Members

Deprecated Value Members

Inherited from Product

Inherited from Equals

Inherited from LogSupport

Inherited from LazyLogger

Inherited from LoggingMethods

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped

Corpus