case class Corpus(nodes: Vector[CitableNode]) extends LogSupport with Product with Serializable
A corpus of citable texts.
- nodes
Contents of the citable corpus
- Annotations
- @JSExportAll()
- Alphabetic
- By Inheritance
- Corpus
- Product
- Equals
- LogSupport
- LazyLogger
- LoggingMethods
- Serializable
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
Corpus(nodes: Vector[CitableNode])
Create a new corpus with a vector of CitableNode objects.
Create a new corpus with a vector of CitableNode objects.
- nodes
Contents of the citable corpus
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
def
++(corpus2: Corpus): Corpus
Create a new corpus by adding a second corpus to this one.
Create a new corpus by adding a second corpus to this one.
- corpus2
second corpus with contents to be added.
-
def
--(corpus2: Corpus): Corpus
Create a new corpus by subtracting a second corpus from this one.
Create a new corpus by subtracting a second corpus from this one.
@ corpus2 second corpus with contents to be removed from this one.
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
>=(urn: CtsUrn): Corpus
Create a new corpus of nodes that are contained by a given URN.
Create a new corpus of nodes that are contained by a given URN.
- urn
CtsUrn to use in filtering the corpus.
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
cex(delimiter: String = "#"): String
Two-column serialization of this Corpus as formated for CEX serialization.
Two-column serialization of this Corpus as formated for CEX serialization.
- delimiter
String value to separate two columns.
-
def
chunkByCitation(drop: Int = 1): Vector[Corpus]
Split a Corpus in to a Vector[Corpus] by citation (Will first chunk by Text).
Split a Corpus in to a Vector[Corpus] by citation (Will first chunk by Text).
- drop
How many levels of the passage-hierarchy, from the right, to drop when grouping
-
def
chunkByText: Vector[Corpus]
Split a Corpus in to a Vector[Corpus] by distinct text (versions & exemplars)
-
def
citedWorks: Vector[CtsUrn]
List all versions or exemplars cited in a corpus.
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
def
compressReff(urns: Vector[CtsUrn]): Vector[CtsUrn]
Given a Vector[CtsUrn] compress it so that any sequences of URNs that can be expressed as ranges are expressed as ranges.
Given a Vector[CtsUrn] compress it so that any sequences of URNs that can be expressed as ranges are expressed as ranges.
- urns
Vector[CtsUrn]
-
def
concrete(urn: CtsUrn): Set[CtsUrn]
Find list of all concrete texts for a given URN.
Find list of all concrete texts for a given URN.
- urn
URN to find concrete texts for.
-
def
concreteMap: Map[CtsUrn, Corpus]
Map each concrete text's URN to a Vector of [CitableNode]s.
-
def
containedNodes(u: CtsUrn): Corpus
Create a new corpus comprising nodes contained by a given URN.
Create a new corpus comprising nodes contained by a given URN.
- u
A CtsUrn at either version or exemplar level.
-
def
contents: Vector[String]
Project text contents of the corpus to a vector of Strings.
-
macro
def
debug(message: Any, cause: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingMethods
-
macro
def
debug(message: Any): Unit
- Attributes
- protected
- Definition Classes
- LoggingMethods
-
val
dupes: Iterable[CtsUrn]
Erroneously duplicated URN values.
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
macro
def
error(message: Any, cause: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingMethods
-
macro
def
error(message: Any): Unit
- Attributes
- protected
- Definition Classes
- LoggingMethods
-
def
exemplarToVersion(newVersionId: String): Corpus
Creates a new corpus by reducing exemplar-level URNs to version-level URNs.
Creates a new corpus by reducing exemplar-level URNs to version-level URNs. Order of exemplar-level nodes is maintained in the flattened, version-level corpus.
- newVersionId
Value for version identifier of newly generated version.
-
def
exemplars(urn: CtsUrn): Set[CtsUrn]
Find the set of exemplars in the present corpus matching a given URN.
Find the set of exemplars in the present corpus matching a given URN.
- urn
URN to find exemplars for.
-
def
find(v: Vector[String]): Corpus
Create a new corpus containing citable nodes with content matching all of a list of strings.
Create a new corpus containing citable nodes with content matching all of a list of strings. This is equivalent to successively filtering from a given corpus for nodes matching each string. E.g., corpus.find (Vector[s1,s2]) is equivalent to corpus.find(s1).find(s2).
- v
Strings to search for.
-
def
find(v: Vector[String], currentCorpus: Corpus): Corpus
Create a new corpus containing citable nodes with content matching all strings in a given list by recursively finding matches for the first string in the list.
Create a new corpus containing citable nodes with content matching all strings in a given list by recursively finding matches for the first string in the list.
- v
Strings to search for.
- currentCorpus
Corpus to search in.
-
def
find(str: String): Corpus
Create a new corpus containing citable nodes with content matching a given string.
Create a new corpus containing citable nodes with content matching a given string.
- str
String to search for.
- returns
A Corpus object.
-
def
findToken(t: String, omitPunctuation: Boolean = true): Corpus
Create a new corpus containing citable nodes with content matching a white-space delimited token.
Create a new corpus containing citable nodes with content matching a white-space delimited token. Optionally, ignore punctuation characters.
- omitPunctuation
True if punctuation should be ignored.
-
def
findTokens(v: Vector[String], currentCorpus: Corpus, omitPunctuation: Boolean = true): Corpus
Create a new corpus with nodes containing all tokens in a given list by recursively finding matches for the first token in the list.
Create a new corpus with nodes containing all tokens in a given list by recursively finding matches for the first token in the list. Optionally omit or include punctuation in token definition.
- v
Tokens to search for.
- currentCorpus
Corpus to search in.
- omitPunctuation
True if punctuation should be omitted from tokens.
-
def
findTokensWithin(v: Vector[String], distance: Int, omitPunctuation: Boolean = true): Corpus
Create a new corpus containing citable nodes with content matching all of a list of whitespace-delimited tokens with a given number of words of each other.
Create a new corpus containing citable nodes with content matching all of a list of whitespace-delimited tokens with a given number of words of each other.
- v
Vector of tokens.
- distance
Maximum size of consecutive tokens all tokens in v must fall within.
-
def
findWhiteSpaceTokens(v: Vector[String]): Corpus
Create a new corpus containing citable nodes with content matching all of a list of whitespace-delimited tokens.
Create a new corpus containing citable nodes with content matching all of a list of whitespace-delimited tokens. This is equivalent to successively filtering from a given corpus for nodes matching each token. E.g., corpus.findTokens (Vector[s1,s2]) is equivalent to corpus.findTokens(s1).findTokens(s2).
- v
Strings to search for.
-
def
findWordTokens(v: Vector[String]): Corpus
Create a new corpus containing citable nodes with content matching all of a list of whitespace-delimited tokens, ignoring punctuation ("word" tokens).
Create a new corpus containing citable nodes with content matching all of a list of whitespace-delimited tokens, ignoring punctuation ("word" tokens). This is equivalent to successively filtering from a given corpus for nodes matching each token. E.g., corpus.findTokens (Vector[s1,s2]) is equivalent to corpus.findTokens(s1).findTokens(s2).
- v
Strings to search for.
-
def
first: CitableNode
Find first citable node in the corpus.
Find first citable node in the corpus. It is an exception if the passage does not include at least one citable node.
-
def
firstNode(filterUrn: CtsUrn): CitableNode
Find first citable node in a passage.
Find first citable node in a passage. It is an exception if the passage does not include at least one citable node.
- filterUrn
URN identifying the passage.
-
def
firstNodeIndex(urn: CtsUrn): Option[Int]
Find index in this corpus of a URN's first node.
Find index in this corpus of a URN's first node. If urn is a leaf node, it's simply the index of the node, but for a containing node, it's the first contained leaf node.
- urn
First node of a range.
-
def
firstNodeOption(filterUrn: CtsUrn): Option[CitableNode]
Find first citable node in a passage.
Find first citable node in a passage. Option is None if no citable nodes are found for the requested passage.
- filterUrn
URN identifying the passage.
-
def
flattenTriple(v: Vector[(String, CitableNode, Int)], newVersion: String): (Int, CitableNode)
Pairs a CitableNode with a sequential index number for that node.
Pairs a CitableNode with a sequential index number for that node.
- v
Vector of triples, comprised of passage identifier (a String value), a citable node, and a sequence number within the passage node.
- newVersion
Version identifier for the new node.
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
macro
def
info(message: Any, cause: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingMethods
-
macro
def
info(message: Any): Unit
- Attributes
- protected
- Definition Classes
- LoggingMethods
-
def
isEmpty: Boolean
True if citable nodes vector is empty.
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
last: CitableNode
Find the last citable node in the corpus.
Find the last citable node in the corpus. It is an exception if the passage does not include at least one citable node.
-
def
lastNode(filterUrn: CtsUrn): CitableNode
Find the last citable node in a passage.
Find the last citable node in a passage. It is an exception if the passage does not include at least one citable node.
- filterUrn
URN identifying the passage.
-
def
lastNodeIndex(urn: CtsUrn): Option[Int]
Find index in this corpus of a URN's last node.
Find index in this corpus of a URN's last node. If urn is a leaf node, it's simply the index of the node, but for a containing node, it's the last contained leaf node.
- urn
Last node of a range.
-
def
lastNodeOption(filterUrn: CtsUrn): Option[CitableNode]
Find the last citable node in a passage.
Find the last citable node in a passage. Option is None if no citable nodes are found for the requested passage.
- filterUrn
URN identifying the passage.
-
macro
def
logAt(logLevel: LogLevel, message: Any): Unit
- Attributes
- protected
- Definition Classes
- LoggingMethods
-
lazy val
logger: Logger
- Attributes
- protected[this]
- Definition Classes
- LazyLogger
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
next(filterUrn: CtsUrn): Vector[CitableNode]
Find nodes following a passage.
Find nodes following a passage. The number of nodes will equal the number of nodes in the passage unless fewer than that number of nodes follow the passage. In that case, all following nodes will be returned. If no nodes follow the passage, an empty vector is returned.
- filterUrn
passage to find nodes before
-
def
nextUrn(filterUrn: CtsUrn): Option[CtsUrn]
Find URN for nodes following a passage.
Find URN for nodes following a passage.
- filterUrn
Passage to find nodes after.
-
def
ngramHisto(str: String, n: Int, threshhold: Int, dropPunctuation: Boolean): StringHistogram
Create a histogram of ngrams of size n, occurring more than threshold times, and including a specified string.
Create a histogram of ngrams of size n, occurring more than threshold times, and including a specified string.
- str
String that must be part of indexed ngram.
- n
size of ngram desired
- threshhold
only include ngrams that occur more than threshhold times. (Default value of 0 therefore collects all ngrams of the given sie.)
- dropPunctuation
true if punctuation should be omitted from ngrams
- returns
a vector of word+count pairs sorted from high to low
-
def
ngramHisto(n: Int, threshhold: Int = 0, dropPunctuation: Boolean = true): StringHistogram
Create a histogram of ngrams of size n, occurring more than threshold times.
Create a histogram of ngrams of size n, occurring more than threshold times.
- n
size of ngram desired
- threshhold
only include ngrams that occur more than threshhold times. (Default value of 0 therefore collects all ngrams of the given sie.)
- dropPunctuation
true if punctuation should be omitted from ngrams
- returns
a vector of word+count pairs sorted from high to low
- val nodes: Vector[CitableNode]
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
passageVersions(urn: CtsUrn): Vector[CtsUrn]
Find all versions of a given CtsUrn in this corpus.
Find all versions of a given CtsUrn in this corpus.
- urn
URN to find versions for
-
def
passagesToWords(skipPunct: Boolean = true): Vector[Vector[String]]
Convert strings to vectors of words, tokenizing on whitespace.
Convert strings to vectors of words, tokenizing on whitespace. Optionally, omit puncutation characters from result.
- skipPunct
true if punctuation should be omitted.
-
def
pointIndex(urn: CtsUrn): Int
Find index in
nodes
of a given CtsUrn. -
def
prev(filterUrn: CtsUrn): Vector[CitableNode]
Find nodes preceding a passage.
Find nodes preceding a passage. The number of nodes will equal the number of nodes in the passage unless fewer than that number of nodes preceding the passage. In that case, all preceding nodes will be returned. If no nodes precede the passage, an empty vector is returned.
- filterUrn
passage to find nodes before
-
def
prevUrn(filterUrn: CtsUrn): Option[CtsUrn]
Find URN for nodes preceding a passage.
Find URN for nodes preceding a passage.
- filterUrn
Passage to find nodes before.
-
def
rangeExtract(urn: CtsUrn): Corpus
Create a new corpus from a single URN idetnifying a range.
Create a new corpus from a single URN idetnifying a range. The given URN must refer to a concrete text.
- urn
Range URN identifying corpus to extract.
-
def
rangeIndex(urn: CtsUrn): RangeIndex
Find beginning and end index in this corpus of a given range URN.
Find beginning and end index in this corpus of a given range URN. Beginning and end references of ranges may either be node references or containing references.
-
def
relation(u1: CtsUrn, u2: CtsUrn): TextPassageTopology.Value
Computes topological relation of passage components of two CtsUrns.
Computes topological relation of passage components of two CtsUrns.
- u1
First CtsUrn to compare.
- u2
Second CtsUrn to compare.
-
def
size: Int
Number of citable nodes in the corpus.
-
def
sortPassages(passages: Iterable[CtsUrn]): Vector[CtsUrn]
Given an Iterable[CtsUrn] return a Vector[CtsUrn] sorted by document order according to the order in the Corpus.
Given an Iterable[CtsUrn] return a Vector[CtsUrn] sorted by document order according to the order in the Corpus. If any URNs in the parameter Iterable are range-URNs, this expands them to leaf-nodes before sorting.
-
def
sumCorpora(corpora: Vector[Corpus], sumCorpus: Corpus): Corpus
Create a single Corpus by summing up the contents of a vector of corpora.
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
textContents(filter: String, connector: String = "\n"): String
Format text contents of passages matching a given string as a single string.
Format text contents of passages matching a given string as a single string.
- connector
String value separating citable nodes in the resulting string.
-
def
to2colString(delimiter: String): String
Represent the Corpus in two-column delimited-text format.
Represent the Corpus in two-column delimited-text format.
- delimiter
String value to use as to separate URN strings from text contents.
-
def
to82xfString(delimiter: String): String
Represent the Corpus in 82XF format.
Represent the Corpus in 82XF format.
- delimiter
String value to use as a column separator.
-
def
to82xfVector: Vector[XfRow]
Create a vector of edu.holycross.shot.ohco2.XfRow instances equivalent to the present corpus.
-
macro
def
trace(message: Any, cause: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingMethods
-
macro
def
trace(message: Any): Unit
- Attributes
- protected
- Definition Classes
- LoggingMethods
-
def
urns: Vector[CtsUrn]
Project all URNs in the corpus to a vector.
-
def
urnsForNGram(gram: String, threshhold: Int = 2, dropPunctuation: Boolean = true): Vector[CtsUrn]
Find passages, identified by URN, where a given ngram occurs.
Find passages, identified by URN, where a given ngram occurs. The value of n is derived from the number of whitespace-delimited tokens in gram.
- gram
The desired ngram, with white space separating tokens.
- dropPunctuation
True if punctuation should be omitted.
-
def
validReff(urn: CtsUrn): Vector[CtsUrn]
Extract all URNs for all citable nodes identified by a given URN.
Extract all URNs for all citable nodes identified by a given URN. Note that it is not an error if the resulting Vector is empty.
- urn
URN identifying passage for which to find node URNs.
- def validReff2(filterUrn: CtsUrn): Vector[CtsUrn]
-
def
versions(urn: CtsUrn): Set[CtsUrn]
Find the set of versions in the present corpus matching a given URN.
Find the set of versions in the present corpus matching a given URN.
- urn
URN to find versions for.
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
macro
def
warn(message: Any, cause: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingMethods
-
macro
def
warn(message: Any): Unit
- Attributes
- protected
- Definition Classes
- LoggingMethods
-
def
~=(filterUrn: CtsUrn): Corpus
Create a new corpus of nodes that are URN-similar to a given CtsUrn, limited to a given Version or Exemplar.
Create a new corpus of nodes that are URN-similar to a given CtsUrn, limited to a given Version or Exemplar. Collect all texts where this URN is cited, then collect citable nodes for the cited version. Note that chaining these filters therefore successively filters the corpus and can be thought of as filtering by logically ANDing the URNs.
- filterUrn
URN identifying a set of nodes to select from this corpus.
-
def
~~(urnV: Vector[CtsUrn], resultCorpus: Corpus): Corpus
Recursively add to a given corpus all nodes in the present corpus that are URN-similar to the first URN in a given vector of URNs.
Recursively add to a given corpus all nodes in the present corpus that are URN-similar to the first URN in a given vector of URNs. When all nodes in the vector have been applied, the result is the final accumulation of all added nodes.
- urnV
vector of URNs to use in filtering the corpus.
-
def
~~(urnV: Vector[CtsUrn]): Corpus
Create a new corpus of nodes that are URN-similar to any CtsUrn in a given vector of CtsUrns.
Create a new corpus of nodes that are URN-similar to any CtsUrn in a given vector of CtsUrns. Note that this can be thought of as filtering by logically ORing the CtsUrns in the Vector.
- urnV
vector of URNs to use in filtering the corpus.
-
def
~~(filterUrn: CtsUrn): Corpus
Create a new corpus of nodes that are URN-similar to a given CtsUrn.
Create a new corpus of nodes that are URN-similar to a given CtsUrn. Collect all texts where this URN is cited, then collect citable nodes for the cited version. Note that chaining these filters therefore successively filters the corpus and can be thought of as filtering by logically ANDing the URNs.
- filterUrn
URN identifying a set of nodes to select from this corpus.
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated @deprecated
- Deprecated
(Since version ) see corresponding Javadoc for more information.