Citable collections
Summary
The task: We want to create a type for working with a collection of the citable books we developed on the previous page. We should be able to filter the collection by appying URN logic to the identifiers for our books. We should be able to write our collection to plain-text format and re-instantiate it from the plain-text representation. And we should be able to apply any Julia functions for working with iterable content to our book list.
The implementation:
- define a new type for a collection of citable books, the
ReadingList
type- identify it as a citable collection (the
CitableCollectionTrait
)- implement filtering the collection using URN logic (the
UrnComparisonTrait
)- implement round-trip serialization (the
CexTrait
)- make the collection available to all Julia functions working with iterable content (
Iterators
)
Defining the ReadingList
Our model for a reading list is simple: it's just a Vector of citable publications. We'll annotate our vector as containing subtypes of the abstract CitablePublication
we previously defined, even though in this example we'll only use our one concrete implementation, the CitableBook
. As with our other custom types, we'll override Base.show
.
struct ReadingList
publications::Vector{<: CitablePublication}
end
function show(io::IO, readingList::ReadingList)
print(io, "ReadingList with ", length(readingList.publications), " items")
end
show (generic function with 288 methods)
Let's see an example.
books = [distantbook, enumerationsbook, wrongbook, qibook]
rl = ReadingList(books)
ReadingList with 4 items
The publications
field is just a normal Julia Vector.
rl.publications[4]
Christopher W. Forstall and Walter J. Scheirer, *Quantitative Intertextuality: Analyzing the Markers of Information Reuse* (urn:isbn10:3030234133)
What will make it different from other Vectors is that it will support a series of CITE traits
Implementing the CitableCollectionTrait
We first want to identify our new type as fufilling the requirements of a citable collection with the CitableCollectionTrait
. We'll repeat the pattern:
- define a singleton type for the trait value.
- override the function identifying the trait value for our new type. Here the function is named
citablecollectiontrait
, and we'll define it to return the concrete valueCitableReadingList
for the tyupeReadingList
.
struct CitableReadingList <: CitableCollectionTrait end
import CitableBase: citablecollectiontrait
function citablecollectiontrait(::Type{ReadingList})
CitableReadingList()
end
citablecollectiontrait (generic function with 2 methods)
citablecollectiontrait(typeof(rl))
Main.CitableReadingList()
Use the citablecollection
function to test if a specific object is a citable collection.
citablecollection(rl)
true
Like citable objects, citable collections should report the type of URN they use for citation.
import CitableBase: urntype
function urntype(readingList::ReadingList)
Isbn10Urn
end
urntype (generic function with 4 methods)
urntype(rl)
Main.Isbn10Urn
The promise we now need to fulfill is that our collection will implement three further traits for URN comparison, serialization and iteration.
Implementing the UrnComparisonTrait
We have previously implemented the UrnComparisonTrait
for an identifer type (the Isbn10Urn
) and for a citable object type (the CitableBook
). In both of those cases, we compared two objects of the same type, and returned a boolean result of comparing them on URN logic.
For our citable collection, we will implement the same suite of functions, but with a different signature and result type. This time, our first parameter will be a URN which we will use to filter the collection given in the second parameter. The result will be a (possibly empty) list of content in our citable collection – in this example, a list of CitableBook
s.
We mark our ReadingList
type as urn-comparable exactly as we did for Isbn10Urn
s and CitableBook
s.
struct ReadingListComparable <: UrnComparisonTrait end
function urncomparisontrait(::Type{ReadingList})
ReadingListComparable()
end
urncomparisontrait (generic function with 6 methods)
urncomparable(rl)
true
Implementing the required functions urnequals
, urncontains
and urnsimilar
To implement the required functions, we'll just lean on the work we've already done: we'll use the boolean version of those functions to filter our collections.
function urnequals(urn::Isbn10Urn, reading::ReadingList, )
filter(item -> urnequals(item.urn, urn), reading.publications)
end
function urncontains(urn::Isbn10Urn, reading::ReadingList)
filter(item -> urncontains(item.urn, urn), reading.publications)
end
function urnsimilar(urn::Isbn10Urn, reading::ReadingList)
filter(item -> urnsimilar(item.urn, urn), reading.publications)
end
urnsimilar (generic function with 6 methods)
If your collection does not allow duplicate identifiers, urnequals
should return a list of 0 or 1 item.
urnequals(distanthorizons, rl)
1-element Vector{Main.CitableBook}:
Ted Underwood, *Distant Horizons: Digital Evidence and Literary Change* (urn:isbn10:022661283X)
Three of the books in our list are published in the English-language zone, and therefore will satisfy urnsimilar
when compared to Distant Horizons.
urnsimilar(distanthorizons, rl)
3-element Vector{Main.CitableBook}:
Ted Underwood, *Distant Horizons: Digital Evidence and Literary Change* (urn:isbn10:022661283X)
Andrew Piper, *Enumerations: Data and Literary Study* (urn:isbn10:022656875X)
Andrew Piper, *Can We Be Wrong? The Problem of Textual Evidence in a Time of Data* (urn:isbn10:1108922036)
But only two are published in the same ISBN area code as Distant Horizons:
urncontains(distanthorizons, rl)
2-element Vector{Main.CitableBook}:
Ted Underwood, *Distant Horizons: Digital Evidence and Literary Change* (urn:isbn10:022661283X)
Andrew Piper, *Enumerations: Data and Literary Study* (urn:isbn10:022656875X)
Implementing the CexTrait
As we did with citable objects, we want to ensure that we can round-trip an entire collection to and from delimited-text format. We'll make our new ReadingList
type implement CexTrait
in the same way as CitableBook
.
struct ReadingListCex <: CexTrait end
function cextrait(::Type{ReadingList})
ReadingListCex()
end
cextrait (generic function with 5 methods)
cexserializable(rl)
true
Implementing the required functions cex
and fromcex
We will serialize our collection with a header line identifying it as citecollection
block, followed by one line for each book in our list. We can format the books' data by mapping each book to an invocation the cex
that we previously wrote for CitableBook
s.
function cex(reading::ReadingList; delimiter = "|")
header = "#!citecollection\n"
strings = map(ref -> cex(ref, delimiter=delimiter), reading.publications)
header * join(strings, "\n")
end
cex (generic function with 5 methods)
cexoutput = cex(rl)
println(cexoutput)
#!citecollection
urn:isbn10:022661283X|Distant Horizons: Digital Evidence and Literary Change|Ted Underwood
urn:isbn10:022656875X|Enumerations: Data and Literary Study|Andrew Piper
urn:isbn10:1108922036|Can We Be Wrong? The Problem of Textual Evidence in a Time of Data|Andrew Piper
urn:isbn10:3030234133|Quantitative Intertextuality: Analyzing the Markers of Information Reuse|Christopher W. Forstall and Walter J. Scheirer
Recall from our experience implementing CEX serialization for CitableBook
s that we will need to expose three mandatory parameters for fromcex
: the trait value, the CEX data and the Julia type we want to instantiate.
function fromcex(trait::ReadingListCex, cexsrc::AbstractString, T;
delimiter = "|", configuration = nothing, strict = true)
lines = split(cexsrc, "\n")
datalines = filter(ln -> !isempty(ln), lines)
isbns = CitableBook[]
inblock = false
for ln in datalines
if ln == "#!citecollection"
inblock = true
elseif inblock
bk = fromcex(ln, CitableBook)
push!(isbns, bk)
end
end
ReadingList(isbns)
end
fromcex (generic function with 9 methods)
To keep this example brief and avoid introducing other packages, our implementation of fromcex
naively assumes cexsrc
will contain a single CEX block introduced by the #!citecollection
heading. This would break on real world CEX data sources: in a real application, we would instead use the CiteEXchange
package to parse and extract appropriate blocks. See the documentation of CiteEXchange
, or look at how a package like CitableCorpus
uses CiteEXchange
in its implementation of fromcex
for different data type.
Once again, we can now invoke fromcex
with just the parameters for the CEX data and desired Julia type to create, and CitableBase
will find our implementation.
fromcex(cexoutput, ReadingList)
ReadingList with 4 items
Free bonus!
CitableBase
optionally allows you to include a third parameter to the fromcex
function naming the type of reader to apply to the first string parameter. Valid values are StringReader
, FileReader
or UrlReader
. The previous example relied on the default value of StringReader
. The following examples use the file RL/test/data/dataset.cex
in this repository; its contents are the output of cex(rl)
above.
fname = joinpath(root, "RL", "test", "data", "dataset.cex")
fileRL = fromcex(fname, ReadingList, FileReader)
ReadingList with 4 items
url = "https://raw.githubusercontent.com/cite-architecture/CitableBase.jl/dev/RL/test/data/dataset.cex"
urlRL = fromcex(url, ReadingList, UrlReader)
ReadingList with 4 items
Implementing required and optional frnctions from Base.Iterators
The Iterators
module in Julia Base
was one of the first traits or interfaces in Julia. It allows you to apply the same functions to many types of iterable collections. We need to import the Base.iterate
function, and implement two versions of it for our new type: one with a single parameter for the collection, and one with a second parameter maintaining some kind of state information. Both of them have the same return type: either nothing
, or a Tuple pairing one item in the collection with state information.
Since our reading list is keeping books in a Vector internally, we can use the state parameter to pass along an index into the Vector. In the version of iterate
with no parameters, we'll return the first item in the list, and set the "state" value to 2. In the two-parameter version, we'll return the item indexed by the state count, and bump the count up one.
import Base: iterate
function iterate(rlist::ReadingList)
isempty(rlist.publications) ? nothing : (rlist.publications[1], 2)
end
function iterate(rlist::ReadingList, state)
state > length(rlist.publications) ? nothing : (rlist.publications[state], state + 1)
end
iterate (generic function with 254 methods)
It is also useful (and trivial) to implement the optional methods for the length and base type of the collection.
import Base: length
function length(readingList::ReadingList)
length(readingList.publications)
end
import Base: eltype
function eltype(readingList::ReadingList)
CitablePublication
end
eltype (generic function with 80 methods)
length(rl)
4
eltype(rl)
Main.CitablePublication
Now our ReadingList
type is usable with all the richness of the Julia interface for iterators. Just a few examples:
for
loops
for item in rl
println(item)
end
Ted Underwood, *Distant Horizons: Digital Evidence and Literary Change* (urn:isbn10:022661283X)
Andrew Piper, *Enumerations: Data and Literary Study* (urn:isbn10:022656875X)
Andrew Piper, *Can We Be Wrong? The Problem of Textual Evidence in a Time of Data* (urn:isbn10:1108922036)
Christopher W. Forstall and Walter J. Scheirer, *Quantitative Intertextuality: Analyzing the Markers of Information Reuse* (urn:isbn10:3030234133)
- checking for presence of an item
distantbook in rl
true
- collect contents without having to know anything about the internal structure of the type
collect(rl)
4-element Vector{Main.CitablePublication}:
Ted Underwood, *Distant Horizons: Digital Evidence and Literary Change* (urn:isbn10:022661283X)
Andrew Piper, *Enumerations: Data and Literary Study* (urn:isbn10:022656875X)
Andrew Piper, *Can We Be Wrong? The Problem of Textual Evidence in a Time of Data* (urn:isbn10:1108922036)
Christopher W. Forstall and Walter J. Scheirer, *Quantitative Intertextuality: Analyzing the Markers of Information Reuse* (urn:isbn10:3030234133)
More free stuff!
The slidingwindow
function does what its name suggests: it creates a Vector of Vectors by sliding a window along a collection.
titles = map(bk -> bk.title, rl)
slidingwindow(titles)
3-element Vector{SubArray{String, 1, Vector{String}, Tuple{UnitRange{Int64}}, true}}:
["Distant Horizons: Digital Evidence and Literary Change", "Enumerations: Data and Literary Study"]
["Enumerations: Data and Literary Study", "Can We Be Wrong? The Problem of Textual Evidence in a Time of Data"]
["Can We Be Wrong? The Problem of Textual Evidence in a Time of Data", "Quantitative Intertextuality: Analyzing the Markers of Information Reuse"]
It can also work directly on a citable collection.
slidingwindow(rl)
3-element Vector{SubArray{Main.CitablePublication, 1, Vector{Main.CitablePublication}, Tuple{UnitRange{Int64}}, true}}:
[Ted Underwood, *Distant Horizons: Digital Evidence and Literary Change* (urn:isbn10:022661283X), Andrew Piper, *Enumerations: Data and Literary Study* (urn:isbn10:022656875X)]
[Andrew Piper, *Enumerations: Data and Literary Study* (urn:isbn10:022656875X), Andrew Piper, *Can We Be Wrong? The Problem of Textual Evidence in a Time of Data* (urn:isbn10:1108922036)]
[Andrew Piper, *Can We Be Wrong? The Problem of Textual Evidence in a Time of Data* (urn:isbn10:1108922036), Christopher W. Forstall and Walter J. Scheirer, *Quantitative Intertextuality: Analyzing the Markers of Information Reuse* (urn:isbn10:3030234133)]
The partitionvect
function partitions a Vector into a series of Vectors of a given size. In contrast to slidingwindow
, the elements in the new Vectors do not overlap.
partitionvect
can work on any generic Vector.
v = collect(1:10)
partitionvect(v)
5-element Vector{SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}}:
[1, 2]
[3, 4]
[5, 6]
[7, 8]
[9, 10]
It also works on any citable collection.
partitionvect(rl)
2-element Vector{SubArray{Main.CitablePublication, 1, Vector{Main.CitablePublication}, Tuple{UnitRange{Int64}}, true}}:
[Ted Underwood, *Distant Horizons: Digital Evidence and Literary Change* (urn:isbn10:022661283X), Andrew Piper, *Enumerations: Data and Literary Study* (urn:isbn10:022656875X)]
[Andrew Piper, *Can We Be Wrong? The Problem of Textual Evidence in a Time of Data* (urn:isbn10:1108922036), Christopher W. Forstall and Walter J. Scheirer, *Quantitative Intertextuality: Analyzing the Markers of Information Reuse* (urn:isbn10:3030234133)]
Recap: citable collections
On this page, we wrapped a citable collection type, te ReadingList
around a Vector of CitableBook
s. We made the type identifiable as a citable collection. We implemented filter of the collection on URN logic with the UrnComparisonTrait
, and serialization with the CexSerializableTrait
. You can test these for these traits with boolean functions.
citablecollection(rl)
true
urncomparable(rl)
true
cexserializable(rl)
true
In addition, we made the ReadingList
implement Julia's Iterators
behavior.