An archival corpus is made up of a set of text files, and an inventory documenting the citable structure of each document.
We can use this TextInventory file with files in this root directory to construct a Corpus.
When serialized as XML, the inventory validates against a Relax NG schema.
For a given corpus, we can determine:
In the example corpus defined above, the inventory contains entries for 3 files.
The file names are:
Index | File path |
---|---|
0 | Iliad-A.xml |
1 | Iliad-Butler.xml |
2 | tier2/Iliad-B.xml |
Their URNs are :
Index | File path |
---|---|
0 | urn:cts:greekLit:tlg0012.tlg001.butler: |
1 | urn:cts:greekLit:tlg0012.tlg001.msA: |
2 | urn:cts:greekLit:tlg0012.tlg001.msB: |
For a given corpus, we can determine:
In the example corpus defined above, the inventory contains entries for 3 files.
These files are found in the file system:
Index | File path |
---|---|
0 | Iliad-A.xml |
1 | Iliad-Butler.xml |
2 | tier2/Iliad-B.xml |
We can determine if the list of files in the inventory have a one-to-one relation to the XML files in the directory hierarchy. We can get names of documents identified in the inventory but not found on disk, and names of files found on disk but not identified in the inventory.
One-to-one match. In the example corpus defined above, the files and inventory do match (have a one-to-one correspondence).
Files on disk missing from inventory. If we use this TextInventory file with the same set of archival files, we can construct a valid Corpus, even though it contains only 1 entry for an online file. We can verify that files listed in the inventory and files on diskdo not match, and can determine that 2 files in the file system does not appear in the inventory, and that the first item (item 0) in the list of missing files is Iliad-Butler.xml.
Files in inventory not found in file system. If, with the same set of archival files, we use a TextInventory listing additional files as online , we can still construct a valid Corpus, even though it contains 3 entries. We can verify that files listed in the inventory and files on diskdo not match, and can determine that 1 file in the file system does not appear in the inventory, and that the first item in the list of missing files is Iliad-C.xml.