Skip to content

Commit

Permalink
new dataversion 0.8
Browse files Browse the repository at this point in the history
  • Loading branch information
dirkroorda committed May 20, 2021
1 parent b1824be commit 126869d
Show file tree
Hide file tree
Showing 46 changed files with 27,505,014 additions and 6 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ Status

This is **work in progress!**

* 2021-05-20 A new TF version (0.8) has been delivered.
When multiple letters occur on one page, the words of the first line
of some letters are not contained in line nodes.
This has been corrected.
* 2021-01-30 A new TF version (0.7) has been delivered.
This version as a major encoding difference: whereas in version 0.6 footnote material ended
up in the values of a feature, now footnotes are treated like text material.
Expand Down
5 changes: 2 additions & 3 deletions docs/transcription.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ There are also node types for other entities, such as volume, letter, page, line
as listed in the *otype* feature and documented below.

All non-slot nodes have a type and are linked to a subset of slots.
The linkage is stored in the *oslots* feature, which is an edge feature: it specifies a edges between
The linkage is stored in the *oslots* feature, which is an edge feature: it specifies an edge between
each non-slot nodes and the slots that belong to them.
This feature is hardly ever used directly, because the Text-Fabric API has functions to
move from containers to containees, the so-called
Expand Down Expand Up @@ -142,8 +142,7 @@ All punctuation, including spaces, is stored on the slot of the preceding word,
in the feature `punc`.

Whitespace will be normalized to single spaces or newlines.
Only the original letter contents and the editorial remarks are stored word by word.
The footnotes are stored one by one, as values of the feature `fnote`, see below.
The original letter contents and the editorial remarks and the footnotes are stored word by word.

Next to the `trans` and `punc` features, there are the `transo`, `punco` and `transr`, `puncr`
and `transn` and `puncn` feature pairs.
Expand Down
6 changes: 6 additions & 0 deletions programs/lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,12 @@ def readYaml(fileName):
""".strip().split()
)

ENSURE_LB_ELEMENTS = set(
"""
head
""".strip().split()
)

A2Z = "abcdefghijklmnopqrstuvwxyz"


Expand Down
12 changes: 10 additions & 2 deletions programs/tfFromTrim.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
META_DECL,
WHITE_RE,
ADD_LB_ELEMENTS,
ENSURE_LB_ELEMENTS,
A2Z,
parseArgs,
initTree,
Expand All @@ -37,7 +38,7 @@
--help: print this text and exit
"load": loads the generated TF; if missing this step is not performed
"loadOnly": does not generate TF; loads previously generated TF
"loadonly": does not generate TF; loads previously generated TF
volume: only process this volume; default: all volumes
page : only process letter that starts at this page; default: all letters
"""
Expand Down Expand Up @@ -108,6 +109,7 @@
isremark
isfolio
isnote
isorig
isspecial
issub
issuper
Expand Down Expand Up @@ -506,7 +508,13 @@ def walkNode(cv, doc, node, cur):
cur[tag] = None
warnings[f"nested: {tag}"].add(doc)

if tag in BREAKS:
if tag in ENSURE_LB_ELEMENTS:
curLine = cur.get("line", None)
if not curLine:
cur["line"] = cv.node("line")
cur["ln"] += 1
cv.feature(cur["line"], n=cur["ln"])
elif tag in BREAKS:
curLine = cur.get("line", None)
if curLine:
linkIfEmpty(cv, curLine)
Expand Down
607 changes: 607 additions & 0 deletions tf/0.8/author.tf

Large diffs are not rendered by default.

607 changes: 607 additions & 0 deletions tf/0.8/authorFull.tf

Large diffs are not rendered by default.

Loading

0 comments on commit 126869d

Please sign in to comment.