Previous Post: Lesson 3: Ontology Re-Use
- Three months in, with additions from the end of the project -
Data modelling in the Semantic Web always means searching a balance between the specific use case and a community-wide standard. All vocabularies (in my experience at least) contain concepts that offer room for interpretation, which can create uncertainty – but also offers the wiggle room we sometimes need to fit our concepts into someone else’s vocabulary. Thus, it cannot hurt to explain the specific use one makes of these vocabularies. This pertains to the use of single concepts as well as to how these concepts are used in combination with others to express a specific content. For example: Using FRBRoo (or, by now: LRMoo) concepts does not mean all the necessary documentation is actually already handled in the FRBRoo documentation. Oftentimes, we only use a small part of the possibilities a certain vocabulary has to offer, and this selection is not self-explanatory. SemanticKraus makes use of FRBRoo, but is far from utilising all its possibilities. For example: While using the F22 Self-Contained Expression resp. the F24 Publication Expression to model texts, it makes no use of the F1 Work at all – a class one might consider central to bibliographic information. SemanticKraus also uses classes from CIDOC CRM, but only a small set; some of them are rather broad terms that are used in a slightly narrower sense, which might in some cases warrant the creation of a custom subclass – a practice that we did not pursue, because our ambition was to create an ontology consisting entirely of reused classes.
These and other project-specific idiosyncrasies create the need for documenting project ontologies even if all classes are re-used from other vocabularies.
SemanticKraus’s data model wholly consists of classes from other ontologies; here are the most important ones (a list can also be found under https://semantickraus.acdh.oeaw.ac.at/resource/app:Model):
FRBRoo, the first one, serves for the representation of bibliographical data (its class names start with an “F”), providing classes for bibliographic items as well as the surrounding events, like publication events (to include text metadata). INTRO, the third one, is used for the modelling of intertextual relations and person mentions in texts (for this domain, INTRO provides classes for text passages as well as text features; these classes start with “INT”). Both are aligned to CIDOC CRM, which provides the concepts that, for the most part, are used to model real-world entities and events that surround the bibliographic and textual data – like events or persons (which occur as persons mentioned in texts or authors of texts; CIDOC CRM classes start with an “E”). The graphical representation of the SemanticKraus data model gives an overview of the most important modules; however, for reasons of legibility, some aspects are omitted: This does not only concern those providing the events mentioned above with time and place, but also a large number of metadata: identifiers, IDs, page numbers. Also, provenance data was added to individual entities as well as to project graphs (which is described in more detail in a 2025 paper in ZfdG)1.
In the upper right part, biographical data, it is easy to see how a person, modelled as a CIDOC CRM E21 Person, is connected to its birth and death events, providing the ‘semantic glue’ for the usual biographical info consisting in date and place of each of these events. E74 Groups are used – somewhat narrowing their scope – to represent political parties, which are an important factor in the biographical data of persons especially from the Dritte Walpurgisnacht project: Since it is focussed on the year 1933 – the year of the Nazi takeover in Germany and of the creation of the text in question –, political affiliations were considered an essential biographical information in the project context. Another information contained in this module of the model is the person’s profession; it is modelled as an F51 Pursuit.
Below this module is the one that’s probably the most populated in the data model: It contains bibliographical data; as one can see here, FRBRoo, just like CIDOC CRM which it is an extension of, makes use of event classes to connect texts to their bibliographical data: An F30 Publication Event is linked to its result, the published text – an F24 Publication Expression (a class that’s merged into F3 Manifestation in the new LRMoo); not in the visualisation, but part of the model is the E52 Timespan and the E53 Place connected to this event, which allow for adding standard bibliographical information on date and place of publication. In some cases – among them the most prominent –, the F24 Publication Expression is contained in another F24 Publication Expression (via R5 has component), where the latter one represents a magazine, journal, or other kind of periodical, and the former its single issues. This is the case with Die Fackel, where one Publication Expression is linked to its 415 published issues via R5 has component. The text as a published entity – the result of the publication process – contains the text as the self-contained intellectual product of the author’s work as represented in the F22 Self-Contained Expression; in many cases, this P165 incorporates other entities of the same class, representing, e. g., the issue v. the texts it contains, or a collection v. the poems it contains, or the volume edited posthumous vs. the edited text. While the F28 Expression Creations are linked to the responsible E21 Persons – the authors –, there is no link between the F30 Publication Event and a person or other actor like the publishing house. Modelling this information is of course possible, but was not required by our data. One addition to these classes is the INT16 Segment. It represents the text as a part of an F24 Publication Expression, or more precisely: the part of an F24 Publication Expression (in LRMoo: the F3 Manifestation) that is taken by the F22 Self-Contained Expression. This is necessary to provide the single texts in, e. g., published issues with basic bibliographical information like page numbers. However, it plays an even more important role in connection to the text passage (which I will come to later on). Some texts were originally provided with creation dates, others with dates of first publication. Others again – plays – where provided only with dates of first performances (which is a common scholarly practice). The diversity of our data thus did not only demand us to provide E52 Time-Spans for the text’s creation and its publication, but also to introduce the F31 Performance (specified with an E55 Type as a ‘first performance’), to connect this performance date to the text in question.
A different kind of textual entity is modelled in the upper left: It concerns legal files, which are not so much defined by bibliographical metadata (or archival metadata), but more by the respective legal cases. Consequently, we decided to create these events as the context for the respective legal files. Again, the text is represented twofold: By its informational content, modelled as an E73 Information Object, and by its carrier, an F4 Manifestation Singleton; both P12i were present at the E5 Event representing the legal case. Persons play a double role in this regard: On the one hand, they are the creators of the files in questions, on the other hand, they are participants in legal cases (as plaintiffs or defendants). This participation was modelled as an E7 Activity the E5 Event P10i contains and that was P14 carried out by the respective person.
Nearly all of the texts in question do not only play a role in this sphere of biographical and bibliographical events. Their significance also lies in their contents, meaning: in the references to other texts or persons they contain. Talking about these references means talking about text passages that contain them. That’s what we have to model. Below to the left, there are two closely related modules centred around the representation of the text passage; on the left there is the more concrete sphere, containing texts as published (ergo quotable) entities resp. as material legal files; to the right, there is the ‘informational’ pendant: these texts/files as intellectual products. Parts of the latter – INT1 Text Passages – are not necessarily linked to any particular, say, edition of a text, but are merely parts of the text as an expression which is manifested in many editions of the same text. They do not have page numbers (which turns out to be an important ontological litmus test to differentiate between levels of ‘concreteness’ when talking about texts); in some cases, they can be located by act or scene numbers or the like, since these units are already part of a text at this stage: Goethe’s Faust has no page numbers, but act, scene, or verse numbers; page numbers enter the scene (no pun intended) as soon as we talk about, say, the Reclam edition in my bookshelf. The INT1 Text Passage’s pendant is the INT16 Segment, a specific part of a published text or file, that as opposed to the INT1 Text Passage can be located on or within a certain edition of a text – and thus provides the text passages, the references to persons and texts, with a page number.
While the INT16 Segments provide the link to concrete reality, the INT1 Text Passage – as informational objects – can contain meaningful content, like references. The mentions of persons throughout the text we modelled are represented as INT2 Actualizations of Feature on these text passages – a textual feature that is identified on the text passage and specified through the feature it ‘actualizes’: an INT18 Reference. This Reference in turn points to the representation of the real-life entity, the E21 Person.
The reference to another text, i. e. the intertextual reference – a feature Kraus’s texts are rich with – is modelled as an INT3 Intertextual Relationship which is in turn linked to the referring text via R13 has referring entity and to the referred to text via R12 has referred to entity. As can easily be seen, these two properties allow for giving the ‘direction’ of the quote, the allusion, etc., and link texts as well as, more granularly, text passages.
SemanticKraus is completed by now; possible changes will likely only be caused by future additions to the data that might demand extensions of the data model. Among the possible additional modules, I think the most promising would be one integrating scholarly discourse into the graph – by modelling Kraus research and its connections to certain texts in his oeuvre. This would require an extension of the model only in so far as the relation of one text ‘discussing’ another would need to be added – while for bibliographical and biographical modelling, current modules could be re-purposed. So far, we had two focus points: Creating the vocabulary to represent our data as best we can, and putting that vocabulary to use. Another entry of this project blog will, however, tackle a third aspect of a project data model: Querying, or, to be more specific: Querying in the context of creating a web representation to provide a comfortable user interface for data exploraiton.
Next Post: Lesson 5: A Short Digression
Bernhard Oberreither: Karl Kraus im Semantic Web. Zur Integration von Editionsdaten in einen gemeinsamen Wissensgraphen. In: Zeitschrift für digitale Geisteswissenschaften 10 (2025). 26.03.2025. HTML / XML / PDF. DOI: 10.17175/2025_003 ↩