Difference between revisions of "Semantic MediaWiki with Property Clusters"

From FollowTheScore
Jump to: navigation, search
m (1 revision(s))
(Abstract)
Line 20: Line 20:
  
 
== Abstract ==
 
== Abstract ==
The paper presents an enhancement of Semantic MediaWiki. While SMW currently has an atomic view on properties we define clusters of properties which eventually resemble well known concepts of object orientation. Using SMW properties we define a meta model which supports classes, relations and inheritance. As a proof of concept we provide an implementation which is based on a set of templates and a few existing MediaWiki extensions. The application model is described with special meta properties; a graphical visualisation is provided automatically. We describe various approaches for editing, varying from classical inline annotation over intensive use of templates up to an integration with the screen forms of 'Semantic Forms'. The article describes a set of improvements and API functions which could become integral part of SMW in a future release. A demo website (http://semeb.com/dpldemo/SMWpc) is available.
+
The paper presents an enhancement of Semantic MediaWiki. While SMW currently has an atomic view on properties we define clusters of properties which eventually resemble well known concepts of object orientation. Using SMW properties we define a meta model which supports classes, relations and inheritance. As a proof of concept we provide an implementation which is based on a set of templates and a few existing MediaWiki extensions. The application model is described by special meta properties; a graphical visualisation is provided automatically. We describe various approaches for editing, varying from classical inline annotation over intensive use of templates up to an integration with the screen forms of 'Semantic Forms'. The article describes a set of improvements and API functions which could become integral part of SMW in a future release. A demo website (http://semeb.com/dpldemo/SMWpc) is available.
  
 
== Keywords ==
 
== Keywords ==

Revision as of 21:13, 25 March 2008

Semantic MediaWiki with Property Clusters

or

Semantic MediaWiki with Personal Classes

 

Dr. Gero Scholz, February 2008
http://semeb.com/dpldemo/SMWpc


Contents


Abstract

The paper presents an enhancement of Semantic MediaWiki. While SMW currently has an atomic view on properties we define clusters of properties which eventually resemble well known concepts of object orientation. Using SMW properties we define a meta model which supports classes, relations and inheritance. As a proof of concept we provide an implementation which is based on a set of templates and a few existing MediaWiki extensions. The application model is described by special meta properties; a graphical visualisation is provided automatically. We describe various approaches for editing, varying from classical inline annotation over intensive use of templates up to an integration with the screen forms of 'Semantic Forms'. The article describes a set of improvements and API functions which could become integral part of SMW in a future release. A demo website (http://semeb.com/dpldemo/SMWpc) is available.

Keywords

Semantic MediaWiki, Semantic Forms, Class, Relation, Inheritance, Meta Model, Proof of Concept, Demo Implementation

Summary

Wiki Technology is rapidly evolving. While there is still a primary focus on easy editing of shared articles with free text, free links and a flexible scheme of categories there are several efforts to add structural support for advanced concepts of knowledge management. Semantic MediaWiki (SMW) is a MediaWiki extension which provides basic concepts for semantic annotation of MediaWiki articles. SMW uses 'typed properties' which can be attached to wiki pages. It also offers semantic browsing aids and a query engine. While this is certainly an important step towards knowledge representation there are still significant deficits.

Currently in SMW every possible combination of properties can be assigned to every article. It is possible to assign multiple values for the same property to the same article. The difference between 'relations' and 'values' which was part of the SMW concept in older versions has been dropped in favor of more generalized 'properties' in the latest SMW release. All this leads to a fairly universal, generic concept. In short, SMW offers a concept of "weak typing" expressed by arbitrary bundles of properties taken from an 'ocean' of all possible attributes which might be useful for annotation.

Practical experience shows that people do not primarily perceive objects as conglomerates of attributes. Instead they classify objects and use well defined names for these classes. Classes in essence are named clusters of properties. In this article we show how a concept of 'strong typing' can be built on top of SMW. We call that concept Semantic MediaWiki with property clustering (SMWpc). One might also call it 'Semantic MediaWiki with personal classes'. This interpretation would emphasize the fact that the process of defining classes depends on the personal perspective of the author and the purpose he has in mind. A bicycle and a football would probably in many cases belong to different classes but for a sports shop they might both be classified as 'sellable goods'.

'SMWpc' is a proof of concept which is already usable for small wikis. It is based on SMW, a few other MW extensions and some tricky MW templates. To improve performance and robustness a more professional implementation should be made by extending the current php source code of SMW. Such an implementation could also offer enhanced navigation and querying based on the concept of classes, relations and inheritance.

Idea and Concept

The majority of all concepts in software development deal with classification and abstraction, starting from ancient Algol up to entity-relationship modelling and all flavours of object orientation. All these proven and well established concepts (including inheritance hierarchies and polymorphism) should also be available in a truely semantic wiki. There is no need to reinvent the wheel on this conceptual level -- RDF and OWL (with different levels of detail) point into the same direction.

The question is how these powerful concepts can be integrated into a 'simple' wiki without losing the primary value of wikis: Ease of use and a low threshold for novice users. After all, a wiki contributor is an 'author' and not a clerk who enters values into an IT system with several hundred screen forms. The freedom to start with plain text and to add more structure only when you feel it is needed should be preserved. Templates can play a good role when trying to keep the balance between reliable structure and authors´ freedom.

Normal Wiki authors have no education in IT. Wiki administrators may have such an education but their focus often lies on technical issues like installation, user administration and database maintenance. What is needed for a 'semantic wiki' is an information engineer who understands the knowledge domain, has a good feeling for 'evolution of structures' and who has inhaled the above mentioned concepts of abstraction and classification. Let´s call him the wiki knowledge engineer.

The knowledge engineer typically would start with a small, weakly structured collection of articles which have some commonalities. His approach would be "Let us use as much structural support as possible - but let us decide when and how much of it we are going to use".

Once a wiki grows and there is a lot of information the knowledge engineer will create a 'model'. A semantic wiki should allow him to document such a model in a formal way and use it to support queries and (maybe) entering more data. The knowledge engineer must constantly monitor the ratio between the size of the 'information model' and the total amount of information in a wiki. Encyclopedic wikis will have a lower ratio than specialized wikis with closer scope and more elaborated relationships between the articles.

SMWpc (Semantic wiki with personal classes) is a proof of concept for enhanced wiki knowledge engineering. It demonstrates how classification concepts can be implemented on top of the current MediaWiki infrastructure, using normal wiki templates and some of the more popular extensions (Semantic MediaWiki, Dynamic Page List, Wiki Graph, Variables, simple forms).

SMWpc introduces the concept of nestable 'Classes'. Classes correspond to traditional MediaWiki categories. Properties are tied to Classes. MediaWiki articles are seen as 'instances of a class' (objects). Of course they are still 'wikitext with chapters and hyperlinks and category links'. They still are 'wikitext with associated property values'. But the focus in SMWpc is on the semantic model. The model defines which properties make sense for instances of certain classes. It also makes a difference between properties which hold a value ('attributes') and properties which point to another object. In the latter case the model can state that the link target must be an object which belongs to a certain class.

The graph below describes the general idea; there is also a more elaborated graph of the meta model.

SMWpc MetaModel.png

Focus

The main focus of SMWpc is on small and medium-size wikis (<10.000 pages) which have a dedicated focus. Their user communities agree on a common scheme for classification of articles and they want better support for collecting highly structured information (using properties). An example could be a wiki in the area of molecular genetics but it could also be a wiki about pets where you have classes like species, food, disease etc. It is quite clear that a property named 'symptom' belongs to class 'disease' and not to 'food' or 'species'. With SMWpc there is a way to express this. While it may make a lot of sense to have multiple values for the 'symptoms' of a 'disease', there should only be a single value for the property 'maximum age' of class 'species'. The property 'likes' must contain a reference to an instance of 'food' and not to a 'disease'. With SMWpc you can express this and much more.

SMWpc Focus.png


Model and Meta Model

As said before, classes are named clusters of properties. Articles are seen as 'instances of' classes. The properties of a class -- and thus of its instances -- may be mandatory or optional. It may be possible to assign a set of values for a given property to an object ('multi-value properties') or it may be required that at most one value ('single-value properties') is assigned. There may be restrictions for the value set of a property. SMW offers the concept of 'Types' for this. On a semantic level it may be not enough to declare a property of type 'Page'. In SMW this means that the value of that property is the name of another wiki page. We want to be able to restrict the values to pages which are instances of specific classes.

All this can of course be expressed in a narrative text. But it is much better to use SMW´s capabilities for that purpose. So SMW is used to describe enhancements of SMW. This reflective way of usage is typical for 'meta models'. SMW already contains some reserved words for built-in meta properties. (e.g. 'has type'). The power (and complexity) of a semantic description framework lies in the structure of its meta model. Based on the meta model the "real models" (application domain models) can be built.

Before we are going into details we have to consider possible name clashes between the meta level and the application level.

Naming Rules and Design Principles

leading '.' for meta level

When describing classes or properties we use normal SMW properties. This could cause conflicts with the 'real' properties of the application model. We therefore use the convention to start all property names which are related to the meta model with a leading dot ('.').

Use of templates

It can be very useful to use templates for the assignment of SMW properties. This leads to a 'decoupling layer' which normally is very thin but can be used to add additional concepts (You will see examples for this later).

It was therefore decided to make all property assignments through templates. The templates use the same naming convention. Sometimes there is a 1:1 correspondence between template and property, sometimes a template is used to assign more than one property. In some (rare) cases we have two different templates which can be used to assign the same property. Of course there must be good reasons for this because this may cause intransparency.

Examples

  • The fact that one class can be a refinement of another class is expressed by a property named '.class extends'
  • There is a template called 'Template:.class definition' which takes (among others) a parameter called extends. This template assigns the name of this parameter (i.e. the ancestor´s class name) as a value to the property .class extends of the Class description article.
  • In the same way this template assigns values to properties like '.class icon' or '.class color'
  • There is a property called .prop describes which ties a property to a class. Note that the same property may be useful for several classes.
  • This property is assigned by 'Template:.prop describes'. This template contains just one line of code.
  • Consequently, the fact that a certain template is used to assign a value to a certain property is also expressed by a property: '.prop assigned by'.

Common prefix for clustering

As you see from the examples, the naming rules do not only require a leading dot but also set up conventions for clustering meta properties (and meta templates) by a common prefix like '.class ***' or '.prop ***.

Currently we use the following prefixes:

.obj
for templates that apply to objects
.class
for Class-related templates and properties
.cat
for templates which describe categories
.instances
for templates dealing with all instances of a class
.prop
for Property-related templates and properties
.type
for Type-related templates and properties
.model
for templates which are used to generate the model graph
.form
for templates which are used to support the creation of whole objects

Access to the meta model

We chose to implement the SMWpc meta model by using standard mechanisms of SMW. This makes it possible to write queries which operate on the meta model in very much the same way that is used for regular queries on application level. The capability of "reflection" is of great value; it is a constitutional part of many languages and environments (smalltalk, java, relational databases). Using this reflection API we are able to create generic object browsers as you will see later.

Consequences for SMW

We propose that the same conventions are also applied to future versions of SMW. Property names on the meta level like 'has type', 'imported from' or 'display units' should no longer be used. These words or phrases are far too common; they may be useful on application level and they are not recognizable as parts of the meta model.

We also suggest that future versions of SMW allow full access to the meta model. Queries on the meta level should work in the same way as normal queries. The analogy to a database system is obvious: you can use SQL for queries on user tables and on the meta tables (catalogue tables) as well.

Classes

Current functionality of SMWpc

Ideally a separate namespace 'Class' should be used for articles which describe classes. The author tried this approach first but ran into some problems with the current implementation of SMW 1.0. So we use normal pages with a certain naming convention (class description articles must start with the word 'Class'). The most important thing of a class is the list of its properties. Technically this is implemented by a meta property '.prop describes' which states that a certain property can be used in conjunction with instances of a certain class. You will find this meta property in the definition of the respective Properties, not in the class definition. When viewing a class definition article you will of course see the list of its properties. That list is generated on the fly from this meta property.

In short, the following rules apply to SMWpc classes:

Defining a Class

  • Class definitions are wiki articles in the main namespace which start with 'Class ', followed by the class name.
  • Class names start with a capital, e.g. 'Class Foo'.
  • A class is described by calling a template called '.class definition'. This template expects
    • .. the class name ('Foo')
    • .. an optional color in #rrggbb notation; this color can be used to support a coloring scheme which corresponds to the classes
    • .. an optional icon file name; can be used to show icons instead of the class name where it seems appropriate (e.g. in the headline of class instances)
    • .. an optional base class name
    • .. a short decriptive text
    • .. a pass-through parameter which acts as a filter when displaying a list of instances (selection); you may assign a default value here
    • .. a pass-through parameter which acts as a view definition when displaying a list instances (projection); you may assign a default value here

Behind the scenes

The class definition template ...

  • sets the corresponding SMWpc meta properties (.class color, .class icon, .class extends, .class doc)
  • generates a navigation menu which offers the complete list of classes as navigation targets; the background color of the menu corresponds to the color of the current class
  • assigns the class article to a wiki category named 'Class'
  • produces a descriptive summary of the class which is returned as output to the user
  • produces a list of instances of the current class; this can be done in a generic way due to the reflection principles used; the filtering criteria (selection and projection) specified by the user are applied here

What you will see

The summary view produced by the '.class definition' template will contain:

  • class selection box
    • list of available classes
    • link to the category which corresponds to the class
    • link to the application model
    • links to create new properties and/or new classes
    • link to the meta model and to SMWpc documentation
    • class icon (if there is one defined)
  • instance list
    • list of selected instances
    • using the columns from the defined view
    • a small form where you can change selection and view
  • class description
    • parent hierarchy
    • all direct subclasses
    • list of all properties (with name, type, description etc.)
    • a list of properties (of other classes) which can serve as references to instances of the current class ('inbound pointers')
  • various links to ..
    • create a new instance (in classic mode and/or forms mode)
    • create an initial version of the category article which corresponds to the current class
    • object data template
    • object form
    • object lister

Example

see Class Student

Possible extensions and enhancements

The following concepts are not implemented so far but could easily be added:

  • A meta property for the plural flection of the class name. This looks almost unnecessary in English but there are also other languages on earth.
  • A male / female variant of the class name ('actor' / 'actress') could be defined. Ideally SMWpc would be aware of a meta property '.class sex' and could apply the correct variant where appropriate
  • Classes without parent could automatically point to a common master class (may be called 'Class Class') to create a single rooted tree of all classes.
  • A class might be declared as virtual. In this case you could have no instances of such a class. If you wanted to use classes to model certain aspects of things ('Class Perishable') such a concept might be useful.
  • Also the contrary is conceivable: A class could be declared to be 'final' which would mean that you could not derive subclasses from it.
  • The list of 'incoming references' is currently calculated without consideration of inheritance.
  • It should be possible to add meta attributes (like author, number of recent views, date of last edit) to the instance list.

Properties

Current functionality of SMWpc

Property definitions are wiki articles which live in the reserved namespace 'Property:'. If they belong to the meta model their name will start with a dot.

SMWpc properties are described as follows:

Property Scope

The scope of a property allows us to make a distinction between:

  • native SMW-internal properties like 'has type' ("smw")
  • properties that relate to Semantic Forms ("sf")
  • properties that relate to Semantic Forms with personal classes ("smwpc")
  • "normal" properties that relate to the user domain ("user")

The scope of a property is defined by a meta property named .prop scope.

Property Definition

A Property is defined by using 'Template:.prop definition' which takes the property name, an optional color, an optional icon file name and a descriptive text. The template assigns meta properties named '.prop color' and '.prop icon' and '.prop doc'. Sometimes it may be useful to use small icons instead of property names. Care should be taken with colors as it is normally more than enough to establish a modest coloring scheme based on classes (and not on properties).

Assignment to a Class

Each Property should be assigned to at least one class. You can have Properties without such an assignment; but these are not recognized by SMWpc. Assigning a property to a class is done by using 'Template:.prop describes' which assigns the specified class name as value to the meta property '.prop describes'. It is possible to assign a property to several different classes. It does not make sense to assign the same property to classes where one class is an extension of the other one.

Assignment of values by templates

As said before all SMW properties are assigned by the use of templates. Normally this will be one template with the property´s name. Use the template '.prop assigned by' to specify the name of the template(s) which is (are) used to assign a value for this property. Using smore than one template allows to have alias names for properties. Thus a student could be said to 'attend' a certain university or to 'visit' it or to 'study at' the university.

Mandatory and optional Properties

A Property may be declared to be 'mandatory'. This means that for every class where it is applicable all instances must set a value for this property. The current implementation does not check this. The default is that a property can be used together with a class which it 'describes' but it need not be used. If a property can be used with more than one class it must either always be mandatory or never.

Uniqueness of Properties

A Property may be declared to be 'unique'. This means that there must be at most one value of this property for each instance. The current implementation checks this (although in a rather inefficient way). If a property can be used with more than one class it must either always be unique or never.

Deprecation of SMW Property Inheritance

It is discouraged to use SMW´s features for Property inheritance (built-in 'Property:subproperty of'). Inheritance between single properties is conceptually a little bit strange. Using SMWpc's class inheritance makes it unnecessary to use this feature.

Reference Properties

A Property can be declared to create a reference to other class instances ('Reference Properties': '.prop references'). The value of such a property is the name of an article which is an instance of the specified class. Assigning a reference property more than once to an article creates a set of references (if the model allows for that -- see 'uniqueness of properties'). It is possible to specify several different classes as possible reference targets of a property. Although this may look a little strange it sometimes may be useful.

It is possible to define a dedicated name for the inverse relation using 'Template:.prop reverse' (which sets the meta property 'Property.prop reverse'). Such names are quite useful when generating output in query results. Assuming that you have a Property named 'teaches' in your model which points from an instructor to a student you might be able to create a list which explains that certain students 'are taught by' certain instructors.

Polymorph References

Sometimes you want to use the same verb to assign different properties. A person might 'play' chess and 'play' the violin. But in your information model you will have two different properties for this (like 'plays instrument' and 'plays game'). Both property definitions will state that they are assigned by 'Template:plays'. The template in such a case has the responsibility to check the class of the referenced object and to set the correct property based on this decision (see example).

Algorithmic Redundancy

A Property can be declared to be algorithmically derived from other properties. The corresponding functionality may either be built into the query engine or the redundant properties may explicitly exist in the database. In the latter case they will typically be calculated by the template which is used to set the basic property from which the other one is derived. For example a predicate like 'is adult' could be derived from the day of birth and the current date. Algorithmic redundancy is expressed by 'Template:prop derived from' (which sets 'Property:.prop derived from').

Possible extensions and enhancements

  • A check could make sure that a property is not said to 'describe' two classes where one class is an extension of the other one.
  • For properties which are used as a reference to other class instances the target class conformance should be checked. The inheritance tree must be considered. If the target class was e.g. 'Person' a link might point to an instance of class 'Student' (assuming that Student is a subclass of Person).
  • Checking for mandatory properties should be implemented.
  • Checking for uniqueness of property assignments should be implemented in a more efficient way.
  • In some cases it might be useful to define the target class for reference properties in a generic way (like 'the same class as the source'). This may or may not include classes which are derived from the current class. What would that be good for? For instance you could define properties named 'comparable' or 'competes with' in that way. Now imagine that you write something like: Johannes Brahms is a _classical composer_. He was often _compared_ to _Anton Bruckner_. The inference engine should now suppose that 'Anton Bruckner' (although there may be no information on this object elsewhere in the wiki) is a 'classical composer'.
  • One could describe correlations between properties. Let us assume we have a property called 'Profession' and another property 'plays instrument'. We might want to have a way to express that 'conductors' often 'play an instrument' as well. In essence the SMWpc concept of classes is the description of property clusters (or correlations) but it currently only simply states that a property may be applicable to instances of a class or not. It does not say anything about the correlation of values. Currently it does not even allow to state that a certain optional property of a class becomes mandatory if another (optional) property has been set or if another property has some specific value.

Objects

Declaring an article to be an instance of an SMWpc class

SMWpc offers a template which declares a MediaWiki article to be an instance of a SMWpc class.

Manipulation of Properties

SMWpc offers a small API which is used to get and set property values. The 'set' methods do some basic checking of property assignments against the class model.

Inbound references

There is a template which returns a list of references to the current object. This is not the same as the classical "what links here" as we use the semantics of the class model instead of plain MW hyperlinks to calculate that list.

Types

SMWpc uses the SMW Property Types as they are. The only thing we do is to add a SMWpc property called 'type built-in' which is set to 'true' if the Type belongs to the basic types of SMW.

Categories

Current functionality of SMWpc

Defining a category

SMWpc offers a one-click way to create a category for each class and it assigns each instance of the class automatically to that category. The name of the category must be the same as the class. So for 'Class Foo' you will have a 'Category:Foo'. SMWpc provides a template called '.cat definition' which is used in the category article. This template simply expects the name of the category; this is needed for technical reasons although it looks redundant; you must not use {{PAGENAME}} here.

Behind the scenes

The template displays the class description and assigns the category to a common super category called 'Class'. If the corresponding class has a parent class the category will be assigned to a correspondig super category, too. Thus the category tree will stay in sync with the inheritance structure.

What you will see

The category page will contain:

  • a statement saying that articles in this category are 'instances of' a certain SMWpc class.
  • the class description text

And, of course, you will see the standrad MediaWiki list of articles pertaining to the category.

Possible extensions and enhancements

The current design of SMWpc leaves categories mainly as they are. One could replicate certain parts of the Class article in the category article - assuming that novice users will primarily use the concept of categories. Only a simple change in 'Template:.cat definition' would be needed.

Meta Model

Current functionality of SMWpc

The meta model shows how existing concepts of MediaWiki and SMW are extended by SMWpc. Black color is used for MW, blue stands for SMW, red/brown color shows the additions introduced by SMWpc.

{{#wgraph:name=class meta model|thumb=80|svg|

nodetype legend { bordercolor white color white level 0 font helvR10 }
nodetype SMWpc { color #ffeeee textcolor darkred bordercolor #ff6666}
nodetype SMW   { color #eeeeff textcolor blue    bordercolor #6666ff}
nodetype SF    { color #ffccff textcolor magenta bordercolor #ff66ff}
nodetype wiki  { color #eeeeee textcolor black   bordercolor #666666}
nodetype user  { color #eeffee textcolor black   bordercolor #66ff66 }
nodetype *     { font helvO12  align left }
edgetype SMWpc { textcolor darkred   color red        }
edgetype SMW   { textcolor blue      color blue       }
edgetype user  { textcolor darkred   color darkgreen  }
edgetype wiki  { textcolor black     color black      }
edgetype *     { font helvO10 textwidth 25}
node Property {label 'Property
* display units
* imported from
* provides service
\f09* .prop scope
* .prop reverse
* .prop doc
* .prop mandatory
* .prop unique
* .prop icon
* .prop color' href 'Special:Properties' textwidth 30 type SMW} node Form {label 'Form' href 'Special:Forms' type SF} node Field {label 'Field' type SF} node Type {label 'Type
* allows value
* corresponds to
\f09* .type built-in' href 'Special:Types' textwidth 30 type SMW} node Template {horizontal_order 10 type wiki} node ObjForm {label 'Object Template\n(~data template of SF)' horizontal_order 15 type SMWpc} node Category {horizontal_order 20 level 2 type wiki } node Class {label 'Class
* .class color
* .class icon
+ .class doc' horizontal_order 5 type SMWpc} node Article {label 'Article
\f01* equivalent URI' color lightgreen textwidth 30 type user} edge Class Category {label 'corresponds to' linestyle dotted type SMWPc}
backedge Property Template {label '.prop assigned by
(1..n)' type SMWpc } backedge Property Class {label '.prop refers to
(0..1)' type SMWpc } backedge Property Class {label '.prop describes
(1..n)' type SMWpc } backedge Property Property {label '.prop derived from
(0..1)' type SMWpc } backedge Property Property {label 'subproperty of' type SMW } edge Category Form {label 'has default form' type SF } edge ObjForm Form {label 'works together with' type SMWpc } nearedge Field Form {label 'is part of' type SF } nearedge Field Property {label 'represents' type SF } backedge Class Class {label '.class extends
(0..1)' type SMWpc } edge Property Type {label 'has type' type SMW } edge Article Template {label 'uses templates to assign\nproperties individually' linestyle dotted type user } edge Article ObjForm {label 'uses template to\ndescribe the whole object' linestyle dotted type user } nearedge ObjForm Template {label 'uses template to assign properties' type SMWpc } nearedge ObjForm Category {label 'assigns article to\na category' type SMWpc } edge Article Class {label '.obj is a
(0..1)' type user } edge Article Category {label 'automatically becomes member\nof a category which corresponds\nto the Class' linestyle dotted type user } edge Category Category {label 'is part of' type wiki } edge Article Property {label 'is annotated with properties according to the templates' type SMW }
splines yes

}} This is the common meta model for Semantic MediaWiki (SMW), Semantic Forms (SF) and Semantic MediaWiki with Property Clusters (SMWpc). Blue attributes and relations are part of SMW, red/brown attributes and relations belong to SMWpc. The green color stands for the user´s document and magenta is the color for SF.

Possible extensions and enhancements

Maybe one could allow multiple inheritance. But it makes things quite complicated. So this concept was intentionally left out.


Application Model

Current functionality of SMWpc

As application grow over time it is important to always have a consistent view of the current state. SMWpc uses a graph generator to automatically produce a diagram of the Classes and their Properties. In addition it produces a table view.

In analogy to UML we use the following layout conventions:

  • Classes are represented by rectangular boxes
    • The class name is in black
    • normal properties are listed inside the box and have a '+' as prefix
  • Properties which constitute a reference to other Classes are shown as lines pointing to them.
    • the template name used to assign the underlying Property is in black
    • The Property name itself is in brown
    • the reverse name of the Property is in green
  • Inheritance is shown as dotted blue lines.

A sample model might look like this:

{{#wgraph:name=class model|thumb=60|svg|

layout_algorithm = dfs
nodetype legend { bordercolor white color white level 0 font helvR10 }
nodetype * { font helvO12 color lightyellow bordercolor darkyellow align left }
edgetype * { font helvO10 textwidth 25 }

{{#replace:

 node 'Freshman' { label "Freshman\f13" href 'Class Freshman' color #{{#vardefine:x|{{#replace:{{#ask:format=list|Class Freshman.class color::*}}|/^[^\]]*\]\] */|}}}}{{#var:x}} }
 node 'Game' { label "Game\f13" href 'Class Game' color #{{#vardefine:x|{{#replace:{{#ask:format=list|Class Game.class color::*}}|/^[^\]]*\]\] */|}}}}{{#var:x}} }
 node 'Location' { label "Location\f13" href 'Class Location' color #{{#vardefine:x|{{#replace:{{#ask:format=list|Class Location.class color::*}}|/^[^\]]*\]\] */|}}}}{{#var:x}} }
 node 'Musical Instrument' { label "Musical Instrument\f13" href 'Class Musical Instrument' color #{{#vardefine:x|{{#replace:{{#ask:format=list|Class Musical Instrument.class color::*}}|/^[^\]]*\]\] */|}}}}{{#var:x}} }
 node 'Person' { label "Person\f13" href 'Class Person' color #{{#vardefine:x|{{#replace:{{#ask:format=list|Class Person.class color::*}}|/^[^\]]*\]\] */|}}}}{{#var:x}} }
 node 'Student' { label "Student\f13" href 'Class Student' color #{{#vardefine:x|{{#replace:{{#ask:format=list|Class Student.class color::*}}|/^[^\]]*\]\] */|}}}}{{#var:x}} }
 node 'Subject' { label "Subject\f13" href 'Class Subject' color #{{#vardefine:x|{{#replace:{{#ask:format=list|Class Subject.class color::*}}|/^[^\]]*\]\] */|}}}}{{#var:x}} }
 node 'test' { label "test\f13" href 'Class test' color #{{#vardefine:x|{{#replace:{{#ask:format=list|Class test.class color::*}}|/^[^\]]*\]\] */|}}}}{{#var:x}} }
 node 'vehicle' { label "vehicle\f13" href 'Class vehicle' color #{{#vardefine:x|{{#replace:{{#ask:format=list|Class vehicle.class color::*}}|/^[^\]]*\]\] */|}}}}{{#var:x}} }
 node 'car' { label "car\f13" href 'Class car' color #{{#vardefine:x|{{#replace:{{#ask:format=list|Class car.class color::*}}|/^[^\]]*\]\] */|}}}}{{#var:x}} }

{{#ask:format=template|link=none|template=.model extends| .class extends::+ .class extends::*}}

{{#ask:format=template|template=.model prop| .prop describes::+ .prop describes::* .prop refers to::* .prop reverse::*.prop assigned by::*}} |/(..SMW::o[fn]f?..)/|}}

node legend { type legend label "class model for\nfollowthescore.org/dpldemo" }

}} {{#ask:mainlabel=Property|sort=.prop describes|.prop describes::+.prop describes::*.prop refers to::*.prop reverse::*}}

Possible improvements:

  • mandatory properties should have a '+', optional properties should have a '?' as a symbolic hint
  • consequently, multiple properties should have twin symbols ('++' or '??')

Paradigms for writing arcticles

Standard MediaWiki uses 'free style' text articles plus untyped links and flexible categories to represent knowlede. SMW adds the concept of typed Properties. SMWpc adds (on top of both) the concept of 'Classes' including typed relations and inheritance. This reduces the freedom of authors but it may also have a benefit as it can be used to assist them in creating 'valid' articles, i.e. articles ('objects') which use the right set of Properties with regard to their Class.

Inline annotations

Normally we write a wiki text and wherever we feel it appropriate we add annotations. We use some 'inline syntax' which adds a certain overhead to the normal flow of text. But still we have full control where we add annotations and how many of them we regard as useful. We call this way of information representation inline annotation.

If an author wants to add a set of standardized descriptive attributes to his text he will create a template and use the attribute values as parameters. The template will insert theses values into his text, typically as a nice little table. And (optionally) it may also set Properties which correspond to the template parameters.

Object Data Templates

The concept of Classes makes it possible to go one step further in that direction -- using object-specific templates which cover the whole object. We call these templates 'object data templates (ODT)'. The idea is that all information for an object is technically given in the form of a call to the ODT. So your article will just call the ODT and you put all contents into the parameter list. This may sound very strange at first glance but practice shows that there are situations where this way of gathering information has advantages over the traditional wiki way.

This is especially true for 'small' objects with a high number of formally defined Properties. In such cases the user can be guided by the ODT, i.e. the empty template shows him what kind of information is expected. SMWpc offers ODTs as an additional, alternate way to collect information. One should have in mind that we are not talking about GUI forms. Instead we are talking of a pre-defined document structure which is in alignment with the Class definition. The idea of ODTs is much closer to XML-trees (DOM) than to screen forms.

Using an ODT does not mean that every parameter value must become a property value of the object (although you could design an ODT like that). While some of your parameters may be quite simple (a word, a number, a sentence, a link) others may consist of several text paragraphs including headlines on various levels and images. So some parameters may have a corresponding Class Property and others are simply a means of structuring your text.

The concept of ODTs only makes sense if there is an appropriate structure which will be accepted by the authors of a wiki because it is considered to be helpful for a certain knowledge domain. A typical wiki may have 70% articles in traditional form and 30% of the articles using ODTs.

The ODT will have to care for a nice and pretty optical presentation of your information. So, very similar to XML, we gain back some separation between content and layout. But this does not mean, on the other hand, that it is forbidden to to add some layout hints to the ODT parameter values. It is a question of style, design and habit how much layout shall be allowed. The best way to think of it is, that ODTs hide the 'master layout' from the editor´s eyes while he is still able to apply incremental, marginal layout to his work.

A simple example for an ODT can be found in the demo articles which come together with SMWpc (open the text in edit mode!).

Comparison between Inline Annotations and Object Data Templates (ODTs)

First it is important to understand that SMW and SMWpc are an optional offer to the users of a MediaWiki. You can have 'normal' articles, articles with native SMW annotations and SMWpc class instances in one wiki peacefully side by side. You can assign Properties directly with the '::' syntax or you can use small templates for that purpose (as is suggested by SMWpc). If you want, you can put your whole article text into one template call using the idea of 'object data templates'.

So there is more choice - more possibilities to apply elegant ways of knowledge representation -- and more possibilities to create a real mess...

It would be nice to have some criteria when to use which representation. Especially because this makes a (minor) difference to the authors and, of course, because it makes a major difference for the designer of the wiki.

The following table gives a summary:

Aspect Standard Wiki SMWpc with Object Data Templates
Paradigm 1 a collection of stories a collection of fact sheets
Paradigm 2 things are somehow connected to oneanother by 'free association' objects have distinct typed references between each other
When to use Broad range of topics, weakly structured text, no common scheme applicable High degree of structural similarity between certain instances of your knowledge domain. Commonly agreed 'reasonable' scheme on how to present information
output / appearance heterogeneous, totally left to the user (apart from the sporadic use of templates which produce some standardized pieces of text homogeneous, standardized scheme how information is presented; there may be areas where "stories" are embedded, but they have their fixed place in the overall schema design.
Navigation The author puts hyperlinks where he feels it makes sense. The kind of relationship which goes along with a link can only be derived from careful reading the text portion around the link. The system expects references at some pre-defined positions and assigns a semantic meaning to them. The reader will find such references always in the same place and can traverse them backwards specifically. Even reports are possible.
Burden for the average article writer
  • low because
    • .. he can start with an empty page
  • high because
    • .. he must master the whole topic mentally and he must find a logical way to present the contents
    • .. he must think about navigation within his text and recognize which links to other articles are desirable
  • low because
    • .. he is confronted with a pre-designed structure which (hopefully) covers all relevant aspects
  • high because
    • .. he must understand that structure and accept it even if he had deliberately taken a much simpler approach to note his 'statements'.
Burden for the wiki designer

EX POST approach:

  • low effort because
    • .. he can wait for things to happen
  • high effort because
    • .. he must invent proper categories and assign them to existing articles
    • .. he must invent and apply templates after having recognized similarities in certain articles
    • .. systematic changes must be done manually

EX ANTE approach:

  • low because
    • .. once the schema is there the quality of contents and navigation will normally be satisfying
    • .. systematic changes can be applied by scripts or template changes
  • high because
    • .. he must understand the knowledge domain before the majority of articles are written
    • .. he must care for appealing optical presentation, suitable navigation and reports, based on a sufficiently stable meta model
Import / Export The contents can technically be exported as XML but the contents is opaque, i.e. it is nothing more than a sequence of characters in the XML scheme. The text can be exported as semantically structured XML or as a csv with named columns.

Screen Forms for data entry

People knowing Semantic Forms may note a conceptual parallelism. The Semantic Forms extension (SF) has a concept which is very close to our object data template. SF allows to define a screen form where the fields correspond to Properties of the article. This screen form works together with a 'data template' which sets the property values and cares for a pretty representation of the article. The data template also is able to 'show incoming relations' based on one or more properties which the designer can freely choose.

With a few minor changes we can use a data template generated by SF also as a object data template in SMWpc.

  • The main point is that we must set the meta property '.obj is a' within the template.
  • Then we must set the SF meta property 'has default form' in the category corresponding to our class so that it points to the ODT.
  • And third we should add a link to SF 'Special:AddPage' so that the user can use the form driven data entry dialogue.

See our little Flute example:

  • The 'group' Property is displayed in the fact box at the right side
  • The flute players are shown there as well (based on the Property 'plays instrument')
  • The 'edit with form' tab is available when browsing Flute.
  • Template:Musical Instrument was generated by SF and only slightly modified (adding .obj is a)
  • Category:Musical Instrument contains the 'has default form' reference to 'Form:Musical Instrument'.
  • Class Musical Instrument offers a link to create a new instrument using Semantic Forms.
  • Note that during form based editing of a musical instrument article the 'invented in' Property is based on an auto completion list of 'Locations'.

Maybe a future release of the SF form generator could read the SMWpc meta model and generate a proposal for the form and the corresponding data template. Even attributes like 'mandatory' or 'multiple' could be taken from the SMWpc model. For the user (= designer of the semantic model) such an integration could make a lot of sense.

Whether one likes screen forms or not is at least partially a matter of taste. While SMWpc can be used to cooperate with SF it also offers support for the more traditional way of editing wiki text. There are several templates which make it easy to create an ODT for a given SMWpc class. Editing is supported by an 'edit intro page' which explains the parameters of the template. Try to create a new Student in our example to see how it works.

Conceptual issues of Editing

Where should a piece of information be given?

Normally all statements in an article affect the 'current object'. But sometimes you have the desire to tell something about a closely related different object. Although this is not very 'canonical' it is how many people think and write. So we want a syntax for changing the 'current object', maybe with a stack engine where you say: We have now been talking a lot about XX. Hold on for a moment, I must just tell you something about Y [push Y].... Now let´s return to our main subject [pop]. (Context stack).

Let us assume that we have a model where a 'Person' can 'teach' other 'Person's. A canonical way to enter property values for this relation would be to write statements like

Peter _teaches_ _Lucie_in chess.

This statement would have to appear in a document named 'Peter'.

If we defined 'taught by' as the reverse relationship we could also write

_Lucie_ was _taught by_ him for two years.

This statement (still within the article named 'Peter') would create the same kind of relationship.

It would be helpful if we could state the same fact also in a document about Lucie.

She was _taught by_ _Peter_ playing the violin when she was 19.

We might want to add that _Peter_ was a renowned Hungarian composer. As long as there is no page on 'Peter' this might be better than nothing. Once somebody is going to create a page about _Peter_ he should see all existing fact statements about _Peter_ (which may be scattered over existing articles).

Sometimes it is not obvious where to put a piece of information. Offering alternative ways would be a real win. Of course SMW would have to recognize such variants and detect potential contradictions and help users to avoid redundancies. But maybe users would even like to keep some redundancies as long as a tool helps them to deal with them efficiently. A query output could then point to several places where a piece of information comes from.

Note: The current implementation of SMWpc does not offer the above features. One reason for this is that SMW does not support context switches at the moment.


Database driven editing suggestions

When using screen forms as a means of editing we may be able to show sets of plausible values for some properties. There would have to be an empirical knowledge base behind such a mechanism, probably based on histograms and correlations of property values. Currently Semantic Forms can use page lists based on categories for auto completion. A more intelligent procedure might be able to sort values by popularity (hit list) or offer values for property 2 based on the currently selected value for property 1. For example once you have said that a person´s profession is 'conductor' the 'plays instrument' property would provide a list of instruments which are typical for conductors (like piano).

In a similar way the editing system may be able to detect combinations of property values which look 'exotic' (i.e. which have been never or rarely used before).

If a wiki uses properties but does not use SMWpc classes so far such an analysis might even be able to show "clusters of properties" (hence the name of SMWpc) which typically occur together and which therefore should be considered to be candidates for class definitions.

Flexibility of notation

The current approach of SMW uses the :: syntax to assign a property to the 'current object'. This frequently leads to a repetition of the verb in a sentence:

 Susan studies [[studies::Philosophy]]. 

An alternate approach would treat the verb as a separate SMW token. Then you could write

 Susan [[studies::]] [[::Philosophy]]. 

This would allow to insert text between the verb and the object:

 Susan [[studies::]] several subjects, but mainly she focuses on [[::Philosophy]]. 

One could use an alternate syntax for this as well:

 Susan [[studies::→]] several subjects, but mainly she focuses on [[←::Philosophy]]. 

You could also add more objects easily:

 Susan [[studies::]] several subjects, but mainly she focuses on [[::Philosophy]] and [[::Sociology]]. 

In theory such an approach would also allow 'left-associative' properties, i.e. the author first builds a stack of values and then tells us to which property they belong:

 [[→::Snooker]] and [[→::badminton]] are Peter´s greatest [[hobbies::←]].

A rudimentary API for editing SMWpc instances

As mentioned before SMWpc supports three ways of editing an article:

  1. classical inline editing
  2. use of an object data template (ODT)
  3. use of an object data template together with a screen form.

All three variants offer a 'semantic what links here' feature, i.e. they allow you to identify incoming references to the current object based on the SMWpc class model.

Inline editing

First a page calls '.obj is a' and thereby states that the current article describes an instance of a SMWpc class. The class name is given as an argument to the template call. Afterwards the 'value setting templates' can be called. These templates mainly call '.obj set' to set the property. They may perform additional tasks like handling of polymorphism.

For details see the example:

1  {{.obj is a|Peter|Student}}.
2  Peter was born in {{born in|1989|Berlin}}. He studies {{studies|Music|Philosophy}}.
3  He {{-|plays}} {{+|bassoon}}, {{+|piano}} and {{+|chess}}; some years ago he also used to play the {{+|trombone}}.
4  Peter was born {{born in|1990|Paris}}.
5  Peter and {{has team mate|Lucie}} play together in a football team.
6  {{Coordinates|33°10'N; 1°00'E}}
7  And here is the full story about Peter..

Note the following:

  1. The first line declares 'Peter' to be an instance of class 'Student'. The user will see a green box (green is the color for 'Students') and an icon for Students. The text says 'Peter is a Student' and contains a link to 'Class Student' and a link to a page which shows all relations pointing to 'Peter'.
  2. The 'born in' template takes year and place of birth; it assigns two properties and two derived properties ('age' and 'is adult')
  3. 'Studies' demonstrates a group assignment. The parameters are output with an 'and' inserted.
  4. The third line separates verb and objects. It sets an 'active property' (called 'plays'); afterwards four values are assigned to that property
  5. 'plays' is polymorph; it assigns 'plays instrument' or 'plays game' depending on the class of the link target. If the target of 'plays' is a musical instrument the user will see a small pictogramm which is associated to the property 'plays instrument'.
  6. Line 4 obviously contradicts line 2. This is detected as the property 'born in' is marked as unique in the SMWpc class model.
  7. Line 5 establishes a link to another student. When viewing that link from Lucie´s page the reverse name 'is team mate of' will be shown.
  8. Line 6 tries to assign a property which is not valid for a 'Student'. The author will se an error message.
  9. Line 7 is plain text.

ODT based editing

This method of editing uses one single template (the object data template) to assign all values and describe additional details of an object. When creating a new article the user will get an 'intro page' which explains all properties of the class. It could also describe additional parameters which do not have corresponding properties. A default version of that intro page can be generated automatically (see the links near the bottom of the resp. Class page). It contains a description of all properties of a class, including inherited properties.

The empty page for the new article is preloaded with a text which invokes the ODT. This text shows all available parameters. It may also set default values. A standard version of the preload text could be generated from the application model (but currently is not).

The ODT itself uses standard templates to create a uniform appearance of the output. A generic version of the ODT could be generated from the application model (currently we only generate a frame with dummy property names).

For details see the example document and the sample ODT (use edit mode!). Try to create a new Student.

Lisa:

{{Student|
        ID = Lisa|
   born in = 1990|
   born at = Munich|
   studies = Philosophy|
}}

Template:Student

{|class=formtable width=100% border=0 cellspacing=0 cellpadding=0
  {{.form header |obj={{{ID|{{PAGENAME}}}}}|class=Student|width=120}}
  {{.form field  |born at|value={{{born at|}}}}}
  {{.form field  |born in|value={{{born in|}}}}}
  {{.form field  |studies|value={{{studies|}}}}}
  {{.form field  |studies in|value={{{studies in|}}}}}
  {{.form field  |has team mate|value={{{has team mate|}}}}}
|}
<noinclude>[[:Category:Student]]</noinclude>

Forms based editing

As described above you can also use Semantic Forms for editing. Generate a form for your class (using SF) and set the '.obj is a' property in the corresponding data template. Add the 'uses default form' property to the category.

For details see the example document (try the 'edit with form' button) and the sample form (use edit mode!) and the sample data template.

Reporting and Exports

There are two simple but useful features for reporting:

  1. SMWpc can generate a link to the MediaWiki export facility which contains all objects of a class.
  2. SMWpc can produce a single document which contains all instances of a certain class.

It would be useful to have some statistics about the use of properties. As SMW currently lacks an interface for this it is not part of SMWpc.

Various suggestions to improve SMW

During the design of SMWpc a number of ideas came up for small useful improvements of SMW. Although the current proof of concept for SMWpc lives well without these features some of them would allow a more elegant and more efficient implementation of SMWpc. And, apart from that, they might be useful in other contexts, too.

Value list for a property

We would like to create a set (unique sorted list) of used values for a property. We would also like to (optionally) see the number of occurencies (frequencies). This would for example allow to construct a combobox in a user dialogue with holds the ten most frequent values or it could be used to assist in auto completion editing for non-page-type properties.

Allow navigation on meta model with #ask

SMWpc should be built in a way which allows comlete access to internal properties via the standard query mechanisms.

Use naming convention for meta properties

As said before some convention should be used to separate namespaces of SMW properties and user properties.

Syntax support for setting multiple properties

The current syntax allows to assign the same value to several properties in one statement. This is a rather exotic situation. It would be much more important to assign several values to one property in one statement (i.e. assignment of a value set).

Support for separation of verb and object

SMW should store the name of the property which was most recently set. A further assignment of a value could then refer to the 'last'(latest) property. SMWpc uses '+' and '-' for that purpose. A better way might look like this: Here we set some [[property::]] to [[::value 1]] and [[::value 2]]

Support for context switches

Something like [[Another Object:::]] could be used to make 'Another Object' the current object. All property assignments occuring after that statement would describe that object. Something like [[PAGENAME:::]] could be used to return to the object which is defined by the current article.

Plain text output in #ask

Sometimes it is very useful to get a property value as a simple text string even if it is of type 'page'. There should be an #ask option for this.

Access MediaWiki meta properties in #ask

Meta data about author, article view count, last edit date etc. should be accessible via #ask.

Repeat output line for value sets in #ask

Currently #ask prints a row for each object. Multiple values of a property cling together within one field. There should be an option which produces a separate line for each occurence of a value set. This is mainly useful in combination with user templates that are called by the #ask processor.

SMW meta model

'Has type' should be made unique. Assigning multiple types to one property looks a bit strange. If a concept of complex types or variable types is needed it should be implemented via a combination of types and given a separate name which then can be used as 'the' type of a property.

Output format of #ask main title column

Currently the 'maintitle' column is always implicitly output as the first column. It should be possible to change this. Especially if one sorts by a certain column one might want to see that column as the first one. The main title column should appear at the position where it is within the #ask statement. And there should be an easy way to leave it out completely.

Alias names

SMW should support alias names for properties (male/female forms, singular/plural, past tense/ present tense). Maybe it could even catch an idea of the difference between these variants and deduce some additional (internal) property values from them. While such a concept could be implemented with templates it would preferably become an integral part of SMW.

Polymorph Properties

SMW should handle properties which point to pages of different classes. I.e. a property like 'is' could have the following meanings:

_Peter_ _is_ _conductor_.  (profession)
_Peter_ _is_ _German_. (nationality)
_Peter_ _is_ _sick_. (health state)
_Peter_ _is_ _moslem_. (religion)

SMW could detect the class of the target object and assign a more specific property as a consequence. If the class of the target object is unknown, SMW should print a note and maybe even offer the list of possible classes.

_Peter_ _is_ _jewish_.
--> SMW: Could not resolve 'jewish'. Is it a profession, a nationality, a health state or a religion?

Handle derived properties

Properties which have a value that can be derived from other property values should be handled inside SMW. SMW should be able to call a certain (user exit)template whenever a certain property is assigned.


Tighter synchronisation between wiki text and SMW database

When you use templates to set SMW properties it occurs quite regularly that you change a template. MediaWiki will detect this and the next time you open one of the affected pages (i.e. articles which use that template) the page will reflect the changes. MW uses a background process for this. The SMW database, however, will only be updated if you manually edit all the affected articles and save them again. This is really annoying and causes a lot of trouble when developing a template framework which uses SMW.

Outlook

'SMWpc' shows that powerful concepts can be built on top of the SMW platform. We hope that the idea of SMWpc will be adopted by the Semantic MediaWiki community. Integration of SMWpc concepts into SMW would certainly help as it could lead to a more robust solution with better performance. Adding SMWpc concepts to SMW would enlarge the scope of SMW significantly. As it would be a pure add-on no current functionality would be lost. Changing some details (e.g. introducing naming conventions for meta properties) would make it necessary to offer migration scripts and (maybe) a downward compatibility switch.

SMWpc is a first step in the direction of true object oriented semantic modeling. There are lots of features which can be improved and added in future.

Technical Annex

A detailed description of the API and a list of all templates, forms, categories, types, classes, properties and sample articles can be found in a separate document.