Semantic MediaWiki with Property Clusters 1

From FollowTheScore
Revision as of 22:00, 26 March 2008 by Gero (talk | contribs) (Consequences for SMW)
Jump to: navigation, search

Idea and Concept

The majority of all concepts in software development deal with classification and abstraction, starting from ancient Algol up to entity-relationship modelling and all flavours of object orientation. All these proven and well established concepts (including inheritance hierarchies and polymorphism) should also be available in a truely semantic wiki. There is no need to reinvent the wheel on this conceptual level -- RDF and OWL (with different levels of detail) point into the same direction.

The question is how these powerful concepts can be integrated into a 'simple' wiki without losing the primary value of wikis: Ease of use and a low threshold for novice users. After all, a wiki contributor is an 'author' and not a clerk who enters values into an IT system with several hundred screen forms. The freedom to start with plain text and to add more structure only when you feel it is needed should be preserved. Templates can play a good role when trying to keep the balance between reliable structure and authors´ freedom.

Normal Wiki authors have no education in IT. Wiki administrators may have such an education but their focus often lies on technical issues like installation, user administration and database maintenance. What is needed for a 'semantic wiki' is an information engineer who understands the knowledge domain, has a good feeling for 'evolution of structures' and who has inhaled the above mentioned concepts of abstraction and classification. Let´s call him the wiki knowledge engineer.

The knowledge engineer typically would start with a small, weakly structured collection of articles which have some commonalities. His approach would be "Let us use as much structural support as possible - but let us decide when and how much of it we are going to use".

Once a wiki grows and there is a lot of information the knowledge engineer will create a 'model'. A semantic wiki should allow him to document such a model in a formal way and use it to support queries and (maybe) entering more data. The knowledge engineer must constantly monitor the ratio between the size of the 'information model' and the total amount of information in a wiki. Encyclopedic wikis will have a lower ratio than specialized wikis with closer scope and more elaborated relationships between the articles.

SMWpc (Semantic wiki with personal classes) is a proof of concept for enhanced wiki knowledge engineering. It demonstrates how classification concepts can be implemented on top of the current MediaWiki infrastructure, using normal wiki templates and some of the more popular extensions (Semantic MediaWiki, Dynamic Page List, Wiki Graph, Variables, simple forms).

SMWpc introduces the concept of nestable 'Classes'. Classes correspond to traditional MediaWiki categories. Properties are tied to Classes. MediaWiki articles are seen as 'instances of a class' (objects). Of course they are still 'wikitext with chapters and hyperlinks and category links'. They still are 'wikitext with associated property values'. But the focus in SMWpc is on the semantic model. The model defines which properties make sense for instances of certain classes. It also makes a difference between properties which hold a value ('attributes') and properties which point to another object. In the latter case the model can state that the link target must be an object which belongs to a certain class.

The graph below describes the general idea; there is also a more elaborated graph of the meta model.

SMWpc MetaModel.png

Focus

The main focus of SMWpc is on small and medium-size wikis (<10.000 pages) which have a dedicated focus. Their user communities agree on a common scheme for classification of articles and they want better support for collecting highly structured information (using properties). An example could be a wiki in the area of molecular genetics but it could also be a wiki about pets where you have classes like species, food, disease etc. It is quite clear that a property named 'symptom' belongs to class 'disease' and not to 'food' or 'species'. With SMWpc there is a way to express this. While it may make a lot of sense to have multiple values for the 'symptoms' of a 'disease', there should only be a single value for the property 'maximum age' of class 'species'. The property 'likes' must contain a reference to an instance of 'food' and not to a 'disease'. With SMWpc you can express this and much more.

SMWpc Focus.png


Model and Meta Model

As said before, classes are named clusters of properties. Articles are seen as 'instances of' classes. The properties of a class -- and thus of its instances -- may be mandatory or optional. It may be possible to assign a set of values for a given property to an object ('multi-value properties') or it may be required that at most one value ('single-value properties') is assigned. There may be restrictions for the value set of a property. SMW offers the concept of 'Types' for this. On a semantic level it may be not enough to declare a property of type 'Page'. In SMW this means that the value of that property is the name of another wiki page. We want to be able to restrict the values to pages which are instances of specific classes.

All this can of course be expressed in a narrative text. But it is much better to use SMW´s capabilities for that purpose. So SMW is used to describe enhancements of SMW. This reflective way of usage is typical for 'meta models'. SMW already contains some reserved words for built-in meta properties. (e.g. 'has type'). The power (and complexity) of a semantic description framework lies in the structure of its meta model. Based on the meta model the "real models" (application domain models) can be built.

Before we are going into details we have to consider possible name clashes between the meta level and the application level.

Naming Rules and Design Principles

leading '.' for meta level

When describing classes or properties we use normal SMW properties. This could cause conflicts with the 'real' properties of the application model. We therefore use the convention to start all property names which are related to the meta model with a leading dot ('.').

Use of templates

It can be very useful to use templates for the assignment of SMW properties. This leads to a 'decoupling layer' which normally is very thin but can be used to add additional concepts (You will see examples for this later).

It was therefore decided to make all property assignments through templates. The templates use the same naming convention. Sometimes there is a 1:1 correspondence between template and property, sometimes a template is used to assign more than one property. In some (rare) cases we have two different templates which can be used to assign the same property. Of course there must be good reasons for this because this may cause intransparency.

Examples

  • The fact that one class can be a refinement of another class is expressed by a property named '.class extends'
  • There is a template called 'Template:.class definition' which takes (among others) a parameter called extends. This template assigns the name of this parameter (i.e. the ancestor´s class name) as a value to the property .class extends of the Class description article.
  • In the same way this template assigns values to properties like '.class icon' or '.class color'
  • There is a property called .prop describes which ties a property to a class. Note that the same property may be useful for several classes.
  • This property is assigned by 'Template:.prop describes'. This template contains just one line of code.
  • Consequently, the fact that a certain template is used to assign a value to a certain property is also expressed by a property: '.prop assigned by'.

Common prefix for clustering

As you see from the examples, the naming rules do not only require a leading dot but also set up conventions for clustering meta properties (and meta templates) by a common prefix like '.class ***' or '.prop ***.

Currently we use the following prefixes:

.obj
for templates that apply to objects
.class
for Class-related templates and properties
.cat
for templates which describe categories
.instances
for templates dealing with all instances of a class
.prop
for Property-related templates and properties
.type
for Type-related templates and properties
.model
for templates which are used to generate the model graph
.form
for templates which are used to support the creation of whole objects

Access to the meta model

We chose to implement the SMWpc meta model by using standard mechanisms of SMW. This makes it possible to write queries which operate on the meta model in very much the same way that is used for regular queries on application level. The capability of "reflection" is of great value; it is a constitutional part of many languages and environments (smalltalk, java, relational databases). Using this reflection API we are able to create generic object browsers as you will see later.

Consequences for SMW

We propose that the same conventions are also applied to future versions of SMW. Property names on the meta level like 'has type', 'imported from' or 'display units' should no longer be used. These words or phrases are far too common; they may be useful on application level and they are not recognizable as parts of the meta model.

We also suggest that future versions of SMW allow full access to the meta model. Queries on the meta level should work in the same way as normal queries. The analogy to a database system is obvious: you can use SQL for queries on user tables and on the meta tables (catalogue tables) as well.