Semantic MediaWiki with Property Clusters 3

From FollowTheScore
Jump to: navigation, search

Paradigms for writing arcticles

Standard MediaWiki uses 'free style' text articles plus untyped links and flexible categories to represent knowlede. SMW adds the concept of typed Properties. SMWpc adds (on top of both) the concept of 'Classes' including typed relations and inheritance. This reduces the freedom of authors but it may also have a benefit as it can be used to assist them in creating 'valid' articles, i.e. articles ('objects') which use the right set of Properties with regard to their Class.

Inline annotations

Normally we write a wiki text and wherever we feel it appropriate we add annotations. We use some 'inline syntax' which adds a certain overhead to the normal flow of text. But still we have full control where we add annotations and how many of them we regard as useful. We call this way of information representation inline annotation.

If an author wants to add a set of standardized descriptive attributes to his text he will create a template and use the attribute values as parameters. The template will insert theses values into his text, typically as a nice little table. And (optionally) it may also set Properties which correspond to the template parameters.

Object Data Templates

The concept of Classes makes it possible to go one step further in that direction -- using object-specific templates which cover the whole object. We call these templates 'object data templates (ODT)'. The idea is that all information for an object is technically given in the form of a call to the ODT. So your article will just call the ODT and you put all contents into the parameter list. This may sound very strange at first glance but practice shows that there are situations where this way of gathering information has advantages over the traditional wiki way.

This is especially true for 'small' objects with a high number of formally defined Properties. In such cases the user can be guided by the ODT, i.e. the empty template shows him what kind of information is expected. SMWpc offers ODTs as an additional, alternate way to collect information. One should have in mind that we are not talking about GUI forms. Instead we are talking of a pre-defined document structure which is in alignment with the Class definition. The idea of ODTs is much closer to XML-trees (DOM) than to screen forms.

Using an ODT does not mean that every parameter value must become a property value of the object (although you could design an ODT like that). While some of your parameters may be quite simple (a word, a number, a sentence, a link) others may consist of several text paragraphs including headlines on various levels and images. So some parameters may have a corresponding Class Property and others are simply a means of structuring your text.

The concept of ODTs only makes sense if there is an appropriate structure which will be accepted by the authors of a wiki because it is considered to be helpful for a certain knowledge domain. A typical wiki may have 70% articles in traditional form and 30% of the articles using ODTs.

The ODT will have to care for a nice and pretty optical presentation of your information. So, very similar to XML, we gain back some separation between content and layout. But this does not mean, on the other hand, that it is forbidden to to add some layout hints to the ODT parameter values. It is a question of style, design and habit how much layout shall be allowed. The best way to think of it is, that ODTs hide the 'master layout' from the editor´s eyes while he is still able to apply incremental, marginal layout to his work.

A simple example for an ODT can be found in the demo articles which come together with SMWpc (open the text in edit mode!).

Comparison between Inline Annotations and Object Data Templates (ODTs)

First it is important to understand that SMW and SMWpc are an optional offer to the users of a MediaWiki. You can have 'normal' articles, articles with native SMW annotations and SMWpc class instances in one wiki peacefully side by side. You can assign Properties directly with the '::' syntax or you can use small templates for that purpose (as is suggested by SMWpc). If you want, you can put your whole article text into one template call using the idea of 'object data templates'.

So there is more choice - more possibilities to apply elegant ways of knowledge representation -- and more possibilities to create a real mess...

It would be nice to have some criteria when to use which representation. Especially because this makes a (minor) difference to the authors and, of course, because it makes a major difference for the designer of the wiki.

The following table gives a summary:

Aspect Standard Wiki SMWpc with Object Data Templates
Paradigm 1 a collection of stories a collection of fact sheets
Paradigm 2 things are somehow connected to oneanother by 'free association' objects have distinct typed references between each other
When to use Broad range of topics, weakly structured text, no common scheme applicable High degree of structural similarity between certain instances of your knowledge domain. Commonly agreed 'reasonable' scheme on how to present information
output / appearance heterogeneous, totally left to the user (apart from the sporadic use of templates which produce some standardized pieces of text homogeneous, standardized scheme how information is presented; there may be areas where "stories" are embedded, but they have their fixed place in the overall schema design.
Navigation The author puts hyperlinks where he feels it makes sense. The kind of relationship which goes along with a link can only be derived from careful reading the text portion around the link. The system expects references at some pre-defined positions and assigns a semantic meaning to them. The reader will find such references always in the same place and can traverse them backwards specifically. Even reports are possible.
Burden for the average article writer
  • low because
    • .. he can start with an empty page
  • high because
    • .. he must master the whole topic mentally and he must find a logical way to present the contents
    • .. he must think about navigation within his text and recognize which links to other articles are desirable
  • low because
    • .. he is confronted with a pre-designed structure which (hopefully) covers all relevant aspects
  • high because
    • .. he must understand that structure and accept it even if he had deliberately taken a much simpler approach to note his 'statements'.
Burden for the wiki designer

EX POST approach:

  • low effort because
    • .. he can wait for things to happen
  • high effort because
    • .. he must invent proper categories and assign them to existing articles
    • .. he must invent and apply templates after having recognized similarities in certain articles
    • .. systematic changes must be done manually

EX ANTE approach:

  • low because
    • .. once the schema is there the quality of contents and navigation will normally be satisfying
    • .. systematic changes can be applied by scripts or template changes
  • high because
    • .. he must understand the knowledge domain before the majority of articles are written
    • .. he must care for appealing optical presentation, suitable navigation and reports, based on a sufficiently stable meta model
Import / Export The contents can technically be exported as XML but the contents is opaque, i.e. it is nothing more than a sequence of characters in the XML scheme. The text can be exported as semantically structured XML or as a csv with named columns.

Screen Forms for data entry

People knowing Semantic Forms may note a conceptual parallelism. The Semantic Forms extension (SF) has a concept which is very close to our object data template. SF allows to define a screen form where the fields correspond to Properties of the article. This screen form works together with a 'data template' which sets the property values and cares for a pretty representation of the article. The data template also is able to 'show incoming relations' based on one or more properties which the designer can freely choose.

With a few minor changes we can use a data template generated by SF also as a object data template in SMWpc.

  • The main point is that we must set the meta property '.obj is a' within the template.
  • Then we must set the SF meta property 'has default form' in the category corresponding to our class so that it points to the ODT.
  • And third we should add a link to SF 'Special:AddPage' so that the user can use the form driven data entry dialogue.

See our little Flute example:

  • The 'group' Property is displayed in the fact box at the right side
  • The flute players are shown there as well (based on the Property 'plays instrument')
  • The 'edit with form' tab is available when browsing Flute.
  • Template:Musical Instrument was generated by SF and only slightly modified (adding .obj is a)
  • Category:Musical Instrument contains the 'has default form' reference to 'Form:Musical Instrument'.
  • Class Musical Instrument offers a link to create a new instrument using Semantic Forms.
  • Note that during form based editing of a musical instrument article the 'invented in' Property is based on an auto completion list of 'Locations'.

Maybe a future release of the SF form generator could read the SMWpc meta model and generate a proposal for the form and the corresponding data template. Even attributes like 'mandatory' or 'multiple' could be taken from the SMWpc model. For the user (= designer of the semantic model) such an integration could make a lot of sense.

Whether one likes screen forms or not is at least partially a matter of taste. While SMWpc can be used to cooperate with SF it also offers support for the more traditional way of editing wiki text. There are several templates which make it easy to create an ODT for a given SMWpc class. Editing is supported by an 'edit intro page' which explains the parameters of the template. Try to create a new Student in our example to see how it works.

Conceptual issues of Editing

Where should a piece of information be given?

Normally all statements in an article affect the 'current object'. But sometimes you have the desire to tell something about a closely related different object. Although this is not very 'canonical' it is how many people think and write. So we want a syntax for changing the 'current object', maybe with a stack engine where you say: We have now been talking a lot about XX. Hold on for a moment, I must just tell you something about Y [push Y].... Now let´s return to our main subject [pop]. (Context stack).

Let us assume that we have a model where a 'Person' can 'teach' other 'Person's. A canonical way to enter property values for this relation would be to write statements like

Peter _teaches_ _Lucie_in chess.

This statement would have to appear in a document named 'Peter'.

If we defined 'taught by' as the reverse relationship we could also write

_Lucie_ was _taught by_ him for two years.

This statement (still within the article named 'Peter') would create the same kind of relationship.

It would be helpful if we could state the same fact also in a document about Lucie.

She was _taught by_ _Peter_ playing the violin when she was 19.

We might want to add that _Peter_ was a renowned Hungarian composer. As long as there is no page on 'Peter' this might be better than nothing. Once somebody is going to create a page about _Peter_ he should see all existing fact statements about _Peter_ (which may be scattered over existing articles).

Sometimes it is not obvious where to put a piece of information. Offering alternative ways would be a real win. Of course SMW would have to recognize such variants and detect potential contradictions and help users to avoid redundancies. But maybe users would even like to keep some redundancies as long as a tool helps them to deal with them efficiently. A query output could then point to several places where a piece of information comes from.

Note: The current implementation of SMWpc does not offer the above features. One reason for this is that SMW does not support context switches at the moment.


Database driven editing suggestions

When using screen forms as a means of editing we may be able to show sets of plausible values for some properties. There would have to be an empirical knowledge base behind such a mechanism, probably based on histograms and correlations of property values. Currently Semantic Forms can use page lists based on categories for auto completion. A more intelligent procedure might be able to sort values by popularity (hit list) or offer values for property 2 based on the currently selected value for property 1. For example once you have said that a person´s profession is 'conductor' the 'plays instrument' property would provide a list of instruments which are typical for conductors (like piano).

In a similar way the editing system may be able to detect combinations of property values which look 'exotic' (i.e. which have been never or rarely used before).

If a wiki uses properties but does not use SMWpc classes so far such an analysis might even be able to show "clusters of properties" (hence the name of SMWpc) which typically occur together and which therefore should be considered to be candidates for class definitions.

Flexibility of notation

The current approach of SMW uses the :: syntax to assign a property to the 'current object'. This frequently leads to a repetition of the verb in a sentence:

 Susan studies [[studies::Philosophy]]. 

An alternate approach would treat the verb as a separate SMW token. Then you could write

 Susan [[studies::]] [[::Philosophy]]. 

This would allow to insert text between the verb and the object:

 Susan [[studies::]] several subjects, but mainly she focuses on [[::Philosophy]]. 

One could use an alternate syntax for this as well:

 Susan [[studies::→]] several subjects, but mainly she focuses on [[←::Philosophy]]. 

You could also add more objects easily:

 Susan [[studies::]] several subjects, but mainly she focuses on [[::Philosophy]] and [[::Sociology]]. 

In theory such an approach would also allow 'left-associative' properties, i.e. the author first builds a stack of values and then tells us to which property they belong:

 [[→::Snooker]] and [[→::badminton]] are Peter´s greatest [[hobbies::←]].

A rudimentary API for editing SMWpc instances

As mentioned before SMWpc supports three ways of editing an article:

  1. classical inline editing
  2. use of an object data template (ODT)
  3. use of an object data template together with a screen form.

All three variants offer a 'semantic what links here' feature, i.e. they allow you to identify incoming references to the current object based on the SMWpc class model.

Inline editing

First a page calls '.obj is a' and thereby states that the current article describes an instance of a SMWpc class. The class name is given as an argument to the template call. Afterwards the 'value setting templates' can be called. These templates mainly call '.obj set' to set the property. They may perform additional tasks like handling of polymorphism.

For details see the example:

1  {{.obj is a|Peter|Student}}.
2  Peter was born in {{born in|1989|Berlin}}. He studies {{studies|Music|Philosophy}}.
3  He {{-|plays}} {{+|bassoon}}, {{+|piano}} and {{+|chess}}; some years ago he also used to play the {{+|trombone}}.
4  Peter was born {{born in|1990|Paris}}.
5  Peter and {{has team mate|Lucie}} play together in a football team.
6  {{Coordinates|33°10'N; 1°00'E}}
7  And here is the full story about Peter..

Note the following:

  1. The first line declares 'Peter' to be an instance of class 'Student'. The user will see a green box (green is the color for 'Students') and an icon for Students. The text says 'Peter is a Student' and contains a link to 'Class Student' and a link to a page which shows all relations pointing to 'Peter'.
  2. The 'born in' template takes year and place of birth; it assigns two properties and two derived properties ('age' and 'is adult')
  3. 'Studies' demonstrates a group assignment. The parameters are output with an 'and' inserted.
  4. The third line separates verb and objects. It sets an 'active property' (called 'plays'); afterwards four values are assigned to that property
  5. 'plays' is polymorph; it assigns 'plays instrument' or 'plays game' depending on the class of the link target. If the target of 'plays' is a musical instrument the user will see a small pictogramm which is associated to the property 'plays instrument'.
  6. Line 4 obviously contradicts line 2. This is detected as the property 'born in' is marked as unique in the SMWpc class model.
  7. Line 5 establishes a link to another student. When viewing that link from Lucie´s page the reverse name 'is team mate of' will be shown.
  8. Line 6 tries to assign a property which is not valid for a 'Student'. The author will se an error message.
  9. Line 7 is plain text.

ODT based editing

This method of editing uses one single template (the object data template) to assign all values and describe additional details of an object. When creating a new article the user will get an 'intro page' which explains all properties of the class. It could also describe additional parameters which do not have corresponding properties. A default version of that intro page can be generated automatically (see the links near the bottom of the resp. Class page). It contains a description of all properties of a class, including inherited properties.

The empty page for the new article is preloaded with a text which invokes the ODT. This text shows all available parameters. It may also set default values. A standard version of the preload text could be generated from the application model (but currently is not).

The ODT itself uses standard templates to create a uniform appearance of the output. A generic version of the ODT could be generated from the application model (currently we only generate a frame with dummy property names).

For details see the example document and the sample ODT (use edit mode!). Try to create a new Student.

Lisa:

{{Student|
        ID = Lisa|
   born in = 1990|
   born at = Munich|
   studies = Philosophy|
}}

Template:Student

{|class=formtable width=100% border=0 cellspacing=0 cellpadding=0
  {{.form header |obj={{{ID|{{PAGENAME}}}}}|class=Student|width=120}}
  {{.form field  |born at|value={{{born at|}}}}}
  {{.form field  |born in|value={{{born in|}}}}}
  {{.form field  |studies|value={{{studies|}}}}}
  {{.form field  |studies in|value={{{studies in|}}}}}
  {{.form field  |has team mate|value={{{has team mate|}}}}}
|}
<noinclude>[[:Category:Student]]</noinclude>

Forms based editing

As described above you can also use Semantic Forms for editing. Generate a form for your class (using SF) and set the '.obj is a' property in the corresponding data template. Add the 'uses default form' property to the category.

For details see the example document (try the 'edit with form' button) and the sample form (use edit mode!) and the sample data template.

Reporting and Exports

There are two simple but useful features for reporting:

  1. SMWpc can generate a link to the MediaWiki export facility which contains all objects of a class.
  2. SMWpc can produce a single document which contains all instances of a certain class.

It would be useful to have some statistics about the use of properties. As SMW currently lacks an interface for this it is not part of SMWpc.

Various suggestions to improve SMW

During the design of SMWpc a number of ideas came up for small useful improvements of SMW. Although the current proof of concept for SMWpc lives well without these features some of them would allow a more elegant and more efficient implementation of SMWpc. And, apart from that, they might be useful in other contexts, too.

Value list for a property

We would like to create a set (unique sorted list) of used values for a property. We would also like to (optionally) see the number of occurencies (frequencies). This would for example allow to construct a combobox in a user dialogue with holds the ten most frequent values or it could be used to assist in auto completion editing for non-page-type properties.

Allow navigation on meta model with #ask

SMWpc should be built in a way which allows comlete access to internal properties via the standard query mechanisms.

Use naming convention for meta properties

As said before some convention should be used to separate namespaces of SMW properties and user properties.

Syntax support for setting multiple properties

The current syntax allows to assign the same value to several properties in one statement. This is a rather exotic situation. It would be much more important to assign several values to one property in one statement (i.e. assignment of a value set).

Support for separation of verb and object

SMW should store the name of the property which was most recently set. A further assignment of a value could then refer to the 'last'(latest) property. SMWpc uses '+' and '-' for that purpose. A better way might look like this: Here we set some [[property::]] to [[::value 1]] and [[::value 2]]

Support for context switches

Something like [[Another Object:::]] could be used to make 'Another Object' the current object. All property assignments occuring after that statement would describe that object. Something like [[PAGENAME:::]] could be used to return to the object which is defined by the current article.

Plain text output in #ask

Sometimes it is very useful to get a property value as a simple text string even if it is of type 'page'. There should be an #ask option for this.

Access MediaWiki meta properties in #ask

Meta data about author, article view count, last edit date etc. should be accessible via #ask.

Repeat output line for value sets in #ask

Currently #ask prints a row for each object. Multiple values of a property cling together within one field. There should be an option which produces a separate line for each occurence of a value set. This is mainly useful in combination with user templates that are called by the #ask processor.

SMW meta model

'Has type' should be made unique. Assigning multiple types to one property looks a bit strange. If a concept of complex types or variable types is needed it should be implemented via a combination of types and given a separate name which then can be used as 'the' type of a property.

Output format of #ask main title column

Currently the 'maintitle' column is always implicitly output as the first column. It should be possible to change this. Especially if one sorts by a certain column one might want to see that column as the first one. The main title column should appear at the position where it is within the #ask statement. And there should be an easy way to leave it out completely.

Alias names

SMW should support alias names for properties (male/female forms, singular/plural, past tense/ present tense). Maybe it could even catch an idea of the difference between these variants and deduce some additional (internal) property values from them. While such a concept could be implemented with templates it would preferably become an integral part of SMW.

Polymorph Properties

SMW should handle properties which point to pages of different classes. I.e. a property like 'is' could have the following meanings:

_Peter_ _is_ _conductor_.  (profession)
_Peter_ _is_ _German_. (nationality)
_Peter_ _is_ _sick_. (health state)
_Peter_ _is_ _moslem_. (religion)

SMW could detect the class of the target object and assign a more specific property as a consequence. If the class of the target object is unknown, SMW should print a note and maybe even offer the list of possible classes.

_Peter_ _is_ _jewish_.
--> SMW: Could not resolve 'jewish'. Is it a profession, a nationality, a health state or a religion?

Handle derived properties

Properties which have a value that can be derived from other property values should be handled inside SMW. SMW should be able to call a certain (user exit)template whenever a certain property is assigned.


Tighter synchronisation between wiki text and SMW database

When you use templates to set SMW properties it occurs quite regularly that you change a template. MediaWiki will detect this and the next time you open one of the affected pages (i.e. articles which use that template) the page will reflect the changes. MW uses a background process for this. The SMW database, however, will only be updated if you manually edit all the affected articles and save them again. This is really annoying and causes a lot of trouble when developing a template framework which uses SMW.