DocTypes

From FollowTheScore
Jump to: navigation, search

DocTypes: A light-weight approach to semantics

MediaWiki uses text articles to represent knowlede. It offers ways to assign articles to categories and supports links between articles. Templates can be used to add a set of parameters to an article.

{{#wgraph: name=DocTypes_1| svg|thumb=50|

node p1 {label "Page 1" type page }
node p2 {label "Page 2" type page }
node p3 {label "Page 3" type page }
node cA {label "Cat A" type cat}
node cB {label "Cat B" type cat}
node cC {label "Cat C" type cat}
node Tx {label "Template X" type tpl }
edge p1 cA { type is_a }
edge p2 cB { type is_a }
edge p3 cB { type is_a }
edge p3 cC { type is_a }
edge p3 Tx { type uses }
edge p1 p2 { type link }
nodetype page { color lightgray bordercolor darkgray }
nodetype cat  { shape ellipse color lightred  bordercolor red      }
nodetype tpl  { color lightgreen bordercolor green      }
edgetype is_a { color darkred }
edgetype uses { color darkgreen }
edgetype link { color black kind near}

}}

Sometimes this is not sufficient. To represent knowledge in a more structured way a typing concept is needed. Instead of Articles you want to have semantically meaningful entities like Person, Trip or Location. Instead of Links you want to have semantically meaningful relationships like takes part in or starts at. A conceptual scheme for our example might look like this:

{{#wgraph: name=DocTypes_2| svg|thumb=60|

node Trip           { type Type shape hexagon         }
node Person         { type Type shape lparallelogram  }
node Location       { type Type shape trapezoid       }
nodetype Type       { color lightyellow bordercolor darkyellow }
edge Person Trip    { type reference label takes_part_in }
edge Trip Location  { type reference label starts_at }
edgetype reference  { color black }
orientation left_to_right

}} While the above diagram acts on type level ('class level', 'relation type level'), the real pieces of knowledge are instances (objects) of the above types and they are related by instances (relations) of the above relation types.

So on the level of individual instances ('objects', 'relations') we would see the following:

{{#wgraph: name=DocTypes_3| svg|thumb=60|

node Trip_4711      { type Trip;Object     }
node Henry          { type Person;Object   }
node Paris          { type Location;Object }
node Trip_8552      { type Trip;Object     }
node Maria          { type Person;Object   }
node Rome           { type Location;Object }
nodetype Trip       { shape hexagon        }
nodetype Person     { shape lparallelogram }
nodetype Location   { shape trapezoid      }
nodetype Object     { color lightmagenta bordercolor darkmagenta }
edge Henry Trip_4711   { type takes_part_in;reference }
edge Maria Trip_8552   { type takes_part_in;reference }
edge Trip_4711 Paris   { type starts_at;reference }
edge Trip_8552 Rome    { type starts_at;reference }
edgetype takes_part_in { color blue label takes_part_in}
edgetype starts_at     { color blue label starts_at}
edgetype reference     { color black }
orientation left_to_right

}}

How does it map to MediaWiki?

DocTypes is a simple and conservative approach to represent semantically meaningful objects and relations within the world of MediaWiki.

  • Objects are defined by calling a MediaWiki template which is named after the Type. There is also a help page for the user which explains the semantics of the Type.
  • Apart from that there are some other Type-related templates which care for XML export and reporting.
  • Articles are seen as containers which store one or more objects (usually of the same class).
  • As soon as an Article contains an object of some class the article will become part of a category which has the same name as the template used to define the object.
  • Relations are basically links between pages, but they point directly to objects using the object ID as a link target.

That´s all.

{{#wgraph: name=DocTypes_4| svg|thumb=60|

node dTrip          { type Document label Document(s)\ncontaining\nobjects\nof_Type\nTrip }
node dPerson        { type Document label Document(s)\ncontaining\nobjects\nof_Type\nPerson }
node dLocation      { type Document label Document(s)\ncontaining\nobjects\nof_Type\nLocation }
node Trip_4711      { type Trip;Object     }
node Henry          { type Person;Object   }
node Paris          { type Location;Object }
node Trip_8552      { type Trip;Object     }
node Maria          { type Person;Object   }
node Rome           { type Location;Object }
node tTrip          { type Template label Template\nTrip }
node tPerson        { type Template label Template\nPerson }
node tLocation      { type Template label Template\nLocation }
node cTrip          { type Category label Category\nTrip }
node cPerson        { type Category label Category\nPerson }
node cLocation      { type Category label Category\nLocation }
nodetype Category   { shape ellipse color lightred  bordercolor red      }
nodetype Template   { color lightgreen bordercolor green      }
 
nodetype Trip       { shape hexagon        }
nodetype Person     { shape lparallelogram }
nodetype Location   { shape trapezoid      }
nodetype Document   { color lightgray bordercolor darkgray }
nodetype Object     { color lightmagenta bordercolor darkmagenta }
edge Henry Trip_4711   { type takes_part_in;reference }
edge Maria Trip_8552   { type takes_part_in;reference }
edge Trip_4711 Paris   { type starts_at;reference }
edge Trip_8552 Rome    { type starts_at;reference }
edge dPerson dTrip     { type takes_part_in;dummy}
edge dTrip   dLocation { type starts_at;dummy}
edge tPerson tTrip     { type takes_part_in;dummy}
edge tTrip   tLocation { type starts_at;dummy}
edge cPerson cTrip     { type takes_part_in;dummy}
edge cTrip   cLocation { type starts_at;dummy}
edgetype dummy         { textcolor white color white }
edgetype takes_part_in { color blue label takes_part_in}
edgetype starts_at     { color blue label starts_at}
edgetype reference     { color black }
orientation left_to_right

}}

How about OWL, RDF, Semantic Wiki etc. ?

DocTypes is somewhat less abstract and less generic than these concepts. It does not introduce ontologies and annotations and there is no general abstract query language for traversing relations. DocTypes is based on the idea of semantic triples but it does not put them in the foreground.

Instead, DocTypes is very much straight forward and rather easy to use for the average MediaWiki user as there is nothing new to learn for him. There is no additional syntax, no need to qualify relationships while writing documents. Instead the author fills his text into the parameter list of a template. So he is essentially being guided by a 'form' but still has the full power of expressing himself with rich text and embedded media.

Note that we are not talking about a traditional screen form. This would be too rigid and would put too much burden onto the DocTypes-Designer. Rather we talk of creating a template which essentially means to list the attributes which will make up an object.

In general, you should not expect the full power of semantic modelling (OWL/RDF etc.) from DocTypes, but you may be astonished how much can be done. The biggest benefit of DocTypes is probably its simplicity.


Comparison between the traditional way and DocTypes

Today a wiki author uses basically rich text when writing. If he wants to add a set of standardized descriptive attributes to his text he will create a template and use the attribute values as parameters. The template will insert theses values into his text, typically as a nice little table.

The core idea of DocTypes is to reverse that principle. Using DocTypes you put your whole piece of knowledge into a template call. While some of your parameters may be quite simple (a word, a number, a sentence, a link) others may consist of several text paragraphs including headlines on various levels and images.

Of course this only makes sense if there is an appropriate structure which will be accepted by the authors because it is considered to be helpful for a certain knowledge domain. A typical wiki may have 70% articles in traditional form and 30% of the articles containing DocTypes.

The good thing is that it doesn´t make a difference to the authors. But, of course, it makes a difference for the designer of the wiki.

The following table gives a summary:

Aspect Standard Wiki DocTypes
Paradigm 1 a collection of stories a collection of fact sheets
Paradigm 2 things are somehow connected to each other by 'free association' objects have distinct typed references between each other
When to use Broad range of topics, weakly structured text, no common scheme applicable High degree of structural similarity between certain instances of your knowledge domain. Commonly agreed 'reasonable' scheme on how to present information
output / appearance heterogeneous, totally left to the user (apart from the sporadic use of templates which produce some standardized pieces of text homogeneous, standardized scheme how information is presented; there may be areas where "stories" are embedded, but they have their fixed place in the overall schema design.
Navigation The author puts hyperlinks where he feels it makes sense. The kind of relationship which goes along with a link can only be derived from careful reading the text portion around the link. The system expects references at some pre-defined positions and assigns a semantic meaning to them. The reader will find such references always in the same place and can traverse them backwards specifically. Even reports are possible.
Burden for the average article writer
  • low because
    • .. he can start with an empty page
  • high because
    • .. he must master the whole topic mentally and he must find a logical way to present the contents
    • .. he must think about navigation within his text and recognize which links to other articles are desirable
  • low because
    • .. he is confronted with a pre-designed structure which (hopefully) covers all relevant aspects
  • high because
    • .. he must understand that structure and accept it even if he had deliberately taken a much simpler approach to note his 'statements'.
Burden for the wiki designer

EX POST approach:

  • low effort because
    • .. he can wait for things to happen
  • high effort because
    • .. he must invent proper categories and assign them to existing articles
    • .. he must invent and apply templates after having recognized similarities in certain articles
    • .. systematic changes must be done manually

EX ANTE approach:

  • low because
    • .. once the schema is there the quality of contents and navigation will normally be satisfying
    • .. systematic changes can be applied by scripts or template changes
  • high because
    • .. he must understand the knowledge domain before the majority of articles are written
    • .. he must care for appealing optical presentation, suitable navigation and reports, based on a sufficiently stable meta model
Import / Export The contents can technically be exported as XML but the contents is opaque, i.e. it is nothing more than a sequence of characters in the XML scheme. The text can be exported as semantically structured XML or as a csv with named columns.

Glossary

Before we are going to show an example and give more details we need a short definition of the terminology of DocTypes:

Page
A page (article) in your wiki which is designed in alignment with DocTypes principles. Pages contain one or more Objects of a certain Type.
Type
A definition of common Properties for all Objects (Instances) belonging to that Type.
Object
A piece of knowledge contained in a Page which has a certain Type.
Property
An attribute of an Object; it can be a plain value, a complex value (consisting of Instances of other Types) or a Reference to another Object.
Reference
A Property which points to another Object.
ReferenceInfo
Some text which can go along with a Reference; it explains more about the kind of relationship.


Technology

DocTypes is basically a series of clever templates which use standard Mediawiki features and some existing MediaWiki extensions like DPL and Variables. DocTypes is more a certain way to use existing MediaWiki technology than a new technology.


Continue reading with DocTypes Design or DocTypes Example.

Access DocTypes defined in this wiki: Category:DocType

Look at the template scripts which implement DocTypes: Category:DocTypeScript


Other Approaches

If you are interested in Semantic Mediawiki, you can play around in this wiki, too. See SMW Demo.