User talk:Capmo

From FollowTheScore
Jump to: navigation, search

Hello Carlos, what a nice nice Homepage!

At the moment I work on a caching system which would allow DPL results to be stored and retrieved. I hope to improve performance so that DPL could be used on real large encyclopedias like Wikipedia. Would you like to help testing? Gero 05:52, 19 May 2009 (UTC)

Hi Gero, and thanks for the invite! This would be a really nice and useful feature, you may count on me to help testing it. Post here whatever code you want tested (or a link to the page where I can find it) and I'll try to return with the results and any bug report in a timely manner. Regards, Capmo 17:17, 19 May 2009 (UTC)

DPL Cache - ideas & question

  • Definitions:
    • A wiki page which contains one or more invocations of DPL is called a DPL-page in the following text.
    • A page containing pre-fabricated DPL results is called a cache page.
    • A page which is part of the DPL result (because its name or content appears within the DPL result) is called a data page.
  • Cache Levels:
    • Closest to the user there is the browser cache on the client side. It depends on expiration hints given in the http header (which in turn is generated by the MW core system on the server side). At the moment it is not clear to me how MW decides which expiration hint it puts into the http header.
    • Next comes the MW parser cache on the server side. By default this feature is ENabled on newer MW installations. If not explicitly disabled within the php code of the DPL extension the parser cache uses cached content to retrieve a page - which means that the DPL extension is not invoked at all. Note that if the DPL extension is not invoked it cannot disable the parser cache.
    • Third comes the cache level I think about - let us call it the DPL cache. If the DPL extension is invoked it uses its own strategy to decide whether it should generate output by querying the database or whether it can use older results instead which have been stored in the MW database (i.e. as content of a normal MW page which sereves as a cache page).

Basically my idea is based on the fact that DPL internally produces regular wikitext in a first step.

  1. Normally this wikitext is returned by DPL to the parser and thereafter it is transformed to HTML together with all the other wiki text from the DPL page, from included templates etc.
  2. I want to offer a new option (dplcache=<articlename>) which will save the wikitext portion generated by DPL to an arbitrary wiki cache page (in parallel to returning this output to the parser). The user can specify the name of the cache page according to his own ideas; typically it should be a page in the Template namespace; its name could be derived from the DPL page. For a DPL page named Xyz the cache page might be named Template:Extension DPL/Xyz. If a DPL page contains more than one invocation of DPL (which can happen in rare cases) the user must specify different names because we need a separate cache page for each DPL invocation.
  3. Whenever the DPL page is viewed, DPL checks if the according cache page exists. If it exists DPL will NOT do its normal work; instead it will simply generate a template inclusion for the cache page. Thus the parser will care for including the previously generated DPL result from the cache page.
  4. The user can specify an expiration time for the cache page (dplcachetime=<timespan in seconds>). If the cache page is too old, it will be updated when the page containing the DPL statement is viewed after the dplcachetime has passed.

I have a prototype which "in principle" works as described above. But there are a lot of problems still to be solved. One of them is the following:

  • If somebody changes the DPL page we do not know if the DPL statement was changed or some other code on the DPL page. To be on the safe side we can always ignore the cache. BUT: HOW do we know that the page is being EDITED, PREVIEWED or simply SHOWN? If the page is only shown we want the cache to be active, if it is being saved we want to update the cache, if it is previewed in edit mode we simply want to ignore the cache.

Do you know how the php code of a MW extension can find out in which mode (DISPLAY, EDIT/SAVE, EDIT/PREVIEW) it is being called?

Another problem is the following:

  • Even if the cache time is over it might not be necessary to regenerate the DPL result because it logically depends on the existence and content of the data pages.

A third problem:

A user who changed one of the data pages might want to explicitly trigger a cache-update. This means that he should be made aware of the connection between the DPL page and the data page he is editing.

Gero 19:39, 19 May 2009 (UTC)

Hi Gero, good that you added the cache levels, I was just going to ask you about the differences between your cache system and the parser cache. I suppose that when we enable the option "allowcachedresults" we're talking of the parser cache, right?
  • In relation to the cache pages being in the Template namespace, couldn't it be a potential problem? What if someone changes its contents? Then when the cache page is transcluded, it will display whatever was typed in there, and not necessarily the last result from the DPL query. Is it intended to be so? I'd suggest that the extension created a new namespace "DPLCache:" and stored all cache pages there; it would also create an exclusive permission level so that only the extension would be able to change pages in this namespace. (of course the access level could be tweaked by an admin via LocalSettings, but in theory the cache pages would be much more "protected" from accidental edits.)
  • About your question #1, sorry I can't help with the edit/preview/show issue, only what I know is that the "action" parameter passed in the URL changes in each case.
  • I couldn't understand your question #2: how will the DPL code know if the data pages have changed or not, if it doesn't execute the query first to check for any changes?
  • For question #3, I think that your solution should accept the parameter "action=purge" in order to be compatible with the general wiki behaviour. So, if someone changed one of the data pages, if s/he goes to the results page and adds the "&action=purge" to the URL, somehow your extension would have to catch it and rerun the queries. Since the results are being transcluded from the cache page which is static, I'm not sure if this is viable.
Capmo 17:03, 20 May 2009 (UTC)