Talk:Allowing true caching by DPL dev page

From FollowTheScore
Jump to: navigation, search

status update?

Am quite interested in this concept -- as we have a high-traffic wiki with over 70k pages, and would like to use DPL on every page (several times over) to display news, images, etc. Caching and db load are a significant issue.

Any chance there is a status update? Anything I can do to help?

Maybe there is a simpler (temporary) solution?

--Jb 03:38, 18 February 2009 (UTC)

Reply

As mentioned several times the current lack of caching is the most prevailing blocking point for DPL to be used on really big sites. I would love to see a caching feature and I will support you with advice - but I cannot afford the time to create that feature myself.

DPL is technically very stable but of course it can slow down your response times significantly if used without care. I think in many cases it would be sufficient to have a nightly rebuild of pages containing DPL. A more sophisticated solution might want to analyse dependencies and work asynchronously in the background - but I think these two things can be separated.

My basic idea is as follows:

  1. we could add a time pattern to DPL syntax which specifies the point in time where DPL must recalculate its output in the normal way (this could be done in absolute time or relative time compared to the last processing)
  2. if a DPL statement contains such a pattern, DPL will check the time stamp of another page (which contains the result of the last DPL execution)
  3. if the result is still considered to be acceptable (according to the time pattern) DPL will just include the content of that page instead of its normal processing.
  4. if not, DPL will behave normally (i.e. calculate its output) AND in addition it will save its output (as wiki text) to the cache page.
  5. Ideally the cache page would have versioning disabled (otherwise its history would create useless amounts of data in the wiki database).
  6. I think it would be acceptable to require that a page using that cach mechanism must only have ONE DPL statement in its code - so the name of the original page plus some generated suffix could serve as a page name for the cache.
  7. deleting the cache file manually could be a simple way to enforce a rebuild if you explicitly want the rebuild to be done right after you made changes to dependant pages(of course this requires the special right to delete a page)
  8. The page containing the DPL statement could put its own revision number into the output it saves to the cache file. Thus it could know when the DPL statement itself has been changed since the last invocation. In this case it would be enough to make a "null edit" on the page with the DPL statement to invalidate the cache.

I know that the current implementation of DPL is not easy to understand - so I suggest the following implementation strategy:

  1. Build a new extension named "ExtensionCache"
  2. let this extension receive the time pattern for cache expiration
  3. pass the literal "DPL" (or the symbol under which DPL is registered as a parser function) to that extension as a second parameter
  4. pass the whole set of DPL parameters as a third parameter to ExtensionCache
  5. teach ExtensionCache to make a call-back to DPL in case it wants to rebuild the cached content - with DPL returning its output to "ExtensionCache"!
  6. let ExtensionCache include the cached content or the freshly generated content into the output and let it care fopr saving freshly generated content to the cache page.

Got the idea?

For testing one could even write a "dummyDPL" extension which returns a fixed string (maybe containing a trime stamp) to demnonstrate the mechanism.

If such a solution existed I would integrate the call back interface into DPL and make integration tests.

What do you think of this idea? Could you put some effort in that?

Gero 23:23, 22 February 2009 (UTC)