Issue:Caching
Description: | DPL Pages Load Slowly |
Extension / Version: | DPL / ? |
Type / Status: | Change Request / open |
Problem
Would like a way to Cache Pages. I was thinking a Bot that runs the DPL query and would write the static result to the Actual Article. Then you could schedule the bot to refresh the Pages as often as necessary.
I am adding this as a Feature request. I am not smart enough to write the bot.
postjl@milwaukee.k12.wi.us
Reply
Yes. This would be a really useful enhancement and I have thought about this several times. But I never found time to develop that feature. The point is, however, that your article might look quite ugly if the output of a DPL query were contained as wiki text. The best thing would be to have the expanded DPL query(queries) for an article in a page with identical name but in a separate namespace. Then DPL could look for that page, check its date (maybe even update it ad-hoc if it were expired) and include the result, possibly with a small note that the content was taken from a cache.
I would be willing to support someone with ideas and advicer on the DPL source code if you could find someone to do the programming.
I am especially excited about this as it might make it possible to use DPL on really large sites like wikipedia...
Gero 16:58, 1 July 2008 (CEST)
Hmm..I think something like this already works (but with the downside of DPL query output showing in wiki text):
Suppose you want ArticleA to show cached results of {{#dpl:category=baz|someparam={{sometemplate}}}}.
Create Template:QueryA with this content: {{subst<includeonly></includeonly>:#dpl:category=baz|someparam={{subst<includeonly></includeonly>:sometemplate}}}}.
Whenever you want to update, write {{subst:QueryA}} into ArticleA.
--Rezyk 20:48, 3 July 2008 (CEST)
Also..I am presuming that "allowcachedresults=true" does not suit your purposes for some reason (either the pages used get updated too often, constantly outdating the automatic cache...or you want updates that will pick up newly categorized pages). If you haven't considered that option already, it should be tried. --Rezyk 20:59, 3 July 2008 (CEST)
ReReply
I would like to add my voice to this feature request, that would really change things for DPL. I'm currently moving all the data on my website and basically all the content of the website will rely on DPL and caching will be totally necessary. The allowcachedresults is unfortunately a little bit too slow to update. I may start to look at it but it will likely result in very dirty code. What are you thinking about ? creating new tables in the SQL database and storing the final output in these, only modifying when we detect that pages of a given category are being added or removed ? --EmuWikiAdmin1 July 22nd 2008
Comment
I wonder what the structure of such tables should look like ... The output of DPL is highly dynamic (hence the name) and the template author has lots of options to influence layout etc.
The only way I could think of is to store the wiki code produced by a DPL statement as a blob in a table. To make sure that the code becomes invalidated whenever the source of your document or one of the templates (potentially) used within the DPL statement changes you would have to attach an array of the revisions of all documents on which the DPL result depends.
You might be able to create a new extension which can wrap some other extension (like for instance DPL) by creating that dependency list and storing the wikicode result of the embedded extension. Designing the whole thing this way would allow a high degree of decoupling. The "umbrella extension" would probably have to provide an API which has to be used by the "inner" extension. The API would be called by the inner extension (say DPL) just before it returns. It would only contain the generated code and the dependency list (document names would be enough as we could assume that the most recent version of each document has just been used).
I hope I made my thoughts clear enough?
The main problem I see is how to make sure that you do nbot MISS documents which were added after the query has been run the last time and which would occur in the result set if you re-executed the query now... Say, you look for all documents belonging to category X and ending with "foo". You execute the query and get a list like {Afoo, Bfoo, Cfoo}. You store the resulting wiki code and show the result. The next time somebody calls the document with the above query you would have to make sure that no document ending in "foo" has meanwhile been added to category X. Finding this out requires you to run the query! So there is no gain in using a cache.
What would really be neede is something like "rerun that query whenever a document is added to category X or deleted from category X". I think it would be enough to rerun the query regardless of additional constraints like the name ending on "foo" or so. The point is that the umbrella extension would have to understand something like "this cached DPL result depends on all documents belonging to category "X". A background task could then refresh the contents after adding documents to cat X (or removing documents from cat X). The pity is that DPL allows you to use many other query conditions besides categories and I can´t see a generic way to catch all these parameters and conditions.
On the other hand you could deliver a cached result and offer a "refresh" button to the reader in case he suspects that the result mioght be outdated. But how should (S)HE know if the result might be outdated?
So many questions ...
Gero 17:17, 23 July 2008 (CEST)
Reply to comment
Yeah I totally got your idea this is exactly where I was going. As a matter of fact I had already concluded that it would be difficult to wrap all the functionalities of DPL and I already knew my first attempt would be to allow caching only for categories-selected pages (sorry for the others...).
I don't see the problem that you state at the beginning of your comments as being so big : like you say, the table would contain straight wikitext in a field, and probably 2 or 3 other fields that would allow us to identify to which DPL invocation the cache belongs, and all the category-related information so we know which categories this DPL invocation calls.
I don't understand what you say about storing revisions. Why would we need to store all revisions of an article ? All we need to do is include ArticleSave and other hooks such that : whenever an article of a category is touched (edited, created, no matter what), rerun the DPL query, and update the content in the cache.
Actually, the more I look at it, the more I think it will be pretty easy.
I really like your idea of an umbrella extension that would allow to store anything... However I'm really reallllly not good at programming and this may become out of my scope. But yeah when you think about it, it's doable, an extension that says : give me any content, I'll store it, give me the update rules, and we're done!
Basically, I want to transfer the computational burden to when people save articles rather than when they view it.
Reply
INdeed you got my ideas. The point why I mentioned revisons was the following: within a DPL statement you can call another template and if the version of that template changes the script must also be rerun because the DPL script is likely to produce a different result based on the modified sub-template...
I think the umbrella solution could also do this: it would have to take the following parameters:
- the expanded wiki code (based on the up-to-date situation)
- a list of categories whcih will be considered to have an effect on that wiki code (i.e. the umbrella application must update its cache if there was achange in membership within at least one of these categories since the wiki code was produced)
- a list of revisions of arbitrary wiki documents which are considered to invalidate the cache contents (regardless of their membership within any categories).
The update strategy of the umbrella extension (let's call it "make" or "ant") should be configurable:
- it could update the cache whenever it detects a violation of one of the above dependency rules
- it could collect such violations over some configurable amount of time and then do the update. This would be useful because sometimes many similar changes might be made within a short time period (e.g. removing 10 documents from a category and putting them into a different one). It would avoid unnecessary updates in such cases.
- it could do it on demand (i.e. if the page containingthe cached statement is to be displayed)
- it could do it on explicit demand (leaving the update strategy to another component)
- it might even decide to show thge old contents and offer an explicit "rerfresh" button to the user if it knows that a dependency is violated. This could be useful in special situations where the user knows what he is doing (and this behaviour should not be the default case).
After all I think it can be done and I really would very much like to see it being done in a separate extension. I could help with advise and reviews, discussions etc. but currently I do not have time to do the implementation.
Gero 12:54, 24 July 2008 (CEST)
Starting this
Ok I already started to work on this, the cache system is done, I chose to work directly in the DPL code because as I said previously I just don't have enough knowledge to create a complete umbrella extension. I decided to start a page on your wiki to discuss about the caching system instaid of coming here in the bug section. I need your help on a point. The cache system is great right now, and the speed improvement is wonderful, all that's left to do is the update rules. Here is the new Allowing true caching by DPL dev page.
EmuWikiAdmin1 9:33, July 26th 2008 (CEST)