Issue:Caching

From FollowTheScore
Revision as of 00:38, 24 July 2008 by EmuWikiAdmin1 (talk | contribs) (Reply to comment)
Jump to: navigation, search
Description: DPL Pages Load Slowly
Extension / Version: DPL   /   ?
Type / Status: Change Request   /   open

Problem

Would like a way to Cache Pages. I was thinking a Bot that runs the DPL query and would write the static result to the Actual Article. Then you could schedule the bot to refresh the Pages as often as necessary.

I am adding this as a Feature request. I am not smart enough to write the bot.

postjl@milwaukee.k12.wi.us

Reply

Yes. This would be a really useful enhancement and I have thought about this several times. But I never found time to develop that feature. The point is, however, that your article might look quite ugly if the output of a DPL query were contained as wiki text. The best thing would be to have the expanded DPL query(queries) for an article in a page with identical name but in a separate namespace. Then DPL could look for that page, check its date (maybe even update it ad-hoc if it were expired) and include the result, possibly with a small note that the content was taken from a cache.

I would be willing to support someone with ideas and advicer on the DPL source code if you could find someone to do the programming.

I am especially excited about this as it might make it possible to use DPL on really large sites like wikipedia...

Gero 16:58, 1 July 2008 (CEST)

Hmm..I think something like this already works (but with the downside of DPL query output showing in wiki text):

Suppose you want ArticleA to show cached results of {{#dpl:category=baz|someparam={{sometemplate}}}}.

Create Template:QueryA with this content: {{subst<includeonly></includeonly>:#dpl:category=baz|someparam={{subst<includeonly></includeonly>:sometemplate}}}}.

Whenever you want to update, write {{subst:QueryA}} into ArticleA.

--Rezyk 20:48, 3 July 2008 (CEST)

Also..I am presuming that "allowcachedresults=true" does not suit your purposes for some reason (either the pages used get updated too often, constantly outdating the automatic cache...or you want updates that will pick up newly categorized pages). If you haven't considered that option already, it should be tried. --Rezyk 20:59, 3 July 2008 (CEST)


ReReply

I would like to add my voice to this feature request, that would really change things for DPL. I'm currently moving all the data on my website and basically all the content of the website will rely on DPL and caching will be totally necessary. The allowcachedresults is unfortunately a little bit too slow to update. I may start to look at it but it will likely result in very dirty code. What are you thinking about ? creating new tables in the SQL database and storing the final output in these, only modifying when we detect that pages of a given category are being added or removed ? --EmuWikiAdmin1 July 22nd 2008

Comment

I wonder what the structure of such tables should look like ... The output of DPL is highly dynamic (hence the name) and the template author has lots of options to influence layout etc.

The only way I could think of is to store the wiki code produced by a DPL statement as a blob in a table. To make sure that the code becomes invalidated whenever the source of your document or one of the templates (potentially) used within the DPL statement changes you would have to attach an array of the revisions of all documents on which the DPL result depends.

You might be able to create a new extension which can wrap some other extension (like for instance DPL) by creating that dependency list and storing the wikicode result of the embedded extension. Designing the whole thing this way would allow a high degree of decoupling. The "umbrella extension" would probably have to provide an API which has to be used by the "inner" extension. The API would be called by the inner extension (say DPL) just before it returns. It would only contain the generated code and the dependency list (document names would be enough as we could assume that the most recent version of each document has just been used).

I hope I made my thoughts clear enough?

The main problem I see is how to make sure that you do nbot MISS documents which were added after the query has been run the last time and which would occur in the result set if you re-executed the query now... Say, you look for all documents belonging to category X and ending with "foo". You execute the query and get a list like {Afoo, Bfoo, Cfoo}. You store the resulting wiki code and show the result. The next time somebody calls the document with the above query you would have to make sure that no document ending in "foo" has meanwhile been added to category X. Finding this out requires you to run the query! So there is no gain in using a cache.

What would really be neede is something like "rerun that query whenever a document is added to category X or deleted from category X". I think it would be enough to rerun the query regardless of additional constraints like the name ending on "foo" or so. The point is that the umbrella extension would have to understand something like "this cached DPL result depends on all documents belonging to category "X". A background task could then refresh the contents after adding documents to cat X (or removing documents from cat X). The pity is that DPL allows you to use many other query conditions besides categories and I can´t see a generic way to catch all these parameters and conditions.

On the other hand you could deliver a cached result and offer a "refresh" button to the reader in case he suspects that the result mioght be outdated. But how should (S)HE know if the result might be outdated?

So many questions ...

Gero 17:17, 23 July 2008 (CEST)

Reply to comment

Yeah I totally got your idea this is exactly where I was going. As a matter of fact I had already concluded that it would be difficult to wrap all the functionalities of DPL and I already knew my first attempt would be to allow caching only for categories-selected pages (sorry for the others...).

I don't see the problem that you state at the beginning of your comments as being so big : like you say, the table would contain straight wikitext in a field, and probably 2 or 3 other fields that would allow us to identify to which DPL invocation the cache belongs, and all the category-related information so we know which categories this DPL invocation calls.

I don't understand what you say about storing revisions. Why would we need to store all revisions of an article ? All we need to do is include ArticleSave and other hooks such that : whenever an article of a category is touched (edited, created, no matter what), rerun the DPL query, and update the content in the cache.

Actually, the more I look at it, the more I think it will be pretty easy.

I really like your idea of an umbrella extension that would allow to store anything... However I'm really reallllly not good at programming and this may become out of my scope. But yeah when you think about it, it's doable, an extension that says : give me any content, I'll store it, give me the update rules, and we're done!

Basically, I want to transfer the computational burden to when people save articles rather than when they view it.