Talk:Cache API

From FollowTheScore
Jump to: navigation, search

First Feedback

Hello Jean-Francois,

I decided to call your API immediately before executing my own SQL, that is near line 2418 in DPLMain.php. I used the following code to call your interface:


		// update dependencies to CacheAPI whenever the page containing the DPL query is edited
		
		if ($wgRequest->getVal('action','view')=='submit') {
			CacheAPI::remDependencies ( $wgArticle->getID()); 
			foreach ($aIncludeCategories as $categoryGroup) {
				foreach ($categoryGroup as $category) {
					$title = Title::makeTitle(14, $category);
					$catID = $title->getArticleID();
					CacheAPI::addDependencies ( $wgArticle->getID(), 1, $catID, ''); 
					// die ("adding to DEPENDENCIES: ".$title->getArticleID());
				}
			}
		}
  • you must add a "global $wgArticle;" somwehere above this code.
Ok tried it. You version didn't work because you were using numbers (page ids) for categories, I was using text. See discussion below.
  • Is it correct that you expect the pageID of the page containing the DPL statement as the first argument?
Yes.
  • Would it make sense to add a second index to your table to spped up search when dependant articles are changed?
Yes, all columns will be indexed. The SQL query makes them all indexes. And since we're shifting to category IDs (see below), we will be able to index the column in which we search.
  • What is the & separator good for? I have groups of OR-wired cats. All groups are AND-wired....
Aren't we forced to store the AND relations between the categories ? For example, When a DPL invocation looks for articles in categories AA & BB, we don't want to outdate the cache of the DPL page's invocation if an article of only category AA is touched. We want it to be purged only if an article of AA _AND_ BB is touched. The way you used addDependencies should be used only for the OR relation, that is to add separate entries for separate categories. When the AND relation is present (specified by user or implicit), you should pass an array of category ids as a 3rd argument (see below).
  • I would like to use symbolic constants instead of pure numbers for the types (1,2,3)
Ok. In the next version, we will use CACHETYPE_CATEGORY CACHETYPE_TEMPLATE and CACHETYPE_LINKSTO
  • I had to delete the final php delimiter at the end of your source code because there seems to be some invisible UTF code after it.
No idea how it happened! Probably my text editor/uploader.
  • I created a small article containing the following query:
         {{#time:Y-m-d h:i:s}}
         {{#dpl:
         | category = Test¦Fictitious country
         }}
  • I used the following parameter in the LocalSettings.php (because this would be a typical configuration for a huge wiki I guess):
 ExtDynamicPageList::$respectParserCache = true;
  • Instead of doing this I could have written in the DPL query
| allowcachedresults=true
  • The #time statement is very useful - so you can see that the time does NOT change if you reload the query page multiple times.
  • After editing the query page I saw two entries in the new table, correctly pointing to the two categories (Test , Fictitious country)
The only reason it worked for you is that all your categories are true pages that are in fact created. This is not the case for all wikis and probably not for Wikipedia. We have to find another way to work with categories than the makeTitle strategy using page numbers to identify them. See discussion below.
  • Multiple reloading of the query page showed the same date/time each time (which is correct because the ParserCache is enabled)
  • Then I modified Nigunda Test which is one of the articles occuring in the query result in a different browser window.
I will send you a modified version of both my extension and your DPLMain.php that work.
  • I expected the cache now to be invalidated. So when I pressed F5 on the window with the query page I should have gotten a new time stamp - but this did not happen.
  • I have no idea how the ParserCache works but the whole idea is:
    1. use the ParserCache - even for pages containing DPL queries (by default DPL switches the cache off so onbe must pay attention here)
    2. change a page contained in the result (regardless what kind of change, I assume at the moment)
    3. refresh the DPL page
    4. ... and watch that the DPL page is no longer delivered from cache but recalculated.

I think it is best if you set up a similar configuration (article names and cat names may differ) and then try to make the simple example with one or two categories work on your pc. If you have to change the lines in DPLMain.php which I used to call your API: go ahead and tell me what you have done.


Note: I observed that your table changes also if a dependant page is being edited. In my case another entry was added for 'Nigunda Test'. I do not understand why this is necessary and I am afraid it will slow down regular editing.

This is weird. Was there a DPL invocation in Nigunda Test ? I've never seen this behavior. We'll look at this when you try the corrected versions.


Happy testing!

Gero


I'll take the time to review each of your remarks, thanks a lot for testing. However I think there's an error in the way you use addDependencies, and i'd like you to check this. The third argument seems to be a problem, I was expecting a string containing the conditional statement with '&' as separators. For example, if you want page 605 (which contain the DPL statement) to be dependant on the edition of articles that are in categories AA & BB, I would expect a call like this :

CacheAPI::addDependencies ( $wgArticle->getID(), 1, 'AA&BB', );

or if it's just dependent on 1 category like AA :

CacheAPI::addDependencies ( $wgArticle->getID(), 1, 'AA', ); - outdated, the 3rd argument will now be an array of category numbers


Actually the way you did it might just be better, I might switch to using page numbers instead of the names of the categories... I don't know what happens with mediawiki when category articles are deleted and recreated. I'll have to look at this, we don't want category numbers stored in the cacheapi to mean nothing whenever a category is changed, or deleted and recreated after that. For this reason I used the category names. I'd still like to see if it works for your test if you call addDependencies as described above. EmuWikiAdmin- 02:14, 1 July 2009 (UTC) - outdated, now using cat IDs

I just tested it. Did you know that Title::makeTitle(14, $category) does not work if the category page has not been created ? This is due to the fact that a category can exist without the category page being made. That's a big problem because lots of categories are likely to exist but not all of them have a category page already made. Maybe we should go back to storing the dependencies in the form of strings like AA&BB, or do you see another possibility ? Maybe we should work with cat_ids instead of page_ids. Look at the Category table, it might be easier to index and it would speed up the search, and it wouldnt have the same problems that makeTitle has. EmuWikiAdmin- 04:43, 1 July 2009 (UTC)

Here's a working version of the Cache API : http://www.emuwiki.com/testinstall/extensions/CacheAPI_current.txt . The difference with this version is that CacheAPI::addDependencies now takes an array of category IDs as 3rd argument. Here's what you will need to use in DPLMain.php :


		// update dependencies to CacheAPI whenever the page containing the DPL query is edited
		
		if ($wgRequest->getVal('action','view')=='submit') {
			CacheAPI::remDependencies ( $wgArticle->getID()); 
			$categorylist = array();
			foreach ($aIncludeCategories as $categorygroup) {
				foreach ($categorygroup as $category) {
					$catobj = Category::newFromName( $category );
					array_push ( $categorylist , $catobj->getID() );
				}
			}
			CacheAPI::addDependencies ( $wgArticle->getID(), 1, $categorylist, ''); 
		}

I'm not sure I totally understood what's the structure of $aIncludeCategories so I just included everything as if it would be all AND-related. Maybe I'm wrong

EmuWikiAdmin- 04:43, 1 July 2009 (UTC)


You can try for yourself, it works if you test it the way you tested (don't try the linksto and template, but the category works). Right now I need no more input from you I'll go on and improve the system, since I know where you call the API in DPL I might also change DPLMain.php itself if I have problems. I'll be back with more functionalities and a better manual to document all those functionalities. Thanks for your test! EmuWikiAdmin- 04:53, 1 July 2009 (UTC)


Next iteration

Fine, now it works for the simple example!

Some tests I made showed the following behaviour of MW:

As soon as you use a category MW will create an entry in its category table - regardless if there is page for this category or not. The entry in the category table will even survive if you delete the last article of that category. In fact the category table of dpldemo contains several such zombie entries. So the cat_id is a stable identifier.

For DPL it would be _easier_ to pass literals for the categories. Whether the CacheAPI internally uses cat-Ids or literals could be left to the CacheAPI. (Do you get the cat-IDs when you are called back after editing?)


Regarding dependency checking there is no real difference between "and" and "or" for categories - as long as the CacheAPI is not very bright. From a logical point of view passing ANDed groups of OR-wired categories leaves all doors open. For the beginning the CacheAPI could create a single dependency to each of the categories, however.

Example: conditions are

A|B
C|D|E

You have three pages

(1) A,C,D
(2) A,B,E
(3) A,B,C,D,E

Invalidation of the cache is NOT necessary if

(1) A,B,C,D
(1) B,C
(1) B,D
(2) A,C
(3) B,D
(4) A
(4) C,D,E

and so on. So there are many constellations in which we could keep the cache. Case (4) may be of special interest - a new page will only invalidate the query if it belongs to at least one cat of both groups.

So I suggest to pass an array of arrays; the inner elements of the outer array are OR-wired literals of cat names; but let us not get too ambitious i nthe first release...


There is another point: DPL allows a lot of things which cannot be caught with a simple approach like "template", "category" or "link" (think of regular expression, text paragraph inclusions etc.). For ALL these cases we could invent a fourth type named "GENERAL" dependency. Setting this dependency would mean that the cache of the DPL page is invalidated whenever a dependant page is changed (regardless what is changed on that page).

If I move the call of your API to a place AFTER my own query has been executed I will be able to pass an array of page_ids which are considered to influence the DPL result - just because they show up in the result.


A last point:

I should call your CacheAPI only if the DPL page is subject to the MW ParserCache. If the DPL page is already dynamic by itself we can avoid useless calls. So I will add

       if ($bAllowCachedResults) {
          ...
       }

to the call of the CacheAPI.



An error which I noticed when playing around:

from within function "Database::insert". MySQL returned error "1048: Column 'first' cannot be null (localhost)".
Retrieved from "http://gs-nb/dpldemo/index.php/Nigunda_Test"

It occured when I edited two of the four pages which were part of the DPL result ('Somango' and 'Nigunda Test')


P.S.:

I still don´t understand why there shows up an entry in the dependencies table for "Nigunda Test" in my example. I see no need to add entries to the table when dependant pages are changed ...

-- Good luck !

Gero 16:12, 1 July 2009 (UTC)

Next iteration reply

Concerning "and" and "or", the CacheAPI will be intelligent and it will keep the cache considering all the possible factors. For now it clears the cache only if the article edited is member of category A & B & C & etc...This is already an implementation of the AND relation. I'll be working on making it even more picky, by adding the possibility to add NOTs rules (for example, outdate cache when an article in categories A & B & C but not category D is edited.). I don't know if it's the case for Wikipedia but for EmuWiki.com, it would be useless to have a cache system that would add each AND-linked categories as independent dependencies. For example, this DPL invocation lists all emulators that are in categories EMULATORS & NES & PC (DOS/Windows) (Host)... but I have like 4000 other pages listing other emulators for example this page lists all emulators that are in categories EMULATORS & Amstrad CPC & PC (DOS/Windows) (Host). If we say ok this page is dependent on any change to EMULATORS, it is dependent on any change to Amstrad CPC and it is dependent to PC (DOS/Windows) (Host), then everytime an emulator is added, all my 4000 pages will get purged just because they are marked as dependent on the EMULATORS category. That's why it is important for me to make an intelligent cache system, not one that will purge too agressively. I will do it such that ANY function of DPL has its own representation in the cache.

If I move the call of your API to a place AFTER my own query has been executed I will be able to pass an array of page_ids which are considered to influence the DPL result - just because they show up in the result.

This is a possible workaround for some special cases but we should not rely on that too strongly. The problem is it allows updating the cache only when existing articles are being edited, but not when newly articles that would fit in the result sets are being added.

There is another point: DPL allows a lot of things which cannot be caught with a simple approach like "template", "category" or "link"

I'll be working on this. I want almost any feature of DPL to be covered by the cache API.

I should call your CacheAPI only if the DPL page is subject to the MW ParserCache. If the DPL page is already dynamic by itself we can avoid useless calls. So I will add

I agree

An error which I noticed when playing around: For some reason I don't understand yet, when you play with Nigunda_Test, it calls the function addDependencies. Which is weird because on the first look it doesn't seem that Nigunda Test contains a DPL call. Anyways this is also in part due to improper input checking by addDependencies, when the condition statement is empty I should just quit the function without doing anything, that's the kind of thing I'll be improving. And eventually we'll have to understand why Nigunda Test calls addDependencies, this is something I don't get on my test server.

I'll be back in a couple of days with an improved cache API. EmuWikiAdmin- 17:17, 1 July 2009 (UTC)

Progress?

I don´t want to press - but are you making some progress? Is there a preliminary version I could try to integrate? Gero 04:56, 16 July 2009 (UTC)

No it's great that you are pressing I need to kick my own ass and finish this. I just had plumbings and lots of stuff to do in my house lately so I had less time. I'll try to advance next weekend. However right now you can use the current version, it will integrate and it will add your dependencies to the database, the only thing is that for dependencies higher than the 3 first (template, category, title), it will not outdate the cache. However you can still prepare the integration by calling addDependencies as describe in Cache API. EmuWikiAdmin- 07:03, 20 July 2009 (UTC)
Fine! I tried to call your API and it seems to work :-). However, the way you combine AND and OR is "orthogonal" to the way it is done within DPL currently. DPL can handle groups of conditions where the elements of each group are OR-wired and all the groups are ANDed. As far as I can see from your spec you expect it exactly the other way round. Would it be a problem for you to design the two-dimensional array in a way that the inner elements are OR-wired and the outer elements are AND-wired?
 If we had                            I would like to pass
(1) cat=A&B                              ( (A), (B) )
(2) cat=A|B                              ((A,B))
(3) cat=A
(3) cat=B                                ( (A), (B) )  -- same as (1)
(4) cat=A|B
(4) cat=C&D
(4) cat=E|F                              ( (A,B), (C), (D), (E,F) )

--Gero 14:28, 20 July 2009 (UTC)

I currently expect 2 arrays : an array of dependency types, and a bi-dimensional array of dependencies (category names, template names, etc...). I have no problem reviewing the way it's done but I do not think that you completely understood the way it currently works. Currently, there is nothing in the Cache API that understands OR statements - so it's not that I made it the other way around, it's that I didn't consider ORs at all. What I would have expected is that DPL generates by himself all possible combinations based on ORs. I gave an example with ORs on the page. I can change that if you want, but we won't end up with a 2-dimension array, we will probably end up with 2 2-dimension arrays. Basically, every possible combinations have to be generated, either on the DPL side or on my side. For example, for something like :
  • cat=A&B
  • template=C|D|E
then we have to make 3 entries in the database : cat=A&B&template=C, cat=A&B&template=D, and cat=A&B&template=E . So you tell me if it's too hard for you to calculate these in DPL, I can add this to the Cache API, but it's gonna be an additional argument or 2. EmuWikiAdmin- 05:03, 21 July 2009 (UTC)

-- o.k.. I see. Let me put it like this: If I solve the problem by creating the "cartesian product" of all combinations which involve an OR then it will work. If you do it: it will work as well. If I do it, it will only be solved for this use of your component. If you do it, you can offer the solution to others as well ... --Gero 19:34, 24 July 2009 (UTC)


No problem I'm currently working on this. It's a little bit harder than I thought. Just to be sure that I'm doing it correctly, is it correct that all groups of OR-wired elements are linked together by an assumed AND relation ? EmuWikiAdmin- 18:52, 26 July 2009 (UTC)
Yes, exactly. The implicit AND is assumed for ALL conditions (not only category spoecifications) given in a DPL statement. This is the deeper reason why it does not make a difference whether you say 'category=a|category=b' or 'category=a&b'. The latter is only a shorthand which is internally translated to the first form... Gero 20:06, 26 July 2009 (UTC)