Difference between revisions of "Allowing true caching by DPL dev page"

From FollowTheScore
Jump to: navigation, search
(Note)
(Problems to report, suggestions, etc)
 
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
News : '''We have a working system'''.
 +
 +
=What's done=
 +
 +
*Implemented only for Parser Function invocation.
 +
*Automatically works when allowcachedresults is set to true for a given DPL invocation.
 +
*Uses the mediawiki caching system (which is good because no matter if you use memcached, database storing or file storing, and squid, it will work.)
 +
*Some things were already implemented : everytime the article in which there is a DPL invocation is touched, the cache is purged already so I did not have to deal with this. Also, everytime a template that is invoked on the DPL invocation page is modified, the cache is purged for that page, so we don't need to invent a whole new system of template memorizing, mediawiki already does it.
 +
*A table named dpldependencies is created in the mediawiki database. It contains 2 important fields : cacheids and categorydeps. cacheids contains a list of ':' separated numbers that represents ids of article that need to be purged when an article that has categories such as those of the associated categorydeps is parsed (so this is aggressive outdating of the content, including when pages of the given categories are touched, purged, resaved, or simply when the 24 hours have been reached (and that someone refreshed the page after these 24 hours)). categorydeps contains strings such as : 1&2&NewCategory&Kitchen Stuff&Spoons. the | (or) logic is not stored in the database, it's considered to be completely different entries. Only the & (and) logic is stored.
 +
*Additionnaly to the update rules that were already present in mediawiki, I added 1 rule : when an article is rendered (parsed), look for categories, compute all the possible combinations of those categories, and look in the database for updates rules and purge the cacheids that we need to purge.
 +
*Works with categories set as variables because it works at the InternalParseBeforeLinks level (for example a template page that would develop : <nowiki>[[Category:{{{VariableCategory}}}]]</nowiki> etc...
 +
 +
 +
=What's left=
 +
 +
*At this point I would like a review, if someone is available. I won't be able to improve the code in terms of performance, I'm a very bad coder as you will see.
 +
*Implement for html-style tags
 +
*Update rule for deletion of article (right now if you work with template content you don't need this, because the regular mediawiki template outdate rule applies for the cache, but if for example you work with chapter content, and the templates don't link the pages together, then we need that rule).
 +
*Clean the code (I can't do this one)
 +
*There is currently a problem with the cache purging : if you stay logged, you won't see the page refresh. However if a non-logged user comes to the page (you or anyone else), the purge did work. So there is some kind of logged-user-specific caching that I don't know about but we have to dig and find this and purge it as well.
 +
*Maybe other update rules such as move article
 +
*Find a way to keep dpldependencies table clean (right now it does not delete ids associated with categories, when for example a dpl invocation is removed from a page, so it would grow to infinity on a site with lots of changes and moves and deletes...)
 +
 +
=Code=
 +
 +
This is the current state of the code.If you want to see only the differences from the original DPL code, search for //EmuWikiAdmin. This is alpha-quality code. Do not use on a production server. :
 +
 +
[http://www.emuwiki.com/cachecode.txt cachecode.txt] -- Updated August 23th 2008 - 11:50 PM (New York time)
 +
 +
=Problems to report, suggestions, etc=
 +
 +
Ok I found out that there is a fundamental problem with the fact that we're working at the parser level rather than the article save level for example : there's too much expiry. And sometimes its worse it falls in an infinite loop like the parser outdates the page, the page gets parsed, the parser outdates the page... we need to find a way to look at the categories at the ArticleSave level. This may involve parsing ourselves the wikitext to some stage where templates are developped...
 +
 +
:This is real problem. Sorry, I wanted to say: This is a real challenge ;-))
 +
 +
=Archive of the old project=
 +
<pre>
 
Hello, my name is EmuWikiAdmin1, and I realised lately that my website will need DPL to cache its output if I want to have high traffic and not overload the server with too many DPL requests. I thus decided to start this page here and start modifying the DPL code so we have a caching system for DPL. Eventually this code could be included in the official DPL releases if that's what Gero wants, or we can just keep it separated and used only by those who really need it, it will be the decision of Gero.
 
Hello, my name is EmuWikiAdmin1, and I realised lately that my website will need DPL to cache its output if I want to have high traffic and not overload the server with too many DPL requests. I thus decided to start this page here and start modifying the DPL code so we have a caching system for DPL. Eventually this code could be included in the official DPL releases if that's what Gero wants, or we can just keep it separated and used only by those who really need it, it will be the decision of Gero.
 +
 +
=Changed my mind=
 +
 +
I just realised that mediawiki already has every we want for a cache system in their objectcache table. I'm now modifying my project so that instead of developing our own caching system, we just use the general mediawiki caching system (which we already use when we choose the option allowcachedresults=true) but just modify some things that will keep this caching system up to date and that will make the appropriate items expire. The rest of the page is about the old project, the part about dpldependencies will be kept but we won't use a dplcache table.
 +
 +
:This sounds very interesting. Do you make progress, is there something to dicsuss at the moment? [[User:Gero|Gero]] 17:05, 8 August 2008 (CEST)
 +
 +
:Nop, nothing to discuss, I now moved the whole system to this new method, just a couple of update rules to adjust and I'll be ready for you to look at it... It's muuuuuuuch more lightweight. -- EmuWikiAdmin1, 10th august 2008
  
 
=What's been done=
 
=What's been done=
  
*A table named dplcache is added to the mediawiki database following DPL installation. (if not already created)
+
*A table named dplcache is added to the mediawiki database following DPL installation. (if not already created) This contains a cacheID (mediumInt) and output field (long text) which contains DPL output.
 +
 
 +
*Another table named dpldependencies is added to the mediawiki database. This contains 4 fields : cacheids, titledeps, categorydeps, and templatedeps. cacheids contains a list of ':' separated cache ids that need to be updated either when an article with title same as titledeps is updated, or in a category same as categorydeps is deleted, created, or edited.
  
 
*A new parameter is added to DPL : |CacheID=. User can either specify a 6 number (ex : 924924) cache ID, or he can just put CacheID=true. In the case where the user specifies a cache ID, DPL tries to find the cached content associated with this ID. The advantage is, for example, if you have multiple pages with the same DPL invocations, you could just give them the same cacheID, thus preventing from having to maintain multiple caches that would actually contain the same thing. In case the paramater is CacheID=true, then DPL chooses a random available cache ID and actually replaces the CacheID=true string with CacheID=xxxxxx in the wikitext.
 
*A new parameter is added to DPL : |CacheID=. User can either specify a 6 number (ex : 924924) cache ID, or he can just put CacheID=true. In the case where the user specifies a cache ID, DPL tries to find the cached content associated with this ID. The advantage is, for example, if you have multiple pages with the same DPL invocations, you could just give them the same cacheID, thus preventing from having to maintain multiple caches that would actually contain the same thing. In case the paramater is CacheID=true, then DPL chooses a random available cache ID and actually replaces the CacheID=true string with CacheID=xxxxxx in the wikitext.
  
 
*The render function of the DPL parser function is modified : before rendering, check if the CacheID parameter is used. If it is used, then don't refresh the DPL content : just output the cache content from the database. In case the CacheID parameter is present but the database does not have cache content for this cacheID, then just refresh the DPL, store it in the cache, and display the refreshed content to the user. So basically, how the cache works is that it always look for the content in the database, if it's not there, it creates it. So our expiration system will be very simple : just delete the field values associated with a cacheID in the database if we want it to expire. The next time a user goes to a page with the DPL invocation, it will get refreshed.
 
*The render function of the DPL parser function is modified : before rendering, check if the CacheID parameter is used. If it is used, then don't refresh the DPL content : just output the cache content from the database. In case the CacheID parameter is present but the database does not have cache content for this cacheID, then just refresh the DPL, store it in the cache, and display the refreshed content to the user. So basically, how the cache works is that it always look for the content in the database, if it's not there, it creates it. So our expiration system will be very simple : just delete the field values associated with a cacheID in the database if we want it to expire. The next time a user goes to a page with the DPL invocation, it will get refreshed.
 +
 +
*Hooks have been added to delete content from the cache when necessary. Hooks look for : category change, article deletion in categories, article creation or edit in categories, and title of article (right now this one stores only the title of the article in which the DPL invocation is present. templatedeps is not used right now but could contain a list of templates that can be modified and that would affect the DPL invocation.
  
 
=What's left=
 
=What's left=
Line 13: Line 62:
 
*Special page that would allow the administrators to empty all the DPL cache
 
*Special page that would allow the administrators to empty all the DPL cache
  
*Rules for expiration of contents (category-related changes in articles, changes of templates used in the DPL invocation, changes in articles called by the DPL invocation). I need your help on this one Gero : I would like to put a ArticleSave or ArticleSaveComplete hook to detect if there were any change in categories. I got the wikitext but I don't know how to render it to know the final output (expand templates, generate the DPL result). I went to MediaWiki_extensions_FAQ - how to render wikitext but they don't give examples with ArticleSave and it seems that ArticleSave hook functions do not receive the &$parser parameter, which we need to use functions like parse() or recursiveTagParse(). Do you know how we could guess the final output of the text from the ArticleSave hook ?
+
*What's left would be Article Move hook. Unfortunately the hooks of articles moves in mediawiki don't get passed the &$article object and we just have a title, we don't have access to the content. so unless mediawiki decides to pass this parameter, we would be stuck to keep huge article title lists in the tables and scan for these titles when an article is moved. There could be also a possibilities to start from title and detect the categories using different classes but I don't know how right now.
  
 +
*Unfortunately, for now the way I implemented it, if people edit or save articles with Category:{{{VariableCategory}}}, category detection will work but category removal won't work. We need to find a way to get &$article object information at the InternalParseBeforeLinks hook level if we want to extract last revision content and compare it to new revision content. It will not work either for deletion of variable category for the same reason.
 
*Maybe a user-controllable bouton that would make the content of the cache expire.
 
*Maybe a user-controllable bouton that would make the content of the cache expire.
 +
 +
*Detect templates that could affect the DPL invocation and add these templates names to templatedeps
 +
 +
*Add support for the logical & and | of category relations in the DPL invocation. Right now let's say your DPL invocation calls Category = 1&2&3, the caching system says : ok this cacheID is dependant on category 1, it's dependant on category 2, and it's dependant on category 3. So if one edits an article in category 1 only, the cache will get purged. Which is not good : we only want to purge the cache when an article of category 1 AND 2 AND 3 is edited.
 +
 +
*I added $parser->disableCache(); at the beginning of the parser function because I had problems with some caching that was left in mediawiki. I don't know if that breaks anything, tell me what you think. It certainly should slow down people that do not use cache, so if you have an alternative tell me!
 +
 +
*Clean the code.
 +
 +
*I tried to just copy paste your version check for backward compatibility of the parser function. It didn't work, it breaks the output in 1.12. I don't know what I've done wrong if you want to have a look at it that would be nice.
 +
 +
*Extend the functionality to the <nowiki><DPL></nowiki> tags, right now it's limited to parser function.
 +
 +
*Create a cron job that would clean the cache ?
 +
 +
*Find a way to keep dpldependencies table clean : there is no problem with the table Dplcache, because content often gets cleaned because as soon as there's a change in the page where DPL is, or in the categories concerned, it gets deleted. But the dpldependencies table as it is right now is an accumulator that will grow to infinity.
  
  
Line 24: Line 90:
  
 
=Code=
 
=Code=
 +
 +
'''I'm now at a point where I will need some review because this is only my 3rd php program ever and you will find the code is probably very dirty. Go ahead and test it if you have the time, and don't hesitate to clean that up or give me comments about what to do.'''
  
 
This is the current state of the code. Improvments are welcome. If you want to see only the differences from the original DPL code, search for //EmuWikiAdmin. This is alpha-quality code. Do not use on a production server. :
 
This is the current state of the code. Improvments are welcome. If you want to see only the differences from the original DPL code, search for //EmuWikiAdmin. This is alpha-quality code. Do not use on a production server. :
  
[http://www.emuwiki.com/cachecode.txt cachecode.txt]
+
[http://www.emuwiki.com/cachecode.txt cachecode.txt] -- Updated July 28th 2008 - 11:30 AM (New York time)
  
 
= Note =
 
= Note =
Line 33: Line 101:
 
Gresat that you started this!
 
Gresat that you started this!
 
# <s>check if the MW12 hack ("parser must be coaxed ..") is really needed. I think you can just return the cached content.</s> Yep, really needed on 1.12. Actually the output seems ok, except if your DPL invocation you call a parser function, then the parser functions refuses to output.
 
# <s>check if the MW12 hack ("parser must be coaxed ..") is really needed. I think you can just return the cached content.</s> Yep, really needed on 1.12. Actually the output seems ok, except if your DPL invocation you call a parser function, then the parser functions refuses to output.
#* I was thinking about downward compatibility. We must makje sure, that your solution also runs with MW1.9, 1.10, 1.11. [[User:Gero|Gero]] 00:22, 27 July 2008 (CEST)
+
#* I was thinking about downward compatibility. We must makje sure, that your solution also runs with MW1.9, 1.10, 1.11. [[User:Gero|Gero]] 00:22, 27 July 2008 (CEST) - Ok I'll just put the same version check that was in the original DPL.
 
# <s>Please use a new revision number, I recommend 1.8.0</s> Done
 
# <s>Please use a new revision number, I recommend 1.8.0</s> Done
 
# <s>I don´t think the ArticleSave hook will do the job. Try ParserAfterTidy or something similar. You must get access to the parser and you need access to data structures which tell you to which categories the article being edited belongs.</s> Great, finally realised that ParserAfterTidy do not have category information. I used InternalParseBeforeLinks instead.
 
# <s>I don´t think the ArticleSave hook will do the job. Try ParserAfterTidy or something similar. You must get access to the parser and you need access to data structures which tell you to which categories the article being edited belongs.</s> Great, finally realised that ParserAfterTidy do not have category information. I used InternalParseBeforeLinks instead.
# you must also catch the "delete article" operation.
+
# <s>you must also catch the "delete article" operation.</s> Done. Article save, article edit, and change between last revision and current revision have been implemented too.
  
 
Good luck!
 
Good luck!
  
 
[[User:Gero|Gero]] 18:44, 26 July 2008 (CEST)
 
[[User:Gero|Gero]] 18:44, 26 July 2008 (CEST)
 +
'''Bold text'''
 +
 +
</pre>

Latest revision as of 07:45, 29 August 2008

News : We have a working system.

What's done

  • Implemented only for Parser Function invocation.
  • Automatically works when allowcachedresults is set to true for a given DPL invocation.
  • Uses the mediawiki caching system (which is good because no matter if you use memcached, database storing or file storing, and squid, it will work.)
  • Some things were already implemented : everytime the article in which there is a DPL invocation is touched, the cache is purged already so I did not have to deal with this. Also, everytime a template that is invoked on the DPL invocation page is modified, the cache is purged for that page, so we don't need to invent a whole new system of template memorizing, mediawiki already does it.
  • A table named dpldependencies is created in the mediawiki database. It contains 2 important fields : cacheids and categorydeps. cacheids contains a list of ':' separated numbers that represents ids of article that need to be purged when an article that has categories such as those of the associated categorydeps is parsed (so this is aggressive outdating of the content, including when pages of the given categories are touched, purged, resaved, or simply when the 24 hours have been reached (and that someone refreshed the page after these 24 hours)). categorydeps contains strings such as : 1&2&NewCategory&Kitchen Stuff&Spoons. the | (or) logic is not stored in the database, it's considered to be completely different entries. Only the & (and) logic is stored.
  • Additionnaly to the update rules that were already present in mediawiki, I added 1 rule : when an article is rendered (parsed), look for categories, compute all the possible combinations of those categories, and look in the database for updates rules and purge the cacheids that we need to purge.
  • Works with categories set as variables because it works at the InternalParseBeforeLinks level (for example a template page that would develop : [[Category:{{{VariableCategory}}}]] etc...


What's left

  • At this point I would like a review, if someone is available. I won't be able to improve the code in terms of performance, I'm a very bad coder as you will see.
  • Implement for html-style tags
  • Update rule for deletion of article (right now if you work with template content you don't need this, because the regular mediawiki template outdate rule applies for the cache, but if for example you work with chapter content, and the templates don't link the pages together, then we need that rule).
  • Clean the code (I can't do this one)
  • There is currently a problem with the cache purging : if you stay logged, you won't see the page refresh. However if a non-logged user comes to the page (you or anyone else), the purge did work. So there is some kind of logged-user-specific caching that I don't know about but we have to dig and find this and purge it as well.
  • Maybe other update rules such as move article
  • Find a way to keep dpldependencies table clean (right now it does not delete ids associated with categories, when for example a dpl invocation is removed from a page, so it would grow to infinity on a site with lots of changes and moves and deletes...)

Code

This is the current state of the code.If you want to see only the differences from the original DPL code, search for //EmuWikiAdmin. This is alpha-quality code. Do not use on a production server. :

cachecode.txt -- Updated August 23th 2008 - 11:50 PM (New York time)

Problems to report, suggestions, etc

Ok I found out that there is a fundamental problem with the fact that we're working at the parser level rather than the article save level for example : there's too much expiry. And sometimes its worse it falls in an infinite loop like the parser outdates the page, the page gets parsed, the parser outdates the page... we need to find a way to look at the categories at the ArticleSave level. This may involve parsing ourselves the wikitext to some stage where templates are developped...

This is real problem. Sorry, I wanted to say: This is a real challenge ;-))

Archive of the old project

Hello, my name is EmuWikiAdmin1, and I realised lately that my website will need DPL to cache its output if I want to have high traffic and not overload the server with too many DPL requests. I thus decided to start this page here and start modifying the DPL code so we have a caching system for DPL. Eventually this code could be included in the official DPL releases if that's what Gero wants, or we can just keep it separated and used only by those who really need it, it will be the decision of Gero.

=Changed my mind=

I just realised that mediawiki already has every we want for a cache system in their objectcache table. I'm now modifying my project so that instead of developing our own caching system, we just use the general mediawiki caching system (which we already use when we choose the option allowcachedresults=true) but just modify some things that will keep this caching system up to date and that will make the appropriate items expire. The rest of the page is about the old project, the part about dpldependencies will be kept but we won't use a dplcache table.

:This sounds very interesting. Do you make progress, is there something to dicsuss at the moment? [[User:Gero|Gero]] 17:05, 8 August 2008 (CEST)

:Nop, nothing to discuss, I now moved the whole system to this new method, just a couple of update rules to adjust and I'll be ready for you to look at it... It's muuuuuuuch more lightweight. -- EmuWikiAdmin1, 10th august 2008

=What's been done=

*A table named dplcache is added to the mediawiki database following DPL installation. (if not already created) This contains a cacheID (mediumInt) and output field (long text) which contains DPL output.

*Another table named dpldependencies is added to the mediawiki database. This contains 4 fields : cacheids, titledeps, categorydeps, and templatedeps. cacheids contains a list of ':' separated cache ids that need to be updated either when an article with title same as titledeps is updated, or in a category same as categorydeps is deleted, created, or edited.

*A new parameter is added to DPL : |CacheID=. User can either specify a 6 number (ex : 924924) cache ID, or he can just put CacheID=true. In the case where the user specifies a cache ID, DPL tries to find the cached content associated with this ID. The advantage is, for example, if you have multiple pages with the same DPL invocations, you could just give them the same cacheID, thus preventing from having to maintain multiple caches that would actually contain the same thing. In case the paramater is CacheID=true, then DPL chooses a random available cache ID and actually replaces the CacheID=true string with CacheID=xxxxxx in the wikitext.

*The render function of the DPL parser function is modified : before rendering, check if the CacheID parameter is used. If it is used, then don't refresh the DPL content : just output the cache content from the database. In case the CacheID parameter is present but the database does not have cache content for this cacheID, then just refresh the DPL, store it in the cache, and display the refreshed content to the user. So basically, how the cache works is that it always look for the content in the database, if it's not there, it creates it. So our expiration system will be very simple : just delete the field values associated with a cacheID in the database if we want it to expire. The next time a user goes to a page with the DPL invocation, it will get refreshed.

*Hooks have been added to delete content from the cache when necessary. Hooks look for : category change, article deletion in categories, article creation or edit in categories, and title of article (right now this one stores only the title of the article in which the DPL invocation is present. templatedeps is not used right now but could contain a list of templates that can be modified and that would affect the DPL invocation.

=What's left=

*Special page that would allow the administrators to empty all the DPL cache

*What's left would be Article Move hook. Unfortunately the hooks of articles moves in mediawiki don't get passed the &$article object and we just have a title, we don't have access to the content. so unless mediawiki decides to pass this parameter, we would be stuck to keep huge article title lists in the tables and scan for these titles when an article is moved. There could be also a possibilities to start from title and detect the categories using different classes but I don't know how right now.

*Unfortunately, for now the way I implemented it, if people edit or save articles with Category:{{{VariableCategory}}}, category detection will work but category removal won't work. We need to find a way to get &$article object information at the InternalParseBeforeLinks hook level if we want to extract last revision content and compare it to new revision content. It will not work either for deletion of variable category for the same reason.
*Maybe a user-controllable bouton that would make the content of the cache expire.

*Detect templates that could affect the DPL invocation and add these templates names to templatedeps

*Add support for the logical & and | of category relations in the DPL invocation. Right now let's say your DPL invocation calls Category = 1&2&3, the caching system says : ok this cacheID is dependant on category 1, it's dependant on category 2, and it's dependant on category 3. So if one edits an article in category 1 only, the cache will get purged. Which is not good : we only want to purge the cache when an article of category 1 AND 2 AND 3 is edited.

*I added $parser->disableCache(); at the beginning of the parser function because I had problems with some caching that was left in mediawiki. I don't know if that breaks anything, tell me what you think. It certainly should slow down people that do not use cache, so if you have an alternative tell me!

*Clean the code.

*I tried to just copy paste your version check for backward compatibility of the parser function. It didn't work, it breaks the output in 1.12. I don't know what I've done wrong if you want to have a look at it that would be nice.

*Extend the functionality to the <DPL> tags, right now it's limited to parser function.

*Create a cron job that would clean the cache ?

*Find a way to keep dpldependencies table clean : there is no problem with the table Dplcache, because content often gets cleaned because as soon as there's a change in the page where DPL is, or in the categories concerned, it gets deleted. But the dpldependencies table as it is right now is an accumulator that will grow to infinity.


=Discussion=

Add your input here.


=Code=

'''I'm now at a point where I will need some review because this is only my 3rd php program ever and you will find the code is probably very dirty. Go ahead and test it if you have the time, and don't hesitate to clean that up or give me comments about what to do.'''

This is the current state of the code. Improvments are welcome. If you want to see only the differences from the original DPL code, search for //EmuWikiAdmin. This is alpha-quality code. Do not use on a production server. :

[http://www.emuwiki.com/cachecode.txt cachecode.txt] -- Updated July 28th 2008 - 11:30 AM (New York time)

= Note =

Gresat that you started this!
# <s>check if the MW12 hack ("parser must be coaxed ..") is really needed. I think you can just return the cached content.</s> Yep, really needed on 1.12. Actually the output seems ok, except if your DPL invocation you call a parser function, then the parser functions refuses to output.
#* I was thinking about downward compatibility. We must makje sure, that your solution also runs with MW1.9, 1.10, 1.11. [[User:Gero|Gero]] 00:22, 27 July 2008 (CEST) - Ok I'll just put the same version check that was in the original DPL.
# <s>Please use a new revision number, I recommend 1.8.0</s> Done
# <s>I don´t think the ArticleSave hook will do the job. Try ParserAfterTidy or something similar. You must get access to the parser and you need access to data structures which tell you to which categories the article being edited belongs.</s> Great, finally realised that ParserAfterTidy do not have category information. I used InternalParseBeforeLinks instead.
# <s>you must also catch the "delete article" operation.</s> Done. Article save, article edit, and change between last revision and current revision have been implemented too.

Good luck!

[[User:Gero|Gero]] 18:44, 26 July 2008 (CEST)
'''Bold text'''