Difference between revisions of "Allowing true caching by DPL dev page"

From FollowTheScore
Jump to: navigation, search
Line 3: Line 3:
 
=What's been done=
 
=What's been done=
  
*A table named dplcache is added to the mediawiki database following DPL installation. (if not already created)
+
*A table named dplcache is added to the mediawiki database following DPL installation. (if not already created) This contains a cacheID (mediumInt) and output field (long text) which contains DPL output.
 +
 
 +
*Another table named dpldependencies is added to the mediawiki database. This contains 4 fields : cacheids, titledeps, categorydeps, and templatedeps. cacheids contains a list of ':' separated cache ids that need to be updated either when an article with title same as titledeps is updated, or in a category same as categorydeps is deleted, created, or edited.
  
 
*A new parameter is added to DPL : |CacheID=. User can either specify a 6 number (ex : 924924) cache ID, or he can just put CacheID=true. In the case where the user specifies a cache ID, DPL tries to find the cached content associated with this ID. The advantage is, for example, if you have multiple pages with the same DPL invocations, you could just give them the same cacheID, thus preventing from having to maintain multiple caches that would actually contain the same thing. In case the paramater is CacheID=true, then DPL chooses a random available cache ID and actually replaces the CacheID=true string with CacheID=xxxxxx in the wikitext.
 
*A new parameter is added to DPL : |CacheID=. User can either specify a 6 number (ex : 924924) cache ID, or he can just put CacheID=true. In the case where the user specifies a cache ID, DPL tries to find the cached content associated with this ID. The advantage is, for example, if you have multiple pages with the same DPL invocations, you could just give them the same cacheID, thus preventing from having to maintain multiple caches that would actually contain the same thing. In case the paramater is CacheID=true, then DPL chooses a random available cache ID and actually replaces the CacheID=true string with CacheID=xxxxxx in the wikitext.
  
 
*The render function of the DPL parser function is modified : before rendering, check if the CacheID parameter is used. If it is used, then don't refresh the DPL content : just output the cache content from the database. In case the CacheID parameter is present but the database does not have cache content for this cacheID, then just refresh the DPL, store it in the cache, and display the refreshed content to the user. So basically, how the cache works is that it always look for the content in the database, if it's not there, it creates it. So our expiration system will be very simple : just delete the field values associated with a cacheID in the database if we want it to expire. The next time a user goes to a page with the DPL invocation, it will get refreshed.
 
*The render function of the DPL parser function is modified : before rendering, check if the CacheID parameter is used. If it is used, then don't refresh the DPL content : just output the cache content from the database. In case the CacheID parameter is present but the database does not have cache content for this cacheID, then just refresh the DPL, store it in the cache, and display the refreshed content to the user. So basically, how the cache works is that it always look for the content in the database, if it's not there, it creates it. So our expiration system will be very simple : just delete the field values associated with a cacheID in the database if we want it to expire. The next time a user goes to a page with the DPL invocation, it will get refreshed.
 +
 +
*Hooks have been added to delete content from the cache when necessary. Hooks look for : category change, article deletion in categories, article creation or edit in categories, and title of article (right now this one stores only the title of the article in which the DPL invocation is present. templatedeps is not used right now but could contain a list of templates that can be modified and that would affect the DPL invocation.
  
 
=What's left=
 
=What's left=
Line 13: Line 17:
 
*Special page that would allow the administrators to empty all the DPL cache
 
*Special page that would allow the administrators to empty all the DPL cache
  
*<s>Rules for expiration of contents (category-related changes in articles, changes of templates used in the DPL invocation, changes in articles called by the DPL invocation).</s> I implemented when people save or edit articles, when people delete articles, and I also made a special function to compare last edit with the new edit so we can detect when people remove categories from the article.
+
*What's left would be Article Move hook. Unfortunately the hooks of articles moves in mediawiki don't get passed the &$article object and we just have a title, we don't have access to the content. so unless mediawiki decides to pass this parameter, we would be stuck to keep huge article title lists in the tables and scan for these titles when an article is moved. There could be also a possibilities to start from title and detect the categories using different classes but I don't know how right now.
*What's left would be Article Move. Unfortunately the hooks of articles moves in mediawiki don't get passed the &$article object and we just have a title, we don't have access to the content. so unless mediawiki decides to pass this parameter, we would be stuck to keep huge article title lists in the tables and scan for these titles when an article is moved.
+
 
 
*Unfortunately, for now the way I implemented it, if people edit or save articles with Category:{{{VariableCategory}}}, category detection will work but category removal won't work. We need to find a way to get &$article object information at the InternalParseBeforeLinks hook level if we want to extract last revision content and compare it to new revision content. It will not work either for deletion of variable category for the same reason.
 
*Unfortunately, for now the way I implemented it, if people edit or save articles with Category:{{{VariableCategory}}}, category detection will work but category removal won't work. We need to find a way to get &$article object information at the InternalParseBeforeLinks hook level if we want to extract last revision content and compare it to new revision content. It will not work either for deletion of variable category for the same reason.
 
*Maybe a user-controllable bouton that would make the content of the cache expire.
 
*Maybe a user-controllable bouton that would make the content of the cache expire.
 +
 +
*Detect templates that could affect the DPL invocation and add these templates names to templatedeps
 +
 +
*Add support for the logical & and | of category relations in the DPL invocation. Right now let's say your DPL invocation calls Category = 1&2&3, the caching system says : ok this cacheID is dependant on category 1, it's dependant on category 2, and it's dependant on category 3. So if one edits an article in category 1 only, the cache will get purged. Which is not good : we only want to purge the cache when an article of category 1 AND 2 AND 3 is edited.
 +
 +
*I added $parser->disableCache(); at the beginning of the parser function because I had problems with some caching that was left in mediawiki. I don't know if that breaks anything, tell me what you think. It certainly should slow down people that do not use cache, so if you have an alternative tell me!
 +
 +
*Clean the code.
 +
 +
*I tried to just copy paste your version check for backward compatibility of the parser function. It didn't work, it breaks the output in 1.12. I don't know what I've done wrong if you want to have a look at it that would be nice.
 +
 +
*Extend the functionality to the <nowiki><DPL></nowiki> tags, right now it's limited to parser function.
  
  
Line 25: Line 41:
  
 
=Code=
 
=Code=
 +
 +
'''I'm now at a point where I will need some review because this is only my 3rd php program ever and you will find the code is probably very dirty. Go ahead and test it if you have the time, and don't hesitate to clean that up or give me comments about what to do.'''
  
 
This is the current state of the code. Improvments are welcome. If you want to see only the differences from the original DPL code, search for //EmuWikiAdmin. This is alpha-quality code. Do not use on a production server. :
 
This is the current state of the code. Improvments are welcome. If you want to see only the differences from the original DPL code, search for //EmuWikiAdmin. This is alpha-quality code. Do not use on a production server. :
  
[http://www.emuwiki.com/cachecode.txt cachecode.txt]
+
[http://www.emuwiki.com/cachecode.txt cachecode.txt] -- Updated July 28th 2008 - 11:30 AM (New York time)
  
 
= Note =
 
= Note =

Revision as of 17:24, 27 July 2008

Hello, my name is EmuWikiAdmin1, and I realised lately that my website will need DPL to cache its output if I want to have high traffic and not overload the server with too many DPL requests. I thus decided to start this page here and start modifying the DPL code so we have a caching system for DPL. Eventually this code could be included in the official DPL releases if that's what Gero wants, or we can just keep it separated and used only by those who really need it, it will be the decision of Gero.

What's been done

  • A table named dplcache is added to the mediawiki database following DPL installation. (if not already created) This contains a cacheID (mediumInt) and output field (long text) which contains DPL output.
  • Another table named dpldependencies is added to the mediawiki database. This contains 4 fields : cacheids, titledeps, categorydeps, and templatedeps. cacheids contains a list of ':' separated cache ids that need to be updated either when an article with title same as titledeps is updated, or in a category same as categorydeps is deleted, created, or edited.
  • A new parameter is added to DPL : |CacheID=. User can either specify a 6 number (ex : 924924) cache ID, or he can just put CacheID=true. In the case where the user specifies a cache ID, DPL tries to find the cached content associated with this ID. The advantage is, for example, if you have multiple pages with the same DPL invocations, you could just give them the same cacheID, thus preventing from having to maintain multiple caches that would actually contain the same thing. In case the paramater is CacheID=true, then DPL chooses a random available cache ID and actually replaces the CacheID=true string with CacheID=xxxxxx in the wikitext.
  • The render function of the DPL parser function is modified : before rendering, check if the CacheID parameter is used. If it is used, then don't refresh the DPL content : just output the cache content from the database. In case the CacheID parameter is present but the database does not have cache content for this cacheID, then just refresh the DPL, store it in the cache, and display the refreshed content to the user. So basically, how the cache works is that it always look for the content in the database, if it's not there, it creates it. So our expiration system will be very simple : just delete the field values associated with a cacheID in the database if we want it to expire. The next time a user goes to a page with the DPL invocation, it will get refreshed.
  • Hooks have been added to delete content from the cache when necessary. Hooks look for : category change, article deletion in categories, article creation or edit in categories, and title of article (right now this one stores only the title of the article in which the DPL invocation is present. templatedeps is not used right now but could contain a list of templates that can be modified and that would affect the DPL invocation.

What's left

  • Special page that would allow the administrators to empty all the DPL cache
  • What's left would be Article Move hook. Unfortunately the hooks of articles moves in mediawiki don't get passed the &$article object and we just have a title, we don't have access to the content. so unless mediawiki decides to pass this parameter, we would be stuck to keep huge article title lists in the tables and scan for these titles when an article is moved. There could be also a possibilities to start from title and detect the categories using different classes but I don't know how right now.
  • Unfortunately, for now the way I implemented it, if people edit or save articles with Category:{{{VariableCategory}}}, category detection will work but category removal won't work. We need to find a way to get &$article object information at the InternalParseBeforeLinks hook level if we want to extract last revision content and compare it to new revision content. It will not work either for deletion of variable category for the same reason.
  • Maybe a user-controllable bouton that would make the content of the cache expire.
  • Detect templates that could affect the DPL invocation and add these templates names to templatedeps
  • Add support for the logical & and | of category relations in the DPL invocation. Right now let's say your DPL invocation calls Category = 1&2&3, the caching system says : ok this cacheID is dependant on category 1, it's dependant on category 2, and it's dependant on category 3. So if one edits an article in category 1 only, the cache will get purged. Which is not good : we only want to purge the cache when an article of category 1 AND 2 AND 3 is edited.
  • I added $parser->disableCache(); at the beginning of the parser function because I had problems with some caching that was left in mediawiki. I don't know if that breaks anything, tell me what you think. It certainly should slow down people that do not use cache, so if you have an alternative tell me!
  • Clean the code.
  • I tried to just copy paste your version check for backward compatibility of the parser function. It didn't work, it breaks the output in 1.12. I don't know what I've done wrong if you want to have a look at it that would be nice.
  • Extend the functionality to the <DPL> tags, right now it's limited to parser function.


Discussion

Add your input here.


Code

I'm now at a point where I will need some review because this is only my 3rd php program ever and you will find the code is probably very dirty. Go ahead and test it if you have the time, and don't hesitate to clean that up or give me comments about what to do.

This is the current state of the code. Improvments are welcome. If you want to see only the differences from the original DPL code, search for //EmuWikiAdmin. This is alpha-quality code. Do not use on a production server. :

cachecode.txt -- Updated July 28th 2008 - 11:30 AM (New York time)

Note

Gresat that you started this!

  1. check if the MW12 hack ("parser must be coaxed ..") is really needed. I think you can just return the cached content. Yep, really needed on 1.12. Actually the output seems ok, except if your DPL invocation you call a parser function, then the parser functions refuses to output.
    • I was thinking about downward compatibility. We must makje sure, that your solution also runs with MW1.9, 1.10, 1.11. Gero 00:22, 27 July 2008 (CEST) - Ok I'll just put the same version check that was in the original DPL.
  2. Please use a new revision number, I recommend 1.8.0 Done
  3. I don´t think the ArticleSave hook will do the job. Try ParserAfterTidy or something similar. You must get access to the parser and you need access to data structures which tell you to which categories the article being edited belongs. Great, finally realised that ParserAfterTidy do not have category information. I used InternalParseBeforeLinks instead.
  4. you must also catch the "delete article" operation. Done. Article save, article edit, and change between last revision and current revision have been implemented too.

Good luck!

Gero 18:44, 26 July 2008 (CEST)