Difference between revisions of "Issue:Modifiedby and createdby produce inefficient SQL"

From FollowTheScore
Jump to: navigation, search
(Problem)
Line 8: Line 8:
  
 
== Problem ==
 
== Problem ==
 +
 +
This ticket is about inefficient SQL queries in DPL that take very long to run. I provide some new queries that are up to 1000x faster.
  
 
The following DPL to list articles modified by a user takes over 10 seconds on our wiki:
 
The following DPL to list articles modified by a user takes over 10 seconds on our wiki:

Revision as of 17:46, 15 March 2012

Description: modifiedby and createdby produce inefficient SQL
Extension / Version: DPL   /   2.0
Type / Status: Bug   /   open

Problem

This ticket is about inefficient SQL queries in DPL that take very long to run. I provide some new queries that are up to 1000x faster.

The following DPL to list articles modified by a user takes over 10 seconds on our wiki:

modifiedby=Smith
nottitlematch=
namespace=

Its query produces the following SQL (obtained using "debug=6"):

SELECT DISTINCT
 `vpw_page`.page_namespace as page_namespace
,`vpw_page`.page_title as page_title
,`vpw_page`.page_id as page_id
,`vpw_page`.page_title as sortkey
FROM
 `vpw_page`
WHERE
 1=1
 AND `vpw_page`.page_is_redirect=0
 AND 'Smith' in (
  select rev_user_text
  from `vpw_revision`
  where `vpw_revision`.rev_page=page_id)
ORDER BY page_title ASC
LIMIT 0, 1000

Here is a better query for modified articles that runs in 0.02 seconds:

SELECT DISTINCT
 `vpw_page`.page_namespace as page_namespace
,`vpw_page`.page_title as page_title
,`vpw_page`.page_id as page_id
,`vpw_page`.page_title as sortkey
FROM
 `vpw_page`
 INNER JOIN `vpw_revision` ON (`vpw_page`.page_id = `vpw_revision`.rev_page)
WHERE
 1=1
 AND `vpw_page`.page_is_redirect=0
 AND `vpw_revision`.rev_user_text = 'Smith'
ORDER BY page_title ASC
LIMIT 0, 1000

Here is an even worse DPL query to get created articles by a user. It takes 2 minutes and 40 seconds to run on our wiki:

<dpl>
createdby=Smith
namespace=
</dpl>

And here is better SQL that runs in 0.05 seconds:

SELECT DISTINCT
 `vpw_page`.page_namespace as page_namespace
 ,`vpw_page`.page_title as page_title
 ,`vpw_page`.page_id as page_id
 , `vpw_page`.page_title as sortkey
FROM
 `vpw_page`
 INNER JOIN `vpw_revision` ON (`vpw_page`.page_id = `vpw_revision`.rev_page)
WHERE
 1=1
 AND `vpw_page`.page_namespace IN ('0')
 AND `vpw_page`.page_is_redirect=0
 AND `vpw_revision`.rev_user_text = 'Kolson'
 AND `vpw_revision`.rev_parent_id = 0
ORDER BY page_title ASC
LIMIT 0, 1000

Reply