Difference between revisions of "Issue:Modifiedby and createdby produce inefficient SQL"

From FollowTheScore
Jump to: navigation, search
m (createdby)
(createdby: no DISTINCT needed for pages created (you create a page only once))
Line 76: Line 76:
  
 
<pre>
 
<pre>
SELECT DISTINCT
+
SELECT
 
  `vpw_page`.page_namespace as page_namespace
 
  `vpw_page`.page_namespace as page_namespace
 
  ,`vpw_page`.page_title as page_title
 
  ,`vpw_page`.page_title as page_title

Revision as of 15:54, 20 March 2012

Description: modifiedby and createdby produce inefficient SQL
Extension / Version: DPL   /   2.0
Type / Status: Bug   /   open

Problem

This ticket is about inefficient SQL queries in DPL that take very long to run. I provide some new queries that are up to 1000x faster.

modifiedby

The following DPL to list articles modified by a user takes over 10 seconds on our wiki:

modifiedby=Smith
namespace=

Its query produces the following SQL (obtained using "debug=6"):

SELECT DISTINCT
 `vpw_page`.page_namespace as page_namespace
,`vpw_page`.page_title as page_title
,`vpw_page`.page_id as page_id
,`vpw_page`.page_title as sortkey
FROM
 `vpw_page`
WHERE
 1=1
 AND `vpw_page`.page_namespace IN ('0')
 AND `vpw_page`.page_is_redirect=0
 AND 'Smith' in (
  select rev_user_text
  from `vpw_revision`
  where `vpw_revision`.rev_page=page_id)
ORDER BY page_title ASC
LIMIT 0, 1000

Here is a better query for modified articles that runs in 0.02 seconds:

SELECT DISTINCT
 `vpw_page`.page_namespace as page_namespace
,`vpw_page`.page_title as page_title
,`vpw_page`.page_id as page_id
,`vpw_page`.page_title as sortkey
FROM
 `vpw_page`
 INNER JOIN `vpw_revision` ON (`vpw_page`.page_id = `vpw_revision`.rev_page)
WHERE
 1=1
 AND `vpw_page`.page_namespace IN ('0')
 AND `vpw_page`.page_is_redirect=0
 AND `vpw_revision`.rev_user_text = 'Smith'
ORDER BY page_title ASC
LIMIT 0, 1000

createdby

Here is an even worse DPL query to get created articles by a user. It takes 2 minutes and 40 seconds to run on our wiki:

<dpl>
createdby=Smith
namespace=
</dpl>

And here is better SQL that runs in 0.05 seconds and produces the same results:

SELECT
 `vpw_page`.page_namespace as page_namespace
 ,`vpw_page`.page_title as page_title
 ,`vpw_page`.page_id as page_id
 , `vpw_page`.page_title as sortkey
FROM
 `vpw_page`
 INNER JOIN `vpw_revision` ON (`vpw_page`.page_id = `vpw_revision`.rev_page)
WHERE
 1=1
 AND `vpw_page`.page_namespace IN ('0')
 AND `vpw_page`.page_is_redirect=0
 AND `vpw_revision`.rev_user_text = 'Smith'
 AND `vpw_revision`.rev_parent_id = 0
ORDER BY page_title ASC
LIMIT 0, 1000

Reply