Getting mySQL to use indexes, Part 2
The next piece of my struggle to get the Drupal page /admin/content/node to display quickly was analyzing what happens when you ask it to limit the nodes returned to node of a certain type, for example "where node_type='business_listing_premium'".
SELECT n.*, u.name, u.uid FROM node n INNER JOIN users u ON n.uid = u.uid WHERE n.type = 'business_listing_free' ORDER BY n.changed DESC LIMIT 0, 50
Running EXPLAIN on the new query showed that the mySQL optimizer was completely ignoring the index on the column node_type. The reason for this is that while we have 19 million business listings in our database, there are only a handful of node types, so the cardinality of this index is, by definition, low. So the query optimizer (that is to say, its human designers) decided that there isn't a lot to be gained by using an index with only a few unique values compared to the actual size of the table. This may be true in general cases, but not when you are also performing a join with another table and you know that this index will a priori limit the number of rows from the first table in the join to a handful.
In order to get the desired behavior, I had to find out about the mySQL system variable max_seeks_for_key. The blog Racker Hacker explains, the mySQL query optimizer will use an available index whenever unconditionally whenever its cardinality exceeds the value of max_seeks_for_key. He goes on to say that you can safely set it to 1 without doing any damage. So I set it to 1 and restarted the mysqld daemon to see what would happen. Indeed, loading /admin/content/node and filtering for node_type = 'business_listing_premium' became much faster.
For completeness, I did another search where the filter is set to node_type = 'business_listing_free' for which there are over 19 million nodes. As expected, an index on the column node_type will be of little help in this case because the query spends most of its time sorting the nodes by a column, changed, a timestamp value, which is not indexed. In fact, a query like this run by an unsuspecting administrator can bring the site to its knees as I found out when I logged out and tried to log in again and found that the page just wouldn't refresh anymore. So the only sensible thing to do in this case is to go into the mysql console and kill the query.
SELECT n.*, u.name, u.uid FROM node n INNER JOIN users u ON n.uid = u.uid WHERE n.type = 'business_listing_free' ORDER BY n.changed DESC LIMIT 0, 50
Running EXPLAIN on the new query showed that the mySQL optimizer was completely ignoring the index on the column node_type. The reason for this is that while we have 19 million business listings in our database, there are only a handful of node types, so the cardinality of this index is, by definition, low. So the query optimizer (that is to say, its human designers) decided that there isn't a lot to be gained by using an index with only a few unique values compared to the actual size of the table. This may be true in general cases, but not when you are also performing a join with another table and you know that this index will a priori limit the number of rows from the first table in the join to a handful.
In order to get the desired behavior, I had to find out about the mySQL system variable max_seeks_for_key. The blog Racker Hacker explains, the mySQL query optimizer will use an available index whenever unconditionally whenever its cardinality exceeds the value of max_seeks_for_key. He goes on to say that you can safely set it to 1 without doing any damage. So I set it to 1 and restarted the mysqld daemon to see what would happen. Indeed, loading /admin/content/node and filtering for node_type = 'business_listing_premium' became much faster.
For completeness, I did another search where the filter is set to node_type = 'business_listing_free' for which there are over 19 million nodes. As expected, an index on the column node_type will be of little help in this case because the query spends most of its time sorting the nodes by a column, changed, a timestamp value, which is not indexed. In fact, a query like this run by an unsuspecting administrator can bring the site to its knees as I found out when I logged out and tried to log in again and found that the page just wouldn't refresh anymore. So the only sensible thing to do in this case is to go into the mysql console and kill the query.