So you want to run a PIVOT query in MySQL

Here’s a stored procedure which generates PIVOT (or crosstab) queries in MySQL, sort of like what you used to be able to do in Microsoft Access, before everything moved to The Cloud.

prcPivot

It’s called prcPivot, because that has the word PIVOT in it, and it’s a stored procedure .

Arguments

prcPivot has a single JSON_OBJECT argument, because I think that’s going to be a bit easier to use than multiple NULLable parameters, and it makes it look a bit more like an actual TRANSFORM statement.

The JSON_OBJECT provides the following key/values:

Key

Description

TRANSFORM

The cells inside the table. This must be an aggregate function (e.g. SUM, COUNT, MAX etc).

The aggregate is applied to both the row and column criteria.

It’s probably easiest to look at the examples than explain that here.

SELECT

The columns on the left hand side of the table. Is a comma-separated list of expressions.

Don’t include any ‘AS’ aliases on your columns, because that won’t work. If you do need to alias columns, you can do that in the ‘FROM’ option; see the examples below.

FROM

The source for the query

WHERE

(Optional) a WHERE clause. You could put this in the FROM if you want.

ORDER BY

(Optional) an ORDER BY clause. If omitted, will order by the columns in the SELECT option.

PIVOT

An expression to generate the column headers

PIVOT IN

(Optional) A comma separated list of column headers to restrict to

Because the headers are driven off the data, you could also add WHERE clauses to restrict which columns are included.

PIVOT ORDER BY

(Optional) A way to order the column headers

FORMAT

(Optional) Either the string “RESULT” (the default) or “SQL” ( to show the SQL used to generate the result )

Examples

To give a few random examples, here’s some CALL statements based on the sakila sample database, which depicts the internal systems of a video rental store, which is a type of store that used to exist.

So this one shows the number of rentals per month for each customer (pivotting on the month):

CALL prcPivot(JSON_OBJECT(
 'TRANSFORM', 'COUNT ( R.rental_id ) ',
 'SELECT', 'C.email, R.customer_id',
 'FROM', 'rental R
   INNER JOIN customer C ON R.customer_id = C.customer_id',
 'WHERE', 'C.email < "ALF"',
 'PIVOT', 'DATE_FORMAT(R.rental_date, "%Y-%m")'
));

You can use the FORMAT parameter to show the SQL used to generate that:

CALL prcPivot(JSON_OBJECT(
 'TRANSFORM', 'COUNT ( R.rental_id ) ',
 'SELECT', 'C.email, R.customer_id',
 'FROM', 'rental R
   INNER JOIN customer C ON R.customer_id = C.customer_id',
 'WHERE', 'C.email < "ALF"',
 'PIVOT', 'DATE_FORMAT(R.rental_date, "%Y-%m")',
 'FORMAT', 'SQL'
));

... which produces:

SELECT C.email, R.customer_id,
  COUNT(IF(DATE_FORMAT(R.rental_date, "%Y-%m") = "2005-05",  R.rental_id , NULL)) AS "2005-05",
  COUNT(IF(DATE_FORMAT(R.rental_date, "%Y-%m") = "2005-06",  R.rental_id , NULL)) AS "2005-06",
  COUNT(IF(DATE_FORMAT(R.rental_date, "%Y-%m") = "2005-07",  R.rental_id , NULL)) AS "2005-07",
  COUNT(IF(DATE_FORMAT(R.rental_date, "%Y-%m") = "2005-08",  R.rental_id , NULL)) AS "2005-08",
  COUNT(IF(DATE_FORMAT(R.rental_date, "%Y-%m") = "2006-02",  R.rental_id , NULL)) AS "2006-02"
 FROM rental R
   INNER JOIN customer C ON R.customer_id = C.customer_id 
 WHERE C.email < "ALF"
 GROUP BY C.email, R.customer_id
 ORDER BY C.email, R.customer_id

which can be useful if the generated SQL has a syntax error, fails at runtime for some reason, you want to modify it manually, or feed it into the explainerator.

More examples

Same sort of thing, pivotted on the other axis, by swapping the SELECT and PIVOT arguments.

Note we can only have a single expression in the PIVOT so I've removed the customer_id.

CALL prcPivot(JSON_OBJECT(
 'TRANSFORM', 'COUNT ( R.rental_id ) ',
 'SELECT', 'DATE_FORMAT(R.rental_date, "%Y-%m")',
 'FROM', 'rental R
   INNER JOIN customer C ON R.customer_id = C.customer_id',
 'WHERE', 'C.email < "ALF"',
 'PIVOT', 'C.email'
));

Which looks a bit ugly because that DATE_FORMAT is appearing in the column header, so you could move that into the FROM instead and give it an alias; e.g.

CALL prcPivot(JSON_OBJECT(
 'TRANSFORM', 'COUNT ( T.rental_id ) ',
 'SELECT', 'rental_month',
 'FROM', '( SELECT R.rental_id, C.email, DATE_FORMAT(R.rental_date, "%Y-%m") AS rental_month 
   FROM rental R
   INNER JOIN customer C ON R.customer_id = C.customer_id) AS T',
 'WHERE', 'T.email < "ALF"',
 'PIVOT', 'T.email'
));

Prior art

The guts of this was cribbed off this post, but I've added a few things that were missing:

  • user-defined aggregate function (not just 'SUM')
  • arguably slightly nicer syntax
  • better error handling
  • better handling of pivot columns containing quotes and newlines
  • ordering of the pivot columns ( PIVOT ORDER BY )
  • subsetting the pivot columns ( PIVOT IN )
  • ability to return the SQL instead of the result set ( FORMAT )

and removed the rollup columns and rows, which I wasn't a huge fan of.

Source me

Here you go:

DELIMITER $$
DROP PROCEDURE IF EXISTS prcPivot $$
CREATE PROCEDURE prcPivot(
    IN jsonArguments  JSON
    )
    DETERMINISTIC
    SQL SECURITY INVOKER
BEGIN
    -- The usual claptrap

    /* BSD 2-Clause License
    
    Copyright (c) 2023, randomnoun
    All rights reserved.
    
    Redistribution and use in source and binary forms, with or without
    modification, are permitted provided that the following conditions are met:
    
    * Redistributions of source code must retain the above copyright notice, this
      list of conditions and the following disclaimer.
    
    * Redistributions in binary form must reproduce the above copyright notice,
      this list of conditions and the following disclaimer in the documentation
      and/or other materials provided with the distribution.
    
    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
    AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
    IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
    DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
    FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
    DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
    SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
    CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
    OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */

    -- inputs are supplied as entries in the jsonArguments object:

    DECLARE strTransform      VARCHAR(10000); -- TRANSFORM: aggregate values returned in table rows
    DECLARE strSelectColsCsv  VARCHAR(10000); -- SELECT:    column(s) on the left, separated by commas
    DECLARE strFrom           VARCHAR(10000); -- FROM:      table name, db.tbl or query
    DECLARE strWhere          VARCHAR(10000); -- WHERE:     empty string or where clause
    DECLARE strOrderBy        VARCHAR(10000); -- ORDER BY:  empty string or order by clause; defaults to strSelectColsCsv
    DECLARE strPivotCol       VARCHAR(10000); -- PIVOT:     name of column to put across the top
    DECLARE strPivotIn        VARCHAR(10000); -- PIVOT IN:  subset of columns
    DECLARE strPivotOrderBy   VARCHAR(10000); -- PIVOT ORDER BY: order of columns (defaults to strPivotCol )

    DECLARE strFormat         VARCHAR(100);   -- FORMAT:    either 'SQL' or 'RESULT' ( default = RESULT )

    -- TODO "PIVOT IN", "PIVOT ORDER BY"

    -- temp variables
    DECLARE strTransformRegex VARCHAR(100);
    DECLARE strTransformFunc  VARCHAR(10000);
    DECLARE strTransformDistinct VARCHAR(10000);
    DECLARE strTransformVal   VARCHAR(10000);
    DECLARE strErrorMessage   VARCHAR(128);

    DECLARE strSqlSubQuery    VARCHAR(10000);
    DECLARE strSqlQuoteVal    VARCHAR(10000);
    DECLARE strSqlConcat      VARCHAR(10000);
    DECLARE strSqlSelect1     MEDIUMTEXT;
    DECLARE strSqlSelect2     MEDIUMTEXT;

    -- bit more breathing room constructing the SQL
    SET SESSION group_concat_max_len = 100000;

    SET strTransform = JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.TRANSFORM'));
    SET strSelectColsCsv = JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.SELECT'));
    SET strFrom = JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.FROM'));
    SET strWhere = COALESCE(JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.WHERE')), '');
    SET strOrderBy = COALESCE(
      JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$."ORDER BY"')),
      JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.ORDERBY')),
      CONCAT(strSelectColsCsv));
    SET strPivotCol = JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.PIVOT'));
    SET strPivotIn = COALESCE(
      JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$."PIVOT IN"')), -- JSON array ? maybe not
      JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.PIVOTIN')),
      '');
    SET strPivotOrderBy = COALESCE(
      JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$."PIVOT ORDER BY"')),
      JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.PIVOTORDERBY')),
      CONCAT(strPivotCol));

    SET strFormat = COALESCE(JSON_UNQUOTE(JSON_EXTRACT(jsonArguments, '$.FORMAT')), 'RESULT');


    -- for debugging, uncomment the SELECTs of various @variables below

    -- check mandatory parameters exist
    IF strTransform IS NULL THEN -- etc
       SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Missing TRANSFORM'; -- has error code 1644, not 45000
    END IF;
    IF strSelectColsCsv IS NULL THEN -- etc
       SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Missing SELECT';
    END IF;
    IF strFrom IS NULL THEN -- etc
       SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Missing FROM';
    END IF;
    IF strPivotCol IS NULL THEN -- etc
       SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Missing PIVOT';
    END IF;

    -- validate parameters
    IF NOT (strTransform REGEXP '^\\s*([a-zA-Z]+)\\s*\\((.*)\\)\\s*$') THEN -- etc
       SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'TRANSFORM must be in an aggregate function; e.g. SUM(1)';
    END IF;

    -- SUM(1)            -> SUM, null, 1
    -- COUNT(DISTINCT n) -> COUNT, DISTINCT, n
    SET strTransformRegex    = '^\\s*([a-zA-Z]+)\\s*\\((\\s*DISTINCT\\s*)?(.*)\\)\\s*$';
    SET strTransformFunc     =     REGEXP_REPLACE(strTransform, strTransformRegex , '$1');
    SET strTransformDistinct = IF (REGEXP_REPLACE(strTransform, strTransformRegex, '$2') = "", "", "DISTINCT ");
    SET strTransformVal      =     REGEXP_REPLACE(strTransform, strTransformRegex, '$3');
    -- select @strTransformFunc, @strTransformDistinct, @strTransformVal;

    -- all the aggregates from https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html
    IF NOT (strTransformFunc REGEXP '^(AVG|BIT_AND|BIT_OR|BIT_XOR|COUNT|'
        'GROUP_CONCAT|JSON_ARRAYAGG|JSON_OBJECTAGG|'
        'MAX|MIN|STD|STDDEV|STDDEV_POP|STDDEV_SAMP|'
        'SUM|VAR_POP|VAR_SAMP|VARIANCE)$') THEN
       -- MESSAGE_TEXT cannot exceed 128 chars
       -- see https://dev.mysql.com/doc/refman/8.0/en/signal.html#signal-condition-information-items
       SET strErrorMessage = CONCAT('Unknown TRANSFORM aggregate "', strTransformFunc,
         '"; expected aggregate e.g. SUM, MIN, MAX, COUNT, GROUP_CONCAT');
       SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = strErrorMessage;
    END IF;
    IF NOT (strFormat REGEXP '^(SQL|RESULT)$') THEN
       SET strErrorMessage = CONCAT('Unknown FORMAT "', strFormat, '"; expected SQL or RESULT');
       SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = strErrorMessage;
    END IF;


    -- Find the column headings
    SET strSqlSubQuery = CONCAT(
        'SELECT DISTINCT ', strPivotCol, ' AS val ',
        ' FROM ', strFrom, ' ',
        IF(strWhere = '', '', CONCAT('\n WHERE ', strWhere)),
        ' ORDER BY ', strPivotOrderBy);
    -- select @subq;

    -- escape quotes, backslashes inside val
    SET strSqlQuoteVal = "'\"', REGEXP_REPLACE(val, '([\"\\\\\\\\])', '\\\\\\\\$1'), '\"'";
    SET strSqlConcat = CONCAT (
      "CONCAT('",
      strTransformFunc, "(", strTransformDistinct,
      "IF(", strPivotCol, " = ', ", strSqlQuoteVal, ", ', ", strTransformVal, ", NULL)) AS ', ", strSqlQuoteVal,
     ")");

    -- select strSqlQuoteVal, strSqlConcat;

    SET @strSqlSelect1 = CONCAT(
        'SELECT GROUP_CONCAT(', strSqlConcat, ' SEPARATOR ",\n") INTO @strPivotSelectColsCsv',
        ' FROM ( ',strSqlSubQuery, ' ) AS top',
        IF(strPivotIn = '', '', CONCAT('\n WHERE val IN (', strPivotIn, ')'))
    );
    -- select @strSqlSelect1;

    PREPARE _sql FROM @strSqlSelect1;   -- seems to require a session var here
    EXECUTE _sql;                       -- intermediate step: build SQL for columns
    DEALLOCATE PREPARE _sql;

    -- construct the query and perform it
    SET @strSqlSelect2 = CONCAT(
        'SELECT ',
            strSelectColsCsv, ',\n',
            @strPivotSelectColsCsv,
        '\n FROM ', strFrom, ' ',
        IF(strWhere = '', '', CONCAT('\n WHERE ', strWhere)),
        '\n GROUP BY ', strSelectColsCsv, -- @TODO remove any 'AS' aliases
        IF(strOrderBy = '', '', CONCAT('\n ORDER BY ', strOrderBy)),
        '\n'
    );

    IF strFormat = 'SQL' THEN
      SELECT @strSqlSelect2 AS `sql`;    -- the statement that generates the result
    ELSE
      PREPARE _sql FROM @strSqlSelect2;
      EXECUTE _sql;                     -- the resulting pivot table ouput
      DEALLOCATE PREPARE _sql;
    END IF;

END;
$$


    And because Microsoft products used hungarian notation for everything, and I cut my teeth on Microsoft Access.
    if you don't include IFPS, or Paradox for DOS.
    Which means I also give tables names like tblTableName instead of TABLE_NAME, which I continue to insist is more readable.
    Unlike these footnotes.
    Which I've just created a small jquery plugin around for no good reason.





Add a Comment

Your email address will not be published. Required fields are marked *