Back to Top

Monday, April 06, 2009

NoCOUG SQL challenge

2724771924_3bd95fe601_o NoCOUG (which stands for Northen California Oracle Users Group) published an SQL challenge [PDF]: using SQL determine the probability of achieving a given number by throwing a non-balanced dice N times.

Being a PostgreSQL fanboy that I am, I’ve given a try with PG. Here are the results:

To create the table and populate it (note the 1.0 notation, otherwise it does integer arithmetic):


CREATE TABLE die
(
  face_id integer NOT NULL,
  face_value integer NOT NULL,
  probability double precision NOT NULL,
  CONSTRAINT die_pkey PRIMARY KEY (face_id)
);

INSERT INTO die(face_id, face_value, probability) VALUES
	(1, 1, 1.0/6 + 1.0/12),
	(2, 3, 1.0/6 + 1.0/12),
	(3, 4, 1.0/6 + 1.0/12),
	(4, 5, 1.0/6 - 1.0/12),
	(5, 6, 1.0/6 - 1.0/12),
	(6, 8, 1.0/6 - 1.0/12);

And here is a stored procedure written in plpgsql to calculate the probabilities:


CREATE OR REPLACE FUNCTION calc_probs(depth integer) RETURNS SETOF die AS $$
BEGIN
  IF (depth <= 1) THEN
    RETURN QUERY SELECT * FROM die;
  ELSE
    RETURN QUERY
      SELECT 
          MIN(A.face_id) AS face_id,
          A.face_value + B.face_value AS face_value, 
          SUM(A.probability * B.probability) AS probability 
        FROM die A, (SELECT * FROM calc_probs(depth - 1)) B
        GROUP BY 2;
  END IF;
END;
$$ LANGUAGE plpgsql;
SELECT face_value, probability FROM calc_probs(100) ORDER BY face_value;

I didn’t do any benchmarks, but it should be quite fast. One optimization you could do for such functions in production is to declare it STABLE (or an unsafe optimization: declare it IMMUTABLE if the underlying table changes very infrequently). From the documentation:

STABLE indicates that the function cannot modify the database, and that within a single table scan it will consistently return the same result for the same argument values, but that its result could change across SQL statements. This is the appropriate selection for functions whose results depend on database lookups, parameter variables (such as the current time zone), etc.

Finally, here is a solution using the CTE (Common Table Expression) feature from the upcoming 8.4 release (you can think of the CTE’s like dynamically defined VIEWS – for more details about them you can start at the following links: CTEReadme, Common Table Expressions (WITH and WITH RECURSIVE) and Waiting for 8.4 - Common Table Expressions (WITH queries)):


WITH RECURSIVE p AS (
    SELECT 1 AS throws, face_value, probability FROM die
UNION ALL
    SELECT B.throws + 1 AS throws,
        A.face_value + B.face_value AS face_value,
        A.probability * B.probability AS probability
        FROM die A, p B
        WHERE B.throws < 3
) SELECT face_value, SUM(probability) FROM p
    WHERE throws = 2 GROUP BY face_value ORDER BY face_value;

This solution is a little less optimal because it does the GROUPing at the end, but I wasn’t able to include it in the inner select (it kept giving me an error saying that grouping is not supported on the recursive part of the queries). It also makes you repeat the number of dice throws twice. It is possible that it can be written better (this is the first time I’ve experimented with the particular feature) or that it just isn’t suitable for these types of queries.

Toying around with this challenge was fun and it certainly shows that PostgreSQL is on par with most of Oracle’s features.

Picture taken from TooFarNorth's photostream with permission.

2 comments:

  1. Oracle does not yet have recursive CTEs so in that respect, PostgreSQL is ahead of Oracle. Oracle does have “table functions” though; that is, functions that return multiple rows of data. It is believed that Oracle will introduce recursive CTEs in version 11.2--a.k.a. 11gR2--which isexpected to be announced at Oracle OpenWorld in October of this year.

    PostgreSQL was not included in the NoCOUG SQL challenge in order to ease the burden on the judges but your recursive solution is excellent and easily translated into Oracle PL/SQL. Congratulations on noticing that grouping can be performed after each iteration in order to keep the calculations manageable. But PostgreSQL will probably balk when the recursion gets too deep. I believe that Oracle does not support more than 50 levels of recursion.

    If you're like to do some benchmarking, here are some results for the case of an unbiased conventionally numbered 20-sided die, each face having probability 0.05. Only the mode is shown; that is, the sum that has the highest probability.

    Throws Mode Probabilitity
    128 1344 0.006107973793216340000000000000
    256 2688 0.004321538266140720000000000000
    512 5376 0.003056689677902580000000000000
    1024 10752 0.002161724362372430000000000000
    2048 21504 0.001528682501663810000000000000
    4096 43008 0.001080981552024420000000000000
    8192 86016 0.000764383452856994000000000000

    Regards,

    Iggy Fernandez

    ReplyDelete
  2. The Second International NoCOUG SQL Challenge has been published in the February 2011 issue of the NoCOUG Journal. The Journal can be downloaded from http://bit.ly/gVNZsW. SQL commands for creating the required data are available at http://bit.ly/g58WVn.

    ReplyDelete