This is a classic problem but I was surprised to see that there’s not a great deal of info on the web on how to do a proper daily report – one that doesn’t have any gaps when data is missing. In this post I’ll guide you through building such a report using MySQL and PHP for presentation. Obviously, you will need to adapt the code for your needs, but the concepts should be valid for a wide range of similar needs.
So, one sunny morning your boss comes in and says “The system we currently have for handling online transactions has a big shortcoming; it has no reporting capabilities. I want to see how our widget sales are doing. Can I have it done by noon?”
Looking at the database, you will most likely have something similar to the sample
orders table below with (some of) the columns & associated data looking like this:
If you don’t have any table even remotely similar to this, but wish to follow this article, you can create your own:
CREATE TABLE orders ( id INT PRIMARY KEY AUTO_INCREMENT, order_date DATETIME, product_id INT, quantity INT, customer_id INT ); INSERT INTO orders (order_date, product_id, quality, customer_id) VALUES ('2009-08-15 12:20:20', '1', '2', '123'), ('2009-08-15 12:20:20', '2', '2', '123'), ('2009-08-17 16:43:09', '1', '1', '456'), ('2009-08-18 09:21:43', '1', '5', '789'), ('2009-08-18 14:23:11', '3', '7', '123'), ('2009-08-21 08:34:21', '1', '1', '456');
The simple query
Obtaining the total sales for each day is very easy:
SELECT DATE(order_date) AS date, SUM(quantity) AS total_sales FROM orders GROUP BY date
The result (based on the table above, would be
As you prepare to run to your boss and tell him that the problem is solved, you might notice the gap between some of the dates. What’s happening? Well, in this particular example, there is no data for the date
2009-08-16; on the other hand, you have to show that for some dates there were zero sales, so what do you do?
Instinctively, the answer seems to be generating a range of values (dates) on the fly via something like
GENERATE_RANGE(start_value, end_value) and
JOIN in with the dates in the table. I know quite many people new to MySQL think this way. Let me give you the bad new first: there is no such function in MySQL. There are ways to do this in MSSQL and Oracle, but even then it’s not straightforward.
The good news is that you don’t have to generate the dates on the fly – in fact you shouldn’t even want to. Think this way: why generate the same range of dates each time you run the query? Cache. Reuse. Create a calendar table. Not a temporary one; a clean honest calendar table. The benefit is that not only it’s significantly faster, but you can also add additional information to it – for example you could add a bit for holidays and some notes, like the holiday names. You can go further and mark the dates where you started special discounts to see if they had any impact on the sales. Endless possibilities.
Creating and filling the calendar table
In this article we’ll create a calendar table with just one column – the date, but as I’ve said, there’s nothing stopping you from adding more information to it.
CREATE TABLE calendar (datefield DATE)
We now need a way to fill the calendar. For this, you can use a simple stored procedure:
DELIMITER | CREATE PROCEDURE fill_calendar(start_date DATE, end_date DATE) BEGIN DECLARE crt_date DATE; SET crt_date=start_date; WHILE crt_date < end_date DO INSERT INTO calendar VALUES(crt_date); SET crt_date = ADDDATE(crt_date, INTERVAL 1 DAY); END WHILE; END | DELIMITER ;
The code is very simple. I’ve seen other (faster but less intuitive) approaches. Considering that you’ll run this sproc just once or maybe once every couple of years, saving a few miliseconds is not a big deal.
To call it, just enter the date ranges you want created:
CALL fill_calendar('2008-01-01', '2011-12-31');
Now we can redo our
JOIN-ing the dates in the
orders with the ones from the
The complete query
SELECT calendar.datefield AS DATE, SUM(orders.quantity) AS total_sales FROM orders RIGHT JOIN calendar ON (DATE(orders.order_date) = calendar.datefield) GROUP BY DATE
If you run this query, you’ll notice two flaws:
- it includes all dates in the calendar, so even those in the future will be included;
- for dates not present in
orders, the query will show
This is very easy to fix, but as usual I wanted to show you why I’m building a query in a particular way:
SELECT calendar.datefield AS DATE, IFNULL(SUM(orders.quantity),0) AS total_sales FROM orders RIGHT JOIN calendar ON (DATE(orders.order_date) = calendar.datefield) WHERE (calendar.datefield BETWEEN (SELECT MIN(DATE(order_date)) FROM orders) AND (SELECT MAX(DATE(order_date)) FROM orders)) GROUP BY DATE
So, we’re using
IFNULL function to replace
NULLs with zeroes and two subselects to restrict the interval.
The result is:
Don’t make views
At this point, if you are coming from a MSSQL, Oracle or even Access background, you might be tempted into turning the query above into a view. It makes sense to aggregate some info into a view (toal sales, total income, sales for each widget) and then query the view to focus on particular areas.
Views are horribly slow in MySQL. Without going in detail, views in MySQL offer nowhere near the sophistication you might be used to from other databases so it’s best to just avoid them.
Charting the results
At this point, you can display the aggregated data in a nice format. Ideally you’d use a flash charting solution, which has the benefit of looking great (always a bonus for the boss) and being very efficient (you just output some XML from PHP and the flash chart does the rest).
However, in this tutorial I’ll show you how to generate a cahrt using plain html. It’s not very efficient, but it works.
First, lets obtain the data:
$db_connect = mysql_connect($DB_SERVER, $DB_USERNAME, $DB_PASSWORD) or die("Could not connect");
mysql_select_db($DB_DBName) or die("Could not select database");
$dailySalesQuery = "SELECT calendar.datefield AS date,
IFNULL(SUM(orders.quantity),0) AS total_sales
FROM orders RIGHT JOIN calendar ON (DATE(orders.order_date) = calendar.datefield)
WHERE (calendar.datefield BETWEEN (SELECT MIN(DATE(order_date)) FROM orders) AND (SELECT MAX(DATE(order_date)) FROM orders))
GROUP BY date";
$dailySalesResult = mysql_query($dailySalesQuery, $db_connect) or die(mysql_error());
Now that you have the data, it’s time to build the chart:
|<? if ($v>0) echo “$v|
You will need a simple colored graphic file for the bar (in this example
If you want to output the dates below each bar, you’ll have use
mysql_data_seek($dailySalesResult, 0) to move the pointer of the results back to the beginning and output another table row. It’s not very efficient, but I couldn’t find a better solution with tables.
Below you can see a real-world example of the chart: