Articles by Peter

You are currently browsing Peter’s articles.

Another thing that happened to me since 2009 was home ownership: bought a place and put my name on a serious mortgage contract.  As I go about doing my daily business, I often wonder: how much of this home do we actually own, and how much do we still owe??

What came immediately to mind is the fundraiser tracker.  You’ve probably seen something like this in conjunction with your alma mater, local animal shelter, or kid’s day care asking you for money:



I wanted to create a visualization like that, in Excel, next to my family’s finances.  Like a true engineer, I absolutely hate manually updating anything, so this had to be automatable.  How do I do it?



Let me spare you of the gruesome details of my experimental process, and just say that you can draw pictures with bar charts.  Wanna draw a circle with an absolutely rectangular bar chart?  Let’s do it visually:

Step 1 – Create a stacked bar chart.  The number of stacks that you need depends… and trust me for now that you need 4 for a circle:


Step 2 – Make sure you have enough bars.  100 is always a good starting point in my opinion.


Step 3 – Set the gap between bars to 0%.  Now this chart looks like some European country’s flag.  Still quite rectangular:


Step 4 – Adjust the height of these 100 bars so they look like this.  Note how some stacks are much taller than others:


Step 5 – Resize your chart into a square.  Clean up the colors and update the axes ranges, and…


Ta-da!  Here’s the circle!  Don’t like how pixelated it is?  Add more bars!



The key here is to think of a picture as a rectangle with w-by-h pixels.  Then you can replicate it with w stacked bars each having a maximum height of h.  How many stacks you need depends on how complicated your graph is.  Is it possible in theory to draw a Mickey Mouse?  Sure, but I intend to stick with simpler images that can be abstracted into a few layers of colors.  You’d need to either set up formulas or a lookup table for the height of these stacks on each bar.  For a circle centered on (a,b), remember the formula is (x-a)^2 + (y-b)^2 = 0.


Back to my original goal of drawing a target tracking graphic.  I decided to go with a house complete with a chimney, even though our home looks nothing like it.  It involves a handful of vertical, horizontal, and sloped lines:



The shape of my house, including where the chimney is, is entirely parameter-ized.  So by updating a few cells I can get to alternate versions like these:



Lastly, I needed to fill it in with the progress tracker, as follows.  Note that my blue bars and white bars within the house are two separate data series, with heights adjusted dynamically to the input value:




Just to illustrate how flexible this setup is, if you don’t like my vertical split, the complete/remaining portions can be divided horizontally as well:



This way, I’ve got a bar-based graphics that updates automatically.  The only thing left to do is actually paying off that mortgage!

Only because my soon-to-be-former coworker Monica asked about this topic, I decided to dust off this blog (finding that login was no easy task!).  It’s hard to believe but the date stamp on the most recent post was… six years ago!  It’s not that I’ve forgotten about this blog, but I’m borderline at a point in life where you can’t teach an old dog new tricks.  And when I don’t learn new tricks, it’s hard to blog about them.

Well, I did learn one thing more recently than 2009: bar charts.  Remember when we all first learned about these fundamental expressions of statistics, like, in fourth grade?  Drawing the axes with a pencil and measuring out the height of each bar with a ruler was among the most painful part of any math class that I can remember… and yet, this two-post series is about those.  A steroids-injected version of those boring bar charts.

In its first part, we will examine how to create waterfall charts with Excel’s native capabilities.  A waterfall chart is one where floating bars are used to illustrate the progression of a number.  For example, I used one in my recent guest blog entry on Hungry for Points, to outline how my points & miles balance changed between two points in time:

Added some, subtracted some, and a new balance!


On the surface, it seems impossible to draw floating bars with Excel’s charting tool.  That may be true, but the trick is to craft an optical illusion with white bars.  The same idea can be applied to both overlapping clustered bars and stacked bars.  Let’s see how the former works:

Step 1 – Determine the step-wise cumulative value, and the increment/decrement amount:


Step 2 – Create a clustered bar chart where each step has two bars: for increments, bar 1 should be the previous cumulative amount and bar 2 should be the new cumulative amount; for decrements, it’s the opposite.  For the first and last steps representing the balances, bar 1 should be 0:

“Bar 1” is orange here.

Step 3 – Make sure bar 1 is in front and set the overlap to 100%:


Step 4 – Fill bar 1 white, or whatever your background color is:


Step 5 – Color the individual bars:


Step 6 – Add data labels to explain your chart.  These can easily be formulaic and linked to your numbers:



Voila! This may look exotic, but it really isn’t!

Stay tuned for the next article in this series where we take this technique to the next level!


For those of you in the business of peer reviewing other people’s data entries, the ability to minimize tedious checks is essential.  Today we’ll look at how =SUM(ABS(…)) is a convenient tool to reduce a simple 3-step process to a simpler 1-step process.

Your intern has just copied a large table of numbers from Worksheet1 to Worksheet2, and you want to make sure nothing went wrong in the process.  In best practice, you’d want to check not only the numbers but also the relative position of the numbers… did the intern switch some rows or columns around?  A good system of lookup formulas should be employed to ensure the integrity of this data transfer.  However, for now let’s just focus on the most basic case where all we care is that the two tables match identically.

It’s a huge table of 100 rows and 20 columns, so you really don’t want to look at each number with your bare eyes.  A typical way to check is to set up a third table of the same dimentions, and subtract the corresponding cells in the two tables:

(Worksheet3!A1) =Worksheet1!A1-Worksheet2!A1
(Worksheet3!A2) =Worksheet1!A2-Worksheet2!A2

After that, a quick glance for nonzero values will help you find places where discrepancies exist.  This is not difficult or time consuming by any means, but you still have to (1) type in a formula, (2) copy formula to the correct range, and (3) check for nonzero values.  Instead, try this array formula:

{=SUM(ABS(Worksheet1!A1:T100-Worksheet2!A1:T100))}     (Don’t forget to press CTRL+ALT+Enter)

This formula does an element-wise subtraction of the second table from the first table, and totals the absolute differences into a single number.  If the result of this formula is nonzero, then we’ve got a discrepancy problem worth further investigation – otherwise the job is done!  The ABS() function here serves an important role at making sure positive discrepancies don’t happen to offset negative discrepancies.

This article demonstrates a simple VBA method to utilize the Goal Seek tool automatically, similar to a native Excel formula.

Goal Seek is an Excel built-in tool that uses numeric approximation methods to “back solve” for an input value to a system of formulas when given a desired output value.  If you are not familiar with Goal Seek, many tutorials online can guide you to using this tool.  I hate it, and try to avoid it at all costs… but like everything else that I hate, there come times when I just have to face it like a man.

The problem?  For one, it’s not transparent, and we can’t exactly tell what Excel did in the background.  But more importantly, it’s a pain to apply and to udpate.  While all other formulas can automatically re-calculate upon changes to the input values, Goal Seek requires you to click on the menu to bring up the dialog box, then fill out three input boxes manually.

See the spreadsheet above.  Everything in blue are input values to this dynamic model, which can affect the results.  We want to set E14 to a value specified in D3, so these two cells correspond to the first two parameters of Goal Seek (“Set cell” and “To value“).  As an alternative, we put the difference between the two in F6, and set this cell to 0 (x=y is equivalent to x-y=0).  This approach will slightly simplify the coding later on, and is an easier check for the user.

Download the Sample File

The “Necessary Adjustment” in D6 plays an accomondating role in adjusting each of the values so that the final calculation reaches target.  For Goal Seek, it is the third parameter “By changing cell“.  Initially we have no idea what this adjustment should be, and leave it at 0.  At this time, F6 is showing 0.8%, indicating that the necessary criteria haven’t been fulfilled.

Let’s do a Goal Seek now – set F6 to 0 by changing D6.  If you record this into a macro, say Macro1, you can quickly repeat it by using an assigned shortcut key stroke, or hook it up with a command button for an easy UI.  A lot of people do exactly this as an enhancement to the otherwise awkward model, but it’s not good enough.  Let’s automate it.

Press Ctrl+F11 to bring up the Visual Basic Editor to see the embedding code to Macro1, and you’ll see this line of code:

Range(“F6”).GoalSeek Goal:=0, ChangingCell:=Range(“D6”)

Fair enough.  It’s almost in plain English, and shouldn’t take an VBA expert to understand.  The next step is to copy and paste this line out of Macro1, to somewhere more automatically executable.

Browse to “Sheet1” under “Microsoft Excel Objects“.  Either type or use the drop-down box to create the following VBA event:

 Private Sub Worksheet_Calculate()
 End Sub

This subroutine will be called every time the “calculate” event is triggered on this worksheet.  Depending on whether you set the calculation to Automatic or Manual in Tools->Options, this happens when any formula-based cell changes value to a new input value.  The idea is to paste the Goal Seek code here so it will be called whenever F6 may change; however, let’s keep it in a separate function for the sake of good programming practice:

Private Sub Worksheet_Calculate()
End Sub

Private Sub CheckGoalSeek()
   Range(“F6”).GoalSeek Goal:=0, ChangingCell:=Range(“D6”)
End Sub

In the Visual Basic editor, it should look like this:

Now give it a try!  Change any of the blue input values and you’ll see Goal Seek working by itself to come up with the new adjustment values.  Awesome!  We’re done now in terms of automating a tedious task.

Note: Always save your work before you test out the macro.  depending on the version of your Excel, the repetitive call to Goal Seek may result in a run-time error.  However, the next part of this code will solve the problem.

Have you noticed how slow this enhancement has made your model?  Maybe not, if you have a relatively new computer or a small workbook like the sample file.  But if you add some temporary debugging code to the function, you’ll notice how many times your function is (unnecessarily) being triggered when you change any of the input values:

Private Sub CheckGoalSeek()
   Range(“F3”).Value = Range(“F3”).Value + 1
   Range(“F6”).GoalSeek Goal:=0, ChangingCell:=Range(“D6”)
End Sub

Give it another run and, hey:

You just changed one input, and Goal Seek ran close to 2,000 times!  Why???

The subroutine Worksheet_Calculate is triggered every time any function calculates a new value.  There are six cells (in column E as well as cell F6) that depend on the changing cell D6.  As Goal Seek cycles through possible values as the input, all these six cells have to update – and in turn call up more instances of Goal Seek.

Imagine you want to paint a wall red.  Every time you see that the wall isn’t entirely red, you call up a painter to do the job.  As soon as the first painter gets to work, you realize that it’s still not entirely red, so you call up another one.  Even when you’ve got 30 painters to simultaneously working on the wall (if that’s physically possible), there might still be a white spot remaining and you make yet another phone call… chances are, the job would be complete before the last five painters arrive on site.  This algorithm gets the job done, but heck, is super inefficient.  Add a little check to the process and avoid ever calling more than one painter!

Private Sub CheckGoalSeek()
   Static isWorking As Boolean

   If Round(Range(“F6”).Value, 6) <> 0 And Not isWorking Then
      isWorking = True
      Range(“F3”).Value = Range(“F3”).Value + 1
      Range(“F6”).GoalSeek Goal:=0, ChangingCell:=Range(“D6”)
      isWorking = False
   End If
End Sub

Adding in the static variable helps flagging whether the Goal Seek function has been employed.  The first time this function is called, it raises the flag and will not release it until the full iteration of Goal Seek completes.  During this time, this subroutine would still be called 2,000 instances, but it would simply bypass all the code and exit.  Try again and F3 increments only once per change of input values, indicating that the Goal Seek isn’t running wild to waste your computer’s resources.  That’s what we wanted, and now we can remove that debugging code.

The first part of the If statement checks whether the solution already exists – i.e. if Worksheet_Calculate is being triggered by some formula updates that are completely irrelevant to this system.  If so, there’s no need to call Goal Seek at all.

Here’s another catch: if you want a robust model, you’ve gotta consider any potential for error.  How about an invalid input value?  Keeping the target overall raise at 5% and changing the max raise to 0% is for sure an impossible scenario, and Goal Seek goes crazy as follows:

We realize that wasn’t very smart and change the max back to 6%… and geez it doesn’t work!  The adjustment factor had been thrown too far off anything reasonable, and now Goal Seek doesn’t know what to do with such an awkward initial guess.  Our model is stuck, and that ain’t cool.  A quick way to fix this is to reset the initial guess every time we call Goal Seek:

Private Sub CheckGoalSeek()
   Static isWorking As Boolean

   If Round(Range(“F6”).Value, 6) <> 0 And Not isWorking Then
      isWorking = True
      Range(“D6”).Value = 0
      Range(“F6”).GoalSeek Goal:=0, ChangingCell:=Range(“D6”)
      isWorking = False
   End If
End Sub

It’s a slight waste of resources if educated and reasonable input values can be expected at all times, but definitely a worthy investment if you want to prevent ever getting “stuck”.

Lastly, it’s dangerous to use absolute cell references in macros.  If columns or rows are added or removed, the key cells would have changed locations (from D6 to D7, for example) and the macro will get confused!  The best practice is to name the cells and refer to them by name, like the following:

Private Sub CheckGoalSeek()
   Static isWorking As Boolean

   If Round(Range(“GoalSeekCell“).Value, 6) <> 0 And Not isWorking Then
      isWorking = True
      Range(“ByChangingCell“).Value = 0
      Range(“GoalSeekCell“).GoalSeek Goal:=0, ChangingCell:=.Range(“ByChangingCell“)
      isWorking = False
   End If
End Sub

Now we have a robust macro that automatically updates Goal Seek!  Enjoy!

Download the Sample File


With the debates heating up more than ever, we step closer to the General Election each day.  The question of the day is – will you vote?

I love democracy.  I’m also a statistician.  Mixing those two and you get the following conclusion: allowing every single citizen to vote is an unnecessary waste of the nation’s resources, which is therefore, unpatriotic.

Let’s make some simplifying assumptions regarding the election process:

  • Popular vote is what determines the outcome – let’s assume no electoral college here.  It’s how election should be done, anyway.
  • There are only two candidates, and every voter will vote for one or the other.  This is almost true in the United States, with less than 1% of the popular votes going toward a non-Republican non-Democrat in 2004.

Now, let’s see the following scenario.  121 million citizens voted in the 2004 presidential election and mattered (no offense to those voting for Nader, Badnarik, etc… but they really didn’t count for much).  51.2% of them voted for Bush, who continued to become our president for the next four years.  The remaining 48.8%, of course, wishfully voted for Kerry.  Without going far into the statistical details, we know that if we drew a random sample of 2.6 million voters, we can be 99.9% confident that their voting outcome will be within 0.1% of the entire country’s voting results.

Download the Sample File

I made a simple Sample Size Calculator based on the formulas from this website.  It takes four parameters regarding the population size, estimate of the result, and the required confidence level, and gives you the necessary sample size.  The noteworthy formula used is the inverse normal function:

=NORMINV(probability, mean, standard_dev)

It gives you the value below which the distribution is expected p% of the time.  Use 0 for mean and 1 for standard deviation to get the normalized z value.  Do take caution to adjust the two-tailed confidence level to one-tailed.

The variables and formulas for this problem include:

N = size of the population (number of eligible voters)
p = expected % outcome (% of the population voting for Democrat or Republican); assume 50% for the “worse case” scenario
c = confidence interval expressed as a %
CL = confidence level, as a %
Z = NORMINV(1-(1-CL)/2, 0, 1) = the number of standard deviations that we are CL% confident the result is within
ss = (Z² * (p) * (1-p)) / c² = the sample size needed for an infinite population
ssf = ss / (1 + (ss-1) / N) = the sample size needed for a finite population with size N

Another thing to point out is that the statistical significance heavily depends on the expected outcome p.  The closer p is to the mid-point (50%), the less confidence we have and the larger the sample size we require.  As historically observed, the U.S. voters are split fairly evenly down the middle.  However, that’s frequently not the case when you examine voting results by state or county.  When a locality is as heavily as 80% leaning toward red or blue, we’d only need a very small sample of voters to be very confident about the results.

If you aren’t familiar with the concept of confidence intervals, this article probably isn’t making much sense to you.  Otherwise, it’s proving a practical political point – in an age where everything needs to go green and at a time when everyone is on a tight budget, let’s not waste all the applications, ballots, and time by allowing every citizen to vote!  Instead, we should randomly select 2.2% of the eligible voters in the country and rest assured that their votes are nearly 100% representative of ALL Americans!

 Download the Sample File


Hours after Ron bought his new favorite car, he applied at the dealership to become a salesperson.  “I built a great car selection tool, now I can use it to make good business.”

But he knows something was not quite enough – a good interactive user interface should further remove the user from the raw data.  And he is right.  As his experienced dealer superior recalls, if you give too many detailed choices to the average customers, they will get too stressed out and freak out.

So Ron decides to build a new UI – one that allows the user to select among the car options and displays the specs (and price) for the car of interest, one at a time.

Download the Sample File

First, he makes a list of cars currently available at the dealership.  With ActiveX Control’s Option Button (Alt L > I > O), the basic selector is created as follows:

In this case, like before, he links each of the Option Buttons to a cell to the right of the car name.  Note that this could be a tedious process once the list gets long.  Consider using VBA to automate the task.

To develop a good UI, it’s important to have an interactive way to distinguish the selection.  Note how your browser changes the color of a link that you click on?  Ron decides to first format this entire table to the de-selected look.  The column of TRUE/FALSE values might confuse the customer, so he hides it by changing the font color to white.  Note that the graying out of fonts could add contrast to enhance your selection, but it’s a matter of personal style.

Now, he adds conditional formatting to the table.  In this case, one set of rules is applied to each of the three columns.  Excel 2007 gives you this really neat Rules Manager for conditional formats, but the same effect can be done in earlier versions of Excel.

This is what the result looks like.  When the customer clicks on a car of desire, the entire selection lights up!

Lastly, he uses a combination of lookup functions to create a detailed table of specs to display on the right:

Download the Sample File

With such an interactive UI, obviously, Ron’s car dealer business is taking off!


I’m dedicating this post to the first person to compliment on the Excel Workbook blog…

Some time has passed since Ron‘s Beijing Olympics post.  It’s now 2020 and Laser Tag has become an official Olympics sport.  The globally famed Laser Tag champion Wang Y. blew her way through the games with amazingly acrobatic shooting, docking, and sniping skills.  After the thrilling final round where all athletes gave their 101%, you as the game admin recorded the scores as follows:

Download the Sample File

Having studied Ron‘s medal comparison model for the past three Olympics, you now have full appreciation for the neat user interfaces that Excel allows you to create.  Now you want to use that knowledge to create an entirely new and personalized score reporting system.  Here’s what you have in mind:

  • Scores will be presented in a bar graph
  • Each participating athlete will receive a personalized score card
  • Each athlete will be listed first on her score card with name spelled out on the graph, in honor of her excellent sportsmanship
  • All the other athletes will be listed in alphabetic order by country code; their names will not be shown

For example, the score card for China’s Wang Y. should look like this – her name, country, and score are listed on top, followed by the other countries in order: Australia, Brazil, (skipping China), Germany, and so on.

While this task could be done fairly quickly by hand, it could get very tedious.  You are a busy game admin after all, and don’t have the time to personalize 10 score cards manually.  You already know how to locate the chosen athlete, but how do you re-order the remaining list?

The INDEX() function comes in handy.  It returns the value of an element in a table or an array, selected by the row and column number indexes.  It often complements the MATCH() function, and can be considered as an inverse function to MATCH().  Note how it’s accomplishing the same task as the OFFSET() function used for Beijing Olympics.  Now you write:

          [score array],
          CurrentItemPosition – (CurrentItemPosition<=ChosenItemPosition)*1

The second argument determines whether to shift the reference by one row.  When China is selected (position 3), the 1st and 2nd countries need to be shfited into positions 2 and 3, but all other countries in the list will stay put for positions > 3.

Now you have a neat user interface that generates a unique score card as easily as a drop-down menu selection.  Many people resort to writing macros for this type of tasks, but they can more often than not be done with a relatively simple set of formulas.

The following is the score card for the USA athlete, who won silver.

Congratulations to all!

Download the Sample File

While arrays are great for grouping a system of formulas, their real advantage is to enable complex subtotals and lookups that ordinary Excel functions fail to support, through a technique known as the array sum or the conditional sum.

The general syntax is:

     [condition 1]*
     [condition 2]*…
     [value 1]*…

Note the brackets enclosing the formula.  To enter the formula as an array, press CTRL+ALT+Enter.

Each of the conditions and values is an array of corresponding size (equal size most of the time, but not necessarily).  The enclosing SUM() function and the array brackets serve as a SUMPRODUCT().  Note that

=SUMPRODUCT([array 1], [array 2], …)

is equivalent to

{=SUM([array 1]*[array 2]*…)}

The array sum multiplies the value array by one or more condition arrays, which contain 0’s and 1’s (FALSE and TRUE).  These condition arrays serve as filters (or masks) to zero-out values not corresponding to the specified conditions.  By picking up the remaining values this way, we can take SUMIF(), COUNTIF(), VLOOKUP() and other functions beyond one condition and one dimension.

Below is an example of box office sales records.  To make sense of this data, we’d often need to analyze subsets of the records.

Download the Sample File

With SUMIF() and COUNTIF() functions, we can easily obtain a subtotal for all sales in July.  We’ll practice using an array sum instead:

     ([month column]=”July”)*
     ([sales column])

Note that by omitting the [sales column], we’d be counting the number of records instead of summing their values.

Moving on, let’s get a subtotal for all June sales in Roanoke.  SUMIF() and COUNTIF() fail to be useful here.

     ([month column]=”June”)*
     ([city column]=”Roanoke”)*
     ([sales column])

Similarly, we can add a third condition (or in fact, as many as necessary):

     ([month column]=”August”)*
     ([city column]=”New York”)*
     ([movie column]=”Transformers”)*
     ([sales column])

In the last example, only one record satisfies all conditions.  The result is similar to that of a VLOOKUP(), except of course, we have a lot more freedom by using an array sum.  Sometimes, you might find yourself needing to specify more than one condition in a lookup and not wanting to double count any potential duplication of records.  What you can do is to use a MAX() function in place of the SUM() to limit the result to one record.

Here is a typical data snapshot summarizing the subtotals in a two-dimensional table:

Lastly, let’s see how an array sum can obtain subtotals (or single-value lookups) in such a two dimensional array.  The trick is to line up the dimensions of the condition and value arrays:

     ([3×1 month column]=”July”)*
     ([1×3 movie row]=”Batman”)*
     ([3×3 sales table array])

This article provides an introduction to the more advanced methods of using Excel formulas.  Once you get familiar with these methods, you will find a much more satisfying experience analyzing data in Excel.

Download the Sample File


Since many of this blog’s posts will be about array-based tricks, it might be a good idea to have a proper introduction to Excel’s array formulas.

By default, Excel’s functions and formulas take one or more input values and produce a single output value. Think of the typical multiplication and SUMPRODUCT() function.  When solving a more complex system of problems, however, it is often desired to process an array of similar values concurrently.  Hence the array formulas.

Download the Sample File

Let’s start with the real basics.  You’re the manager of a video game store, and for recordkeeping, you’ve just created a spreadsheet listing the prices of your inventory of video game consoles.  Now you want the identical price listing to appear on another part of the spreadsheet, and here’s what you can do:

  1. Select an area identical in size to the source data array (in this case, 3 rows x 1 column)
  2. Type the “=” to get a formula started, then drag to select the source data array
  3. Press Ctrl+Shift+Enter to complete your first array formua

Note that, in the formula bar, a pair of brackets {} enclose your formula.  This tells you that it’s recognized as an array formula.

Note: if you pressed the Enter key without holding down Ctrl and Shift, the brackets would not appear and only the first element in the array would be recognized.  Additionally, you cannot type in those brackets.

An array does not have to be a one-dimensional vector.  You can also reproduce an entire table by employing a two-dimentional array (3 rows x 2 columns):

So what now?  Pretty boring and useless, huh?  For this type of tasks you would typically use a simple “=” formula and copy it across the table.  In fact, we’d usually recommend that as well.  But this approach has its advantages – because the entire array (1D or 2D) is produced by one formula, you cannot change any of its elements (try typing over any of the cells… Excel won’t let you).  When processing large amounts of data, it’s not always easy to check for human errors, i.e. if your coworker’s son messed up one of your 5,000 rows of identical formulas.  Array formulas help ensure data integrity.

Moving on… now it’s the end of the month and your intern compiled the sales records for the store.  But he listed the consoles horizontally on three rows instead of vertically!  To fix this, you use the TRANSPOSE() function to “rotate” the table… turning rows into columns and columns into rows.  This is a very handy function that’s strictly used with arrays.

To turn this 2×4 table into a 4×2 table

Use the TRANSPOSE() function… don’t forget, Ctrl+Shift+Enter!

Okay, before we wrap up this post, let’s look at something actually useful, i.e. more than simply duplicating values.  Given your price list (a 3×1 array) and your intern’s sales records (a 1×3 array), you now want to calculate the total revenue.  SUMPRODUCT() comes to mind since you want to multiply the price by unit for each product, and add them all together.

However, SUMPRODUCT() requires all of its input parameters to be of identical size.  If you simply entered in the row array and the column array, you’d get a #VALUE! error.  You could TRANSPOSE() one of the two arrays to match the dimension of the other, or you could simply use TRANSPOSE() within your SUMPRODUCT():

{=SUM([array1], TRANSPOSE([array2]))}

Note the brackets – don’t forget to press Ctrl+Shift+Enter.  This calculation gives you the total revenue by first turning the two arrays into the same dimension, and then apply the SUMPRODUCT() function.

Download the Sample File

In the next article, we’ll talk about array sums, which lead into a whole wide world of possibilities.

This article discusses an issue on rounding error, specifically related to breaking down a total amount with a trend/inflation rate.  It’s more of a math problem than an Excel problem, and if you work in the healthcare industry, this problem probably will look familiar to you.

Note: “Trend” here is assumed to be a compounding annual cost inflation.  When “trending” an amount from one period to another, the length of time is measured between the mid-points of the two periods.

Suppose you made a projection for your client’s healthcare spendings to be $12 million in the upcoming calendar year, assuming 10% annual trend.  Now the client wants to break that down to monthly figures.

The answer is not $1 million for each of the 12 months.  The force of trend makes this spending an increasing function over time, i.e. February should cost slightly more than January, and so on.  The industry-standard formula is to apply the 10% trend, from the mid-point of the year to the mid-point of each month, i.e.

Monthly Amount = Annual Amount / 12 * (1 + Trend) ^ ([mid month] – [mid year])

While this is a perfectly valid and defendable approach, the results always have a problem – the “rounding error”.  As seen in the table below, adding the 12 months of breakdown back together produces a slightly higher sum than anticipated.  This error is usually small ($4,511 in this case, < 0.1%) and people tend to overlook such insignificant imperfection.

Numbers don’t add up

Note: This type of difference is widely known in the industry as a “rounding error”, but technically it isn’t.  Rather, it is a methodological inaccuracy.

But not everybody can tolerate imperfections – certain demanding bosses/clients could get very upset if your basic numbers don’t add up!  To correct this situation, I’ve seen people throw that $4,511 in one of the months, break it out evenly across the 12 months, or use some GoalSeek magic to add a margin to each number until the total comes out right.  None of these methods is pretty.

Take a step back and think like a Calculus student again… the trending cost can be represented as the area under curve of an exponential function, and the aforementioned approach to monthly amounts is a discrete numerical approximation of the integral!

The pink bars are the results of the previous method, which closely but not perfectly approximate the area under the blue (exponential) curve.

Now, if you can take a basic anti-derivative from point a to point b (one month to another), you’ll have a solution that’s better than a good approximation:

Monthly = Annual / Trend * ((1 + Trend)^(i/12) – (1 + Trend)^((i-1)/12))

The resulting formula is difficult to understand when you look at it for the first time.  However, once you understand the concepts, it’s very easy to explain.  The monthly breakdown sum up beautifully to the exact $12 million, so you won’t have to do any unsightly tweaking!

Numbers add up perfectly!

Download the Sample File

The sample file contains a hand-written derivation of the formula.


« Older entries