Array Sums

You are currently browsing the archive for the Array Sums category.

For those of you in the business of peer reviewing other people’s data entries, the ability to minimize tedious checks is essential.  Today we’ll look at how =SUM(ABS(…)) is a convenient tool to reduce a simple 3-step process to a simpler 1-step process.

Your intern has just copied a large table of numbers from Worksheet1 to Worksheet2, and you want to make sure nothing went wrong in the process.  In best practice, you’d want to check not only the numbers but also the relative position of the numbers… did the intern switch some rows or columns around?  A good system of lookup formulas should be employed to ensure the integrity of this data transfer.  However, for now let’s just focus on the most basic case where all we care is that the two tables match identically.

It’s a huge table of 100 rows and 20 columns, so you really don’t want to look at each number with your bare eyes.  A typical way to check is to set up a third table of the same dimentions, and subtract the corresponding cells in the two tables:

(Worksheet3!A1) =Worksheet1!A1-Worksheet2!A1
(Worksheet3!A2) =Worksheet1!A2-Worksheet2!A2

After that, a quick glance for nonzero values will help you find places where discrepancies exist.  This is not difficult or time consuming by any means, but you still have to (1) type in a formula, (2) copy formula to the correct range, and (3) check for nonzero values.  Instead, try this array formula:

{=SUM(ABS(Worksheet1!A1:T100-Worksheet2!A1:T100))}     (Don’t forget to press CTRL+ALT+Enter)

This formula does an element-wise subtraction of the second table from the first table, and totals the absolute differences into a single number.  If the result of this formula is nonzero, then we’ve got a discrepancy problem worth further investigation – otherwise the job is done!  The ABS() function here serves an important role at making sure positive discrepancies don’t happen to offset negative discrepancies.

While arrays are great for grouping a system of formulas, their real advantage is to enable complex subtotals and lookups that ordinary Excel functions fail to support, through a technique known as the array sum or the conditional sum.

The general syntax is:

     [condition 1]*
     [condition 2]*…
     [value 1]*…

Note the brackets enclosing the formula.  To enter the formula as an array, press CTRL+ALT+Enter.

Each of the conditions and values is an array of corresponding size (equal size most of the time, but not necessarily).  The enclosing SUM() function and the array brackets serve as a SUMPRODUCT().  Note that

=SUMPRODUCT([array 1], [array 2], …)

is equivalent to

{=SUM([array 1]*[array 2]*…)}

The array sum multiplies the value array by one or more condition arrays, which contain 0’s and 1’s (FALSE and TRUE).  These condition arrays serve as filters (or masks) to zero-out values not corresponding to the specified conditions.  By picking up the remaining values this way, we can take SUMIF(), COUNTIF(), VLOOKUP() and other functions beyond one condition and one dimension.

Below is an example of box office sales records.  To make sense of this data, we’d often need to analyze subsets of the records.

Download the Sample File

With SUMIF() and COUNTIF() functions, we can easily obtain a subtotal for all sales in July.  We’ll practice using an array sum instead:

     ([month column]=”July”)*
     ([sales column])

Note that by omitting the [sales column], we’d be counting the number of records instead of summing their values.

Moving on, let’s get a subtotal for all June sales in Roanoke.  SUMIF() and COUNTIF() fail to be useful here.

     ([month column]=”June”)*
     ([city column]=”Roanoke”)*
     ([sales column])

Similarly, we can add a third condition (or in fact, as many as necessary):

     ([month column]=”August”)*
     ([city column]=”New York”)*
     ([movie column]=”Transformers”)*
     ([sales column])

In the last example, only one record satisfies all conditions.  The result is similar to that of a VLOOKUP(), except of course, we have a lot more freedom by using an array sum.  Sometimes, you might find yourself needing to specify more than one condition in a lookup and not wanting to double count any potential duplication of records.  What you can do is to use a MAX() function in place of the SUM() to limit the result to one record.

Here is a typical data snapshot summarizing the subtotals in a two-dimensional table:

Lastly, let’s see how an array sum can obtain subtotals (or single-value lookups) in such a two dimensional array.  The trick is to line up the dimensions of the condition and value arrays:

     ([3×1 month column]=”July”)*
     ([1×3 movie row]=”Batman”)*
     ([3×3 sales table array])

This article provides an introduction to the more advanced methods of using Excel formulas.  Once you get familiar with these methods, you will find a much more satisfying experience analyzing data in Excel.

Download the Sample File


Since many of this blog’s posts will be about array-based tricks, it might be a good idea to have a proper introduction to Excel’s array formulas.

By default, Excel’s functions and formulas take one or more input values and produce a single output value. Think of the typical multiplication and SUMPRODUCT() function.  When solving a more complex system of problems, however, it is often desired to process an array of similar values concurrently.  Hence the array formulas.

Download the Sample File

Let’s start with the real basics.  You’re the manager of a video game store, and for recordkeeping, you’ve just created a spreadsheet listing the prices of your inventory of video game consoles.  Now you want the identical price listing to appear on another part of the spreadsheet, and here’s what you can do:

  1. Select an area identical in size to the source data array (in this case, 3 rows x 1 column)
  2. Type the “=” to get a formula started, then drag to select the source data array
  3. Press Ctrl+Shift+Enter to complete your first array formua

Note that, in the formula bar, a pair of brackets {} enclose your formula.  This tells you that it’s recognized as an array formula.

Note: if you pressed the Enter key without holding down Ctrl and Shift, the brackets would not appear and only the first element in the array would be recognized.  Additionally, you cannot type in those brackets.

An array does not have to be a one-dimensional vector.  You can also reproduce an entire table by employing a two-dimentional array (3 rows x 2 columns):

So what now?  Pretty boring and useless, huh?  For this type of tasks you would typically use a simple “=” formula and copy it across the table.  In fact, we’d usually recommend that as well.  But this approach has its advantages – because the entire array (1D or 2D) is produced by one formula, you cannot change any of its elements (try typing over any of the cells… Excel won’t let you).  When processing large amounts of data, it’s not always easy to check for human errors, i.e. if your coworker’s son messed up one of your 5,000 rows of identical formulas.  Array formulas help ensure data integrity.

Moving on… now it’s the end of the month and your intern compiled the sales records for the store.  But he listed the consoles horizontally on three rows instead of vertically!  To fix this, you use the TRANSPOSE() function to “rotate” the table… turning rows into columns and columns into rows.  This is a very handy function that’s strictly used with arrays.

To turn this 2×4 table into a 4×2 table

Use the TRANSPOSE() function… don’t forget, Ctrl+Shift+Enter!

Okay, before we wrap up this post, let’s look at something actually useful, i.e. more than simply duplicating values.  Given your price list (a 3×1 array) and your intern’s sales records (a 1×3 array), you now want to calculate the total revenue.  SUMPRODUCT() comes to mind since you want to multiply the price by unit for each product, and add them all together.

However, SUMPRODUCT() requires all of its input parameters to be of identical size.  If you simply entered in the row array and the column array, you’d get a #VALUE! error.  You could TRANSPOSE() one of the two arrays to match the dimension of the other, or you could simply use TRANSPOSE() within your SUMPRODUCT():

{=SUM([array1], TRANSPOSE([array2]))}

Note the brackets – don’t forget to press Ctrl+Shift+Enter.  This calculation gives you the total revenue by first turning the two arrays into the same dimension, and then apply the SUMPRODUCT() function.

Download the Sample File

In the next article, we’ll talk about array sums, which lead into a whole wide world of possibilities.

How do you use an array formula to sum up element-wise minimum values?

Suppose you plan to buy several computer parts from either BestBuy or Circuit City.  The two stores have different prices for each product.  If you buy from one store exclusively, you won’t get the best deals possible.  To optimize the value of your hard-earned dollar, you want to find the least amount of money necessary to buy one of each part.

99.5% of the time, you’d set up a spreadsheet similar to the one shown below.  By using a MIN() function, you create a third column listing the lower price of each item.  Then it’s a no-brainer to add up the column for the total.

Download the Sample File

Occassionally, you may want/need to calculate this number in one single step.  Array formulas came to mind and it seems intuitive to write {=SUM(MIN([array 1], [array 2]))}.  However, that doesn’t work because MIN([array 1], [array 2]) does not give you an array of element-wise minimum values… instead, it gives you a single minimum value between the two arrays.

To fix this problem, you can employ the IF() function within your array formula.  This is short, clean, and easy to understand:

{=SUM(IF([array 1]>[array 2], [array 2], [array 1]))}

Or if you prefer to use the mask/filter approach, your formula could look like this:

{=SUM([array 1]*([array 1]<[array 2]) + [array 2]*([array 1]>=[array 2]))}

The ([array 1]<[array 2]) part creates a mask of 0’s and 1’s, which is used in a similar manner to the typical SUMIF-type of array formulas.  The two approaches use the same logic, and the only difference is the appearance.

A problem is always simple when you already have the solution, but believe it or not, I came up with the most convoluted approach and a super long formula when this problem was first presented to me.  I’ve included that formula in the sample spreadsheet – see if you can figure out what I was trying to do!

Download the Sample File