A few months ago, our main copy editor asked me why we publish subjective reviews and objective measurements, but don’t try to make precise correlations between what the measurements indicate and what the reviewer heard. His question is a good one, particularly as audio magazines past and present have attempted to do just that. I explained to him the various reasons why trying to do this is so difficult, and why so many have failed -- which is why we don’t do it. Because the issue is complex, and not nearly as straightforward as it at first seems, he suggested that it might be a good idea for an editorial.

Sonus Faber Olympica IIISonus Faber Olympica III in NRC’s anechoic chamber

My experiences with measuring loudspeakers began in 2000, when I was putting together our measurement program with Canada’s National Research Council (NRC), probably the best-known and most reputable body in the world for conducting such tests. All of our testing is done by them. The NRC is equipped with an excellent anechoic chamber, proper measuring equipment, and highly trained staff -- all necessary for the running of such tests.

The NRC was also home to Dr. Floyd Toole, a senior researcher in the physics division from 1965 to 1991. The groundbreaking research work that Toole began in the 1970s, and that culminated in the mid-’80s, succeeded, to a significant degree, in correlating measurements with listening impressions. Although Toole left in the 1990s to work at Harman International, some of the NRC staff he worked with remained and helped us set up our testing program.

When we were designing the program, I consulted with experts at the NRC, as well as with numerous speaker designers well versed in measuring techniques. They all told me the same thing: Even though we were putting together a great measurement program that would be superior in quality and quantity to what any other publication was presenting, and that the measurements we would be taking would tell us some useful things about the performance of speakers, we still would not be producing nearly enough measurements to tell us exactly how a speaker would sound to a listener in his room -- or even close to it. Even Toole’s own work, which involved many more measurements than we were proposing, couldn’t be used to precisely predict sound quality; in fact, the companies that adopted Toole’s recommendations, based on his research, used a combination of rigorous listening tests and measurements in their design processes.

To support what they were saying, some of these designers showed me the extent of their own programs, which, among other tests, often included a measurement of sound power, aka total radiated response -- a metric that actually is a good indicator of a speaker’s tonal balance in-room. To measure a speaker’s sound power, the speaker must be measured horizontally and vertically -- front, sides, above, below, and rear -- at over 50 intervals. Yet as revealing as measuring a speaker’s sound power can be, I know of no magazine that does it: the setup is too complex, it takes too much time, and if you’re renting an anechoic chamber -- the only proper way to do it -- it’s all just too expensive.

Paradigm Prestige 95F in chamberParadigm Prestige 95F in the chamber

The designers also showed me many other measurements, including some proprietary ones. All in all, it was eye-opening to see how much data some manufacturers collect during their design processes. In fact, if you added together all of the speaker measurements published by all audio magazines, the total wouldn’t come close to the number of measurements and the plethora of graphs produced by these designers. Yet even with what they showed me, these designers don’t rely on measurements alone to predict the sound of a speaker -- for every one of them, listening was and remains a crucial part of the design process, to ultimately determine what a speaker sounds like.

Admittedly, when I began to learn about why we shouldn’t try to read too much into all this information, I was surprised. I was still relatively new to audio reviewing (the SoundStage! Network began in late 1995), but I’d already read many reviews in which strong conclusions about the sound were drawn from measurements. What’s more, these conclusions often seemed to be arrived at on the basis of very few graphs. Were these reviewers mistaken? From what these designers and engineers were telling me, and from what I’ve experienced since then, yes -- the data used for analysis were insufficient to glean the kind of information these reviewers were presenting to their readers.

Drawing conclusions from too few data isn’t restricted to the audio industry. The other week, I watched a TV commentator negatively opining on a movie she had yet to see -- she felt she already knew enough about it to make actually watching it unnecessary. What could be more foolish? How often have you watched the news and seen people attempting to decipher the stock market by looking at only one or two facts? I see it every day. Suffice it to say, the world is full of people willing to draw conclusions -- sometimes very important ones -- based on insufficient data, and there are plenty more people willing to gobble it all up. After all, it’s easier than thinking. I’ve never wanted to be part of that; hence my position on predicting the sound of speakers based on too few data.

I learned something else in 2000 that has since proven true: If you’ve got the right setup, as we do at the NRC, taking measurements is a relatively straightforward task. But even if you take enough measurements to better correlate what you hear with what you’ve measured, as Toole did, being able to properly interpret those measurements and correlate accurately with what people hear requires loads of experience and knowledge to draw on -- far more than most reviewers have. It’s like interpreting X-rays and CAT scans: Almost anyone can draw conclusions from the images produced, but you have to have specialized knowledge and long experience to see anything of real medical significance in such images.

How does one obtain such knowledge? In high-end audio, I know of no courses or books that can teach you everything about measurements and how they correlate with what we hear; instead, I’ve found that this knowledge is usually gained through experimentation -- the kind of experimentation done by speaker designers, who work directly with this stuff every day, correlating the results of what they and others measure with what they and others hear. Such knowledge is not learned overnight.

Paul Barton and Doug SchneiderPaul Barton (left) and Doug Schneider

Besides Toole, two of the best interpreters of measurements I know of are Andrew Jones and Paul Barton. Jones, currently with Pioneer and TAD, also worked for KEF for many years, where he was deeply involved in measurement- and listening-based experimentation. Barton, of PSB, began measuring speakers at NRC in the early 1970s, and worked with Toole on his research; to this day, the two still talk with each other about listening and measurements. Jones and Barton have built their careers on attempting to correlate listening with measuring, and I enjoy sitting down and talking with them about it whenever I can. When I do, I realize that even though I’ve learned a lot about measurements -- Barton once told me that I can interpret measurements as well as or better than any other reviewer, which I take as a high compliment -- what I’ve learned is still only a small fraction of what they know and understand. Guys like these keep me in line -- and reinforce my position that trying to draw too many conclusions about how a speaker might sound based only on our measurements is the wrong way to go.

If, by now, you’re wondering whether the measurements we present on the SoundStage! Network group of sites have any use at all, the answer is yes. Here’s why.

Sonus Faber Olympia III responsesSonus Faber Olympica III on- and off-axis frequency responses

The various frequency-response charts we produce for a speaker’s behavior on and off its reference axis (which is usually directly in front of the tweeter, or directly in front of the midrange or midrange-woofer and tweeter) are more exhaustive than those published in any other publication I know of, and can give some indication -- though no more than that -- of the speaker’s sound. For example, the speaker’s frequency bandwidth can be determined from the upper- and lower-frequency limits of the curves, while the overall timbre of the speaker’s sound -- its tonal balance -- can be inferred from the curves’ overall trends, provided you know what to look for. For example, if you see a midrange that’s elevated in relation to the lows and highs, you can assume that voices and instruments, whose bandwidths tend to fall within the midband, will probably sound a little forward. If you see that the higher frequencies are much lower in level than the mids and lows, then the speaker will probably sound a bit dull in the highs. Conversely, if the highs are raised significantly, there’s a good chance that it will sound bright. If the lows don’t extend too deep, or their level is low in relation to the mids and highs, then the speaker’s sound could be thin and light. That said, I wouldn’t try to predict much more than that -- and what I’ve mentioned isn’t much. Unless, to augment the graphs we provide, you have a sound-power measurement -- a far more accurate representation of a speaker’s in-room tonal balance -- everything I’ve listed here still doesn’t add up to much more than a guess. It’s simply based on too little data.

Sonus Faber Olympica III distortionSonus Faber Olympica III total harmonic distortion + noise at 90dB

There’s still not much agreement on how much distortion is acceptable, and at which frequencies it’s most audible, so there’s little about the sound that can be inferred from our harmonic-distortion tests. But distortion tests, along with our test of frequency-response linearity, indicate a speaker’s behavior at high sound-pressure levels (SPLs), which can help us know how loudly a speaker can cleanly and safely play. For distortion, we first measure the speaker at a distance of 2 meters as it reproduces a swept sinewave at an SPL of 90dB, and assess the level of distortion it produces based on what we see in the graph. (The speaker’s frequency response is shown in the upper part of the graph, the distortion components toward the bottom.) If the speaker does well on this test -- that is, if the test reveals a reasonably low level of distortion, which is determined based on our past experience of running tests on many other speakers -- and it doesn’t look as if the extra power required for a higher SPL will damage the drivers (again, based on our accumulated experience of testing speakers), we run the test again, this time at 95dB. Significant harmonic distortion at either of these SPLs indicates that the speaker is being stressed under these test circumstances, and that pushing it to even higher SPLs will likely cause greater problems: more distortion, or even damage.

Paradigm Prestige 95F linearityParadigm Prestige 95 deviation from linearity at 90dB (versus 70dB)

Likewise, to assess whether or not a speaker’s output rises linearly as the SPL increases -- that is, the volume of all frequencies the speaker reproduces increases uniformly -- we first measure its reproduction of a swept sinewave at an SPL of 70dB at a distance of 2m, to get a baseline frequency-response measurement. After the baseline is established, we measure it again, this time at 90dB, and then compare the two readings, to see if the volume of all frequencies within the audioband did increase by 20dB. If all frequencies within the audioband did increase by 20dB, that means that the speaker’s output is still reasonably linear. If we determined in the distortion tests that it was safe to push the speaker to 95dB, we run the test again at that level. If the speaker’s output is not linear at SPLs of 90 or 95dB, this indicates that the speaker is being stressed at those higher outputs, perhaps thermally and/or mechanically.

Paradigm Prestige 95F impedanceParadigm Prestige 95F impedance

Our measurements of a speaker’s impedance and electrical phase can tell you how difficult the speaker is to drive. For example, an 8-ohm load is a fairly typical and easy load for most amplifiers, tubed or solid-state. But if the impedance hovers at around 4 ohms, and particularly if it dips below that, then you’ll need an amp that puts out quite a bit of current -- which rules out most tubed amps. Our sensitivity measurements tell you the output for a given input at a certain distance from the speaker; we use 2.83V and 1m, respectively, which are industry standards. The average sensitivity of all the speakers we’ve measured over the last 15 years is about 87dB; this means that, for a 2.83V input (equivalent to 1W if the speaker presents an 8-ohm load, 2W if 4 ohms), the average speaker delivers an SPL of 87dB when measured at 1m. If the speaker’s sensitivity is lower than 87dB, it will require a higher voltage -- and, therefore, more power -- to deliver the same output level. If you understand how SPL relates to power -- every 3dB increase in SPL requires a doubling of amplifier power -- then you can use these figures to better determine how powerful an amplifier needs to be to drive your speakers to the SPLs you desire.

Last but not least, measurements can tell you whether or not something has been competently designed. I could point to many examples of this, but one stands out: We measured a dynamic-cone loudspeaker whose impedance dipped to well below 1 ohm at several points in the audioband. No loudspeaker should present so difficult a load, which could actually damage an amplifier -- an amp “sees” such a speaker as, basically, a short circuit. When we informed the manufacturer of what we’d found, they said it was news to them -- which indicated to me that their designers hadn’t even measured what they were designing (although many companies measure competently, many others don’t), and probably had no idea what they were doing. They’ve since gone out of business.

Our measurements do indicate, in a limited way, certain performance characteristics; general qualities of bandwidth, timbre, and output capability can be inferred from them. Our measurements can also indicate how compatible a speaker will be with your amplifier, as well as a speaker’s overall design competence. These things aren’t insignificant, which is why we continue to measure speakers, power amplifiers, and headphones -- our measurements help us give you better purchase recommendations. Unfortunately, what they can’t do is what many readers seem to want them to do: tell you exactly how something will sound in your listening room. For that, for now and for the foreseeable future, the test instruments you need are your two ears.

. . . Doug Schneider