THE ARBITRON PPM VERSUS THE NIELSEN METER/DIARY
Which is right? No, which is better?
By Erwin Ephron
If contradictory answers to straightforward questions are an absurdity, then audience research is Theater Of The Absurd.
Its most celebrated performance was in the 1970's, when concurrent Simmons and MRI magazine studies consistently showed readership differences by title ranging from 15- to 60-percent. This was finally set straight by changing the Simmons measurement.
Now there's a new show opening in Philadelphia. Arbitron versus NSI. Arbitron PPM data report TV viewing levels that average 46 percent higher than NSI Diary/Meter levels.
Differences of this magnitude send accountants to jail. In research they raise the familiar question, "Which numbers are right?" The dialogue is straight out of Fiddler on the Roof. "Arbitron's right." " Nielsen's Right." "How can both be right?" "You know, you're right too."
THE PPM REPORTS VIEWING LEVELS
46% HIGHER THAN THE METER/DIARY
Buyers and sellers can argue and even agree on which set of numbers they prefer,
as happened in Print research. But that has little to do with "Truth,"
because there is no "Truth" in audience measurement. There
is only validity, bias, sample-size, economics and judgment.
What is "a viewer?"
To get into this Talmudic argument we need to ask, "What do we mean by a viewer?" The answer seems obvious; a viewer is someone who watches a TV program. But we can't use this description of a viewer as a plan for measuring viewing, because we can't sit in the room and count people watching programs.
For measurement we need an "operational definition" of viewing. One that doesn't describe what we're trying to measure, but instead, tells us how we plan to measure it.
When we look closely at how TV, Magazines and Radio measure viewers, readers and listeners; we find it is very different from how we describe them.
A viewer is operationally defined by NTI as "someone who pushes a button on the Nielsen Peoplemeter remote, when the set is tuned to the program." Nielsen uses a one in 20,000 sample; so 20,000 people are "counted" as viewing a program literally at the push of a button.
Although respondents are instructed when to push, the measurable qualifying act "pushing a Peoplemeter button" is far removed from how we would describe "viewing". 
A magazine issue reader is operationally defined as someone who answers, "yes" to the MRI question "Did you happen to read or look into any of these publications (the weeklies screened in) in the past week…"
This "yes" is equally far-removed from how we would describe the act of "reading".
A radio station listener is operationally defined as "someone who enters a station's call letters (or dial position) in an Arbitron diary with a start and stop time." Certainly not what we think of as "listening." 
In the top 60 markets NSI defines a TV viewer as "someone who enters a station's call letters, channel number or program in a diary with a start and stop time" integrated with an independent record of household set tuning. A fairly complicated process to measure a simple act.
In each case, since we cannot measure audience directly, we measure it by measuring something else.
Researchers refer to the fit between what we want to measure and what the research technique actually measures as "validity." Validity is the challenge in constructing an appropriate operational definition for the abstraction called Audience.
The Arbitron PPM is a good example. Its operational definition of a viewer is "someone within earshot of a station signal coming from a TV set." No surprise this technique will produce numbers substantially different from those produced by the Meter/Diary or the Peoplemeter. The PPM is measuring something else.
Now it gets dicey. Since viewing cannot be measured directly, no "true" count of viewers is possible. And by extension no one set of numbers is "right" in an absolute sense. But there is a lifeline in this sea of relativity. It's called judgment. While we cannot prove which set of numbers is right, we can determine which system is better for making decisions about using television for advertising.
Here are four things to consider. There are others:
Validity is the common sense question, which operational definition best fits what we think of as "viewing?" Certainly the set meter tuning record and the diary claim to have "viewed" comes closer to our idea of viewing than "within range of an audible signal" does. And because the PPM uses this more inclusive definition of viewing, a PPM audience estimate counts as "viewers" more people who did not see the commercial. This is not as good a fit as the Meter/Diary definition of viewing.
But when we ask the next question, "Are there distortions in the measurement that would lead us to bad decisions?" the PPM does far better than the Meter/Diary, because of problems with the diary. In spite of instructions, the diary keeper tends to over-report more familiar, more frequently viewed channels at the expense of those viewed occasionally. The technique favors high-rated shows over low-rated ones and broadcast over cable. That bias is a major defect when the data is used for TV planning and buying.
In addition, the cost of separate diary surveys to obtain viewer data prevents continuous measurement and creates the "sweeps problem," which is actually a bias in the measurement because it does not obtain a representative sample across the year. 
In contrast, the PPM, when carried, is free of human error. It is an electronic measurement, which captures everything it hears. All exposures from all sources receive equal treatment. And because it is a continuing panel there is continuous measurement.
So although the PPM counts as viewing, behavior we would not consider viewing, the station shares it reports are more accurate because they are not filtered through the respondent's judgment or memory. And that is the essential trade-off between active and passive measures. Active measures, like the diary, can use the respondent's judgment for a better measure of viewing, but passive measures like the PPM capture shares better because they eliminate the respondent's judgment. Both levels and shares are important, but I would argue that shares are the more critical for a transactional currency.
The third issue is what are the costs of each technique, which would affect sample size and relative error? There is little empirical data available, but the Meter/Diary system's need for both a panel and surveys compared to the PPM's single panel suggests the PPM could field larger samples and produce more reliable data-per-dollar. But this is only informed speculation.
The PPM single panel format would also increase effective sample-size over time and the data will be more stable because it uses essentially the same sample each week. 
Potential is a tasty mix of scope and economics. The question here is: "Can this measurement system be used for other media as well?" If it can be there is the possibility of creating a large single-source database for cross-media planning.
Here the PPM's weakness becomes an advantage. It is an ambient measure of exposure, substituting the idea of proximity, (in the vicinity of an ad), for encounter, (eyes open facing the vehicle carrying the ad). An ambient definition of exposure can be applied to any medium, including TV, radio, print, outdoor and the Internet. It is a "lowest common denominator" measurement.
And since the PPM is passive, it is theoretically possible to measure all media exposures, continuously and simultaneously, in a single panel without stressing respondents.
This means the PPM technique has the potential to provide direct estimates of between-media duplication, reach/frequency and frequency distributions, media-mix stuff that now requires data fusion or modeling. So single source audience, targeting and reach frequency across media could be the trade-off for the PPM's less precise definition of "exposure." 
Not, which is right, which is better?
Questions that ask, "Which numbers are right?" show our semantic confusion. There is no physical thing called "Audience" we can carefully count to "get it right." But we can estimate the size of an audience by counting other things related to it. These surrogates, diary claims of viewing, meter-tuning patterns, encoded audio signals will each produce a different audience.
The question isn't which count is right. The question is which count is better.
"Better" can only be determined by looking past the numbers to judge
how well each system does all of the jobs we want it to do. By that standard
a PPM-type measurement seems very promising.
Nielsen instructs panel members to record themselves as viewing a television set "whenever they are watching or listening to that set." Arbitron diary-keepers receive similar instructions related to "hearing" a radio station. But the surrogate act measured for viewing is "pushing a button" or "marking a diary."
Although stations prefer "hearing," "listening" is closer to what advertisers think they are buying.
 "The sweeps problem" is created by the station (and network) practice of running unusually strong programming during the short diary measurement periods.
 The Meter/Diary technique benefits less because the diary samples change.
 For Print, Outdoor and the Internet, the
PPM technique would have to be modified to use a silent signal. Ambient electronic
measures of Print have been proposed by Weinblatt/Douglas. Arbitron, Nielsen
and Bloom/Stewart have proposed electronic ambient measures of Outdoor for the
US and overseas.
 For media-mix planning PPM data would
be improved by the use of media exposure factors to account for the greater
(and lesser) probability of proximity to an ad resulting in actual exposure
to an ad. This would have to come from other research.
- October 11, 2002 -