Polls – Behind the Numbers

You might recall the flap last year over the differences between the various public polls on the Mayoral Election and the actual results. Freddie Ferrer supporters claimed that inaccurate polls doomed his candidacy. This wasn’t the first and won’t be the last time candidates will complain about these polls.

What is has been missed by many is that there is a major difference between the NY1, Quinnipiac and Marist polls and the private polls conducted by political campaigns- a difference that often explains why so many election results are quite different than the poll projections.

There are two basic methods for selecting whom is polled- random digit dialing of phone numbers or calling from voter lists. The college and almost all media-sponsored polls use random digit dialing (RDD). Polls paid for by candidates, Political Parties and interest groups use voter lists. There are advantages and disadvantages to each method. The critical issue being the ability to do a good job at the desired goal for the sample — cover only people who can be expected to vote in the election in question.

Only RDD can generate any possible phone number (including unlisted numbers), so every phone in the area has a chance to get picked.  This is an important part of the theory of probability that underlies statistics.  Voter list-based samples can only provide phone numbers for those voters whose phones are listed somewhere – on voter registration forms, with the phone company, or other places where voter file vendors can purchase the phone numbers.

From here on everything is on the side of the voter file-based sample (I must admit my bias here, part of my business at Prime New York is selling voter list-based samples). An interviewer using RDD knows absolutely nothing about the person contacted before speaking to whoever answers the phone, whereas the person contacted with a voter sample comes with a lot of information already provided.  That means the RDD based poll has to rely entirely on answers the people give to the interviewer.  First comes the question of who is a registered voter.  No one is going to vote in the election if they are not registered, but RDD is going to generate a lot of numbers that don’t go to registered voters.   Will the respondent tell the truth when asked?  Studies have shown people don’t like to admit they aren’t registered because it reflects poorly on them as a “bad citizen” or other social stigma (its just like high school peer pressure, but much more diffuse).   If the poll is for a primary election, things get even worse.  On a voter sample for a Democratic Primary only Democrats are listed.  Again, the RDD interviewer has to ask a question to find out if the person is a Democrat.  Are they likely to vote?  Every pollster has a series of questions they use to determine if the person is likely to vote. But just as people often lie about whether they are registered, they also do so about how often they vote (and for the same reasons). Survey after survey done after an election have shown a much larger number of people saying they voted than actually turnout for the election.  Proponents of RDD correctly say its strength lies in the ability to reach everybody but they don’t tell you that RDD has a great deal of difficulty finding the right people to ask about elections – especially low turnout elections like primaries.

Voter list samples have an advantage in telling pollsters the right people to survey for elections.  They list who is registered and have information on exactly which elections the person has voted in.  Since past voting is a strong predictor of future turnout, this helps pollsters identify likely voters.  In fact, most polls for campaigns draw the sample from people with a certain history of voting.  The most sophisticated use of voting history (Predicted Electorate Random Sampling) relies on voting history to create a sample that reflects the likely electorate rather than on the often-misleading answers of survey participants.  If the election is a low turnout one getting the electorate right is critical to getting a poll right.

In the end, RDD work best for higher turnout elections or if there are problems with the voter records. And that’s why most RDD polls had a good track record in predicting winners of Presidential Elections and other major elections. But in low-turnout contests, the list of sure winners in pre-election RDD polls turned losers on Election Day (remember Governor Koch) is a long one.

However, studies have shown that voter list-based samples can do just as well as RDD – and often better – when they use past voting history to create the sample.  An article coming out in the next edition of the academic journal Public Opinion Quarterly shows that tests of an early version of Predicted Electorate Random Sampling (then called Registration Based Sampling) outperformed RDD in several elections, including the NY 2002 Statewide General Election.  When the right voters get harder to find – as in low-turnout elections like Primaries – the advantages of voter list-based samples grow even larger, something to keep in mind between now and this September’s New York Primaries.