Friday, October 26, 2012

Trends

Since I've now been posting these averages for 30 days now, I thought it would be worthwhile to take another look at the polling trends over the last month.  The chart below shows the trends for the RCP Average, Rasmussen's daily track, and my 2008, 2010, D+3, and Rasmussen Party ID models.





There are a couple interesting things I'd like to point out.  First of all, notice that the tracks for my models are remarkably stable, especially from 10/13, through 10/22.  The tracks had a blip on the 22nd and 23rd, because I added Gallup into the mix on the 22nd, and RCP cycled 10 polls during those 2 days.  But note how quickly the tracks returned to their steady state.

The two lines that aren't steady during these periods are Rasmussen's daily track, and the RCP average itself.  The RCP average is wandering all over the place, while the reweight models stay on very steady horizontals.

Second, Obama's campaign was doing very well on October 1st.  Romney's support had been degrading up to that point, and the trend line was looking bad.  The race changed for good on October 6th.  That is when the debate performance became a factor in the polls.  Romney's support has never tapered off since.  Romney had a slight increase in his lead after the VP debate, but that settled back to his level as of the 6th.

Finally, you can see from the graph that the RCP is averaging out to about a D+5 advantage across all of the polls.  That is 2 points better for Obama than the best case scenario discussed by any pundit during this election season.  There are going to be a lot of shocked people on November 7th.




34 comments:

  1. That is a lovely graph (even without Christina on it). :-)

    ReplyDelete
  2. Just for sake of completeness, is there a reason you didn't show 2004 turnout on the graph? I am guessing it would provide a similar trend line, but it just struck me that it was missing.

    ReplyDelete
    Replies
    1. It is so close to the 2010 trend line that they overlap. I didn't want the graph to be too hard to read.

      Delete
  3. With all respects for the effort displayed to produce this blog, I am quite positive, though, that this is the kind of thing that can only be produced by living in what Bill Maher calls the GOP bubble.

    On the surface of your equations, it looks like a no-brainer Romney victory. But the model you have constructed to inform that conclusion is deeply flawed. Here are a few ways that is true:

    1) From the outset of your initiative to today you have added in new data sources and tweaked your model, this does not give your trendlines true like to like comparison capability.

    2) You have, by your own admission, thrown out partisan splits from poll data. You can't just take the data you like and discard the rest. You need to take the whole of it - the baby AND the bathwater.

    3) You have not accounted or weighted for known "house bias" of pollsters. Rasmussen, for example, has a notorious R~3 lean.

    4) You have not accounted or weighted for the historical accuracy of pollsters, Gallup, for example, tends to become an increasing outlier in national polling the deeper into an election cycle you get.

    5) The models you have chosen to extrapolate unkowns like Ohio party ID are not particularly well proven.

    6) Worst of all, you have a clear political bias. Which any credible statistician will point out will tend to leave you with a noted observer bias that will more likely than not lead you to confirm what you think you already believe or know.

    I could go on - but I am faily sure this will be incindiary enough.

    I do applaud the desire to get deeper into the numbers. I just hold you are doing so in a way that adds value only by being highly self-selective in your model.

    Cheers.

    .chris daly

    ReplyDelete
    Replies
    1. I'm not even going to bother. I'll leave it that all of your statements are not factual.

      Delete
    2. Your comments are welcome, but sound distinctly more partisan than an attempt at informed debate.

      You don't seem to understand what's being done here or the principles underlying it. The core of your argument is points 2&4:

      I would recommend you read my previous explanation, found here, as it's explained more generally than below: (http://tinyurl.com/96fdrfw)


      We desire to reduce the potential solution space for the election outcome. Polling is nothing but sampling on an unknown, high dimensional manifold. Dave is utilizing the various pollsters as a pseudo-random sampling algorithm which samples a sparse subset of this volume -- which is why we want the pollsters to differ slightly. All we are after is an efficient way to sample an unknown underlying population which produces reliable data indicating what the underlying population's values are. Noise is mostly irrelevant as it can be filtered.

      'All' Dave is doing is utilizing the embedded information that exist in stable temporal correlations across different timesplices to reduce the entropy in the calculation, to help narrow down the potential solution space of this election.

      This is the same thing done by nearly every modern form of video compression codec, think P-frame macroblocks.

      So, to reiterate, the pollsters individually consider inter-timeslice correlations to reduce entropy and hopefully guide their sparse sampling to be a more efficient representation of the whole space.

      Dave is doing this in another way, by transforming his data into a partisan affiliation space and taking advantage of the correlations that exist across time. It has nothing to do with sampling -- the same sampled numbers act as inputs. They fluctuate just the same as the pollsters find they are in the real world.

      The rationale to do this is especially strong, IMHO, as you compare the short dynamical timeframe over which the sampling periods occur compared to the slow time evolution of many of these variables: we change ideologies much, much slower than we poll, thus they are stable.

      Delete
    3. Furthermore, he speaks of "House Effect" which interests me. I'd love to have a discussion about this type of filtering.

      Consider,

      Keeping with my earlier example, imagine a square of pixels and you want to determine it's average color. You can only sample a single point in this square. Each pollster is a bit different and guesses which is the best spot and picks a pixels that's in a different area.

      Ok, the 'election' comes, all the pixels report, and the average color is found out. A guy names Nate Silver who makes Andy Dick look straight then comes along and measures the difference between each predicted color and the actual. He then calls this the 'House Effect' and adjusts all future predicted colors by this difference amount.

      In effect, it converges the random sampling pattern from all the different pollsters onto a single point in the square which has the average color. What was once several pollsters providing wide, sparse coverage of the unknown square, are now asking just a single pixel. Your random sampling pattern is now point sampling.


      4 years later, the next square comes around and the pixel colors are somewhat different. The pollsters go back and each try to pick a representative pixel, but our butt pirate friend is now basically taking all their different insights into what color is in different regions of the square and adjusting it so he's sampling the whole square at a single point which corresponds to the best way to predict the last square.

      Ugh. Even if we assume you get some pseudo-random jitter from the pollsters because they've updated what pixel they sample, he's still bounding the search area by artificial constraints. We can argue about the logic of stored information in prediction or what types of landscapes it makes sense on, what sampling patterns are best suited, etc. But these are all known fields which Silver shows no regard for.

      Delete
    4. This comment has been removed by a blog administrator.

      Delete
  4. Thanks for this and all your work Dave.

    The RCP average is subject to spurious variability as a result of picking up and dropping polls with different party advantages. This is neatly illustrated by your chart whatever the merits of the rest of your analysis.

    FWIW I do not think that you are guilty of leaving out polls that fail to confirm your preferred outcomes. Including the PPP poll, for instance, would make very little difference to your numbers.

    NJH

    ReplyDelete
    Replies
    1. Correct. If the poll is in the RCP, it goes in the average. I don't cherry pick. If RCP doesn't include I poll, neither do I.

      Delete
  5. "House Bias" - har har har. Nate Silver, is that you???

    ReplyDelete
    Replies
    1. For the record, I don't "house bias" anything. Every poll gets the exact same weight, same as is done by RCP. Even when it is bad for Romney, like the CBS poll.

      Delete
    2. This comment has been removed by a blog administrator.

      Delete
  6. I do sound like him don't I. Lucky for him, he is much smarter than I am. And house bias is no laughing matter at all.

    ReplyDelete
    Replies
    1. Nate is smart? Okay then. He's also a fraud who weights polls based on his subjective feelings. Find a new god.

      Delete
    2. There is nothing godly about him. That's the joy or good math. And the "subjective feelings" he weights by is based on the actual historical accuracy of polls to their election outcomes. How does that make you feel my man? subjectively, of course?

      Delete
    3. You are aware that Nate's vaunted accuracy was based on adjusting his model using internal polling given to him right before the election by the Obama campaign in 2008? Right? So yes, he's a fraud and math has nothing to do with it. He should have stuck with baseball.

      My feelings are that you are a dem troll. But we both already knew that.

      Delete
    4. Smells like a Dem troll to me, too. (sigh)

      Delete
    5. And yet Silver's vaunted accuracy was in fact... incredibly accurate. more later, work calls.

      Delete
    6. You are no longer welcome. Unlike other sites, I do not tolerate trolling. Don't bother posting anything regarding Nate Silver, I will only delete it.

      Delete
    7. i am not trying to troll so much as I am trying to engage in a conversation on methodology. Lets talk about other methods as well.

      Delete
    8. Nate Silver doesn't have a methodology. You want to discuss other methodologies, start your own blog. But your first post here was to state 6 lies about me. Don't be looking for me to be welcoming you with open arms.

      Delete
    9. This comment has been removed by a blog administrator.

      Delete
    10. This comment has been removed by a blog administrator.

      Delete
  7. If NumbersMuncher is correct, Rasmussen has changed his party advantage to about D+6 today in order to keep Obama within 3 points.

    I think this supports your general approach. Rasmussen may be interested in controlling volatility to reduce reputational risk but it is not obvious that reversing out his changes gives a less true picture of the state of play.

    NJH

    ReplyDelete
    Replies
    1. Yeah, it is hard to dig out Rasmussen's splits. I'm still carrying him as D+3 until I hear differently. Gallup is even worse.

      Delete
    2. People argue that your sort of approach is naive because party ID is itself a "fluid" thing, but if the pollsters themselves are controlling outcomes by fixing the party advantage it would be more naive to go with their headline numbers.

      Keep at it.

      NJH

      Delete
    3. My argument about that is that Party ID *is* fluid. But when you self identify with a party, the great majority of the time you also decide you will support their candidates. If you want to be non-partisan, then you self identify as an Independent. So if a poll tells me how many self identified Ds, Rs, and Is are in it, then I can adjust the results to match any turnout model I want and be pretty accurate. The only real variable is which turnout model we will get.

      Delete
    4. This morning's Ras number appears in Dave's average tomorrow, btw.

      Could be a real move. I'd look into NM's claim (but yeah, that's work).

      Delete
  8. I gotta say, I take real delight in contemplating the lobbying/recriminations/threats Nate Silver is, was, and will be getting in the current version of Journolist. Oh, what I wouldn't pay to read those emails.

    ReplyDelete
  9. You guys are way smarter than me. Could one of you explain to me what Dave is measuring so I can understand? It seems to be a reweighting of the polls to show that Romney is really leading, but that's all I've gathered.

    (I'm not trying to troll. I really am curious.)

    ReplyDelete
    Replies
    1. Also, I've looked at Nate Silver's methodology too and it is not sound.

      Delete
  10. He's really just doing election scenario analysis using all known data as the basis for formulating his model assumptions. All known data includes all of the internal polling numbers from 2012 and actual election numbers from prior years. The math is actually quite simple and the methodology is very reasonable. He then lists a range of outcomes assuming various turnout scenarios. Pick the one you think is the most likely and go with those results. Dave argues, based on 2012 Gallup surveys that the actual election result will be somewhere between d+3 and 2010.

    ReplyDelete
    Replies
    1. Which means a lot of polling this year - with D+7 or D+9 or worse samples...I think there were one or two D+11/+12 ones - are just, well, ridiculous.

      Delete