Tuesday, February 03, 2009

How to do a quality-adjusted housing index

The new REBGV numbers are in, and the benchmark value is up in January 2009 relative to December 2008. This seems very different from anecdotes and what I've seen around, so I'm sure many people will be wondering what the heck is going on.

I really doubt that they have been 'messing' with the numbers. I mean, it could be, but that's not the first conclusion I would jump to. Before everyone starts jumping to conspiracy theories, let's try to understand their methodology.

I just noticed that the REBGV has a page with some details on their HPI methodology. I was always a bit curious about this, and I'm glad to see this explanation. It's pretty much what I expected. They likely use a hedonic regression to adjust for quality.

Here's how it works. Say that you have data on the five units sold in a certain area/class--say downtown apartments. Say that the data you have on them is the sales price and the square footage. Now, you could know more than square footage--you could know # rooms, amenities, etc. But let's assume for simplicity that all you know is the sf and the price.

Here are the data

Price SF

250000 400
375000 550
400000 700
700000 1200
850000 1500

If you run a simple regression of price on SF, you get the following equation:

Price = 47552 + 537*SF

Now, imagine that you thought the typical benchmark for downtown condos that you are interested in valuing has 800sf. The way you figure out the benchmark price is to plug in SF=800 to the equation.

price = 47552 + 537*800=$477,389.

This is your benchmark value. Next month, you repeat the exercise given the sales that you see in that month. You then pump in SF=800 and compare the benchmark price to the previous month. Presto, you have your time series of benchmark values.

Now, how could this go wrong?

What if you had more high value (or low value) sales in a given month; a change in the sales mix? As we know, this can skew up or down the median or mean sales price.

In principle, the HPI can account for this. Even if there are only a few observations at the low end, we still can estimate the HPI. So long as the estimated relationship is truly linear, we are still good to go.

In fact, what if ALL we have is high end sales? We can still calculate the benchmark like this. Imagine that the first 3 sales in the dataset weren't there--all that sold was the two high end units. Our regression equation would now be estimated just based on those two observations. The equation is:

price = 100000 + 500*SF

We can still pump in our 800SF benchmark and we'll get a value of $500,000. Note that this is different than the 477,389 we got above. Why? Because the relationship between the characteristics (SF) and the price was not exactly the same among the high end units as among all the units. Note that this could go either way; it's not necessarily biased up or down.

So, a weird sales mix (like you have in the slowest sales month in the middle of the biggest housing bust evahhhh) can lead to a weird HPI value not because it is inherently biased when the sales mix is atypical. Instead, a bias can happen if the relationship between prices and characteristics is different among the observed characteristics and the characteristics of the benchmark unit. Subtle, perhaps.

But, at the end of the day, I expect the January HPI figure is not much more than an anomaly; we'll see the resumption of price declines over the rest of the Spring. With MOI at 20, I can't see anything else as likely.

5 comments:

jesse said...

Thanks for this VHB. A few notes about the HPI.

First, we have seen in San Diego that different tiers (low, mid, and high priced properties) declined at different rates on a PSF basis. Until now Vancouver has bucked that trend.

Second the HPI does not account for quality adjustments. For example if all properties sold had major high quality renos or construction, the price PSF would be higher. This is not captured in the HPI, based upon the definitions from the website. I agree with you what they are describing is hedonic regression but only apparently for more tangible variables like lot size, structure size, and number of rooms, not necessarily quality.

Note the Case Shiller index tries to eliminate major renos by excluding properties with obviously extraneous appreciations from previous sale. These are deemed to indicate a new structure or massive renovation. They can be excluded because the volume of "non-renovated" units can make up a more contiguous and statistically significant data set.

Still, some buyers are finding value at these prices. Not many of them. And that's the important takeaway.

Road Rager said...

It also may be that the accuracy of any statistical calculation decreases with sample size. Since the sample size is number of sales, which is reducing, expect this number to be less than relevant

jesse said...

There were still 700+ sales, more than enough to keep the same confidence intervals. In fact REBGV gives a decent indication of this with the "price range" column in their data. For all of Vancouver this number hasn't moved all that much for January.

Likely there was either a true levelling off or, as VHB implies, a systematic skew in the data.

Panda said...

If you look at the regions with the most sales and ignore areas with few sales (hello Port Moody with 2 sales and detached +170k) you get a feeling that the numbers are mixed with no clear direction:

all detached
van west (46 sales): down 30k
van east (42 sales): up 13k
maple ridge (38 sales): up 14k
richmond (30 sales): down 37k
burnaby (29 sales): up 11k

So I don't see much significance in the benchmark "increase".

mohican said...

Good work VHB.

It seems normal to me to have some statistical fluctuations within the benchmark from month to month. Over longer periods of time these fluctuation work themselves out either by having larger or smaller price changes in previous or future months.