I really doubt that they have been 'messing' with the numbers. I mean, it could be, but that's not the first conclusion I would jump to. Before everyone starts jumping to conspiracy theories, let's try to understand their methodology.
I just noticed that the REBGV has a page with some details on their HPI methodology. I was always a bit curious about this, and I'm glad to see this explanation. It's pretty much what I expected. They likely use a hedonic regression to adjust for quality.
Here's how it works. Say that you have data on the five units sold in a certain area/class--say downtown apartments. Say that the data you have on them is the sales price and the square footage. Now, you could know more than square footage--you could know # rooms, amenities, etc. But let's assume for simplicity that all you know is the sf and the price.
Here are the data
If you run a simple regression of price on SF, you get the following equation:
Price = 47552 + 537*SF
Now, imagine that you thought the typical benchmark for downtown condos that you are interested in valuing has 800sf. The way you figure out the benchmark price is to plug in SF=800 to the equation.
price = 47552 + 537*800=$477,389.
This is your benchmark value. Next month, you repeat the exercise given the sales that you see in that month. You then pump in SF=800 and compare the benchmark price to the previous month. Presto, you have your time series of benchmark values.
Now, how could this go wrong?
What if you had more high value (or low value) sales in a given month; a change in the sales mix? As we know, this can skew up or down the median or mean sales price.
In principle, the HPI can account for this. Even if there are only a few observations at the low end, we still can estimate the HPI. So long as the estimated relationship is truly linear, we are still good to go.
In fact, what if ALL we have is high end sales? We can still calculate the benchmark like this. Imagine that the first 3 sales in the dataset weren't there--all that sold was the two high end units. Our regression equation would now be estimated just based on those two observations. The equation is:
price = 100000 + 500*SF
We can still pump in our 800SF benchmark and we'll get a value of $500,000. Note that this is different than the 477,389 we got above. Why? Because the relationship between the characteristics (SF) and the price was not exactly the same among the high end units as among all the units. Note that this could go either way; it's not necessarily biased up or down.
So, a weird sales mix (like you have in the slowest sales month in the middle of the biggest housing bust evahhhh) can lead to a weird HPI value not because it is inherently biased when the sales mix is atypical. Instead, a bias can happen if the relationship between prices and characteristics is different among the observed characteristics and the characteristics of the benchmark unit. Subtle, perhaps.
But, at the end of the day, I expect the January HPI figure is not much more than an anomaly; we'll see the resumption of price declines over the rest of the Spring. With MOI at 20, I can't see anything else as likely.