Last week, we received our AVM test results from one of the largest banks in the U.S. To their credit, they are one of a few banks that periodically conducts their own AVM performance analysis, in order to better align their AVM cascade, rather than depending on vendor reported AVM performance results or relying on third party results.
The bank actually conducts their tests on two populations of loans: a population of recent purchase loans closed by the bank, and a population of closed refinance loans where the appraisal value is used as the benchmark. They do this since they only use AVM values to support their refinance activity. This is also very clever because it provides some visibility into which lenders may be gaming the third party testing process by way of MLS data, or access to recent sales data on the subject property.
It is widely accepted among AVM providers that third party tests can be easily gamed due to access to MLS data or known sales data in the test population, distributed by the testing firm, which consists of addresses of properties that have recently sold. In fact third party testers rely on the AVM vendors to not return valuation estimates when the actual sales price or listing price is known within their own database. Unfortunately this methodology places far too much reliance on the integrity of the vendors, resulting in large discrepancies between the reported performances versus the actual performance experienced during deployment in the real world. This behavior is very easily observed based on the test results we received last week.
Generally auto valuation models are built using sales comp data around the subject property. Consequently the accuracy should not vary by much, and should not depend on whether the subject property is listed for sale or not. In other words, a model should return the same valuation estimate on a property before or after a listing, and should not be influenced by the listing price of the subject property.
When we observe the test results however, the wide disparity between the accuracy achieved on the purchase loan population compared to refi loan population, among some vendors, is quite suspicious. This is especially evident among the models that perform very high on the purchase loan population while performing average or below average on the refi population. In fact 7 out of the 23 vendors had a 20 point or more discrepancy between the two test populations. As an example, I’m observing models that performed 81% within 10% on the purchase population, but achieving only 46% within 10% on the refi population. Another resulted in 81% vs 54% within 10%. (As an aside, you should question any AVM that is performing near 80% within 10% since most AVMs perform within 50 to 60% within 10%. This is statistically equivalent to finishing a 100 meter race under nine seconds – very suspicious!) Overall, although our AVM was not the first in both cases, I was pleased to see that we performed above average and very similarly on both tests.
I’m not getting the “suspicious” activity. If an AVM vendor’s data sources included legally/compliant sourced MLS comparable sales to a subject properties and the result is higher accuracy levels of the resulting AVMs, why is this outcome ‘suspect’ simply because purchase loan to accuracy is higher? The use of “asking price” (to get a valuation seems more than just ‘suspect’ – that seems way outside acceptable methodology) – are you saying this is all they are using for the purchase side?
Does your bank customer change its mix of vendors based on the study? Do they ‘mask’ who the other vendors are, or do you get to see the names?
Thanks for your comment. I do not disagree that MLS comparable sales can be legitimately used in constructing AVMs. My main concern is when they are used in validation testing. Obviously AVM vendors have access to the actual sales prices, and sometimes they have the sales price of the subject property being tested, especially when they have access to the MLS data. It’s up to the vendor to not return valuations on addresses that they already know the sales price to (avoid target leak). In a truly blind test, one would expect a model to perform similarly between properties that were listed compared to those not listed on the MLS. What was surprising when I saw the results of the refinance test (compared against appraisal values) vs. resale test (compared against sales) is that some models performed significantly higher in the resale test compared to the refinance test (around 85% within 10% in resale compared to 50% within 10% in refi). I’m just wondering if this perceived lift in accuracy is real.
As far as the banks, I think they are getting smarter, and more and more are starting to do their own validation testing rather than rely on third party metrics. I’m not sure if they are changing the mix of vendors. The vendor names are masked when we receive the test results.
Thanks again for your comment.
We are presently looking at our AVM implementation and your comments provide a good perspective to consider. Thanks.
One note, I think there is a testing fallacy created when it uses appraised values as the benchmark on refinance transactions. It is known in the industry that refinance appraisals tend to be overstated. Thus, by using this as a benchmark, the comparison is not clean. This would easily explain the large difference between testing results.
In my mind, I figured the best way to calibrate AVM models would be against sales transactions, as described. Then, we would use the AVM to benchmark appraisers on refinance transactions.
Thanks for your comment. I agree with you to some extent about refinance appraisals having a tendency to be inflated (in fact, we have observed this bias especially on cash out refi’s from 2005 to early part of 2007), but under the current environment appraisals are put through a much stricter review process almost to a fault, whether for refi or purchase. Also appraisers are generally much more conservative now. In fact, there are numerous stories of buyers having to increase the down payment to lower the LTV because the appraisal value was below the agreed upon purchase price, and refinances being turned down due to lack of value. Overall, in today’s conservative environment, we observe the delta between refi appraisals and purchase appraisals, have narrowed. Consequently, under the current environment when I observe some AVMs performing 20% to 30% better in purchase tests compared to refinance tests, it does leave me wondering what would cause this. My suspicion is that purchase tests are prone to target leak that could inflate the testing accuracy compared to what you would experience in the real world.
Ideally, I think the best method of AVM testing is to compare the AVM results to values that you believe in (appraisals that have been reviewed and accepted for both purchase and refi) with a method that eliminates target leak as much as possible (requesting AVM estimates before a sale transaction actually occurs). More and more we are seeing that banks are getting more sophisticated and taking this approach to testing and keeping AVM providers more accountable. Thanks again for your comment, and wish you the best with your implementation.