Last week, we received our AVM test results from one of the largest banks in the U.S. To their credit, they are one of a few banks that periodically conducts their own AVM performance analysis, in order to better align their AVM cascade, rather than depending on vendor reported AVM performance results or relying on third party results.
The bank actually conducts their tests on two populations of loans: a population of recent purchase loans closed by the bank, and a population of closed refinance loans where the appraisal value is used as the benchmark. They do this since they only use AVM values to support their refinance activity. This is also very clever because it provides some visibility into which lenders may be gaming the third party testing process by way of MLS data, or access to recent sales data on the subject property.
It is widely accepted among AVM providers that third party tests can be easily gamed due to access to MLS data or known sales data in the test population, distributed by the testing firm, which consists of addresses of properties that have recently sold. In fact third party testers rely on the AVM vendors to not return valuation estimates when the actual sales price or listing price is known within their own database. Unfortunately this methodology places far too much reliance on the integrity of the vendors, resulting in large discrepancies between the reported performances versus the actual performance experienced during deployment in the real world. This behavior is very easily observed based on the test results we received last week.
Generally auto valuation models are built using sales comp data around the subject property. Consequently the accuracy should not vary by much, and should not depend on whether the subject property is listed for sale or not. In other words, a model should return the same valuation estimate on a property before or after a listing, and should not be influenced by the listing price of the subject property.
When we observe the test results however, the wide disparity between the accuracy achieved on the purchase loan population compared to refi loan population, among some vendors, is quite suspicious. This is especially evident among the models that perform very high on the purchase loan population while performing average or below average on the refi population. In fact 7 out of the 23 vendors had a 20 point or more discrepancy between the two test populations. As an example, I’m observing models that performed 81% within 10% on the purchase population, but achieving only 46% within 10% on the refi population. Another resulted in 81% vs 54% within 10%. (As an aside, you should question any AVM that is performing near 80% within 10% since most AVMs perform within 50 to 60% within 10%. This is statistically equivalent to finishing a 100 meter race under nine seconds – very suspicious!) Overall, although our AVM was not the first in both cases, I was pleased to see that we performed above average and very similarly on both tests.