1. Introduction

Developing forecasts and making predictions for better decision-making can prove challenging in an environment of constantly changing information.

Forecasts based on information available at one point in time will only support the same decisions or actions in another if the underlying information has remained virtually unchanged. However, available information can be prone to changes. By this logic, the best forecasts and consequent decisions are those that can account for what is not yet known.

Property market data, specifically settled sales data, are often the basis of analysis that policy and decision-making rely on. However, information on property market transactions is only received at the point of settlement rather than at contract signing. This is also known as the settlement lag. Except for off-the-plan transactions that take between two to three years to settle, the period between contract signing and settlement could be a delay of weeks if not months. The first release of the settled sales property market data for a given contract period is therefore only a partial view of all the contracts that have been signed in the same period. This first release also only reflects the transactions that have had the shortest settlement period. From our calculations of the CoreLogic settled sales data, on average only about 60 per cent of transactions for a given contract period are received in the first release.

To our knowledge, there have not been many studies that have investigated the impact of settlement lag. It is unclear to what extent settlement lag affects analyses between initial first data release and the final data release, when all transaction data for a given contract period has been received. It is also not known which segments of the property market dominate the early data releases and therefore influence inferences from early analyses as well as influence decision-making to a greater extent.

This paper complements the work of Anenberg and Laufer (2017), which constructs a house price index based on house values at contract date, when the sale price is determined, rather than on closing or settlement data when the property is transferred. The focus of this paper is to interrogate various data vintages to understand the impact of settlement lags and data revisions on house price forecasts. It first attempts to identify segments of the market that make up a larger share of earlier data releases.

The location of properties (i.e. inner, metropolitan, outer rings of Melbourne or regional Victoria) is first explored to examine whether settled properties in a certain region make up a larger share of the early data releases. This is then followed by analysis on whether the type of property (i.e. houses or units) settle earlier. In the third exploration, the price distribution of each data release is considered by analysing the median contract price for each data release, to determine if early data releases include more properties that have a higher or lower contract price.

Finally, the impact of each data release is explored through the changes observed in a hedonic price index. This is done because price indices are often used in policy and decision-making to quantify and capture price changes and turning points in the property market. As the timing and information content of each data release matters, the analysis uses a real-time data framework. Data releases are also known as vintages and the reference period refers to the period during which contract dates fall.

The next section provides a short discussion about property data and the settlement lag. Section 2 provides a succinct overview of the economic literature and discusses common issues in the property market data space. Section 3 describes the data and approach taken, while Section 4 describes the results from the data investigations. Section 5 then provides some concluding remarks with some avenues of further research.

Updated