Freddie Mac Loan-Level Dataset
Freddie Mac’s Single Family Loan-Level Dataset seems worth looking into:
The user guide contains R script for importing the data!
Possible uses
- Compare distributions of interest rates on investor vs. owner-occupier loans (by year) – do investors pay higher interest?
- First look at the data: yes, they pay higher rates
- Another thing I could do: run a regression of interest rate, loan-level and quarter controls, and an investor dummy for each quarter/year – does the “investor spread” change over the course of the boom? Could implement this in the sample data first…
- Compare default/deliquency rates on investor vs. owner-occupier loans during the bust – did investors default more? This would require using the very large monthly performance files.
- Compare LTV distributions during the boom
Overview
The data are quarterly and contain information on fully-amortizing 15-, 20-, and 30-year fixed-rate mortgages with full documentation that were purchased or guaranteed by Freddie Mac from 1999–2018. So the sample is obviously much smaller than HMDA, and restricted to low-risk loans (i.e. a subset of conforming loans).
For each quarter, there is one file containing loan origination data and one file containing monthly performance data for each loan in the origination data file. Thus, for a loan originated in quarter 1999Q1, the performance file tracks this loan, on a monthly basis, until it is terminated (or sold?)! Accordingly, the performane files are huge!
The origination data alone are very rich. Includes: loan amount, interest rate, credit score, MSA, investor/second-home flag, LTV, CLTV, DTI, and a few other things.
Malin Hu used the origination data in her JMP (Arlene Wong may have used the performance data too?).