Accredited official statistics

Techincal annex

Published 17 July 2025

Applies to England

For general technical information regarding caveats in this report, please see the聽 technical report.

For a detailed glossary of terms please see the聽glossary.

1. Outcomes of interest

Binary logistic regression was used in this report to examine which personal, household, and property-related characteristics were statistically associated with various housing-related outcomes. These included:

  • being a first-time buyer
  • property ownership expectations (expecting to buy a home in the future)
  • housing conditions (living in a non-decent home, living with damp, or in an over-occupied household)
  • security of tenure (positive subjective assessment)
  • wellbeing (positive subjective assessment)
  • considering or making complaints about repairs or safety issues in the property

Subgroups of interest

The analysis of wellbeing included all households, while analysis of the other outcomes was conducted for specific sub-samples:

  • being a first-time buyer was conducted for the sub-sample of property owners,
  • property ownership expectations were conducted for the sub-sample of renters,
  • living in a non-decent home and living with damp was conducted for the sub-sample of renters,
  • living in an over-occupied household was conducted for the sub-sample of renters, excluding one person households,
  • security of tenure was conducted for the sub-sample of private renters,
  • considering or making complaints about repairs or safety issues in the property were conducted for the sub-sample of private and social renters in turn.

Security of tenure outcome

Security of tenure was derived using responses from four variables, each measuring agreement or disagreement with the following statements on a 5-point Likert scale:

  • I currently feel safe from eviction
  • My housing situation is secure enough for me to feel confident making long term decisions about my life
  • My housing situation is secure enough for me to feel invested in my community
  • My housing situation is secure enough that where I live feels like home

These variables were combined into a single security of tenure index. The Cronbach鈥檚 alpha for the index was 0.878, indicating good reliability. For the regression analysis, those with a score equal to or above the median were considered to have average or higher feelings of security of tenure, reflecting a generally positive assessment of their tenure鈥檚 security.

Well-being outcome

The well-being outcome was produced using responses from four variables, each measured on a ten-point scale:

  • Satisfaction with life
  • The extent to which things done in life are worthwhile
  • Happiness experienced the previous day
  • Anxiety felt the previous day (with this variable reversed-coded, so that higher values indicate lower anxiety)

These variables were combined into a single well-being index. The Cronbach鈥檚 alpha for the index was 0.764, indicating good reliability.

For the regression analysis, those with a score equal to or above the median were considered to have average or higher well-being, reflecting a generally positive assessment of their overall well-being.

2. Model predictors

Excluding and including predictors

The analysis aimed to include all relevant predictors of interest in the model to provide a comprehensive analysis. We excluded predictors that were strongly correlated with other predictors to maintain the model鈥檚 reliability and clarity. This is because multicollinearity鈥攚hen two or more predictors are highly correlated鈥攃an make it difficult to determine the individual effect of each predictor. Multicollinearity can also lead to unstable estimates and inflated standard errors, meaning the results might be unreliable and give misleading conclusions.

We focused particularly on removing predictors that introduced multicollinearity or those that showed no meaningful relationship with the outcome in the bivariate models. These predictors were excluded because they were correlated with other variables and also did not appear to have a significant effect on the outcome when considered individually.

Once multicollinearity was addressed, the analysis tested the statistical significance of each predictor in the full model. Predictors with a significance level of p = 0.05 or lower were retained in the final model, as these were considered to have a statistically significant association with the outcome. Non-significant predictors were excluded to refine the model and focus on the most relevant variables.

The initial, full model included all possible predictors of interest, while the final models presented in the annexed tables and discussed in the report only include predictors that significantly contributed to the model. Wald鈥檚 test was used to assess whether each predictor significantly contributed to the model.

Predictors included in the models

The following categorical variables were entered in all models: age of HRP, ethnicity of HRP, gender of HRP, socio-economic group of the HRP, household income, whether anyone in the household 聽had a long-term illness or disability.

Categorical variables for household type and region were entered in all models, excluding models where not applicable (living with damp).

A categorical variable for the type of dwelling was entered in all models, excluding models where not applicable (being a first-time buyer, property ownership expectations聽 and living in an over-occupied household).

A categorical variable for type of tenure was included in the models for property ownership expectations, housing conditions, and wellbeing, because other models were fitted for a sub-sample of property owners (being a first-time buyer) or private renters (security of tenure, considering or making complaints) or social renters only (considering or making complaints).

The were also the following variable selections due to interest in exploring association between specific factors:

  • A categorical variable for length of residence at the current dwelling was included in the models for security of tenure, wellbeing and considering or making complaints about repairs or safety issues in the property.
  • Two categorical variables for living in a non-decent home and living with damp were only included in the models for wellbeing and considering or making complaints about repairs or safety issues in the property.
  • Weekly rent was included in the models for living in a non-decent home and living with damp only.
  • The following categorical variables were entered in the model for first time buyers only: whether the property is freehold or leasehold, Energy efficiency rating band (SAP 2012) for the property, flags for using savings, inheritance, help from family to buy property.

Reference groups

All explanatory variables were treated as categorical and entered into the model as dummy variables. For each categorical variable, one group was selected as the reference category (typically the largest group or a policy-relevant category such as London). Coefficients for other categories reflect differences relative to this reference group. Reference categories are listed in the annex tables accompanying each model.

3. Logistic regression

Binary logistic regression is a type of regression model used when the outcome variable is binary, meaning that it has only two possible values. The model uses a logistic function, ensuring predicted probabilities fall between 0 and 1, which assesses how different factors influence the likelihood of the outcome. For example, these models can show how the probability of being a first-time buyer or living in a non-decent home varies by individual or household characteristics.

By controlling for other variables, binary logistic regression ensures that the estimated effect of one predictor is not confounded by the influence of other variables in the model. For instance, controlling for age and household type size ensures that any observed relationship between income and the likelihood of being a first-time buyer reflects income alone. This helps prevent over- or under-estimation of the true effect of each factor.

While logistic regression identifies associations, it does not imply causation. Therefore, results should be interpreted as indicative rather than definitive.

The svyglm function from R survey package was used for the regression analysis to account for the complex survey design.

4. Odds ratios

In this report, we use odds ratios to examine how various factors influence the likelihood of specific outcomes.

Odds represent the ratio of the probability of an event occurring to the probability of it not occurring. The odds ratio compares the odds of an outcome occurring for one group relative to another. In a binary logistic regression model, the odds ratios compare the group of interest to the reference group for each predictor, measuring the strength of the relationship between the predictor and the outcome.

  • An odds ratio greater than 1 suggests the predictor increases the odds of the outcome, compared with the reference group.
  • An odds ratio less than 1 indicates the predictor decreases the odds of the outcome, compared with the reference group.
  • An odds ratio of 1 means there is no effect of the predictor on the outcome, or the odds are the same as for the reference group.

For example, an odds ratio of 1.5 means the odds of the outcome are 1.5 times higher for one group compared to the reference group.

On the other hand, an odds ratio of 0.5 means the odds of the outcome are half as high for that group compared to the reference group.

Odds are not the same as probability. While probability expresses the chance of something happening out of all possible outcomes, odds compare the chance of something happening to the chance of it not happening. For example, a 75% probability (3 out of 4) means the odds are 3 to 1.

Similarly, odds ratios should not be confused with likelihood or probability, even though they are related. An odds ratio shows how the odds of an outcome differ between groups, not the overall chance that the outcome will occur. It tells us about the strength of association, not how likely the outcome is for a particular group.

5. Forest plots

Forest plots are included in the report to help illustrate how different factors affect outcomes of interest, such as being a recent first-time buyer or living with a damp or in an overcrowded household. The plots were produced as follows. After each logistic regression was run, the coefficients were exponentiated to obtain odds ratios and their corresponding 95% confidence intervals were calculated. These values, along with the reference category for each predictor (which has an odds ratio of 1), were copied into an Excel workbook for plotting.聽

First, a scatter chart was created with predictor categories on the vertical axis and odds ratios on the horizontal axis. Each odds ratio was plotted as a dot for its category. Excel鈥檚 horizontal error bars were then added to each dot to show the lower and upper confidence limits for the odds ratios. Finally, a vertical line was placed at an odds ratio of one, allowing the reader to see whether a confidence interval includes 1. This is important because if a confidence interval includes 1, the factor鈥檚 effect is not statistically significant.聽

How to interpret the plots?聽聽

In every resulting plot, the dots represent odds ratios. An odds ratio tells us how much more (or less) likely someone in each group is to have the outcome, compared with the reference group for that factor.聽

  • Odds ratios above 1 mean the group is more likely to have the outcome. For example, an odds ratio of 2 means their odds are twice as high as the reference group鈥檚.聽

  • Odds ratios below 1 mean the group is less likely to have the outcome.聽

Each odds ratio has a confidence interval, represented by a line around the point estimate. This shows the range of values that we are 95% confident the true value in the population lies within. If the confidence interval crosses the vertical line (which represents 1), it means we cannot be certain the factor has a real impact - its effect is not statistically significant.聽聽

6. Wald鈥檚 test

In addition to the odds ratios produced by the logistic regression models, we used Wald鈥檚 test to assess whether each explanatory variable, as a whole, significantly contributes to the model. This test evaluates the null hypothesis that all coefficients associated with a specific regression term (e.g. all dummy variables for a categorical predictor) are equal to zero.

While the odds ratios highlight group-level differences relative to the reference category, the Wald test provides a broader view of a variable鈥檚 overall impact on the outcome. We also used Wald test values to rank the explanatory variables by the strength of their association with the outcome.

The regTermTest function from R survey package was used for the Wald鈥檚 Test to account for the complex survey design.

7. Statistical significance

The statistical significance of a regression coefficient indicates the reliability of the estimate. Coefficients with a p-value of 0.05 or lower were considered statistically significant and are highlighted in the relevant tables. Coefficients with p-values greater than 0.05 are considered statistically non-significant and should be interpreted with caution.

8. Weighting

The regression analyses were carried out on weighted data.