MLE for Linear Standard Variation in Python

The Setting

At work we’ve been looking at the variance of differences between sale prices of neighboring houses, and its dependence on the distance in space and time of the sales.

Searching for about a minute on the web, I didn’t see an implementation of a solution to the following problem:

The Problem

Given the model
\varepsilon_i \sim \mathcal{N}(0, \langle x_i, a\rangle^2),
and samples (x_i, \varepsilon_i)_{i=1}^n, what is the maximum likelihood estimation for the vector a?

The Log-Likelihood

Using Gauss’s formula:
\begin{aligned} \log{p(\varepsilon | a,x)} &= \log{\left( \prod_{i=1}^n \frac{1}{\sqrt{2\pi}\langle x_i, a\rangle} \exp\left(-\frac{1}{2}\frac{\varepsilon_i^2}{\langle x_i, a\rangle^2}\right) \right)} \\ &= \sum_{i=1}^n \left(-\log{\langle x_i, a\rangle - \frac{1}{2}\frac{\varepsilon_i^2}{\langle x_i, a\rangle^2}} \right) - \frac{n}{2}\log{2\pi} \end{aligned}

The Jacobian

We’re going to use BFGS to maximize the log-likelihood, so we’ll also need the Jacobian. Differentiating the above:
\begin{aligned} \frac{\partial\log{p(\varepsilon | a,x)}}{\partial a_j} &= \sum_{i=1}^n \left(\frac{x_{ij}}{\langle x_i, a\rangle} - \frac{\varepsilon_i^2x_{ij}}{\langle x_i, a\rangle^3} \right) \end{aligned}

The Code

From here it’s just writing correct numpy broadcasting code and calling to scipy’s minimize. We also tack on numba.

from numba import njit
import numpy as np
import scipy.optimize

@njit
def linear_std_f_jac(a, x, e2):
    """
    Compute the minus log-likelihood of $e^2$ given the model $e~N(0, <x,a>^2)$
    (up to a fixed additive constant) and its Jacobian with respect to `a`.
    """
    dots = 1/np.dot(x, a)
    e2dots2 = e2 * dots**2
    f = np.sum(e2dots2 / 2 - np.log(dots))
    j = np.sum(x * (dots - e2dots2 * dots).reshape(-1, 1), axis=0)
    return f, j

def mle_linear_std(x, *, e=None, e2=None, a0=None, bounds=None, **kwargs):
    """
    Find the MLE for `a` given the model $e~N(0, <x,a>^2)$
    
    If you pass `e2`, the squared errors, then they won't be recomputed.
    If you don't pass `e2`, then you must pass `e`.
    
    `a0`, `bounds` and `kwargs` are passed to `scipy.optimize.minimize`.
    """
    x = np.asanyarray(x)
    assert x.ndim == 2
    n, l = x.shape
    if a0 is None:
        a0 = np.ones(l)
    else:
        a0 = np.asanyarray(a0)
    if e2 is None:
        assert e is not None
        e = np.asanyarray(e)
        assert e.ndim == 1
        e2 = e**2
    else:
        e2 = np.asanyarray(e2)
        assert e2.ndim == 1
    if bounds is None:
        bounds=[(0, None)] * l
    return scipy.optimize.minimize(
        linear_std_f_jac, a0,
        args=(x, e2), jac=True, bounds=bounds, **kwargs
    )

The Example

n = 10_000_000
l = 5

random = np.random.RandomState(1729)
x = np.c_[np.ones(n), np.abs(random.randn(n, l - 1))]
a = np.abs(random.randn(l))
e = np.dot(x, a)

import time
t1 = time.time()
res = mle_linear_std(x, e2=e**2)
t2 = time.time()

print("time:", t2 - t1)
print("correct:", a, "\n")
print(res)

Which outputs:

time: 7.45331597328186
correct: [1.70512861 0.28985998 1.05376116 1.58281698 1.39634357] 

      fun: 21011319.054358717
 hess_inv: <5x5 LbfgsInvHessProduct with dtype=float64>
      jac: array([ 5.7129652 ,  2.83110926,  4.06507689, 10.56198964, -2.42801207])
  message: 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
     nfev: 13
      nit: 12
     njev: 13
   status: 0
  success: True
        x: array([1.70513432, 0.28985536, 1.0537623 , 1.5828478 , 1.39631855])

Looks ok.