Gaussians and matrix completion

\[ \begin{pmatrix} 6 & 10 & 13 & * & * & * \\ 0 & 4 & -2 & 0 & * & * \\ 5 & 7 & 4 & -3 & * & * \\ * & -1 & 2 & 1 & -1 & * \\ * & * & * & 6 & 2 & * \\ * & * & * & * & * & 5 \end{pmatrix} \overset{?}{\rightarrow} \begin{pmatrix} 6 & 10 & 13 & 4 & 5 & 4 \\ 0 & 4 & -2 & 0 & 2 & 6 \\ 5 & 7 & 4 & -3 & 5 & 1 \\ -2 & -1 & 2 & 1 & -1 & 2 \\ 2 & 5 & -1 & 6 & 2 & 8 \\ -3 & 1 & 2 & 0 & 0 & 5 \end{pmatrix} \]

If \( {Z} \) is a Gaussian random vector of \( {\mathbb{R}^n} \) with covariance matrix \( {K} \), then for every non empty \( {I\subset\{1,\ldots,n\}} \), the random vector \( {Z_I:=(Z_i:i\in I)} \) of \( {\mathbb{R}^I} \) is Gaussian, with covariance \( {K_{I,I}} \).

Conversely, suppose that we prescribe the Gaussian marginal distributions of \( {Z_{I_1},\ldots,Z_{I_r}} \) for non empty subsets \( {I_1,\ldots,I_r\subset\{1,\ldots,n\}} \) with the natural compatibility conditions in case of overlaps. Can we find a Gaussian random vector \( {Z} \) on \( {\mathbb{R}^n} \) with such prescribed marginals? The problem reduces to the construction of a covariance matrix with prescribed diagonal blocs with possible overlaps: we prescribe the entries with index \( {(i,j)\in I_1\times I_1\cup\cdots\cup I_r\times I_r} \).

In linear algebra, this is a matrix completion problem, more precisely a positive definite matrix completion problem. Among all possible solutions, we may try to maximize the entropy. Recall that the Boltzmann-Shannon entropy \( {H(f)} \) of a probability density function \( {f} \) on \( {\mathbb{R}^n} \) is

\[ H(f):=-\int\!f(x)\,\log(f(x))\,dx. \]

For a Gaussian distribution, we obtain

\[ H(\mathcal{N}(m,K))=\frac{n}{2}\log(2\pi e)+\log(\det(K)). \]

Therefore, looking for a maximum entropy solution for our Gaussian problem with marginal constraints reduces to looking for a maximum-determinant matrix completion.

Let \( {\mathcal{S}_n^+} \) be the set of \( {n\times n} \) symmetric positive definite matrices. A very well known inequality attributed to Hadamard states that for every \( {K\in\mathcal{S}_n^+} \),

\[ \det(K)\leq K_{1,1}\cdots K_{n,n} \]

with equality if and only if \( {K} \) is diagonal. This can be rephrased as the solution of a matrix completion problem: among the elements of \( {\mathcal{S}_n^+} \) with prescribed diagonal, the diagonal matrix is the unique maximizer of the determinant. More generally, if one prescribes a subset of the \( {n(n+1)/2} \) entries, can we complete the remaining entries in order to obtain an element of \( {\mathcal{S}_n^+} \)? Is there a unique solution with maximum determinant? How to compute it?

The prescribed entries can be though of as a weighted graph \( {G=(V,E,w)} \), in other words as a map \( {w:E\rightarrow\mathbb{R}} \) where \( {E\subset\{\{i,j\}:i,j\in V\}} \) and \( {V=\{1,\ldots,n\}} \). We may now define the set of matrix completions as

\[ \mathcal{C}_G:=\{K\in\mathcal{S}_n^+:K_{i,j}=w_{\{i,j\}}\ \forall \{i,j\}\in E\}. \]

Each element of \( {\mathcal{C}_G} \) is a complete weighted graphs extending \( {(V,E,w)} \). Note that \( {\{i\}\in E} \) means that \( {G} \) prescribes the diagonal entry \( {(i,i)} \) in the matrix.

Theorem 1 (Max-det matrix completions) If \( {\{i\}\in E} \) for all \( {i\in V} \) and if \( {\mathcal{C}_G} \) is not empty, then there exists a unique \( {K_*\in\mathcal{C}_G} \) such that
\[ \det(K_*)= \max_{K\in\mathcal{C}_G}\det(K). \]
Moreover, \( {K_*} \) is the unique element of \( {\mathcal{C}_G} \) such that
\[ (K_*^{-1})_{i,j}=0 \quad\text{for all}\quad \{i,j\}\not\in E. \]

Proof: The non empty convex set \( {\mathcal{C}_G} \) is compact since \( {\{i\}\in E} \) for all \( {i\in V} \). Now the first statement follows from the strict concavity of the functional \( {K\in\mathcal{S}_n^+\mapsto\log(\det(K))} \). The second statement follows from the first order conditions of optimality:

\[ K^{-1}=(\nabla\log\det)(K) \in \mathrm{span}\{e^{(i,j)}:\{i,j\}\in E\} \]

where \( {e^{(i,j)}} \) is the \( {n\times n} \) symmetric matrix with \( {1} \) at positions \( {(i,j)} \) and \( {(j,i)} \) and \( {0} \) elsewhere. The gradient is relative to the Hilbert space of symmetric \( {n\times n} \) matrices equipped with the scalar product \( {A\cdot B:=\mathrm{Tr}(AB)} \). ☐

If there exists \( {i\in V} \) with \( {\{i\}\not\in E} \) and if \( {\mathcal{C}_G} \) is not empty, then one can show that \( {\mathcal{C}_G} \) is not compact and the function \( {\det} \) is unbounded on \( {\mathcal{C}_G} \). However, one can force compactness by adding the constraint \( {\mathrm{Tr}(K)\leq \tau} \) or the constraint \( {\mathrm{Tr}(K^2)\leq\tau} \) for some fixed real number \( {\tau>0} \), and derive a result similar to theorem 1, see Grone, Johnson, Sá, Wolkowicz, Positive definite completions of partial Hermitian matrices, Linear Algebra Appl. 58 (1984), pp. 109-124.

Theorem 1 does not provide a condition on \( {(V,E,w)} \) ensuring that \( {\mathcal{C}_G} \) is not empty. An answer is given by the following result. We recall that a cycle of length \( {r\geq1} \) is a sequence \( {i_1,\ldots,i_{r+1}\in V} \) such that \( {\{i_k,i_{k+1}\}\in E} \) for all \( {1\leq k\leq r} \) and \( {i_{r+1}=i_1} \). A cycle is irreducible if it does not contain a cycle of smaller length.

Theorem 2 (Existence of matrix completions) Suppose that for every clique of \( {(V,E)} \) the symmetric matrix associated to this clique and constructed from \( {w} \) is positive definite. Then \( {\mathcal{C}_G} \) is not empty iff \( {(V,E)} \) does not have irreducible cycles of length \( {>3} \).

One can find a proof of theorem 2 in Grone, Johnson, Sá, Wolkowicz, Positive definite completions of partial Hermitian matrices, Linear Algebra Appl. 58 (1984), pp. 109-124. The remarkable feature of theorem 2 is that it reduces the existence of a matrix completion to a purely graphical condition. Graphs without irreducible cycles of lengths \( {>3} \) are known as chordal graphs or triangulated graphs. Two examples: unions of complete graphs (i.e. the adjacency matrix is full block diagonal up to relabeling of the vertices), and band diagonal graphs (i.e. the adjacency matrix is full band diagonal up to relabeling of the vertices).

An ordering of the vertices is a bijection \( {\ell:V\rightarrow V} \). We say that \( {\ell} \) is a perfect elimination ordering of \( {(V,E)} \) when for every \( {i\in V} \), the set

\[ \{j\in V:\{i,j\}\in E,\ell(i)<\ell(j)\} \]

is a clique of \( {(V,E)} \). This is equivalent to say that \( {(V,E)} \) is chordal. This leads to a recursive algorithm similar to the Gaussian elimination process for the construction of the solution \( {K_*} \) in theorem 1, see e.g. Grone, Johnson, Sá, Wolkowicz, Positive definite completions of partial Hermitian matrices, Linear Algebra Appl. 58 (1984), pp. 109-124 and Dahl, Vandenberghe, Roychowdhury, Covariance selection for nonchordal graphs via chordal embedding, Optim. Methods Softw. 23 (2008), no. 4, 501-520. For more references on matrix completions and max-det problems, see e.g. Vandenberg, Boyd, Wu, Determinant maximization with linear matrix inequalities constraints, SIAM J. Matrix Anal. Appl. 19 (1998), no. 2, 499-533.

For a Gaussian random vector \( {Z} \) with covariance matrix \( {K\in\mathcal{S}_n^+} \) and for every \( {i\neq j\in\{1,\ldots,n\}} \), we have \( {K^{-1}_{i,j}=0} \) if and only if \( {Z_i} \) and \( {Z_j} \) are independent conditionally on the remaining components of \( {Z} \). This is easily seen from the Schur complement formula

\[ (K^{-1})_{I,I}=(K_{I,I}-K_{I,J}(K_{J,J})^{-1}K_{J,I})^{-1} \]

for every non empty and disjoint \( {I,J\subset\{1,\ldots,n\}} \). The Gaussian formulation of theorem 1 is a rather old result of statistics, see e.g. Dempster, Covariance Selection, Biometrics 28 (1972), no. 1, pp. 157-175. It constitutes a basic element of graphical models, see e.g. the books Whittaker, Graphical models in applied multivariate statistics, Wiley, 1990 and Lauritzen, Graphical models, Oxford University Press, 1996. The algorithm for the computation of \( {K_*} \) can be seen as a flavor of the Iterated (or Iterative) Proportional Fitting (IPF) algorithm developed at the origin for contingency tables. The IPF for Gaussian distributions is studied for instance in Cramer, Conditional iterative proportional fitting for Gaussian distributions, J. Multivariate Anal. 65 (1998), no. 2, 261-276.

Some other posts: