Block-Householder QR factorization

As usual, the key to a BLAS-3 version is to use deferred updates, in this case applied to the Householder QR factorization. Let Qi be the ith partial product for the H-reflectors and suppose it can be represented using two matrices (not vectors) I − VI UIT:
Qi = Hi ... H2 H1 = I − VI UIT
The recursion defining the next Q is
 Qi+1 = Hi+1 Qi
         = (I − wi+1wi+1T)(I − ViUiT)
         = Hi+1 − Hi+1 Vi UiT
         = I − wiwiT − Hi+1 Vi UiT
         = I − [Hi+1 Vi wi] [ Ui wii]T
         = I − [Vi wi+1] [ Ui QiTwi+1]

Let ν (that is a "nu", not "vee") be the chosen block size. So a block of transforms can be built up for a block column C of A by:


C = [c1, c2, ... , cν] 
U = [ ], V = [ ]
for j = 1:ν Find Householder wj s.t. Hj cj is zero in j+1:n V = [ Hj*V, wj] U = [ U, wj] if (j < ν), cj+1 = (I − VUT)cj+1.

A complete algorithm for the case where the block size ν evenly divides n is:
A = [A1, A2, ... , Ap], where p*ν = n.
for k = 1:p
    Set i  = (k−1)*ν + 1
    Find U, V for C = A(i:m,i:i+ν−1)
    A = A(i:m,i+ν:1) = (I − VUT) A(i:m,i+ν:1)

The way the block algorithm is stated, it requires two temporary arrays U and V, which can be of size up to m×ν. In practice only one of the two needs to be allocated and the other is stored in the corresponding part of A. Also, how should the block size ν be chosen? The same as the value that would give maximal computational rate in matrix*matrix multiply, or something else? A professional implementation could take into account the decreasing sizes of the arrays being used as the factorization proceeds and use a small ν at the start, then increase it as the columns in U and V become shorter. What is the maximal performance boost that could be expected from doing that? If small, then why bother? If large, how much time should be allocated to coding that more dynamic version?

All of those questions can be answered with the tools you now have:

The first two require no coding at all, and you should be able to knock those off in an hour. Their information also will not depend on which normalization or storage scheme used for the Housholder reflectors.


Next: Singular value factorization

  • Started: Mon 23 Aug 2010, 06:00 PM
  • Modified: Tue 09 Nov 2010, 12:48 PM to add in the Q_1 versus U_1 note
  • Last Modified: Mon 04 Nov 2019, 07:19 AM