Tensor reshaping

In multilinear algebra, a reshaping of tensors is any bijection between the set of indices of an order-M {\displaystyle M} tensor and the set of indices of an order-L {\displaystyle L} tensor, where L < M {\displaystyle L<M}. The use of indices presupposes tensors in coordinate representation with respect to a basis. The coordinate representation of a tensor can be regarded as a multi-dimensional array, and a bijection from one set of indices to another therefore amounts to a rearrangement of the array elements into an array of a different shape. Such a rearrangement constitutes a particular kind of linear map between the vector space of order-M {\displaystyle M} tensors and the vector space of order-L {\displaystyle L} tensors.

Definition

Given a positive integer M {\displaystyle M}, the notation [ M ] {\displaystyle [M]} refers to the set { 1 , … , M } {\displaystyle \{1,\dots ,M\}} of the first M positive integers.

For each integer m {\displaystyle m} where 1 ≤ m ≤ M {\displaystyle 1\leq m\leq M} for a positive integer M {\displaystyle M}, let V m {\displaystyle V_{m}} denote an I m {\displaystyle I_{m}}-dimensional vector space over a field F {\displaystyle F}. Then there are vector space isomorphisms (linear maps)

V 1 ⊗ ⋯ ⊗ V M ≃ F I 1 ⊗ ⋯ ⊗ F I M ≃ F I π 1 ⊗ ⋯ ⊗ F I π M ≃ F I π 1 I π 2 ⊗ F I π 3 ⊗ ⋯ ⊗ F I π M ≃ F I π 1 I π 3 ⊗ F I π 2 ⊗ F I π 4 ⊗ ⋯ ⊗ F I π M ⋮ ≃ F I 1 I 2 … I M , {\displaystyle {\begin{aligned}V_{1}\otimes \cdots \otimes V_{M}&\simeq F^{I_{1}}\otimes \cdots \otimes F^{I_{M}}\\&\simeq F^{I_{\pi _{1}}}\otimes \cdots \otimes F^{I_{\pi _{M}}}\\&\simeq F^{I_{\pi _{1}}I_{\pi _{2}}}\otimes F^{I_{\pi _{3}}}\otimes \cdots \otimes F^{I_{\pi _{M}}}\\&\simeq F^{I_{\pi _{1}}I_{\pi _{3}}}\otimes F^{I_{\pi _{2}}}\otimes F^{I_{\pi _{4}}}\otimes \cdots \otimes F^{I_{\pi _{M}}}\\&\,\,\,\vdots \\&\simeq F^{I_{1}I_{2}\ldots I_{M}},\end{aligned}}}

where π ∈ S M {\displaystyle \pi \in {\mathfrak {S}}_{M}} is any permutation and S M {\displaystyle {\mathfrak {S}}_{M}} is the symmetric group on M {\displaystyle M} elements. Via these (and other) vector space isomorphisms, a tensor can be interpreted in several ways as an order-L {\displaystyle L} tensor where L ≤ M {\displaystyle L\leq M}.

Coordinate representation

The first vector space isomorphism on the list above, V 1 ⊗ ⋯ ⊗ V M ≃ F I 1 ⊗ ⋯ ⊗ F I M {\displaystyle V_{1}\otimes \cdots \otimes V_{M}\simeq F^{I_{1}}\otimes \cdots \otimes F^{I_{M}}}, gives the coordinate representation of an abstract tensor. Assume that each of the M {\displaystyle M} vector spaces V m {\displaystyle V_{m}} has a basis { v 1 m , v 2 m , … , v I m m } {\displaystyle \{v_{1}^{m},v_{2}^{m},\ldots ,v_{I_{m}}^{m}\}}. The expression of a tensor with respect to this basis has the form A = ∑ i 1 = 1 I 1 … ∑ i M = 1 I M a i 1 , i 2 , … , i M v i 1 1 ⊗ v i 2 2 ⊗ ⋯ ⊗ v i M M , {\displaystyle {\mathcal {A}}=\sum _{i_{1}=1}^{I_{1}}\ldots \sum _{i_{M}=1}^{I_{M}}a_{i_{1},i_{2},\ldots ,i_{M}}v_{i_{1}}^{1}\otimes v_{i_{2}}^{2}\otimes \cdots \otimes v_{i_{M}}^{M},} where the coefficients a i 1 , i 2 , … , i M {\displaystyle a_{i_{1},i_{2},\ldots ,i_{M}}} are elements of F {\displaystyle F}. The coordinate representation of A {\displaystyle {\mathcal {A}}} is ∑ i 1 = 1 I 1 … ∑ i M = 1 I M a i 1 , i 2 , … , i M e i 1 1 ⊗ e i 2 2 ⊗ ⋯ ⊗ e i M M , {\displaystyle \sum _{i_{1}=1}^{I_{1}}\ldots \sum _{i_{M}=1}^{I_{M}}a_{i_{1},i_{2},\ldots ,i_{M}}\mathbf {e} _{i_{1}}^{1}\otimes \mathbf {e} _{i_{2}}^{2}\otimes \cdots \otimes \mathbf {e} _{i_{M}}^{M},}where e i m {\displaystyle \mathbf {e} _{i}^{m}} is the i th {\displaystyle i^{\text{th}}} standard basis vector of F I m {\displaystyle F^{I_{m}}}. This can be regarded as a M-way array whose elements are the coefficients a i 1 , i 2 , … , i M {\displaystyle a_{i_{1},i_{2},\ldots ,i_{M}}}.

General flattenings

For any permutation π ∈ S M {\displaystyle \pi \in {\mathfrak {S}}_{M}} there is a canonical isomorphism between the two tensor products of vector spaces V 1 ⊗ V 2 ⊗ ⋯ ⊗ V M {\displaystyle V_{1}\otimes V_{2}\otimes \cdots \otimes V_{M}} and V π ( 1 ) ⊗ V π ( 2 ) ⊗ ⋯ ⊗ V π ( M ) {\displaystyle V_{\pi (1)}\otimes V_{\pi (2)}\otimes \cdots \otimes V_{\pi (M)}}. Parentheses are usually omitted from such products due to the natural isomorphism between V i ⊗ ( V j ⊗ V k ) {\displaystyle V_{i}\otimes (V_{j}\otimes V_{k})} and ( V i ⊗ V j ) ⊗ V k {\displaystyle (V_{i}\otimes V_{j})\otimes V_{k}}, but may, of course, be reintroduced to emphasize a particular grouping of factors. In the grouping, ( V π ( 1 ) ⊗ ⋯ ⊗ V π ( r 1 ) ) ⊗ ( V π ( r 1 + 1 ) ⊗ ⋯ ⊗ V π ( r 2 ) ) ⊗ ⋯ ⊗ ( V π ( r L − 1 + 1 ) ⊗ ⋯ ⊗ V π ( r L ) ) , {\displaystyle (V_{\pi (1)}\otimes \cdots \otimes V_{\pi (r_{1})})\otimes (V_{\pi (r_{1}+1)}\otimes \cdots \otimes V_{\pi (r_{2})})\otimes \cdots \otimes (V_{\pi (r_{L-1}+1)}\otimes \cdots \otimes V_{\pi (r_{L})}),} there are L {\displaystyle L} groups with r l − r l − 1 {\displaystyle r_{l}-r_{l-1}} factors in the l th {\displaystyle l^{\text{th}}} group (where r 0 = 0 {\displaystyle r_{0}=0} and r L = M {\displaystyle r_{L}=M}).

Letting S l = ( π ( r l − 1 + 1 ) , π ( r l − 1 + 2 ) , … , π ( r l ) ) {\displaystyle S_{l}=(\pi (r_{l-1}+1),\pi (r_{l-1}+2),\ldots ,\pi (r_{l}))} for each l {\displaystyle l} satisfying 1 ≤ l ≤ L {\displaystyle 1\leq l\leq L}, an ( S 1 , S 2 , … , S L ) {\displaystyle (S_{1},S_{2},\ldots ,S_{L})}-flattening of a tensor A {\displaystyle {\mathcal {A}}}, denoted A ( S 1 , S 2 , … , S L ) {\displaystyle {\mathcal {A}}_{(S_{1},S_{2},\ldots ,S_{L})}}, is obtained by applying the two processes above within each of the L {\displaystyle L} groups of factors. That is, the coordinate representation of the l th {\displaystyle l^{\text{th}}} group of factors is obtained using the isomorphism ( V π ( r l − 1 + 1 ) ⊗ V π ( r l − 1 + 2 ) ⊗ ⋯ ⊗ V π ( r l ) ) ≃ ( F I π ( r l − 1 + 1 ) ⊗ F I π ( r l − 1 + 2 ) ⊗ ⋯ ⊗ F I π ( r l ) ) {\displaystyle (V_{\pi (r_{l-1}+1)}\otimes V_{\pi (r_{l-1}+2)}\otimes \cdots \otimes V_{\pi (r_{l})})\simeq (F^{I_{\pi (r_{l-1}+1)}}\otimes F^{I_{\pi (r_{l-1}+2)}}\otimes \cdots \otimes F^{I_{\pi (r_{l})}})}, which requires specifying bases for all of the vector spaces V k {\displaystyle V_{k}}. The result is then vectorized using a bijection μ l : [ I π ( r l − 1 + 1 ) ] × [ I π ( r l − 1 + 2 ) ] × ⋯ × [ I π ( r l ) ] → [ I S l ] {\displaystyle \mu _{l}:[I_{\pi (r_{l-1}+1)}]\times [I_{\pi (r_{l-1}+2)}]\times \cdots \times [I_{\pi (r_{l})}]\to [I_{S_{l}}]} to obtain an element of F I S l {\displaystyle F^{I_{S_{l}}}}, where I S l := ∏ i = r l − 1 + 1 r l I π ( i ) {\textstyle I_{S_{l}}:=\prod _{i=r_{l-1}+1}^{r_{l}}I_{\pi (i)}}, the product of the dimensions of the vector spaces in the l th {\displaystyle l^{\text{th}}} group of factors. The result of applying these isomorphisms within each group of factors is an element of F I S 1 ⊗ ⋯ ⊗ F I S L {\displaystyle F^{I_{S_{1}}}\otimes \cdots \otimes F^{I_{S_{L}}}}, which is a tensor of order L {\displaystyle L}.

Vectorization

By means of a bijective map μ : [ I 1 ] × ⋯ × [ I M ] → [ I 1 ⋯ I M ] {\displaystyle \mu :[I_{1}]\times \cdots \times [I_{M}]\to [I_{1}\cdots I_{M}]}, a vector space isomorphism between F I 1 ⊗ ⋯ ⊗ F I M {\displaystyle F^{I_{1}}\otimes \cdots \otimes F^{I_{M}}} and F I 1 ⋯ I M {\displaystyle F^{I_{1}\cdots I_{M}}} is constructed via the mapping e i 1 1 ⊗ ⋯ e i m m ⊗ ⋯ ⊗ e i M M ↦ e μ ( i 1 , i 2 , … , i M ) , {\displaystyle \mathbf {e} _{i_{1}}^{1}\otimes \cdots \mathbf {e} _{i_{m}}^{m}\otimes \cdots \otimes \mathbf {e} _{i_{M}}^{M}\mapsto \mathbf {e} _{\mu (i_{1},i_{2},\ldots ,i_{M})},} where for every natural number i {\displaystyle i} such that 1 ≤ i ≤ I 1 ⋯ I M {\displaystyle 1\leq i\leq I_{1}\cdots I_{M}}, the vector e i {\displaystyle \mathbf {e} _{i}} denotes the ith standard basis vector of F i 1 ⋯ i M {\displaystyle F^{i_{1}\cdots i_{M}}}. In such a reshaping, the tensor is simply interpreted as a vector in F I 1 ⋯ I M {\displaystyle F^{I_{1}\cdots I_{M}}}. This is known as vectorization, and is analogous to vectorization of matrices. A standard choice of bijection μ {\displaystyle \mu } is such that

vec ⁡ ( A ) = [ a 1 , 1 , … , 1 a 2 , 1 , … , 1 ⋯ a n 1 , 1 , … , 1 a 1 , 2 , 1 , … , 1 ⋯ a I 1 , I 2 , … , I M ] T , {\displaystyle \operatorname {vec} ({\mathcal {A}})={\begin{bmatrix}a_{1,1,\ldots ,1}&a_{2,1,\ldots ,1}&\cdots &a_{n_{1},1,\ldots ,1}&a_{1,2,1,\ldots ,1}&\cdots &a_{I_{1},I_{2},\ldots ,I_{M}}\end{bmatrix}}^{T},}

which is consistent with the way in which the colon operator in Matlab and GNU Octave reshapes a higher-order tensor into a vector. In general, the vectorization of A {\displaystyle {\mathcal {A}}} is the vector [ a μ − 1 ( i ) ] i = 1 I 1 ⋯ I M {\displaystyle [a_{\mu ^{-1}(i)}]_{i=1}^{I_{1}\cdots I_{M}}}.

The vectorization of A {\displaystyle {\mathcal {A}}} denoted with v e c ( A ) {\displaystyle vec({\mathcal {A}})} or A [ : ] {\displaystyle {\mathcal {A}}_{[:]}} is an [ S 1 , S 2 ] {\displaystyle [S_{1},S_{2}]}-reshaping where S 1 = ( 1 , 2 , … , M ) {\displaystyle S_{1}=(1,2,\ldots ,M)} and S 2 = ∅ {\displaystyle S_{2}=\emptyset }.

Mode- m Flattening / Mode- m Matrixization

Let A ∈ F I 1 ⊗ F I 2 ⊗ ⋯ ⊗ F I M {\displaystyle {\mathcal {A}}\in F^{I_{1}}\otimes F^{I_{2}}\otimes \cdots \otimes F^{I_{M}}} be the coordinate representation of an abstract tensor with respect to a basis. Mode-m matrixizing (a.k.a. flattening) of A {\displaystyle {\mathcal {A}}} is an [ S 1 , S 2 ] {\displaystyle [S_{1},S_{2}]}-reshaping in which S 1 = ( m ) {\displaystyle S_{1}=(m)} and S 2 = ( 1 , 2 , … , m − 1 , m + 1 , … , M ) {\displaystyle S_{2}=(1,2,\ldots ,m-1,m+1,\ldots ,M)}. Usually, a standard matrixizing is denoted by

A [ m ] = A [ S 1 , S 2 ] {\displaystyle {\mathbf {A} }_{[m]}={\mathcal {A}}_{[S_{1},S_{2}]}}

This reshaping is sometimes called matrixizing, matricizing, flattening or unfolding in the literature. A standard choice for the bijections μ 1 , μ 2 {\displaystyle \mu _{1},\ \mu _{2}} is the one that is consistent with the reshape function in Matlab and GNU Octave, namely

A [ m ] := [ a 1 , 1 , … , 1 , 1 , 1 , … , 1 a 2 , 1 , … , 1 , 1 , 1 , … , 1 ⋯ a I 1 , I 2 , … , I m − 1 , 1 , I m + 1 , … , I M a 1 , 1 , … , 1 , 2 , 1 , … , 1 a 2 , 1 , … , 1 , 2 , 1 , … , 1 ⋯ a I 1 , I 2 , … , I m − 1 , 2 , I m + 1 , … , I M ⋮ ⋮ ⋮ a 1 , 1 , … , 1 , I m , 1 , … , 1 a 2 , 1 , … , 1 , I m , 1 , … , 1 ⋯ a I 1 , I 2 , … , I m − 1 , I m , I m + 1 , … , I M ] {\displaystyle {\mathbf {A} }_{[m]}:={\begin{bmatrix}a_{1,1,\ldots ,1,1,1,\ldots ,1}&a_{2,1,\ldots ,1,1,1,\ldots ,1}&\cdots &a_{I_{1},I_{2},\ldots ,I_{m-1},1,I_{m+1},\ldots ,I_{M}}\\a_{1,1,\ldots ,1,2,1,\ldots ,1}&a_{2,1,\ldots ,1,2,1,\ldots ,1}&\cdots &a_{I_{1},I_{2},\ldots ,I_{m-1},2,I_{m+1},\ldots ,I_{M}}\\\vdots &\vdots &&\vdots \\a_{1,1,\ldots ,1,I_{m},1,\ldots ,1}&a_{2,1,\ldots ,1,I_{m},1,\ldots ,1}&\cdots &a_{I_{1},I_{2},\ldots ,I_{m-1},I_{m},I_{m+1},\ldots ,I_{M}}\end{bmatrix}}}

Definition Mode-m Matrixizing: [ A [ m ] ] j k = a i 1 … i m … i M , where j = i m and k = 1 + ∑ n = 0 n ≠ m M ( i n − 1 ) ∏ l = 0 l ≠ m n − 1 I l . {\displaystyle [{\mathbf {A} }_{[m]}]_{jk}=a_{i_{1}\dots i_{m}\dots i_{M}},\;\;{\text{ where }}j=i_{m}{\text{ and }}k=1+\sum _{n=0 \atop n\neq m}^{M}(i_{n}-1)\prod _{l=0 \atop l\neq m}^{n-1}I_{l}.} The mode-m matrixizing of a tensor A ∈ F I 1 × . . . I M , {\displaystyle {\mathcal {A}}\in F^{I_{1}\times ...I_{M}},} is defined as the matrix A [ m ] ∈ F I m × ( I 1 … I m − 1 I m + 1 … I M ) {\displaystyle {\mathbf {A} }_{[m]}\in F^{I_{m}\times (I_{1}\dots I_{m-1}I_{m+1}\dots I_{M})}}. As the parenthetical ordering indicates, the mode-m column vectors are arranged by sweeping all the other mode indices through their ranges, with smaller mode indexes varying more rapidly than larger ones; thus