0% found this document useful (0 votes)
62 views15 pages

PARAFAC Algorithms for Large Tensors

This paper presents a new approach to Parallel Factor Analysis (PARAFAC) for large-scale tensor factorization, focusing on a modified Alternating Least Squares (ALS) algorithm that utilizes Hadamard products instead of Khatri-Rao products to reduce computational costs. The proposed grid-tensor factorization (gTF) method allows for efficient processing of extremely large tensors by dividing them into smaller sub-tensors and performing parallel computations. Experimental results demonstrate the effectiveness and high performance of the new algorithms compared to traditional methods.

Uploaded by

Igor Carneiro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views15 pages

PARAFAC Algorithms for Large Tensors

This paper presents a new approach to Parallel Factor Analysis (PARAFAC) for large-scale tensor factorization, focusing on a modified Alternating Least Squares (ALS) algorithm that utilizes Hadamard products instead of Khatri-Rao products to reduce computational costs. The proposed grid-tensor factorization (gTF) method allows for efficient processing of extremely large tensors by dividing them into smaller sub-tensors and performing parallel computations. Experimental results demonstrate the effectiveness and high performance of the new algorithms compared to traditional methods.

Uploaded by

Igor Carneiro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Neurocomputing 74 (2011) 1970–1984

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: [Link]/locate/neucom

PARAFAC algorithms for large-scale problems


Anh Huy Phan a,, Andrzej Cichocki b
a
Lab for Advanced Brain Signal Processing, BSI, RIKEN, Wakoshi, Saitama, Japan
b
Systems Research Institute Polish Academy of Science, Poland

a r t i c l e i n f o a b s t r a c t

Available online 19 February 2011 Parallel factor analysis (PARAFAC) is a tensor (multiway array) factorization method which allows to
Keywords: find hidden factors (component matrices) from a multidimensional data. Most of the existing
Canonical polyadic decomposition (CP) algorithms for the PARAFAC, especially the alternating least squares (ALS) algorithm need to compute
Tensor factorization Khatri–Rao products of tall factors and multiplication of large matrices, and due to this require high
PARAFAC computational cost and large memory and are not suitable for very large-scale-problems. Hence,
Large-scale dataset PARAFAC for large-scale data tensors is still a challenging problem. In this paper, we propose a new
Multiway classification approach based on a modified ALS algorithm which computes Hadamard products, instead Khatri–Rao
Parallel computing products, and employs relatively small matrices. The new algorithms are able to process extremely
Alternating least squares
large-scale tensors with billions of entries. Extensive experiments confirm the validity and high
Hierarchical ALS
performance of the developed algorithm in comparison with other well-known algorithms.
& 2011 Elsevier B.V. All rights reserved.

1. Introduction and problem statement dynamic tensor factorization [3,4]. This problem also often arises
in factorization of a training dataset.
Tensor factorizations are important techniques for data Tensor factorizations can be used as dimensionality reduction
mining, dimensionality reduction, pattern recognition, object methods in multiway classification, and factors are bases to
detection, classification, gene clustering, sparse nonnegative project a tensor data onto the feature subspace. In practice the
representation and coding, and blind source separation (BSS) training data is often augmented with some new samples, hence
[1,2]. For example, the PARAFAC also known as the canonical the basis factors for the expanded training tensor need to be
polyadic decomposition (CP) has already found a wide spectrum updated. A simple way is to factorize the new whole training
of applications in positron emission tomography (PET), spectro- tensor again. However, this approach demands high computa-
scopy, chemometrics and environmental science [1]. tional cost. A convenient way is that we update the old bases with
In this paper, we present PARAFAC factorization suitable for factors of the new coming data.
large-scale problems and fast parallel implementation. The pro- The proposed algorithm calculates Hadamard products, multi-
posed model and algorithms solve till now intractable problem plication of small matrices, and avoids Khatri–Rao products.
for arbitrary high dimension and large-scale tensors. We divide a Especially, this new algorithm opens new perspectives to find
given data tensor into a grid of multiple sub-tensors (see Fig. 2). nonnegative factors for the very large-scale nonnegative tensors
We factorize all the sub-tensors independently by using efficient that is potentially useful in many applications, such as those in
PARAFAC algorithms in parallel mode and next integrate partial neuroscience and data mining. For specific data, such as spectral
results for the whole tensor to estimate desired factors. data, images, chemical concentrations, in order to provide mean-
The developed model is referred here to as the grid-tensor ingful representation and clear physical interpretation, sparsity
factorization (gTF). The simplest case of the gTF is factorization for and nonnegative constraints are often imposed on the hidden
a large-scale tensor partitioned into only two sub-tensors. If one factors. The extracted basis components are part-based represen-
sub-tensor is considered as a new data coming and expanding tations of the nonnegative data. However, the constrained non-
the current (actual) data along a single mode (usually time), the negative PARAFAC, also called Nonnegative Tensor Factorization
problem is referred to as the dynamic tensor analysis (DTA) or (NTF) is not guaranteed to converge to a stationary point. Hence,
the NTF often needs to be analyzed with many trials, and different
regularized values to choose the desired solutions. This demands
high computational cost due to often accessing to the input data.
 Corresponding author. In practice, a large data tensor can be well explained by PARAFAC
E-mail addresses: phan@[Link] (A.H. Phan), factors, then it is more efficient to store the hidden factors, and
cia@[Link] (A. Cichocki). process them instead of the whole data tensor. The ALS PARAFAC

0925-2312/$ - see front matter & 2011 Elsevier B.V. All rights reserved.
doi:10.1016/[Link].2010.06.030
A.H. Phan, A. Cichocki / Neurocomputing 74 (2011) 1970–1984 1971

algorithm is a ‘‘work-horse’’ algorithm which is able to factorize algorithm for grid PARAFAC, and its pseudo-code which can be
data tensor with a high fitness [2,5,6]. However, it is not suitable implemented in a parallel processing system. In Section 4, a
for very large-scale problems. By using the estimated factors by conversion of PARAFAC model to its restricted model with
the modified ALS, we develop the very fast algorithm to retrieve nonnegative constraints is introduced. The new stopping criteria
the nonnegative factor hidden from a data tensor. is introduced in Section 5. Communication cost and practical
An important key factor for large-scale factorization is evalua- implementation of grid algorithms are discussed in Section 6.
tion of stopping criteria to control convergence. In this paper, we The grid PARAFAC is extended for complex-valued tensors
use the cost function value or the fit error. However, for a large- in Section 7. The experiments with large synthetic and real-world
scale tensor, an explicit computation of the cost function value is datasets will confirm the validity and efficiency of the proposed
impossible due to so much memory requirement to build up the algorithms in Section 8. Finally, Section 9 concludes the paper.
approximate tensor Y^ . We derive a simple low complexity
formula for stopping criteria applied to the grid-tensor 1.1. Tensor notations and basic models
factorizations.
The paper is organized as follows. After a brief overview of the We shall denote a tensor by underlined bold capital letters, e.g.
notation and PARAFAC tensor factorizations, dynamic tensor Y, matrices by bold capital letters, e.g. A, and vectors by bold italic
factorization is presented for three dimensional tensor lowercase letters, e.g. d. For a three dimensional tensor Y A RIJK ,
in Section 2. Section 3 is devoted to generalize the dynamic its frontal slice, lateral slice, and horizontal slice are denoted,
tensor factorization to grid PARAFAC. We present the ALS respectively, by Yk ¼ YHk ,Y:j: , and YiH . A tube (vector) at a position
(i,j) along mode-3 is denoted by yij: , and the corresponding tubes
along mode-2 and mode-1 are yi:k and y:jk . Mode-n matricized
version of a tensor Y is denoted by YðnÞ , and intuitively defined for
PARAFAC a 3-D tensor as follows:
Yð1Þ ¼ ½Y1 Y2  YK  A RIðJKÞ , ð1Þ

New Data Yð2Þ ¼ ½YT1 YT2  YTK  A RJðIKÞ , ð2Þ

Yð3Þ ¼ ½vecðY1 Þ  vecðY3 ÞT A RKðIJÞ : ð3Þ


The Khatri–Rao, Kronecker, and Hadamard products and element-
wise division are denoted, respectively, by ,  ,,,{ [1] (for
Data more detail see in Appendix B).
The product of a tensor and a matrix along mode-n is denoted
as
Y ¼ Gn A, or YðnÞ ¼ AGðnÞ : ð4Þ
PARAFAC
Multiplication in all possible modes ðn ¼ 1,2, . . . ,NÞ of a tensor G
and a set of matrices AðnÞ is denoted as
G  fAg ¼ G1 Að1Þ 2 Að2Þ    N AðNÞ , ð5Þ
Fig. 1. Illustration of the dynamic tensor factorization for two sub-tensors. The
purpose is to find the common factors for the concatenated tensor Y constructed h iT
from two sub-tensors Y ð1Þ and Y ð2Þ . ½G  fAgðnÞ ¼ AðnÞ GðnÞ AðNÞ      Aðn þ 1Þ Aðn1Þ      Að1Þ : ð6Þ

Factor reconstruction
PARAFAC for sub-tensors
Fast learning rules
Parallel computing
Parallel computing

• PARAFAC for full tensor


• Khatri-Rao products of
tall factors

Fig. 2. Illustration for the standard PARAFAC (dash arrow), and grid PARAFAC for large-scale tensors (solid arrows) in two stages.
1972 A.H. Phan, A. Cichocki / Neurocomputing 74 (2011) 1970–1984

Multiplication of a tensor with all but one mode is denoted as 2. Dynamic tensor factorization

Gn fAg ¼ G1 Að1Þ    2 Að2Þ    n1 Aðn1Þ n þ 1 Aðn þ 1Þ    N AðNÞ : For simplicity of explanation of our basic concept, we shall first
ð7Þ consider the grid PARAFAC for a three dimensional tensor parti-
tioned into only two sub-tensors. This model can also be con-
The above notation is adopted from [2]. sidered as a Dynamic Tensor Analysis (Factorization) in which
Mode-n multiplication of a tensor Y A RI1 I2 IN by a vector data is augmented along a single mode.
a A RIn is denoted by
Problem 1 (Dynamic Tensor Factorization). Consider two data
Z ¼ Yn a A RI1 In1 In þ 1 IN , ð8Þ tensors Y ð1Þ A RIJK1 , Y ð2Þ A RIJK2 with their approximated
PARAFAC models (see Fig. 1):
and the tensor-vector product of a tensor Y with set of N column
vectors fag ¼ ½að1Þ ,að2Þ , . . . ,aðNÞ  is given by Y ðkÞ  I1 AðkÞ 2 BðkÞ 3 CðkÞ , k ¼ 1,2, ð15Þ
ðkÞ IRk ðkÞ JRk ðkÞ Kk Rk
where A A R , B AR and C A R .
Yfag ¼ Y1 að1Þ2 að2Þ   N aðNÞ : ð9Þ
The problem is to find PARAFAC factors for the concatenated
PARAFAC [7] can be formulated as follows, ‘‘Factorize a given N-th tensor Y A RIJK (with K ¼K1 +K2):
order data tensor Y A RI1 I2 IN into a set of N component Y  I1 A2 B3 C, ð16Þ
In J
matrices (factors): AðnÞ ¼ ½a1ðnÞ ,aðnÞ ðnÞ
2 , . . . ,aJ  A R , ðn ¼ 1,2, . . . ,NÞ
representing the common (loading) factors’’, that is, without repeating all computation from the beginning, but using
already known partial results (factors).
J
¼ ½½Að1Þ ,Að2Þ , . . . ,AðNÞ  ¼ ½½fAg ¼ Y^ , The easiest way is that we factorize the concatenation tensor
X
Y að1Þ ð2Þ ðNÞ
j 3aj 3    3aj ð10Þ
j¼1 again. This method provides accurate results. However, for large-
scale data, this is not a suitable solution due to high computa-
where symbol ‘‘3’’ means outer product, and we assume unit- tional cost. In this section, we present three methods to deal with
length components JajðnÞ Jp ¼ 1 for n ¼ 1,2, . . . ,N1, j ¼ 1,2, . . . ,J, this problem. The final learning rules of all the methods are
and p ¼1,2 (see Fig. 2). Tensor Y^ is an approximation of the data identical.
tensor Y. Alternatively we can describe the PARAFAC model using
tensor notation is given by 1. By minimizing the cost function for the concatenated tensor.
ð1Þ ð2Þ ðNÞ
2. By adopting or modifying the ALS algorithm for the concate-
Y ¼ I1 A 2 A . . . N A , ð11Þ nated tensor represented by block matrices.
3. By concatenating factors of sub-tensors and then reducing the
where I is an identity tensor. The mode-n matricization of the
number of components.
tensors (11) can be represented by set of matrix factorizations:

YðnÞ ¼ AðnÞ ðAðNÞ      Aðn þ 1Þ  Aðn1Þ     Að1Þ ÞT


2.1. Approach 1: By directly minimizing a cost function
¼ AðnÞ ðAÞn T ðn ¼ 1,2, . . . ,NÞ: ð12Þ
The well-known PARAFAC algorithm is the alternating least Assume that the concatenation tensor Y can be factorized by R
squares (ALS) algorithm [2] which sequentially minimizes the components: A A RIR , B A RJR and C A RKR . If we split the factor
squared Euclidean distance (Frobenius norm) C into two matrices C ¼ ½CTð1Þ CTð2Þ T , Cð1Þ A RK1 R , and Cð2Þ A RK2 R , the
sub-tensors Y ð1Þ and Y ð2Þ are now approximated via their common
DðYJY^ Þ ¼ 12JYY^ J2F ð13Þ
factors as
ðnÞ
by using the update rules for factors A given by
Y ð1Þ ¼ I1 A2 B3 Cð1Þ , ð17Þ
ðnÞ n n T n 1
A ’YðnÞ fAg ðfAg fAg Þ
¼ YðnÞ fAgn ðfAT Ag,n Þ1 , ðn ¼ 1,2, . . . ,NÞ: ð14Þ Y ð2Þ ¼ I1 A2 B3 Cð2Þ : ð18Þ

where YðnÞ is the mode-n matricized version of tensor Y, and the Generally, a tensor factorization can be solved by minimizing a
Hadamard product is given by cost function of the concatenation tensor and its approximation,
that is
fAg,n ¼ AðNÞ ,    ,Aðn þ 1Þ ,Aðn1Þ    ,Að1Þ :
2
1 1X
For large-scale tensor, the mode-n matricized version YðnÞ of D¼ JYI1 A2 B3 CJ2F ¼ JY ðkÞ I1 A2 B3 CðkÞ J2F
size In  ð k a n Ik Þ is often a large matrix, and the Khatri–Rao
Q 2 2k¼1
product fAgn returns a tall matrix of size ð k a n Ik Þ  J in each 2
Q
1X
iteration step. Hence, this learning rule demands high computa- ¼ JYðkÞ AðCðkÞ  BÞT J2F : ð19Þ
2 k ¼ 1 ð1Þ
tional cost, and of course large memory for large-scale data. The
ALS algorithm cannot decompose a dense data tensor having In order to find the learning rule for factor A, we formulate the
billions of entries. To deal with such large dataset, instead of cost function for the mode-1 matricization (19), and compute its
computing the full tensor in (14), we can reduce number of gradient with respect to A
columns in YðnÞ by some sampled vectors (tubes) along each 2
X
mode-n which satisfy specific criteria [1,8]. Recently, the CUR rA D ¼ ðYðkÞ T
ð1Þ ðCðkÞ  BÞ þ AðCðkÞ  BÞ ðCðkÞ  BÞÞ
decomposition [9] gives a fast approximation for a raw matrix k¼1
2
based on some sampled rows, columns, and their intersection X T T
¼ ðYðkÞ
ð1Þ ðCðkÞ  BÞ þ AðCðkÞ CðkÞ Þ,ðB BÞÞ
entries. This method was also extended to multiway array to
k¼1
select tubes along each modes. However, both block-wise and 2
T T
X
CUR approaches have not completely resolved the problem for the ¼ YðkÞ
ð1Þ ðCðkÞ  BÞ þ AðC CÞ,ðB BÞ: ð20Þ
very large data, since they still to compute Khatri–Rao product. k¼1
A.H. Phan, A. Cichocki / Neurocomputing 74 (2011) 1970–1984 1973

By replacing Y ð1Þ and Y ð2Þ in (20) by their approximations in (15), that is


and setting (20) to zero, we obtain an update rule for A ~ 2 B
Y ¼ I1 A ~
~ 3 C: ð31Þ
!
2
Proof. We consider the mode-1 matricized version of the con-
Yð1Þ ðCðkÞ  BÞ ððCT CÞ,ðBT BÞÞ1
X ðkÞ
A’
catenation tensor Y
k¼1
!
2 ð1Þ ð2Þ
Yð1Þ ¼ ½Yð1Þ Yð1Þ  ¼ ½Að1Þ ðCð1Þ  Bð1Þ ÞT Að2Þ ðCð2Þ  Bð2Þ ÞT 
 B Þ ðCðkÞ  BÞ ððCT CÞ,ðBT BÞÞ1
X ðkÞ ðkÞ ðkÞ T
¼ A ðC " ð1Þ #
k¼1 ðC  Bð1Þ ÞT
! ¼ ½Að1Þ Að2Þ 
2
ðCð2Þ  Bð2Þ ÞT
AðkÞ ðCðkÞT CðkÞ ÞðBðkÞT BÞ ððCT CÞ,ðBT BÞÞ1
X
¼ " # ! T
Cð1Þ 0

k¼1
2
! ¼ ½Að1Þ Að2Þ   Bð1Þ ð2Þ  B
ð2Þ
0 C
AÞÞ ðQ {ðAT AÞÞ1 ,
X ðkÞ ðkÞ ðkÞT
¼ A ðP {ðA ð21Þ "" # #T
k¼1
ð1Þ ð2Þ Cð1Þ ð1Þ ð2Þ ~ C~  BÞ
~ T:
¼ ½A A   ½B B  ¼ Að ð32Þ
where Cð2Þ

PðkÞ ¼ ðAðkÞT AÞ,ðBðkÞT BÞ,ðCðkÞT CÞ, ð22Þ ~ B~ and C~ for the tensor Y
This leads to a tensor factorization of A,
given by
Q ¼ ðAT AÞ,ðBT BÞ,ðCT CÞ: ð23Þ ~ 2 B ~
~ 3 C:
Y ¼ I1 A ð33Þ
Similarly, update rule for B is given by
! This completes the proof. &
2
B ðP {ðB BÞÞ ðQ {ðBT BÞÞ1 :
X ðkÞ ðkÞ ðkÞT
B’ ð24Þ Lemma 1 ensures that any large-scale tensor can be factorized by
k¼1 concatenating factors of sub-tensors. However, the combined factors
Factor C is estimated via partially updating Cð1Þ , and Cð2Þ . The ALS often have higher rank than that we need. The next update rule will
help us to find R componential factors from factors A,~ B, ~
~ C.
update rule to update CðkÞ from the tensor Y ðkÞ is given by
Factor A can be updated by using the ALS learning rule (14) for
T T
CðkÞ ’YðkÞ
ð3Þ ðB  AÞððB BÞ,ðA AÞÞ
1
tensor Y as
¼ CðkÞ ðBðkÞ  AðkÞ ÞT ðB  AÞððBT BÞ,ðAT AÞÞ1 A’Yð1Þ ðC  BÞððCT CÞ,ðBT BÞÞ1
ðkÞ ðkÞT ðkÞT T T 1
¼ C ððB BÞ,ðA AÞÞððB BÞ,ðA AÞÞ ~ C~  BÞ
¼ Að ~ T ðC  BÞððCT CÞ,ðBT BÞÞ1
¼ CðkÞ ðPðkÞ {ðCðkÞT CÞÞðQ {ðCT CÞÞ1 : ð25Þ ~ C~ T CÞ,ðB~ T BÞÞððCT CÞ,ðBT BÞÞ1
¼ Aðð
T
We note that matrices PðkÞ A RRk R and Q A RRR are relatively ~
¼ AðP{ð A~ AÞÞðQ {ðAT AÞÞ1 , ð34Þ
small size. Hence, the learning rules (21), (24) and (25) are low
where
computational cost.
T T T
P ¼ ðA~ AÞ,ðB~ BÞ,ðC~ CÞ: ð35Þ
2.2. Approach 2: By modifying the ALS learning rule
We note that the learning rules (21), (34) are identical due to the
Factors A, B and C can be estimated by using the ALS (14) for following relations
the concatenation tensor Y " ð1Þ #T " # " ð1ÞT #
T C~ Cð1Þ C~ Cð1Þ
~
C C¼ ¼
A’Yð1Þ ðC  BÞððCT CÞ,ðBT BÞÞ1 : ð26Þ ð2Þ Cð2Þ
,
C~ C~ ð2ÞT Cð2Þ
By using the following relation, we can convert the ALS learning " ð1ÞT #
rule to the learning rule given in (21) ~ T
~ ð1Þ
~ ð2Þ T B~ B
B B ¼ ½B B  B¼ ð2ÞT ,
B~ B
" # ! " #
ð1Þ ð2Þ
Cð1Þ ð1Þ ð2Þ
Cð1Þ  B
Yð1Þ ðC  BÞ ¼ ½Yð1Þ Yð1Þ   B ¼ ½Yð1Þ Yð1Þ 
Cð2Þ Cð2Þ  B and
2
X 2
X 2
¼ YðkÞ
ð1Þ ðCðkÞ  BÞ ¼ AðkÞ ðCðkÞ  BðkÞ ÞT ðCðkÞ  BÞ ~ C~ T CÞ,ðB~ T BÞÞ ¼ AðkÞ ðC~
ðkÞT
CðkÞ Þ,ðB~
ðkÞT
X
Aðð BÞ:
k¼1 k¼1 k¼1
X2
ðkÞ ðkÞT ðkÞT 3. Grid PARAFAC
¼ A ððC CðkÞ Þ,ðB BÞÞ: ð27Þ
k¼1
We consider a tensor Y is divided into a grid of multiple sub-
tensors Y ðkÞ of size Ik1  Ik2      IkN , Kknn ¼ 1 Ikn ¼ In , where vector
P
2.3. Approach 3: By concatenating sub-factors
k ¼ ½k1 ,k2 , . . . ,kN  indicates sub-tensor index, 1 rkn r Kn , and Kn is
the number of sub-tensors along mode-n (see Fig. 2). We factorize
Another method to find factors of the concatenation tensor
all the sub-tensors by PARAFAC sub-factors UðnÞ in parallel mode.
Y is that we build up a factorization from concatenation factors. ðkÞ
Finally, the full factors AðnÞ for the whole tensor will be estimated
The following lemma will show this factorization.
from these sub-factors with fast learning rules and parallel
Lemma 1 (Concatenation of factors). Tensor Y is factorized by computing.
factors of R~ ¼ R1 þ R2 components
3.1. ALS algorithm for grid PARAFAC
A~ ¼ ½Að1Þ Að2Þ , ð28Þ
Assuming that the tensor Y can be approximated by N factors
B~ ¼ ½Bð1Þ Bð2Þ , ð29Þ AðnÞ , we split each factor AðnÞ into Kn parts
ðnÞT ðnÞT ðnÞT T
AðnÞ ¼ ½Að1Þ
" #
Cð1Þ Að2Þ . . . AðK nÞ
 , ð36Þ
C~ ¼ ð2Þ
, ð30Þ
C where sub-factor AðnÞ A RIkn J ,kn ¼ 1,2, . . . ,Kn .
ðkn Þ
1974 A.H. Phan, A. Cichocki / Neurocomputing 74 (2011) 1970–1984

Lemma 2. A sub-tensor Y ðkÞ , k ¼ ½k1 ,k2 , . . . ,kN  can be factorized by sub-factors AðnÞ
ðkn Þ
a set of N sub-factors fAðkÞ g ¼ fAð1Þ
ðk1 Þ
,Að2Þ
ðk2 Þ
ðNÞ
, . . . ,Aðk NÞ
g: 0 10 11
ðnÞ
X ðnÞ  
ðnÞT ðnÞ
 X 
ðnÞT ðnÞ

Aðkn Þ ’@ U PðkÞ { U Aðkn Þ A@ Q ðkÞ { Aðkn Þ Aðkn Þ A
Y ðkÞ ¼ I1 Að1Þ  Að2Þ    N Aðk
ðk1 Þ 2 ðk2 Þ
ðNÞ

¼ ½½fAðkÞ g: ð37Þ ðkÞ ðkÞ
k n ¼ kn k n ¼ kn

ð45Þ
Proof. Proof of this lemma is directly derived from definition of
Pkn 1 ¼ TS1 : ð46Þ
PARAFAC model for each entry yi , i ¼ ½i1 ,i2 , . . . ,iN , l ¼ 1 Ikl o
Pkn
in o l ¼ 1 Ikl where matrices T and S are computed for specific sub-factors AðnÞ
ðkn Þ
X   
yi ¼ að1Þ að2Þ . . . aiðNÞ : & ð38Þ T¼ UðnÞ PðkÞ { UðnÞT Aðk
ðnÞ

, ð47Þ
i1 i2 N ðkÞ ðkÞ
k1 ,...,kn1
kn þ 1 ,...,kN

The ALS algorithm for grid PARAFAC minimizes the standard X  


ðnÞT ðnÞ

S¼ Q ðkÞ { Aðk Aðkn Þ : ð48Þ
Euclidean distance for all the sub-tensors k1 ,...,kn1

kn þ 1 ,...,kN

1
D¼ JYI1 Að1Þ 2 Að2Þ . . . N AðNÞ J2F ð39Þ Each term of the two summations in (47) and (48) calculates
2
Hadamard divisions, and performs all operations on small-sized
K1 KN matrices, instead of Khatri–Rao products for tall matrices.
1 X 1 X ðkÞ
JY ðkÞ ½½fAðkÞ gJ2F ¼ fAðkÞ gn T J2F Matrices PðkÞ A RJk J and Q ðkÞ A RJJ can be calculated only once
X
¼  JYðnÞ AðnÞ
ðkn Þ
2k ¼1 2
k ¼1
1 N k time, and can be quickly updated after estimating the sub-factors
ð40Þ AðnÞ
kn
. Moreover, the matrices T and S can be calculated through a
parallel loop.
whose gradient components with respect to sub-factor AðnÞ
ðkn Þ
are The pseudo-code of the new ALS algorithm is given
given by in Algorithm 1. The normalization of components to unit-length
X  ðkÞ  
rAðnÞ D ¼ ðnÞ
YðnÞ A n þ Aðk An T An vectors are not explicitly displayed in Algorithm 1. Parallel FOR-
nÞ ðkÞ ðkÞ ðkÞ
ðkn Þ
k n ¼ kn
loop denoted by ‘‘parfor’’ loop is available with the Matlab
X  ðkÞ   Parallel Computing Toolbox.
¼ YðnÞ A n þ AðnÞ
ðkn Þ
fATðkÞ AðkÞ g,n : ð41Þ
ðkÞ
k n ¼ kn

This leads to the learning rule for sub-factor AðnÞ ðkn Þ


0 10 11
ðATðkÞ AðkÞ Þ,n A :
ðnÞ
X ðkÞ  X
AðknÞ
’@ YðnÞ A n A@ ð42Þ
ðkÞ
k n ¼ kn k n ¼ kn

Due to relatively small sizes of sub-tensors, computation of


A , ðATðkÞ AðkÞ Þ,n can be quickly executed on parallel workers
ðkÞ n
YðnÞ
ðkÞ
(labs) or sequentially on a single computer. Moreover, we can
eliminate the sub-tensors involving in estimation of sub-factors
ðnÞ
AðknÞ
to those built up from tubes sampled by the CUR decom-
position. We note that sub-tensors in the grid model do not need
to have consecutive tubes. We do not provide a detailed descrip-
tion because this research is beyond the scope of this paper.
The next section presents optimized algorithm which avoids
Khatri–Rao products An in (42).
ðkÞ

3.2. Optimized ALS learning rules

For sub-tensor Y ðkÞ , we factorize this tensor using the ALS


algorithm (14) or any other algorithms for PARAFAC [6,19,27,28]
with Jk components

Y ðkÞ  I1 Uð1Þ 2 Uð2Þ    N UðNÞ : ð43Þ


ðkÞ ðkÞ ðkÞ

The number of rank-one tensors Jk should be chosen so that


factors UðnÞ explain as much as possible the sub-tensor Y ðkÞ .
ðkÞ
Because sub-tensor Y ðkÞ has small-size, this factorization can
easily achieve high fitness. For a sub-tensor, we have

ðkÞ n
YðnÞ A  UðnÞ Un T An ¼ UðnÞ ðUTðkÞ AðkÞ Þ,n ¼ UðnÞ ðPðkÞ {ðUðnÞT AðnÞ
ðkn Þ
ÞÞ,
ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ

ð44Þ

where PðkÞ ¼ LetðUTðkÞ AðkÞ Þ, A RJk J . Q ðkÞ ¼ ðATðkÞ AðkÞ Þ, A RJJ ,


from (42) and (44), we obtain the fast update rule for the
A.H. Phan, A. Cichocki / Neurocomputing 74 (2011) 1970–1984 1975

4. Grid PARAFAC with nonnegative constraints The multiplicative algorithms have a relative low complexity
Factorization of nonnegative tensors such as spectral data, but they are characterized by rather slow convergence and they
images, data in social networks, email surveillance requires sometimes converge to spurious local minima.
nonnegative factors to provide meaningful components, and
physical interpretation. Many efficient algorithms have been
proposed or generalized from algorithms for nonnegative matrix
factorization (NMF). The ALS [1], HALS [8,10] or multiplicative LS 4.3. Hierarchical ALS algorithm for grid NTF
(least squares) algorithms [1,11], can be directly derived from the
Frobenius cost function (40). However, most efficient algorithms By taking into account that if the tensor Y in (42) is approxi-
for nonnegative tensor factorization cannot deal with very large- mated by a rank-one tensor, that means factors AðnÞ have only one
scale data. In this section, we extend grid PARAFAC for nonnega- component, matrix inversion simplifies into a division by a scalar,
tive factors, and proposed algorithms based on minimizing the and the ALS (42) to update a part k of a factor n is given by
same cost function in (40). Sub-tensors are factorized by PARAFAC
YðkÞ an
P
models, then nonnegative factors for the full tensor will be k n ¼ kn ðnÞ ðkn Þ
aðnÞ ’P : ð53Þ
estimated from factors of sub-tensors. ðkn Þ
ðaTðkn Þ aðkn Þ Þ,n
k n ¼ kn

4.1. ALS algorithm for grid NTF Therefore, if we formulate an ALS update rule for a component of
ðnÞ
the sub-factor Aðk nÞ
, this learning rule does not require matrix
inversion. Hence, it reduces computational cost, and is more
A simple extension is that we enforce nonnegativity or strictly
stable.
speaking positive constraints for factors updated by the ALS (45)
20 1 Assume that, we estimate the r-th component of the sub-factor

6B X ðnÞ
ðnÞ
AðknÞ
denoted as ½AðnÞ  ¼ aðrkn Þ ,
ðkn Þ r
n ¼ 1,2, . . . ,N,kn ¼ 1,2, . . . ,Kn ,
ðnÞ ðnÞT ðnÞ
Aðkn Þ ’4@ U ðPðkÞ {ðU Aðkn Þ ÞÞA
C
ðkÞ ðkÞ
k1 ,...,kn1
r ¼ 1,2, . . . ,R. This component only exists in factorizations of
kn þ 1 ,...,kN
0 3 sub-tensors Y k with kn be the n-th entry of the sub-tensor index
B X ðnÞT ðnÞ 1 7 k ¼ ½j1 , . . . ,jn1 ,kn ,jn þ 1 , . . . ,jN , jm ¼ 1, . . . ,Km . We split the approx-
@ ðQ ðkÞ {ðAðknÞ
Aðkn Þ ÞÞÞ 5 , ð49Þ h i
k1 ,...,kn1 imation of sub-tensors Y k , k ¼ kn into two parts:
kn þ 1 ,...,kN
þ n

where ½x þ ¼ maxfe,xg is a half-wave rectifying nonlinear projec- one consists of all rank-one tensors with which the specific
tion and e is a small positive constant. The ALS (42), (49) exploits component aðk nÞ
is not involved
r
information not only about gradient but also Hessian information
and gives some insight why ALS algorithm have good conver- X ðk Þ ðk Þ
Y ðkÞ ¼ aj 1 3aj 2 3    3aðkNÞ
: ð54Þ
gence rate. Unfortunately, the standard ALS algorithm suffers ðrÞ j
jar
from unstable convergence properties, demands high computa-
tional cost due to matrix inversion, often returns suboptimal And a rank-one tensor built up from aðk
r

solutions. Moreover, it is quite sensitive with respect to noise, and


can be relatively slow in the special case when data are nearly Y ðkÞ
ð þ rÞ
¼ aðk1Þ ðk2 Þ ðkN Þ
r 3ar 3    3ar : ð55Þ
collinear. Some extended ALS algorithms were applied the
Levenberg–Marquardt approach with the regularization para-
The approximation for each sub-tensor Y ðkÞ is now rewritten as
meter l
ðnÞ
AðknÞ
’½TðS þ lIÞ1  þ : ð50Þ Y ðkÞ ¼ Y ðkÞ
r
þY ðkÞ
ð þ rÞ
þ E: ð56Þ

However, for real-world data, a proper selection of parameter l To exploit the learning rule (53), we define a new residual tensor
decides the performance of the final result. which is approximated by the rank-one tensor Y ðkÞ
ð þ rÞ

ðkÞ
4.2. Multiplicative algorithm for grid NTF Y~ ð þ rÞ ¼ Y ðkÞ Y ðkÞ
ðrÞ
¼ arðk1 Þ 3aðk2Þ ðkN Þ
r 3    3ar þ E: ð57Þ

From the gradients (41), we apply a similar method to derive Learning rule to update the component aðk
r

is given by
the multiplicative LS algorithm for NMF [12–16] to obtain the
ðkÞ
update rule X Y~ ð þ rÞn farðkÞ g
aðk
r ’

k ¼ kn P
, ð58Þ
ðaðkÞT arðkÞ Þ,n
2 3 n
k n ¼ kn r
ðnÞ ðnÞ 6 X ðnÞ ðnÞT ðnÞ
Aðkn Þ ’Aðkn Þ ,4 U ðPðkÞ {ðU Aðkn Þ ÞÞ5
7
ðkÞ ðkÞ
k1 ,...,kn1
kn þ 1 ,...,kN
where farðkÞ g ¼ farðk1 Þ ,aðk2Þ ðkN Þ
r , . . . ,ar g.
þ
0 1 Each term in the numerator in (58) is rewritten as
X
{@AðnÞ ðnÞT ðnÞ
ðQ ðkÞ {ðAðk Aðkn Þ ÞÞA, ð51Þ ðkÞ ðkÞ
B C
ðkn Þ nÞ Y~ ð þ rÞn faðkÞ
r g ¼ ðY
ðkÞ
Y^ þ Y ððkÞ Þ faðkÞ
þ rÞ n r g
k1 ,...,kn1
kn þ 1 ,...,kN

¼ UðnÞ fUðkÞ gn T faðkÞ


r g
n
AðnÞ
ðkn Þ
fAðkÞ gn T faðkÞ
r g
n
þ arðkn Þ faðkÞ
r g
n T
faðkÞ
r g
n
or ðkÞ
h i h i
ðnÞ
Aðk ðnÞ
’Aðk ðnÞ
,½T þ {ðAðk SÞ ð52Þ ¼ UðnÞ fUTðkÞ AðkÞ g,n AðnÞ
ðkn Þ
fATðkÞ AðkÞ g,n þ aðk nÞ
r far
ðkÞT ðkÞ ,n
ar g
nÞ nÞ nÞ ðkÞ r r

where matrices T and S are defined in (47) and (48), and ¼ UðnÞ ½PðkÞ {ðUðnÞT AðnÞ Þr Aðk
ðnÞ
½Q ðkÞ {ðAðnÞT AðnÞ Þ þ arðkn Þ farðkÞT arðkÞ g,n
ðkÞ ðkÞ ðkÞ nÞ ðkn Þ ðkn Þ r
computed through a parallel loop in Steps 13 and 15
ð59Þ
of Algorithm 1. Replacing Step 16 by the expression in (52) gives
us pseudo-code of the multiplicative algorithm for grid NTF. where ½Ar ¼ ar denotes the r-th column vector of matrix A.
1976 A.H. Phan, A. Cichocki / Neurocomputing 74 (2011) 1970–1984

Learning rule (58) can be expressed in the compact form labs and the server in a parallel system during the factorization.
1 X ðnÞ For illustration, we will compute the communication cost for the
arðkn Þ ’aðk
r

þ U ½PðkÞ {ðUðnÞT AðnÞ Þr ALS algorithm. The value can be roughly measured as the total
wkn ðkÞ ðkÞ ðkÞ
k n ¼ kn data received and sent in all steps in Algorithm 1.
1 X ðnÞ ðnÞT ðnÞ For all the grid algorithms, the communication cost mostly
 Aðkn Þ ½Q ðkÞ {ðAðknÞ
Aðkn Þ Þr
wkn concentrates on computing matrices T and S to update the sub-
k n ¼ kn
1 1 ðnÞ factors AðnÞ
kn
. At Step 12, to update a matrix PðkÞ , a lab requires an
¼ aðk
r

þ tr  A sr , ð60Þ
wkn wkn Jk  J matrix PðkÞ , the Ikn  Jk sub-factor UðnÞ and the Ikn  J sub-
ðkÞ
where the scalar weight wkn ¼ and T and SðaðkÞT aðkÞ ,n
,
P
r Þ
ðnÞ
k n ¼ kn r factor AðknÞ
, then sends an Jk  J data to the server. Through Steps
are defined in (47) and (48), respectively.
12–15, the total transferred data is
Finally, the equation given in (60) is the update rule for a
component of a sub-factor AðnÞ
ðkn Þ
. By replacing Step 16 in Algorithm Ikn ðJk þ 3JÞ þ2Jkn J þ4J 2
1 by (60), we obtain the pseudo-code for the Hierarchical ALS
or
algorithm for grid NTF.
4Ikn J þ6J2

5. Stopping criterion under the assumption that sub-tensors are approximated with
the same number of components of their full tensors, that means
Stopping criterion takes an important role in identification of J ¼ Jk . For all the sub-tensors, the total transferred data in the
convergence of a factorization. For simplicity, the cost function ‘‘parfor’’ loop in Step 11 is
value (13) is usually used as stopping criterion. The FIT rate
!
X In Y
2
(FITð%Þ ¼ 1ðJYY^ J2F Þ=JYJ2F ) can also be used. However, due to 4J þ6J Kn ,
n
K n n
similar computation, we only mention the Frobenius norm. For a
large tensor, an explicit computation of the cost function or
value (13) is impossible due to so much memory requirement
4JK N1
X
In þ6J2 K N ,
to build up the approximate tensor Y^ . In this section, we derive a n
fast computation for stopping criterion applied to the grid
PARAFAC. The Frobenius norm of a raw sub-tensor and its with an assumption that K1 ¼ K2 ¼    ¼ KN ¼ K. Hence, the total
approximation is given by data transferred inside the ‘‘repeat’’ loop (in Step 7) is
6JK N1
X
ðkÞ 2 ðkÞ 2 ðkÞ In þ10J 2 K N :
DðkÞ ¼ JY ðkÞ Y^ JF ¼ JY ðkÞ J2F þ JY^ JF 2/Y ðkÞ , Y^ S ð61Þ n

where /Y, Y^ S is the inner product of two same-sized tensors For a specific data tensor, the communication cost of the ALS (45)
I1 IN
increases as the number of sub-tensors KN increases. The smaller
ðkÞ ðkÞ
/Y ðkÞ , Y^ S ¼
ðkÞ
Þ vecðY^ ðNÞ Þ: the number of sub-tensors, the lower the communication cost.
X X
 yiðkÞ y^
1 iN i1 iN
ðkÞ T
¼ vecðYðNÞ ð62Þ
i1 ¼ 1 iN ¼ 1 However, the sub-tensor’s size is limited by the configuration of
its working lab, such as memory, operating system. For 32-bit
Each terms in the expression (61) can be computed as follows:
system, tensor size should not exceed 3 Gb. Relatively small sub-
ðkÞ 2 ðkÞ tensors can be easily factorized with high fitness. When the
JY^ JF ¼ JvecðY^ ÞJ22 ¼ JfAðkÞ g 1J22 ¼ 1T fATðkÞ AðkÞ g, 1 ¼ 1T Q ðkÞ 1,
number of sub-tensors Kn along mode-n are exactly the tensor
ð63Þ size, that means Kn ¼ In, for n ¼ 3, . . . ,N, the sub-tensors simplify
into matrices. Factorizations of these sub-tensors will become
ðkÞ
/Y ðkÞ , Y^ S ¼ 1T fUðkÞ gT fAðkÞ g 1 ¼ 1T fUTðkÞ AðkÞ g, 1 ¼ 1T PðkÞ 1: ð64Þ Singular Value Decompositions which can always explain 100% of
the data. However, reconstruction of the full factors demands very
From (61), (63) and (64), we obtain a convenient and fast high communication cost due to the large number of sub-tensors.
computing for the cost function As a consequence, a large-scale tensor should be divided to
1 X ðkÞ 1 X have a minimum number of sub-tensors.
D¼ D ¼ ðJY ðkÞ J2F þ 1T Q ðkÞ 121T PðkÞ 1Þ Dividing the large-scale tensor into small sub-tensors could
2 2
k k
increase the risk of nearly collinear factors which cannot be
1 1X T
¼ JYJ2F þ ð1 Q ðkÞ 121T PðkÞ 1Þ: ð65Þ approximated by the standard ALS algorithm in (14). In order to
2 2 cope with the problem, we can use some modifications such as
k
Compression [17] and Line Search [18,19]. The compression
The first term JYJ2F is constant, hence can be neglected. The rest
technique compresses the data into a smaller tensor using the
terms are additions of all the entries of matrices Q ðkÞ , and PðkÞ .
TUCKER model. Next, an PARAFAC model is fitted to the core
tensor of the TUCKER model. The PARAFAC for the original data
6. Communication cost and practical considerations are represented as products of estimated factors.

This section analyzes the communication cost of grid algo- 6.2. Multistage reconstruction for grid decomposition
rithms, and discusses some practical issues on implementation of
the grid factorization. Techniques are introduced to deal with Reconstruction of the full factors from all sub-factors would
small-sub-tensors, or with the large number of sub-tensors. demand high communication cost of the order YðK N Þ. This section
is devoted to speeding up the reconstruction. A multistage
6.1. Communication cost reconstruction is recommended for this problem. At each stage,
we construct factors for the concatenated tensors which are built
The communication cost of algorithms for grid PARAFAC and up from neighbor sub-tensors; hence this reduces the number of
grid NTF can be evaluated as the total data transferred between sub-tensors for the next step.
A.H. Phan, A. Cichocki / Neurocomputing 74 (2011) 1970–1984 1977

X  
ðnÞT ðnÞ

S¼ Q ðkÞ { AðknÞ
Aðkn Þ , ð69Þ
k1 ,...,kn1
kn þ 1 ,...,kN

based on matrices
PðkÞ ¼ ðUTðkÞ AðkÞ Þ, A RJk J , ð70Þ

Q ðkÞ ¼ ðATðkÞ AðkÞ Þ, A RJJ : ð71Þ

8. Experiments

8.1. Grid decomposition with different grid size

In the first set of simulations, we considered a 1000  1000 


1000 dimensional tensor composed from three factors
AðnÞ A R100010 which were randomly selected from basic sparse
and smooth signals such as half-wave rectified sine, cosine, chirp
and sawtooth waveforms as illustrated in Fig. 4(a). This tensor
consists of 109 entries, and could consume 8 GB RAM. We split the
Fig. 3. Illustration of multistage reconstruction for grid PARAFAC. tensor into a grid of K  K  K sub-tensors, for K ¼2, 4, 5, 8, 10.
That means sub-tensors have 500, 250, 200, 125, 100 entries
along each mode, respectively. To evaluate the performance, we
The (N–1)-stage paradigm illustrated in Fig. 3(a) is an example. employ the Signal-to-Interference Ratio (SIR (dB)) between the
Assuming the tensor is split into a K1  K2      KN grid of sub- estimated and original components. The grid PARAFAC and grid
tensors. At the first stage, we estimate the two factors from all NTF achieved almost perfect performance ( 4 40 dB) as shown
sub-factors along mode-1 and mode-2. That means the grid ALS
algorithm will be employed with a grid of K1  K2  1      1.
This reconstruction demands a cost of YðK 2 Þ, and there are in
total K3      KN separate factorizations processed in parallel.
For the second stage, the data can be considered as a concatena-
tion of 1  1  K3      KN sub-tensors. We reconstruct the third
factor using the grid ALS algorithm with a grid of 0.1
1  1  K3  1      1. The next stage estimates the fourth 0.05
factor, and so on, until the last mode. This paradigm requires 0
N–1 stages, and dramatically reduces the communication cost. 10
However, this technique faces a trade-off between accuracy and 9 1000
8
processing speed. For a 3-D tensor, we need two reconstruction 7 800
stages as displayed in Fig. 3(a). 6 600
So
Another multistage paradigm is illustrated in Fig. 3(b). ur 5 400 ple
ce 4 m
3 200 Sa
2
7. Grid PARAFAC for complex-valued tensors 1 0

PARAFAC for complex-valued tensor was proposed and inves- 40


gridCP gridNTF multistage gCP multistage gNTF CP2NTF
tigated in some real applications such as for DS-CDMA sig-
35
nals [20–22], and for MIMO system [23]. This section will
extend the proposed algorithms for complex-valued data tensor. 30
The ALS algorithm for complex PARAFAC model is extended
from the learning rule in (14) and given by 25
SIR (dB)

ðnÞ n T ,n 1


A ’YðnÞ fA g ðfA A g Þ , ðn ¼ 1,2, . . . ,NÞ: ð66Þ 20
where symbol ‘‘n’’ denotes the complex conjugate operator, and 15
H is the Hermitian transpose. For 3-D complex-valued tensor, this
learning rule simplifies into the COMplex parallel FACtor analysis 10
(COMFAC) approach [20].
The ALS algorithm for the grid PARAFAC to update a sub-factor 5
ðnÞ
Aðk nÞ
is given by
0
ðnÞ
Aðk ’TS1 ; ð67Þ 2 4 5 8 10 20

Grid size K x K x K
ðnÞ
where matrices T and S are computed for specific sub-factors AðknÞ
X    Fig. 4. Illustration of Example 1: (a) components to build up the tensor;
ðnÞ ðnÞT ðnÞ (b) comparison of SIR indices obtained by grid PARAFAC, grid NTF, multistage
T¼ U PðkÞ { U Aðkn Þ , ð68Þ
ðkÞ ðkÞ grid factorizations, and the conversion model of PARAFAC to NTF for different grid
k1 ,...,kn1
kn þ 1 ,...,kN
sizes. The tensor was degraded by a Gaussian noise with SNR ¼ 0 dB.
1978 A.H. Phan, A. Cichocki / Neurocomputing 74 (2011) 1970–1984

in Table 1. The results were averaged over 100 runs. The 1,131 s, and achieved SIR ¼ 25.08 dB, whereas the reconstruction
performance almost did not change when varying the grid size. with three stages took place in 172 s and achieved 23.95 dB.
This tensor was next degraded by an additive Gaussian noise As analyzed in Section 6.1, communication cost of the recon-
with 10 dB. The grid PARAFAC achieved good performance with struction strongly depends on the total number of sub-tensors or
SIR  35 dB for different grid size K ¼2, 4, 5, 8, 10. While the grid the grid size. The larger the number of sub-tensors, the slower the
NTF achieved SIR  46 dB. We cannot see the effect of the grid processing speed. The grid PARAFAC completed the reconstruc-
size on the accuracy. tion from 4 sub-tensors in 18.65 s, and in 249 s for 125 sub-
The noise intensities were increased so that SNR ¼ 0 dB. Both tensors K ¼5, and in 29,310 s for 8000 sub-tensors K ¼20. A
the grid PARAFAC and grid NTF also obtained good performance similar effect of the grid size on the processing speed for the grid
with a small grid size K ¼2, 4. However, the performance NTF, and the multistage grid factorizations.
decreased as the sub-tensor’s size decreased. The algorithms We replicated this experiment but for a 4-D synthetic tensor
achieved only 20 dB for the gridsize K ¼20, or the sub-tensor’s X A R1000100010001000 composed from the similar components.
size 50  50  50. Small sub-tensors could not capture the hidden The tensor was also degraded by a Gaussian noise with SNR ¼
components due to their structures distorted by heavy noise. The 10 dB. The noisy dense tensor consisted of 1012 entries, and could
comparison of SIR indices for such kind of data are illustrated consumed 4 TB RAM with single precision format. The grid
in Fig. 4(b). We also analyzed the multistage reconstructions for PARAFAC and grid NTF factorized the noisy tensor with a
the grid PARAFAC and grid NTF, and the conversion model of 7  7  7  7 dimensional grid of sub-tensors. That means there
PARAFAC to NTF factors. Multistage reconstruction achieved were in total 2401 sub-tensors. Factorizations of all the sub-
slightly lower performance. For example, with a grid size of tensors took place in 19,156 s in a parallel system consisting of
K ¼5, the grid PARAFAC achieved 29.39 dB, and its multistage 4 nodes and 16 cores. The approximation time will decrease when
factorization achieved 26.85 dB. The grid NTF and its multistage using more nodes. The reconstructions of factors took 842 s using
obtained 35.11 and 32.97 dB, respectively. The performance of the the grid PARAFAC, and 3,911 s using the grid NTF. By using the
multistage paradigm could be degraded up to 2 dB, but this (N–2)-stage paradigm presented in Section 6.2, and illustrated
technique speeded up the processing time. The comparison of in Fig. 3(b), the grid PARAFAC and grid NTF completed the
processing speed between models is given in Table 2. With the reconstructions in 167 and 165 s, respectively. The performance
same grid size, the multistage reconstructions were always much of the experiment is shown in Table 3. All the algorithms achieved
faster than the grid models. For example, the grid PARAFAC very good performance 440 dB.
approximated the full factors from 512 ¼83 sub-tensors in
8.2. Synthetic benchmark

Table 1 In the next example, we factorized the synthetic tensor


Comparison of SIR (dB) indices for grid-tensor factorization with different grid size
Y A R500050005000 built up from 15 nonnegative components
for the synthetic tensor 1000  1000  1000.
(shown in Fig. 5(a)), and next degraded by an additive Gaussian
Algorithms Grid size (K  K  K) noise with SNR ¼ 0 dB. The standard deviation of noise was
quickly calculated from synthetic factors based on lemma 3 in
2 4 5 8 10 20
Appendix. A partial data with 200 samples along each dimension
Clean tensors is illustrated in Fig. 5(b), the 45-th slice of this noisy tensor is also
gridCP 41.81 42.54 42.02 42.53 42.21 given in Fig. 6(a). This dense tensor with 125 billions of entries
gridNTF 75.30 76.72 77.77 77.27 76.60 could consume 500 GB of memory. Factorizing such tensor using
10 dB Gaussian noises existing PARAFAC algorithms is impossible because of large
gridCP 34.85 35.11 35.75 35.42 35.12 tensor size. However, the proposed algorithm can quickly deal
gridNTF 46.82 46.52 46.92 46.35 46.50 with this problem. In the approximate step, we divided this
mgridCP 33.75 32.63 35.72 32.78 34.45
tensor into 8000 sub-tensors of size 250  250  250, and simul-
0 dB Gaussian noises taneously factorized them with Ja ¼ 25 PARAFAC components in a
gridCP 31.10 25.11 29.39 25.08 23.90 21.00 parallel system with 16 labs to obtain 8000 sub-factors
gridNTF 35.99 35.92 35.11 29.84 27.70 23.71
mgridCP 27.31 25.08 26.85 23.95 22.44 19.07
UðnÞ A R25025 , n ¼ 1,2,3,k ¼ ½k1 ,k2 ,k3 ,kn ¼ 1, . . . ,20. The experi-
ðkÞ
mgridNTF 33.67 30.97 32.97 29.03 27.03 23.25 ment was run on MATLAB ver 2008b and its Distributed Comput-
CP2NTF 25.15 25.16 24.83 23.52 22.82 20.27 ing Server and Parallel Computing toolboxes.
The full factors were estimated in two stages to reduce inter-
mgridCP—multistage grid PARAFAC, mgridNTF—multistage grid NTF, CP2NTF—
communication between labs. In the first stage, 16 groups of 500
conversion of PARAFAC factors to NTF factors.
consecutive sub-factors in sub-tensors of size 1250  1250  5000
were used to simultaneously estimate 16 sets of sub-factors. Then
from these sub-factors, we built up the full factors for tensor
5000  5000  5000. The whole step 2 took 56.51 s. With these
Table 2
Comparison of running time (s) for algorithms for Example 1.
Table 3
Algorithms Grid size (K  K  K) SIR indices and running time of grid algorithms for the 4-D noisy tensor in
Example 1.
2 4 5 8 10 20
Model SIR (dB) Running time (s)
0 dB Gaussian noises
Approximation 208 176 536 754 1,335 8,160 Sub-tensor approximation 19,156
gridCP 18.65 71.12 249 1,131 2,606 29,310 gridCP 75.15 842
gridNTF 18.05 98.07 169 592 1,102 8,113 gridNTF 87.72 3,911
mgridCP 14.67 29.39 52.64 172 362 3,437 mgridCP 77.01 167.02
mgridNTF 14.08 33.54 57.69 178 370 3,456 mgridNTF 78.29 164.94
CP2NTF 0.27 0.20 0.23 0.23 0.29 0.23 CP2NTF 43.00 0.06
A.H. Phan, A. Cichocki / Neurocomputing 74 (2011) 1970–1984 1979

0.1

0.05

0
15

10

Co 1000
m 800
po 5
ne 600
nt
s 400
ples
0
200 Sam
0

Fig. 5. Illustration for Example 2 with a noisy dense tensor of size 5000  5000  5000 (only 200 samples for each dimensions are shown). (a) 15 components. (b) Noisy
tensor. (c) Reconstructed tensor.

0 0

20 20

40 40

60 60

80 80

100 100

120 120

140 140

160 160

180 180

200 200
0 50 100 150 200 0 50 100 150 200

Fig. 6. Noisy slice and its reconstruction for Example 2 (only 200 samples for each dimensions are shown). (a) Noisy slice. (b) Reconstructed slice.

estimated PARAFAC factors, we can quickly retrieve the nonne- SIR indices in a range of [43.64 , 54.97] dB, and are depicted
gative factors under the data by applying the NTF algorithms. in Fig. 5. Fig. 6(b) shows the reconstruction of the noisy slice
This step only took 2.78 s. The 15 estimated factors achieved high in Fig. 6(a).
1980 A.H. Phan, A. Cichocki / Neurocomputing 74 (2011) 1970–1984

1 1

2 2

Components
Components
3 3

4
4

5
5
6
6
10 15 20 25 30 0 1 2 3 4
Frequency [Hz] Time [sec]

Fig. 7. Illustration of six nonnegative components estimated from 10 PARAFAC components for the Graz benchmark. (a) Spectral components. (b) Temporal components.

8.3. Graz EEG dataset for BCI Table 4


Comparison of fitness and running time for the LS, KL, HALS, and fast grid
multiplicative LS algorithms.
The Graz dataset [24] contains EEG signals involving left
hand, right hand, foot, tongue imagery movements acquired from Algorithms Example 3 Example 4
60 channels (with sampling frequency of 250 Hz) in a duration of
7 s (4 s after trigger). The dataset was recorded from three Fit (%) Time (s) Fit (%) Time (s)
subjects, and had 840 trials. All the EEG signals were transformed
LS 76.93 20,599 79.40 1,237.4
into the time-frequency domain using the complex Morlet wave- KL 76.53 29,595 79.30 2,823.7
let, to have a spectral tensor 60 channels  25 frequency bins HALS 76.71 4,820 79.44 417.3
(6–30 Hz)  250 time frames  840 trials. Due to meaningful gLS 77.08 112.2 79.37 68.3
factorization, the hidden factors under this EEG spectral tensor
require nonnegative constraints, and are considered as useful
features for successful EEG classification [11]. Therefore, we firstly
estimated PARAFAC factors of this tensor, then extracted nonnega-
tive factors. This dense tensor has a total of 315 millions of entries,
and consumes 1.26 GB of memory. Factorization of this full tensor
with 10 components took 3,900 s on a quad core computer
(2.67 GHz, 8 GB memory), and achieved FIT¼78.95%. However, the
grid ALS algorithm for a grid of 16 sub-tensors divided from the EEG
tensor along the 4-th dimension (trials) took only 112 s to extract Component 3
Component 1 Component 2
the same number components with FIT¼78.55%. The nonnegative
factors quickly derived from both approaches only took 0.41 s, and,
respectively, explain 77.08% and 77.07% of the raw tensor for the full
and grid processing. The components of the two spectral and
temporal factors are shown in Fig. 7. The estimated factors can be
used as inputs for EEG classification which is out of scope of this
paper. In Table 4, we compare performances of multiplicative LS, KL,
HALS, grid multiplicative LS algorithms.

Component 4 Component 5 Component 6


8.4. Visual and auditory EEG signals
Fig. 8. Illustration of topographic maps of six spatial components for example
4 extracted by the grid multiplicative algorithm.
We illustrate an example with the EEG benchmark EEG_AV_
stimuli [1] recorded during the 3 stimuli: auditory stimulus;
visual stimulus; both the auditory and the visual stimuli simulta- 50.3 s. The running time for the whole factorization was only 68.3 s.
neously. EEG signals were recorded from 61 channels during 1.5 s The full factors explain 79.37% of the original variance. In Fig. 8, we
after stimulus presentation at a sampling rate of 1 kHz. The display topographic maps of 6 spatial components. The component
observed tensor Y consists of the EEG spectra in the time- 1 is related to the visual stimulus, whereas the components 3 and
frequency domain using the complex Morlet wavelet of size 61 4 reflect activations by the auditory stimulus. Comparison of
channels  31 frequency bins (10–40 Hz)  126 time frames performance of algorithms is given in Table 4.
(0–500 ms)  25 trials  3 classes. The multiplicative LS, KL and
HALS algorithms [1,8], factorized this tensor with six nonnegative
components in 1,237.4, 2,823.7 and 417.3 s with fitness values 8.5. Classification of handwritten digits
79.40%, 79.30%, 79.44%, respectively.
We applied the grid NTF for 75 sub-tensors of size 61  31  126 We factorized and classified the MNIST dataset of images of
divided from the full data along the 4-th and 5-th dimensions. The handwritten digits (0–9). This dataset is composed 60,000 train-
approximate step took place in 18 s, and the construction step took ing images and 10,000 testing images. Each image is a 28  28
A.H. Phan, A. Cichocki / Neurocomputing 74 (2011) 1970–1984 1981

Table 5
Comparison of grid NTF and NTF for the digit dataset.

No. features Running time (s) Accuracy (%)

gNTF NTF gNTF NTF


Fig. 9. Illustration of some samples for the handwritten digit database.
20 17.38 394.31 95.60 95.86
21 25.26 400.67 96.12 96.00
22 26.28 313.37 96.02 96.40
23 19.71 357.08 96.43 96.18
24 24.91 479.83 96.24 96.21
25 38.96 520.47 96.51 96.56
26 24.03 271.90 96.74 96.78
27 29.95 349.11 96.73 96.84
28 26.11 253.94 96.95 96.62

Total 418.01 3340.68

2.52 2.46 2.42 2.38 2.34

0
2.1 1.86 1.86 1.8 1.79 20 1
2
3
10 4
1.43 1.37 1.26 1.2 0.59
5
0 6
7
0.53 0.41 0.41 0.16 0.13 8
Feature 2

–10 9

Fig. 10. Feature extraction for a digit 6 using PARAFAC features: (a) training digit; –20
(b)–(d) samples were reconstructed by using 10, 15 and 20 features with
corresponding bases in (e).
–30

grayscale (0–255) labeled representation of an individual digit. –40


Some samples of this dataset are displayed in Fig. 9.
In the training step, we simultaneously factorized 10 sub-
tensors corresponding to 10 classes (digits) into Jk ¼ 28 PARAFAC –50
components. This step took place in 185.42 s. Then the full –20 –10 0 10 20 30 40
nonnegative factors were built from 10 sets of the sub-factors. Feature 1
The two factors Að1Þ , Að2Þ were used as basis vectors to
Fig. 11. Visualization of 10 digit classes using the t-SNE components [29].
extract feature of an image Yn : f n ¼ vecðYn ÞT ðAð2Þ  Að1Þ Þ
ððAð2ÞT Að2Þ Þ,ððAð1ÞT Að1Þ ÞÞÞ1 . The number of features is exactly the
number of NTF components J.
An example for the 36027-th sample (digit six) in the training From the extracted features, the dataset is visualized via two
database is illustrated in Fig. 10. Training feature of this digit is a t-SNE components [25] in Fig. 11. For this benchmark, another
vectors which consists of J entries. A pair of basis vectors að1Þ j
, ajð2Þ classification method using the Tucker decomposition can be
ð1Þ ð2ÞT
forms a basis image aj aj . A reconstructed sample of a digit found in [26,29].
was a linear composition of basis images in which scaling
coefficients were features. In Fig. 10(e), we displayed basis images
which were ranked in the descending order of 20 largest features. 8.6. Estimation of system MIMO responses using the fourth-order
Feature coefficients were annotated in their corresponding bases. statistics
Superimposes of 10, 15, 20 basis images resulted reconstruction
images of this digit shown in Figs. 10(b)–(d), respectively. This experiment was replicated from the original one intro-
We used the k-nearest neighbor classifier (k¼ 3) for this duced in [23].
experiment, and verified the classification accuracy with 10 We considered an No  Ni MIMO system with Ni ¼ 20 inputs
different numbers of features: J ¼ 20,21, . . . ,28. Both two classi- and No ¼20 outputs. The system output equals:
fications based on gNTF and NTF returned quite similar accuracy
L1
given in Table 5. However, the gNTF-based method significantly X l’
X¼ Hl S þ W, ð72Þ
reduces the running time. Especially, this method provides a
l¼0
convenient and fast way for Monte Carlo analysis. To build nine
sets of nonnegative factors in this experiment, the total running where Hl A CNo Ni , for l ¼ 0,1, . . . ,L1 is the MIMO system impulse
time for gNTF is 185.42 + 232.59¼418.01 s, whereas that for NTF response matrix, S A CNi T contains
l’
the input signals, X A CNi T is
is 3340.68 s. a given output data matrix, S denotes the l positions (columns)
1982 A.H. Phan, A. Cichocki / Neurocomputing 74 (2011) 1970–1984

Fig. 12. Illustration of frequency responses of the 20  20 MIMO system for Example 6. (a)–(c) magnitude responses of the original and estimated impulse functions
obtained the 5-D PARAFAC, the grid PARAFAC and the method in [23]. (d)–(f) phase responses of the original and estimated impulse functions for different methods. For
each plot, the x-axis denotes the frequency points, and the y-axis represents the intensity. We displayed only some responses for the first three inputs and outputs.

shifting operator to the left, with ’0


the columns shifted in from cross-spectrum tensors were independently factorized in a par-
outside the matrix set to zero, S ¼ S. W is the observation noise. allel system. The reconstruction factors took placed only 247 s,
The inputs sj ,j ¼ 1,2, . . . ,Ni were taken to be i.i.d. BPSK signals. and achieved a performance of ONMSE ¼ 0.1792. Although the
The channel length was L¼20. The purpose is to estimate the H ~ i,j,: method in [23] processed the data only in 346 s, its estimated
ðNo  Ni Þ be the Nf-point Discrete Fourier Transform (DFT) of Hi,j,: , impulse responses were distorted from the original responses as
with Nf ¼128, then recover the impulse responses. illustrated in Figs. 12(c) and (f). This method provided an ONMSE
We computed the discretized 4-th order cross-spectrums, ¼ 0.4485. The grid PARAFAC significantly outperformed the other
defined as the K–1 dimensional DFT of the 4th-order cross methods.
cumulants. Concatenation of all the cross-spectrum tensors In Fig. 12, we displayed some selected responses for some first
formed a 5-D tensor C~ A CNo No No No Nf . Factorization of this inputs and outputs. The phase, constant permutation and scalar
tensor with No components allow to retrieve the system fre- ambiguities were corrected to match with the original responses.
quency response H ~ In each plot, the original magnitude or phase responses were
represented by dot lines, the estimated responses were shown by
C~  I1 Að1Þ 2 Að2Þ 3 Að3Þ 4 Að4Þ 5 Að5Þ ð73Þ dashed lines.
with the condition Að1Þ ¼ Að2Þ ¼ Að3Þ . The tensor of frequency
~ is computed as
responses H
9. Conclusions
~ ¼ I1 Að1Þ 3 Að5Þ :
H ð74Þ
We present the new fast and robust ALS algorithms for large-
The method proposed in [23] in fact factorizes only one of cross- scale PARAFAC. The validity and high performance of the pro-
spectrum tensors by the 4-D PARAFAC model. Hence, for large- posed algorithm have been confirmed even for noisy data, and
scale systems with large numbers of inputs and outputs, this also for the large-scale BCI benchmark, and image classification.
technique cannot provide a good solution. Experiments were The new fast stopping criterion is proposed for this algorithm.
presented in the paper [23] for very small MIMO system with Variations of the ALS algorithm for PARAFAC with regularized
few numbers of outputs and inputs such as 2, 3, 4. In this section, terms such as total variation, sparsity, smoothness, nonnegativity,
we will emphasize the grid PARAFAC for such kind of application orthogonality constraints can be applied to the grid PARAFAC
with large system. The performance index used here is the overall with some modifications on the learning rule (45). Strategy for
normalized mean-square error (ONMSE) [23]. For No ¼20, and grid division of a tensor can affect to the performance of
Nf ¼128, the observed tensor consisted of 20.48 millions of factorization, and the running time of parallel computing. Basi-
entries. Factorization of the whole tensor to find 5 factors took cally, sub-tensors’ sizes should satisfy unique conditions of
59,665 s, and achieved an ONMSE ¼ 0.1810. We applied the grid PARAFAC [2]. Moreover, sub-tensor should have maximum pos-
PARAFAC with a grid of 128  1  1  1  1. That means all the sible number of entries in its working lab. Increasing the number
A.H. Phan, A. Cichocki / Neurocomputing 74 (2011) 1970–1984 1983

of sub-tensors from the original tensors increases the commu- MATLAB function kron):
nication cost of the algorithms. To deal with this, we can estimate 2
a11 B a12 B . . . a1J B
3
the full factors in multistage as illustrated in Examples 1, 2, and 5. 6a B a B ...
6 21 22 a2J B 7
7
AB¼6 6 ^
7 ðB:6Þ
4 ^ & ^ 7 5
Appendix A. Basic statistics for a synthetic tensor aI1 B aI2 B . . . aIJ B

This section will present some basic statistics for a synthetic ¼ ½a1  b1 a1  b2 ... aJ  bR1 aJ  bR : ðB:7Þ
tensor Y. In example 1, we built up a dense and very large-scale
It should be mentioned that, in general, the outer product of
tensor. This tensor was then degraded by an additive Gaussian
vectors yields a tensor whereas the Kronecker product gives a
noise. To generate random noise, we needed the standard devia-
vector. For example, for the three vectors a A RJ ,b A RT ,c A RQ
tion of vector vecðYÞ. The following lemma helps us to quickly
their three-way outer product Y ¼ a3b3c A RITQ is a third-order
compute this statistic instead of manipulating on samples yi1 ...iN .
tensor with the entries yitq ¼ aj bt cq , while the three-way
Lemma 3. The following properties hold for an N-dimensional Kronecker product of the same vectors is a vector vecðYÞ ¼
tensor Y built up from N factors AðnÞ A RIn R c  b  a A RITQ .

Y ¼ I1 Að1Þ    N AðNÞ : ðA:1Þ


B.3. Hadamard product
T
X
,
1. X yi1 i2 iN ¼ ðf1 Ag Þ1: ðA:2Þ
i1 ,i2 ,...,iN y2 T T ,
i1 i2 iN ¼ 1 ðfA Ag Þ1: ðA:3Þ The Hadamard product of two equal-size matrices is the
2. i1 ,i2 ,...,iN element-wise product denoted by , (or : for MATLAB notation)
and defined as
Proof. Summation of all the entries of the tensor Y is given by 2
a11 b11 a12 b12 . . . a1J b1J
3

yi1 i2 iN ¼ 1T vecðYÞ


X
6 a21 b21 a22 b22 . . . a2J b2J 7
6 7
i1 ,i2 ,...,iN
A,B ¼ 66 ^
7: ðB:8Þ
4 ^ & ^ 7 5
¼ ð1IN  1IN1      1I1 ÞT ðAðNÞ  AðN1Þ      Að1Þ Þ1 aI1 bI1 aI2 bI2 . . . aIJ bIJ
¼ ðð1T AðNÞ Þ,ð1T AðN1Þ Þ,    ,ð1T Að1Þ ÞÞ1: ðA:4Þ
Frobenius norm of the tensor Y is given by B.4. Khatri–Rao product
y2i1 i2 iN ¼ vecðYÞT vecðYÞ ¼ 1T fAgT fAg 1
X

i1 ,i2 ,...,iN For two matrices A ¼ ½a1 ,a2 , . . . ,aJ  A RIJ and B ¼ ½b1 ,b2 , . . . ,
T
¼ 1 fA Ag 1: T ,
& ðA:5Þ bJ  A RTJ
with the same number of columns J, their Khatri–Rao product,
Appendix B. Outer, Kronecker, Khatri–Rao and Hadamard denoted by , performs the following operation:
products
A  B ¼ ½a1  b1 a2  b2 . . . aJ  bJ  ðB:9Þ
Several special matrix products are important for representa-
¼ ½vecðb1 aT1 Þvecðb2 aT2 Þ . . . vecðbJ aTJ Þ A RITJ : ðB:10Þ
tion of tensor factorizations and decompositions.
The following properties of the Khatri–Rao product are often
B.1. Outer product employed in the paper:

ðA  BÞT ðA  BÞ ¼ AT A,BT B, ðB:11Þ


The outer product of the tensors Y A RI1 I2 IN and
X A RJ1 J2 JM is given by and the Moore–Penrose pseudo-inverse can be expressed as

Z ¼ Y3X A R I1 I2 IN J1 J2 JM


, ðB:1Þ ðA  BÞy ¼ ½ðAT AÞ,ðBT BÞ1 ðA  BÞT : ðB:12Þ

where These properties also hold for complex-valued matrices AA CIJ ,


zi1 ,i2 ,...,iN ,j1 ,j2 ,...,jM ¼ yi1 ,i2 ,...,iN xj1 ,j2 ,...,jM : ðB:2Þ and B A CTJ . In this case, the transpose operators are replaced by
the Hermitian transposes.
Observe that, the tensor Z contains all the possible combinations
of pair-wise products between the elements of Y and X.
References
As special cases, the outer product of two vectors a A RI and
b A RJ yields a rank-one matrix
[1] A. Cichocki, R. Zdunek, A.-H. Phan, S. Amari, Nonnegative Matrix and Tensor
T
A ¼ a3b ¼ ab A RIJ ðB:3Þ Factorizations: Applications to Exploratory Multi-way Data Analysis and
Blind Source Separation, Wiley, Chichester, 2009.
and the outer product of three vectors: a A RI ,b A RJ and c A RQ [2] T. Kolda, B. Bader, Tensor decompositions and application, SIAM Review 51
(3) (2009) 455–500.
yields a third-order rank-one tensor: [3] J. Sun, D. Tao, C. Faloutsos, Beyond streams and graphs: dynamic tensor
analysis, in: Proceedings of the 12th ACM SIGKDD International Conference
Z ¼ a3b3c A RIJQ , ðB:4Þ on Knowledge Discovery and Data Mining, 2006, pp. 374–383.
[4] J. Sun, Incremental pattern discovery on streams, graphs and tensors, Ph.D.
where Thesis, CMU-CS-07-149, 2007.
zijq ¼ ai bj cq : ðB:5Þ [5] J. Carroll, J. Chang, Analysis of individual differences in multidimensional
scaling via an n-way generalization of Eckart–Young decomposition, Psycho-
metrika 35 (3) (1970) 283–319.
B.2. Kronecker product [6] G. Tomasi, R. Bro, A comparison of algorithms for fitting the PARAFAC model,
Computational Statistics and Data Analysis 50 (7) (2006) 1700–1734.
[7] R. Harshman, Foundations of the PARAFAC procedure: models and conditions
The Kronecker product of two matrices AA RIJ and B A RTR is for an explanatory multimodal factor analysis, UCLA Working Papers in
a matrix denoted as A  B A RITJR and defined as (see the Phonetics 16 (1970) 1–84.
1984 A.H. Phan, A. Cichocki / Neurocomputing 74 (2011) 1970–1984

[8] A. Cichocki, A.-H. Phan, Fast local algorithms for large scale nonnegative [26] B. Savas, L. Eldén, Handwritten digit classification using higher order singular
matrix and tensor factorizations, IEICE 92-A(3) 708–721 (2009) (invited value decomposition, Pattern Recognition 40 (3) (2007) 993–1003 doi:
paper). [Link]
[9] S. Goreinov, E. Tyrtyshnikov, N. Zamarashkin, A theory of pseudo-skeleton [27] P. Paatero, A weighted non-negative least squares algorithm for threeway
approximations, Linear Algebra and Applications 261 (1997) 1–21. PARAFAC factor analysis, Chemometrics Intelligent Laboratory Systems,
[10] A.-H. Phan, A. Cichocki, Multi-way nonnegative tensor factorization using fast vol. 38, no. 2, pp. 223–242, 1997.
hierarchical alternating least squares algorithm (HALS), in: Proceedings of [28] P. Tichavský, Z. Koldovský, Simultaneous search for all modes in multilinear
The 2008 International Symposium on Nonlinear Theory and its Applications, models, in: Proceedings of the IEEE International Conference on Acoustics,
Budapest, Hungary, 2008. Speech, and Signal Processing (ICASSP10), 2010.
[11] M. Mørup, L. Hansen, J. Parnas, S. Arnfred, Decomposing the time-frequency [29] A.-H. Phan, A. Cichocki, Tensor decompositions for feature extraction and
representation of EEG using non-negative matrix and multi-way factoriza- classification of high dimensional datasets, Nonlinear Theory and Its Applica-
tion, Technical Report, 2006. URL /[Link] tions, IEICE (invited paper) 1 (1) (2010) 37–68.
php?4144S.
[12] M. Daube-Witherspoon, G. Muehllehner, An iterative image space recon-
struction algorithm suitable for volume ECT, IEEE Transactions on Medical
Imaging 5 (1986) 61–66.
Anh Huy Phan received the B.E. and [Link]. degrees in
[13] A. De Pierro, On the relation between the ISRA and the EM algorithm for
area of Electronic Engineering from the Ho-Chi-Minh
positron emission tomography, IEEE Transactions on Medial Imaging 12 (2)
City University of Technologies. He worked as Deputy
(1993) 328–333.
Head of Research and Development Department,
[14] H. Lantéri, R. Soummmer, C. Aime, Comparison between ISRA and RLA
Broadcast Research and Application Center, Vietnam
algorithms: use of a Wiener filter based stopping criterion, Astronomy and
Television; And he is doing research towards his Ph.D.
Astrophysics Supplemantary Series 140 (1999) 235–246.
degree under supervision of Professor Cichocki. He is
[15] P. Eggermont, V. LaRiccia, Maximum smoothed likelihood density estimation
the co-author of the book: Nonnegative Matrix and
for inverse problems, Annals of Statistics 23 (1) (1995) 199–220.
Tensor Factorizations, J. Wiley 2009. His interest is in
[16] D. Lee, H. Seung, Learning of the parts of objects by non-negative matrix
tensor decompositions and their applications.
factorization, Nature 401 (1999) 788–791.
[17] C. Andersson, R. Bro, Improving the speed of multi-way algorithms: part I.
Tucker3, Chemometrics Intelligent Laboratory Systems 42 (1998) 93–103.
[18] R. Bro, Multi-way analysis in the food industry—models, algorithms, and
applications, Ph.D. Thesis, University of Amsterdam, Holland, 1998.
[19] M. Rajih, P. Comon, Enhanced line search: a novel method to accelerate
PARAFAC, SIAM Journal on Matrix Analysis and Applications 30 (3) (2008) Andrzej Cichocki was born in Poland. He received his
1128–1147. [Link]. (with honors), Ph.D. and Habilitate Doctorate
[20] N. Sidiropoulos, G. Giannakis, R. Bro, Blind parafac receivers for ds-cdma ([Link].) degrees, all in electrical engineering, from the
systems, IEEE Transactions on Signal Processing 48 (3) (2000) 810–823. Warsaw University of Technology (Poland). He is the
[21] D. Nion, L. De Lathauwer, A tensor-based blind ds-cdma receiver using co-author of four international books and monographs
simultaneous matrix diagonalization, in: Proceedings of SPAWC 07, IEEE (two of them translated to Chinese): Nonnegative
Workshop on Signal Processing Advances in Wireless Communications, Matrix and Tensor Factorizations, J. Wiley 2009, Adap-
Helsinki, Finland, 2007, URL /[Link] tive Blind Signal and Image Processing, J. Wiley 2002,
[22] E. Acar, C. Bingol, H. Bingol, R. Bro, B. Yener, Computational analysis of MOS Switched-Capacitor and Continuous-Time Inte-
epileptic focus localization, in: Proceedings of The Fourth IASTED Interna- grated Circuits and Systems (Springer- Verlag, 1989)
tional Conference on Biomedical Engineering BioMED2006, Innsbruck, Aus- and Neural Networks for Optimization and Signal
tria, 2006, pp. 317–322. Processing (J. Wiley and Teubner Verlag, 1993/94)
[23] Y. Yu, A.P. Petropulu, PARAFAC-based blind estimation of possibly under- and author or co-author of more than two hundreds
determined convolutive MIMO systems, IEEE Transactions on Signal Proces- papers. He is Editor-in-Chief of Journal Computational Intelligence and Neu-
sing 56 (2007) 111–124. roscience. Currently, he is the head of the laboratory for Advanced Brain Signal
[24] C. Brunner, R. Leeb, G. R. Muller-Putz, A. Schlogl, BCI competition 2008—Graz Processing in Brain Science Institute, RIKEN, Japan.
data set A, 2008.
[25] L. van der Maaten, G. Hinton, Visualizing data using t-sne, Journal of Machine
Learning Research 9 (2008) 2579–2605.

You might also like