Minimax-Optimal Privacy-Preserving Sparse PCA in Distributed
Systems
Jason Ge Zhaoran Wang Mengdi Wang Han Liu
Princeton University Northwestern University Princeton University Tencent AI Lab
Abstract
This paper proposes a distributed privacy-
preserving sparse PCA (DPS-PCA) algo-
rithm that generates a minimax-optimal
sparse PCA estimator under dierential pri-
vacy constraints. In a distributed optimiza-
tion framework, data provid e rs can use this al-
gorithm to collaboratively analyze the u n ion
of their data sets while limiting the disclosure
of their private information. DPS-PCA can
recover the leading eigenspace of the popula-
tion covariance at a geometric convergence
rate, and simultaneously achieves the optimal
minimax statistical error for high-dimensional
data. Our algorithm provides fine-tuned con-
trol over the tradeo between e s timation accu-
racy and privacy preservation. Numerical sim-
ulations demonstrate that DPS-PCA signifi-
cantly outperforms other privacy-preserving
PCA methods in terms of estimation accuracy
and computational eciency.
1Introduction
Principal component analysis (PCA) is one of the most
widely used to ols for data analysis and dimension re-
duction. Let
x
1
, x
2
, ··· , x
n
2 R
d
be samples drawn
from some underlying distribution with covariance ma-
trix
2 R
dd
, where
d
is the dimension and
n
is
the data size. Principal component analysis estimates
the leading
k
(
k d
)eigenvectorsof based on the
given samples. Projection of the samples into the
k
-
dimensional principal subspace spanned by the eigen-
vectors provides a low-dimensional representation of
the high-dimensional data.
We focus on a distributed setting in which multiple
data providers interact with one another. Suppose that
the data providers, who may come from dierent inter-
est groups, aim to calculate the principal components
Pro ceedings of the 21
st
International Conference on Artifi-
cial Intelligence and Statistics (AISTATS) 2018, Lanzarote,
Spain. PMLR: Volume 84. Copyright 2018 by the author(s).
of the union of their data sets.Forexample,thedata
providers can be healthcare providers holding medi-
cal records related to certain types of drugs or proce-
dures, or research institutes with genome-wide as socia-
tion (GWA) genetic study results for certain diseases.
Even if the data providers may wish to collaborate on
data analysis, they are often unwilling to directly share
data with others because inappropriate data sharing
may lead to severe privacy breach [22, 32].
In practical contexts where privacy matters, there has
been a surging demand for collaborative principal com-
ponent analysis applications, including clinical infer-
ence [
13
]andgenomepopulationstructuremodeling
[
12
]. This motivates us to redesign principal component
analysis algorithms, in order to allow data providers
to collaboratively analyze the data sets without phys-
ically merging them. It is desired that the new algo-
rithm achieves strong privacy guarantees, making it
nearly impossible to infer an individual’s exact state
in the data set from the algorithm’s output. We adopt
the notion of dierential privacy introduced by [
10
], a
cryptographically inspired guarantee that is resistant
to many forms of privacy attacks (s e e [
14
]). Intuitively,
dierential privacy of an algorithm means that a small
change in the input has no signicant impact on the
output of the algorithm (a rigorous definition to be
provided later).
High dimensionality poses another critical challenge to
modern data applications such as population genome
study and brain voxel estimation. It is desired that the
new algorithm produces estimates achieving the opti-
mal or nearly optimal statistical accuracy. However, in
the typical high-dimensional setting where
d n
,sin-
gular value decomposition fails to work by producing
inconsistent estimates of principal components [
23
,
31
].
To overcome this problem, we adopt the sparsity regu-
larization for high dimensional PCA, which has been
extensively studied in [
1
,
3
,
5
,
7
,
8
,
23
,
24
,
27
,
29
,
31
,
36, 37, 40, 41, 43, 45, 49].
The goal of this paper is to provide an algorithmic
solution to distributed principal component analysis,
which leverages the sparsity of high-dimensional data
as well as achieves sucient privacy preservation. In
Minimax-Optimal Privacy-Preserving Sparse PCA in Distributed Systems
what follows, we briefly review some earlier works on
privacy-preserving PCA, sparsity regularization and
distributed optimization, and discuss their relevance
with the current paper.
Related works.
One line of existing works is on dier-
entially private algorithms for model-free PCA,suchas
the Sub Linear Queries (SuLQ) method by [
4
], the Pri-
vate PCA (PPCA) and modified SuLQ (MOD-SuLQ)
method by [
6
]andAnalyzeGaussin[
11
]. MOD-SuLQ
and Analyze Gauss both add noise to the sample co-
variance matrix before applying the singular value de-
composition. PPCA applies the exponential mecha-
nism [
30
]toperformprivateselectionoftheleading
eigenspace. However, model-free methods cannot be
applied successfully on high dimensional data. It is
demonstrated in [
6
]thatthesamplesizehastobeon
the same order of dimension
n d
so that a dieren-
tially private algorithm can achieve a targeted level of
estimation accuracy. The utility gap in Analyze Gauss
[
11
] (Theorem 3) is on the order of
O
(
p
d
). This is quite
discouraging for high dimensional scenarios.
Another line of related work is on the locally dieren-
tial privacy of PCA. In this setting, each data sample
is obscured by some random transformation to ensure
privacy. One of the earliest among these works is [
48
],
which proposed to apply a random linear transforma-
tion to each sample. Other works such as [
9
,
42
,
44
]
studied the theoretical tradeo between the minimax
statistical rate and the local privacy level imposed on
the initial data. In contrast to these works, we consider
the interactive setting in which each data provider re-
leases information by answering a series of statistical
queries with designed protocol, as opposed to simply ob-
scuring all the samples once and for all. For this reason,
our approach is more task-specific and very dierent
from the locally dierential privacy studied earlier.
Athirdlineofrelatedworksisonhigh-dimensional
PCA, e.g., [
16
18
], which proved the dimension-free
statistical results under incoherent assumption of the
sample covariance matrix. Power method for sparse
PCA has been studied in [
43
], which is shown to achieve
the statistical minimax rate in polynomial time. In this
paper, we follow their technique of imposing sparsity
by adding a thresholding step in the power iteration.
Asimilartechniquewasalsousedinthetruncated
power method in [
47
]. We also note that distributed
versions of PCA have been studied by [
25
,
28
,
34
]. The
fo c us there is the tradeo between communication ef-
ficiency and approximation accuracy. In contrast, our
current focus is the privacy preservation feature of dis-
tributed PCA instead of its communication eciency.
Distributed privacy-preserving analysis has also been
studied in works such as [
46
]and[
19
]. [
19
]focused
Bayesian inference and is not directly comparable. [
46
]
assumed that a non-distributed version of the privacy-
preserving procedure is already available and proposed
a multiparty computation protocol. In the context of
sparse PCA estimation, [
46
]isnotapplicablebecause
even the non-distributed version of privacy-preserving
sparse PCA estimation algorithm has not been studied
before in the literature.
Our Contributions.
In this paper, we aim to study
the distributed PCA with a focus on the privacy preser-
vation of high dimensional data. From the algorithmic
perspective, we are interested in designing ecient al-
gorithms with fast convergence rate and reasonable
complexity. From the theoretical perspective, we are
interested in the tradeo between the estimation accu-
racy and the level of privacy preservation, especially in
the sparse high-dimensional setting. The main contri-
butions of this paper are summarized as follows:
(i)
We show that ecient estimation of high dimen-
sional sparse principal subspace can be achieved
under privacy preservation constraints.
(ii)
We propose a privacy-preserving algorithmic
framework which obtains an ec ient estimate
converging in geometric rate to the minimax-
optimal statistical error.
(iii)
Numerical simulations show that our method
significantly outperforms current state-of-the-art
privacy-preserving PCA methods such as MOD-
SuLQ, PPCA [
6
]andAnalyzeGauss[
11
]interms
of estimation accuracy and computational e-
ciency.
To the best knowledge of the authors, this is the
first work on privacy-preserving method for high-
dimensional sparse PCA in a distributed optimization
framework. This is also the first work showing that
the minimax statistical rate of sparse PCA can be
achieved under dierential privacy constraints in poly-
nomial time.
2CollaborativePCAinDistributed
Systems
Let us consider the distributed computing setting with
a number of data providers. The data providers aim to
collab orate with one another, in order to compute the
principal components of the union of all their local data
sets. Each data set may contain sensitive information
that can not be released to the public. In this section,
we introduce a distributed collaborative framework that
enables privacy-preserving PCA, and describe the mod-
eling assumptions for high-dimensional data.
We prop ose a distributed algorithmic framework for
privacy-preserving PCA, as illustrated in Figure 1. Let
S
i
(
i
=1
,
2
, ··· ,N
) be the sample set held by the
i
th
data provider. A central server is used to coordinate the
distributed computation process but it is not trusted
by the data providers. Each data provider is able to
communicate with the central server. This framework
Jason Ge, Zhaoran Wang, Mengdi Wang, Han Liu
allows an iterative algorithm in which the central server
repeatedly updates the global estimates while commu-
nicating with all the data providers.
In each iteration of the algorithm, the central server
sends the current global estimate
Q
(t)
to all the par-
ticipating providers, where
Q
(t)
2 R
dk
is an orthog-
onal basis matrix of the estimated leading eigenspace
of the covariance matrix in the underlying statistical
model. Upon receiving the current estimate
Q
(t)
,each
data provider computes a local intermediate result
H
i
(
S
i
,Q
(t)
)
2 R
dk
which is a
d k
matrix contami-
nated with intentional noise. Then each data provider
sends back
H
i
(
S
i
,Q
(t)
) to the central server, which
later combines all the messages
H
1
,H
2
, ··· ,H
N
to up-
date the global estimate.
Data Provider 1 Data Provider 2
Central Server
Data Provider N
Q
(t)
H
1
(S
1
,Q
(t)
)
Initial Estimation Algorithm Output
Q
(0)
Q
dp
H
N
(S
N
,Q
(t)
)
Figure 1: Distributed Private Sparse PCA Algorithm
Structure
The proposed framework allows individual data
providers to add noise to outgoing messages at their
own discretion. This provides significant flexibility to
the collaborative analysis. Note th at due to the exis-
tence of noise, any algorithm under th is framework is a
randomized algorithm. We use the following notion of
dierential privacy for randomized algorithms, which
has been studied in [
26
]and[
18
]. It says that an algo-
rithm is dierentially private if the output is insensitive
to small input perturbation.
Definition 2.1
(Dierential Privacy of Randomized
Algorithms)
.
Let
S
=
{x
1
,...,x
j
,...,x
n
}
be the set
of all samples. A randomized algorithm
M
:
!R
is
(
,
)-dierentially private (
, >
0)ifforallpairsof
neighboring data sets
S, S
0
diering in any individual
data item:
S
0
=
{x
1
,...,x
0
j
,...,x
n
}
,andforall
R R
,
the algorithm satisfies
Pr(M(S
0
) 2 R) e
· Pr(M(S
0
) 2 R)+.
What privacy guarantee can we promise to the data
providers under this definition? From the perspective
of the
i
-th data provider, first of all, the central server
cannot determine (to some extent) the presence or ab-
sence of any individual data item in the sample set
S
i
,
given the information
H
i
(
S
i
,Q
(t)
) (
t
=1
,
2
,...,T
)col-
lected during the computation. Secondly, the change of
any individual data item in sample set
S
i
does not have
an significant impact on th e output of the algorithm
Q
dp
. The extent of privacy is determined by
>
0 and
>
0, with smaller
and
implying less disclosure of
private information related to an individual data item.
In order to provide sharp estimates given high-
dimensional data, we adopt the following notion of
subspace sparsity.
Definition 2.2
(Subspace Sparsity)
.
Alinearsub-
space is
s
-sparse if the number of non-zero diagonal
entries of the projection matrix onto the subspace is
equal to s.
The notion of subspace sparsity has been introduced in
recent literature on sparse PCA such as [
40
,
41
]. Being
invariant to rotation, the sparsity level of the subspace
spanned by columns of
Q
can be defined as the number
of non-zero entries in the diagonal of the projection
matrix
=
Q
Q
T
.Notethat
i,i
=
P
k
j=1
(
Q
i,j
)
2
,
the sparsity level
s
=
|supp
[
diag
(
)]
|
is actually the
number of non-zero rows of Q
.Andthuswehave
s
= |supp[diag(
)]| = ||Q
||
2,0
=
(||Q
1,
||
2
, ||Q
2,
||
2
,...,||Q
d,
||
2
)
0
.
Now let us describe the mod eling assumptions about
the data.
Assumption 2.3.
The data samples held by the data
providers are independent and identically distributed
with a bounded random variable
X
in a model class
M
(
s
,k
) for integers
k, s
,
0
<k<s
<d
.Forany
X 2M
(
s
,k
), the following three assumptions are
made.
1.
The random variable
X
=
1/2
Z
, in which
Z 2
R
d
is a zero-mean bounded rand om variable with
variance proxy less than one and identity covari-
ance matrix. The matrix
2 R
dd
is positive
semidefinite.
2.
The
k
-dimensional principal eigenspace of is
s
-
sparse.
3.
The
k
-th eigengap is strictly positive, i.e.,
k
k+1
> 0 and
k
is the kth largest eigenvalue.
Let
b
be the sample covariance matrix for the union
of all the samples
S
1
[ S
2
...[ S
N
.Ourmathematical
problem is to approximate the
k
-dimensional leading
principal subspace estimator und er sparsity and dier-
ential privacy constraints
b
Q
= argmin
Q2R
dk
tr
Q
>
b
Q
,
subject to Q
>
Q =I
k
, ||Q||
2,0
s
, (2.1)
and the algorithm has to be dierentially private.
According to our framework of distributed algorithms,
the overall covariance matrix
b
is not known to neither
the central server or any data provider. In what follows,
we present such an algorithm that iteratively computes
an approximate solution to
(2.1)
based on locally held
sample sets S
1
,S
2
,...,S
N
.
Minimax-Optimal Privacy-Preserving Sparse PCA in Distributed Systems
3DistributedPrivacy-Preserving
Sparse PCA Algorithm
In this section, we propose details of the privacay preser-
vation mechanism under the distributed framework of
collaborative PCA, see Algorithm 1. The algorithm can
be viewed as a noisy truncated power iteration, which
is a variant of the classical power iteration for eigenvec-
tor computation [
15
]. Specifically, the matrix-matrix
multiplications in the power step are corrupted by i.i.d.
Gaussian noise matrices. For the estimated principal
subspace basis matrix in each iteration, thresholding
procedure is used to set its rows to zeros whose
`
2
-
norm ranking (ranked in decreasing order among all
the rows) is larger than
bs
.Sparsityconstraintisexplic-
itly imposed by thresholding.
Algorithm 1
Distributed Privacy-Preserving Sparse
PCA (DPS-PCA)
Input:
Sample sets
S
i
,i
=1
,
2
,...,N
held by the
N
data providers respectively; Number of samples
held by the
i
th provider is
n
i
=
|S
i
|
; Initialization
Q
(0)
2 R
dk
Parameters:
Sparsity parameter
bs
;Maximumnum-
ber of Iterations
T
;Dierentialprivacyparameters
, with 0 <<1 and 0 <<1/2
Output: Q
dp
DPS-PCA(
b
,Q
(0)
)
1:
e
Q
(0)
Threshold(Q
(0)
, bs)
2: Q
(1)
,R Thin_QR(
e
Q
(0)
)
3: Let (, ) 2T
1
p
2(2T/)
4: for i =1, 2,...,N do
5:
b
i
P
x
j
2S
i
x
j
x
>
j
/n
i
6: end for
7: for t =1, 2,...,T do
8:
Generate entrywise i.i.d. gaussian matrix
G
(t)
N(0,
2
)
dk
9: for i =1, 2,...,N do
10: H
(t)
i
b
i
Q
(t)
+ G
(t)
/n
i
. Privacy
Preservation Step
11: end for
12: K
(t)
P
N
i=1
n
i
H
(t)
i
/
P
N
i=1
n
i
13: V
(t)
,R
1
Thin_QR(K
(t)
)
14:
e
Q
(t)
Threshold(V
(t)
, bs)
15: Q
(t+1)
,R
2
Thin_QR(
e
Q
(t)
)
16: end for
17: Q
dp
Q
(T +1)
The Thin_QR step in Algorithm 1 is the thin QR
decomposition in [
15
]. For any matrix
A 2 R
mn
with
full column rank, the Thin_QR factorization
A = QR
is unique where
Q 2 R
mn
has orthonormal columns
and
R 2 R
nn
is upper triangular with positive diago-
nal entries. Eectively, the Thin_QR s tep orthogonal-
izes the
k
basis vectors of the
k
leading eigenspace in
Algorithm 2 Thresholding procedure in Algorithm 1
Input: Matrix V 2 R
dk
,SparsityParameterbs
Output:
e
V Threshold(V, bs)
1:
Sort rows in Euclidean norm and let
S
to be the set
of row indices corresponding to the top
bs
largest
rows of V in Euclidean norm
2:
e
V
takes zeros on the rows not indexed by
S
; The
rows of
e
V
indexed by
S
take the same value as the
entries in V
Algorithm 3
Thin_QR decomposition in Algorithm
1
Input: Matrix A 2 R
dk
Output: Q, R
Thin_QR(
A
), where
Q 2 R
dk
has orthonormal columns and upper triangular ma-
trix
R 2 R
kk
has positive diagonal entries, and
A
=
QR
;HereweusetheHouseholderQRmethod
describ ed in [15]
each iteration of the algorithm.
Algorithm 1 takes
O
(
k · d
2
) time per iteration with a
naive implementation. Note that the privacy preserva-
tion step has
O
(
k · d
2
) operations in the matrix multi-
plication. The Householder procedure for Thin_QR de-
composition has time complexity
O
(
d·k
2
).Besides,the
thresholding procedure has time complexity
O
(
d log d
)
in the sorting part and
O
(
d·k
) in the looping part. If we
store
Q
(t)
as a row sparse matrix, the matrix multiplica-
tion should take much fewer operations and Algorithm
1canbeasecientas
O
(
k · bs · d
) time complexity for
each iteration.
4AnalysisofAccuracy-Privacy
Tra deo
In this section we state the dierential privacy guar-
antee of Algorithm 1 and then analyze the minimax-
optimal estimation error obtained by Algorithm 1 in
Theorem 4.3. Some technical proofs are deferred to the
Appendix.
Theorem 4.1
(Privacy Preservation of Algorithm 1)
.
If 0
<<
1
,
0
<<
1
/
2,theprincipalsubspace
estimator
Q
dp
of the Algorithm 1 is (
,
)-dierentially
private. And the information collected by the central
server related to the ith data p rovider
H
i
(S
i
,Q
(1)
),H
i
(S
i
,Q
(2)
),...,H
i
(S
i
,Q
(T )
)
is also (, )-dierentially private.
Proof. See App endix §A for a detailed proof.
Remind that the samples are generated by a sub-
Gaussian random variable with covariance matrix
2 R
dk
and we want to recover the top
k
leading
eigenspace of the . The eigenvalues are sorted in de-
creasing order as
1
2
...
k
>
k+1
...
Jason Ge, Zhaoran Wang, Mengdi Wang, Han Liu
d
0.Weusethescalar
as a metric of the eigengap
between
k
and
k+1
,
=
3
k+1
+
k
3
k
+
k+1
< 1.
Smaller
implies larger eigengap
k
/
k+1
and it is
easier to recover the top
k
eigenspace from the observed
samples.
After we fix the noise level
(
,
),
k
,samplenumber
n
,
the number of data providers
N
,
k
and the threshold-
ing parameter
bs
,assumethatthereexistsparameters
>0 and >0 such that
2k
p
bs +
p
k +
n
k
N(, )
.
Note that when we have suciently large average sam-
ple number
n/N
or signal-to-noise ratio
k
/
(
,
),the
existence of
>
0 and
>
0 can be easily justified.
Besides, the existence of smaller
implies larger signal-
to-noise ratio
k
/
(
,
), and as we will show later, the
parameter
is related to the tradeo between privacy
and success probability of the algorithm. We denote
=
1
as the eective eigengap which characterizes the rate
of convergence in our analysis.
The tradeo analysis is presented under reasonable
assumptions on the parameters
,
,samplesize
n
,
sample dimension
d
,numberofdataproviders
N
,the
thresholding parameter
bs
,thetruesparsitylevelofthe
k
leading eigenspace of the covariance matrix and the
quality of initialization
Q
(0)
.PleaserefertoAppendix
§B.2 for the assumptions and their justifications.
The estimation error is analyzed in terms of subspace
distance, which is defined as follows.
Definition 4.2.
Let the matrices with orthogonal
columns
U 2 R
dk
and
V 2 R
dk
be the basis matrices
for two
k
dimensional subspaces
U
and
V
in
d
dimen-
sional space. Let
U
?
2 R
d(dk)
and
V
?
2 R
d(dk)
be matrices whose orthogonal columns span the sub-
spaces
U
?
and
V
?
which are perpendicular to
U
and
V
respectively. We define the subspace distance between
U and V as
D[U, V]=
U
>
V
?
F
=
V
>
U
?
F
.
Such definition is related to the canonical angles be-
tween subspaces. Let
u
2 R
dd
and
v
2 R
dd
be
orthonormal projection matrices for
U
and
V
respec-
tively. It will be explained later in Appendix §B.4 that
the singular values of
u
?
v
are
s
1
,s
2
,...,s
k
, 0, 0, ··· , 0.
Canonical angles between U and V are defined as
i
(U, V ) = arcsin(s
i
),i=1, 2,...,k.
Let (
U, V
)=
diag
(
1
,
2
,...,
k
). The distance be-
tween U and V can be characterized as
D[U, V]=ksin (U, V )k
F
=
U
>
V
?
F
=
V
>
U
?
F
.
In the special case of
k
=1,thesubspacedistance
between two unit-norm eigenvectors
u 2 R
d
and
v 2
R
d
is
D[U, V]=sin(u, v),(u, v) = arccoshu, vi
||u||
2
=1, ||v||
2
=1.
Please see Lemma B.5 in Appendix §B.4 for a detailed
discussion of Definition 4.2.
Theorem 4.3
(Convergence rate and estimation er-
ror of Algorithm 1)
.
Let Assumption B.1 hold. The
sequence
{Q
(t)
}
>
t1
generated by Algorithm 1 satisfies
kQ
(t)>
Q
⇤?
k
F
stats_err
+
priv_err
+
opt_err
(t),
for all t =1, 2,...,T with probability at least
1 2Te
2
/2
4
n 1
1
d
6 log n
n
1
n
, (4.1)
where T is the total number of iterations. Here the ne ar
optimal minimax optimal statistical error is
stats_err
=
C
1
p
k
1
1/4
p
1
k+1
k
k+1
r
s
(k + log d)
n
,
C
1
> 0 is a constant. (4.2)
and the error induced by privacy constraint is
priv_err
=
1
p
3(1
1/4
)
1+2
r
ks
bs
1
, (4.3)
and the optimization error decays at geometric rate
opt_err
(t)=
(t1)/4
· min
1/2
p
(1
1/2
)
2
,
1
4
.
(4.4)
Proof.
Please see Appendix §B for a detailed proof.
Theorem 4.3 precisely characterizes the accuracy and
privacy tradeo of our algorithm.
Privacy f or free. The statistical and privacy errors
can be written as
stat_err
= C
0
p
log d, with d !1,
priv_err
<C
0
()
1
, with ! 0,
Minimax-Optimal Privacy-Preserving Sparse PCA in Distributed Systems
where
C
0
,C
0
(
)
>
0 are
<
1 are dimension-free
parameters determined by eigenvalues, sparsity pa-
rameter and dierential privacy parameters. When
the parameter
is suciently small and dimension
d
is very large, the error induced by privacy con-
straints can be dominated by the statistical error
priv_err
<
stat_err
. In other words, we can enjoy
the dierential privacy guarantee with induced er-
ror even smaller than the inherent statistical error,
as is shown in Figure 2.
Figure 2: Illustration of theoretical privacy-accuracy
tradeo. We fix the dierential parameter as 0.01
and let range from 0.2 to 2.Withsmaller,more
privacy is preserved at the cost of less estimation
accuracy.
Privacy and success probability. The claims of The-
orem 4.3 hold with probability at least
1 2Te
2
/2
4
n 1
1
d
6 log n
n
1
n
,
where in the second term, the parameter
>
0 also
depends on the dierential privacy parameters in
the following way
n
2kN
k
(, )
p
bs
p
k.
If
is large due to stringent privacy con s traint or
sample size is very small, in the above equation
has to be relatively small and thus the term
2
Te
2
/2
could undermine the success probabil-
ity of the algorithm.
Minimax optimal statistical error. The statistical
error
stat_err
obtained by our estimator actually
achieves the lowest error (up to a constant) amon g
all the possible estimators under worst-case gener-
ating distribution
stat_err
= C
00
inf
e
Q
sup
P2M
E
P
e
Q
>
Q
⇤?
F
,
where
P
is any generating distribution in the model
class
M
=
M
(
s
,k
) as described in Assumption
2.3,
Q
spans the eigenspace of the covariance ma-
trix of
P
and
e
Q
is any estimator of
Q
based on
n
samples from
P
.Wefurtherassumethatthe
eigengap of population covariance matrix satis-
fies
k
k+1
>
k+1
for some constant
>
0.
Under this model class, the min imax lower bound
is
inf
e
Q
sup
P2M
E
P
e
Q
>
Q
⇤?
F
=C
2
p
1
k+1
k
k+1
r
s
(k +1/4 ·log d)
n
,
where
C
2
is a positive constant [
41
]. With eige ngap
condition and fixed ,wehave
=
1
3(1 )
1+
4
,
which can be treated as a constant for a fixed
.
Hence the statistical error
stat_err
=
C
1
p
k
1
1/4
p
1
k+1
k
k+1
r
s
(k + log d)
n
matches the minimax optimal lower bound up to
constants.
Privacy and rate of convergence. The optimiza-
tion error
opt_err
decays at the geometric rate
=
/
(1
)
<
1.Morestringentprivacyrequire-
ments lead to smaller parameter
and thus much
slower rate of convergence.
5NumericalResults
In this section we present numerical synthetic experi-
ments to validate our theory of accuracy and privacy
tradeo. Comparisons with existing state-of-the-art pri-
vate PCA methods are also conducted. For any tar-
geted privacy level, our algorithm is superior in both
estimation accuracy and computation eciency.
5.1 Empirical Privacy-Accuracy Tradeo
We choose
d
= 10
3
,
n
= 10
5
,
k
=5and
s
= 10 in
the experiment, and assume th at there is only one data
owner in the system for fair comparisons with other non-
distributed private PCA method. Data samples are
generated independently from a multivariate Gaussian
distribution with population covariance an d mean
0, where =
P
d
j=1
j
u
j
(
u
j
)
>
=
U
U
>
,and=
diag{
1
,
2
,...,
k
,...,
d
} with
i
= 100,i=1, 2,...k,
i
Uniform[0, 10],i= k +1,k+2,...d.
We generate an
s
k
matrix of i.i.d. standard normal
Gaussian entries and orthogonalize it into a matrix
L
with orthogonal columns. After concatenated with a
(
d s
)
k
all zero matrix, it becomes a
d k
ma-
trix
Q
=[
L
>
0]
>
whose columns span the
k
leading
Jason Ge, Zhaoran Wang, Mengdi Wang, Han Liu
eigenspace. The rest of the eigenvectors
Q
⇤?
are gen-
erated as follows. We project a
d
(
d k
) matrix
R
with i.i.d. standard normal entries into the linear space
orthogonal to the space spanned by Q
R
?
=(I
dd
Q
Q
⇤>
)R, R N(0, I
d(dk)
).
The columns of
R
?
are then orthogonalized into matrix
Q
⇤?
.Finally,byconcatenatingthecolumnsof
Q
and
Q
⇤?
we obtain
U
=[
Q
Q
⇤?
] and thus =
U
U
>
.
The parameters in Algorithm 1 are chosen as
T
= 10
and bs = 50.
Fixing the dierential parameter
as 0
.
3,weplot
the trajectory of estimation error
kQ
(t)>
Q
⇤?
k
F
for
t
=1
,
2
,...,T
under dierent parameter
in Figure
3(a).Remindthat
kQ
(t)>
Q
⇤?
k
F
is the subspace dis-
tance between the subspace spanned by
Q
(t)
and true
eigenspace of spanned by Q
.
We also x
to be 1
.
0 and plot trajectory of th e error for
t
=1
,
2
,...,T
under dierent parameter
and obtain
Figure 3 (b). It is clear in Figure 3 that more stringent
privacy constraints can cause slower rate of convergence
and larger approximation error of the output Q
(T )
.
5.2 Comparisons With Existing Methods
Firstly we com p are the accuracy of estimated
eigenspace for fixed privacy level across three dier-
ent methods: MOD-SULQ, PPCA and our DPS-PCA
method described in Algorithm 1. The SuLQ method is
prop osed in [
4
]andlatermodiedin[
6
]asMOD-SULQ.
SuLQ contaminates the sample covariance matrix by a
random matrix with independent zero mean Gaussian
entries. The variance of the Gaussian noise is adjusted
according to the dierential parameter
and
. The
PPCA method [
6
]privatelyselectsleadingeigenvec-
tors by drawing f rom a matrix Bingham distribution
by Gibbs sampling procedure [
21
]. Privacy is preserved
thanks to the randomized noise in the Gibbs sampling.
There are several other methods such as [
11
,
16
,
18
], but
they follow dierent definition of dierential privacy in
terms of the granularity of privacy, which prevents di-
rect comparisons.
5.2.1 Comparisons of Estimation Accuracy
We run simulations under the same high dimensional
model described in §5.1 where dimension d = 10
3
and
sample number
n
= 10
5
. The dierential parameter
is fixed as 0
.
3. It is clear from Figure 4 (a) that MOD-
SULQ cannot produce meaningful estimation of leading
eigenvectors in high dimensional setting because the
standard deviation of Gaussian noise used by MOD-
SULQ is on the order of O(d log d), which becomes so
large in high dimensional setting that the estimation ac-
curacy is severely undermined. We take 1000 iterations
of Gibbs sampling for the PPCA method but it is hard
to determine whether or not the sampling proce ss has
reached the s tationary distribution. Since the Gibbs
sampling only approximates the matrix Bingham dis-
tribution, the
-dierential privacy is not guaranteed
by PPCA in practice.
5.2.2 Comparisons of Scalability
In low dimensional settings, our DPS-PCA method also
outperforms other methods. If we fix the sample size to
be
n
= 10
5
and change the dimension from 40 to 140,it
can be observed in Figure 4 (b) that MOD-SULQ and
PPCA can perform reasonably well in low dimensions.
The dierential privacy parameters are set as
=0
.
1
and
=1
.
0. The simulations are carried out on the
same statistical model as §5.1 but with parameters
changed to k =3and s
= 10.
5.2.3 Comparisons of Computation Eciency
Our DPS-PCA method excels in terms of computation
time thanks to the geometric convergen c e rate and the
O
(
d
2
) time complexity in each iteration. In contrast,
the Gibbs sampling procedure [
21
] used in PPCA takes
O
(
d
3
) time per iteration and the number of iterations
(burn-in time) could be on the order of 10
3
10
4
before
the sampling reaches the stationary distribution [
6
].
For ease of comparison, we only take
T
= 100 Gibbs
sampling iterations for PPCA method, and use the
Rpackagerstiefel[
20
]assuggestedin[
6
]forGibbs
sampling. The number of iterations for DPS-PCA is set
to
T
= 10. Calculation of the leading
k
eigenvectors in
MOD-SULQ uses the R package rARPACK [
33
]. We
carry out simulations on the same statistical model
described in §5.1 with the dimension
d
changed in each
case. All simulations are conducted on an Intel Xeon
3.4GHz CPU.
Table 1: CPU time of privacy-preserving PCA methods
under dierent dimensions. DPS-PCA has shown signif-
icant advantages in terms of computation complexity
because it converges in geometric rate and takes only
O
(
d
2
) time per iteration. PPCA takes
O
(
d
3
) time for
each iteration and the number of Gibbs sampling itera-
tions can be as high as 10
4
10
5
as in [
6
]. MOD-SULQ
scales well comp utationally with dimension but its esti-
mation accuracy quickly degenerates when
d
becomes
large.
Method
CPU Time
d = 200 d = 400 d = 800
DPS-PCA <1ms <1ms <1ms
MOD-SULQ 8ms 21ms 97ms
PPCA 9.82s 37.46s 173.78s
6Conclusion
Our paper proposes an eective distributed optimiza-
tion algorithm that produces a minimax-optimal sparse
PCA estimator su bject to dierential privacy con-
Minimax-Optimal Privacy-Preserving Sparse PCA in Distributed Systems
1.0
1.5
2.0
0 1 2 3 4 5 6 7 8 9 10
Iteration
Accuracy (Subspace Distance)
Epsilon
0.5
1.0
1.5
2.0
1.0
1.5
2.0
0 1 2 3 4 5 6 7 8 9 10
Iteration
Accuracy (Subspace Distance)
Delta
0.01
0.05
0.10
0.50
(a) (b)
Figure 3: Geometric convergence of estimation accurac y (subspace distance)
kQ
(t)>
Q
⇤?
k
F
in the first
T
= 10
iterations. We fix
=0
.
3 and change
in (a) and fix
=2
.
0 and change
in (b). More privacy is preserved
for smaller dierential parameters
or
at the cost of compromising convergence rate and the final estimation
accuracy.
1.0
1.5
2.0
2.5
3.0
0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
Epsilon
Accuracy (Subspace Distance)
Method
DPS-PCA
PPCA
MOD-SULQ
0.0
0.5
1.0
1.5
40 60 80 100 120 140 160
Dimension
Accuracy (Subspace Distance)
Method
DPS-PCA
PPCA
MOD-SULQ
(a) (b)
Figure 4: Comparisons of privacy-preserving PCA methods. We plot the estimation accuracy of several methods
under dierent dierential privacy constraints in (a) where sample dimension
d
and sample number
n
are fixed
as
d
= 10
3
and
n
= 10
5
. In (b), the dierential privacy parameters are set as
=1
.
0 and
=0
.
1 and the sample
number is xed as
n
= 10
5
.Weletsampledimensionrangefrom40 to 160 and show that the performance of
MOD-SULQ quickly deteriorates when d becomes large. Our method DPS-PCA outperforms in all scenarios.
straints. We analyze the tradeo between privacy con-
straints and estimation accuracy and validate our the-
ory and empirical performance of the algorithm by sim-
ulation results.
Jason Ge, Zhaoran Wang, Mengdi Wang, Han Liu
References
[1] Arash Amini and Martin J Wainwright. High-
dimensional analysis of semidefinite relaxations for
sparse principal components.
Annals of Statistics
,
37:2877–2921, 2009.
[2] Rajendra Bhatia.
Matrix analysis
,volume169.
Springer Science & Business Media, 2013.
[3] Aharon Birnbaum, Iain M Johnstone, Boaz Nadler,
and Debashis Paul. Minimax bounds for sparse PCA
with noisy high-dimensional data.
Annals of Statistics
,
41(3):1055, 2013.
[4] Avrim Blum, Cynthia Dwork, Frank McSherry, and
Kobbi Nissim. Practical privacy: the SuLQ frame-
work. In
Proceedings of ACM Symposium on Princi-
ples of Database Systems
,pages128138.ACM,2005.
[5] T Tony Cai, Zongming Ma, and Yihong Wu. Sparse
PCA: Optimal rates and adaptive estimation.
Annals
of Statistics,41(6):30743110,2013.
[6] Kamalika Chaudhuri, Anand Sarwate, and Kaushik
Sinha. Near-optim al dierentially private pri nci pal
components. In
Advances in Neural Information Pro-
cessing Systems,pages989997,2012.
[7] Alexandre d’Aspremont, Francis Bach, and Laurent El
Ghaoui. Optimal solutions for sparse principal compo-
nent analysis.
Journal of Machine Learning Researc h
,
9:1269–1294, 2008.
[8] Alexandre d’Aspremont, Laurent El Ghaoui, Michael I
Jordan, and Gert RG Lanckriet. A direct formula-
tion for sparse PCA using semidefinite programming.
SIAM review,49(3):434448,2007.
[9] John C Duchi, Michael Jordan, and Martin J Wain-
wright. Local privacy and statistical minimax rates.
In
IEEE Annual Symposium on Foundations of Com-
puter Science,pages429438.IEEE,2013.
[10] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and
Adam Smith. Calibrating noise to sensitivity in pri-
vate data analysis. In
Theory of Cryptography
,pages
265–284. Springer, 2006.
[11] Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta,
and Li Zhang. Analyze Gauss: optimal bounds for
privacy-preserving principal component analys is. In
Proceedings of ACM Symposium on Theory of Com-
puting,pages1120.ACM,2014.
[12] Barbara E Engelhardt and Matthew Stephens. Anal-
ysis of Population Structure: A Unifying Framework
and Novel Methods Based on Sparse Factor Analysis.
PLoS genetics,6(9):e1001117,2010.
[13] PA Federolf, KA Boyer, and TP Andriacchi. Ap-
plication of principal component anal y s i s in clinical
gait research: Identification of systematic dierences
between healthy and medial knee-osteoarthritic gait.
Journal of biomechanics,46(13):21732178,2013.
[14] Srivatsava Ranjit Ganta, Shiva Prasad Ka-
siviswanathan, and Adam Smith. Composition
attacks and auxiliary information in data privacy.
In
Proceedings of the ACM SIGKDD international
Conference on Knowledge Discovery and Data Mining
,
pages 265–273. ACM, 2008.
[15] Gene H Golub and Charles F Van Loan.
Matrix com-
putations, volume 3. JHU Press, 2012.
[16] Moritz Hardt and Aaron Roth. Beating randomized
response on incoherent matrices. In
Proceedings of
the forty-fourth annual ACM symposium on Theory of
computing,pages12551268.ACM,2012.
[17] Moritz Hardt and Aaron Roth. Beyond worst-case
analysis in private singular vector computation. In
Proceedings of Annual ACM Symposium on Theory of
Computing,pages331340.ACM,2013.
[18] Hardt, Moritz and Price, Eric. The noisy power
method: A meta algorithm with applications.
Ad-
vances in Neural Information Processing Systems
,
pages 2861–2869, 2014.
[19] Mikko Heikkilä, Yusuke Okimoto, Samuel Kaski,
Kana Shimizu, and Antti Honkela. Dierentially pri-
vate bayesian learning on distributed data.
arXiv
preprint arXiv:1703.01106,2017.
[20] Peter Ho.
rstiefel: Random orthonormal matrix gen-
eration on the Stiefel manifold
,2014.Rpackageversion
0.10.
[21] Peter D Ho. Simulation of the matrix Bingham-von
Mises-Fisher distribution, with applications to multi-
variate and relational data.
Journal of Computational
and Graphical Statistics,18(2),2009.
[22] Nils Homer, Szab olcs Szelinger, Margot Redman,
David Duggan, Waibhav Tembe, Jill Muehling,
John V Pearson, Dietrich A Stephan, Stanley F Nel-
son, and David W Craig. Resolving individuals con-
tributing trace amounts of dna to highly complex mix-
tures using high-density snp genotyping m icroarrays.
PLoS Genetics,4(8):e1000167,2008.
[23] Iain M Johnstone and Arthur Yu Lu. On consistency
and sparsity for principal components analysis in high
dimensions.
Journal of the American Statistical Asso-
ciation,104(486),2009.
[24] Ian T Jollie, Nickolay T Trendafilov, and Mudassir
Uddin. A modified principal component technique
based on the lasso.
Journal of Computational and
Graphical Statistics,12(3):531547,2003.
[25] Ravi Kannan, Santosh Vempala, and David P
Woo dru. Principal component analysis and higher
correlations for distributed data.
International Con-
ference on Machine Learning
,pages10401057,2014.
[26] Michael Kapralov and Kunal Talwar. On dieren-
tially private low rank approximation. In
Proceedings
of the Annual ACM-SIAM Symposium on Discrete Al-
gorithms,pages13951414.SIAM,2013.
[27] Robert Krauthgamer, Boaz Nadler, and Dan Vi-
lenchik. Do semidefinite relaxations really solve sparse
PCA? arXiv preprint arXiv:1306.3690,2013.
[28] Yingyu Liang, Maria-Florina Balcan, Vandana Kan-
chanapally , and David P W oodru. Improved dis-
tributed principal component analysis.
Advances in
Neural Information Processing Systems
,pages3113
3121, 2014.
[29] Karim Lounici. Sparse principal component analysis
with missing observations. In
High Dimensional Prob-
ability VI,pages327356.Springer,2013.
[30] Frank McSherry and Kunal Talwar. Mechanism De-
sign via Dierential Privacy. In
Proceedings of the An-
nual IEEE Symposium on Foundations of Computer
Science
,pages94103.IEEEComputerSociety,2007.
[31] Boaz Nadler. Finite sample approximation results for
principal component analysis: A matrix perturbation
approach.
Annals of Statistics
,41(2):27912817,2008.
[32] Arvind Narayanan and Vitaly Shmatikov. Robust de-
anonymization of large spars e datasets.
IEEE Sympo-
sium on Security and Privacy,pages111125,2008.
[33] Yixuan Qiu, Jiali Mei, and authors of the ARPACK
library. See file AUTHORS for details.
rARPACK: R
wrapper of ARPACK for large scale eigenvalue/vector
problems, on both dense and sparse matrices
,2014. R
package version 0.7-0.
Minimax-Optimal Privacy-Preserving Sparse PCA in Distributed Systems
[34] Yongming Qu, George Ostrouchov, Nagiza Samatova,
and Al Geist. Principal component analysis for di-
mension reduction in massive distributed data sets.
In
Proceedings of IEEE International Conference on
Data Mining (ICDM),2002.
[35] Mark Rudelson and Roman Vershynin. Non-
asymptotic theory of random matrices: extreme sin-
gular values. arXiv.org,March2010.
[36] Dan Shen, Haipeng Shen, and JS Marron. Consistency
of sparse PCA in high dimension, low sample size con-
texts.
Journal of Multivariate Analysis
,115:317333,
2013.
[37] Haipeng Shen and Jianhua Z Huang. Sparse princi-
pal component analysis via regularized low rank ma-
trix approximation.
Journal of Multivariate Analysis
,
99(6):1015–1034, 2008.
[38] Gilbert W Stewart.
Perturbation Theory for the Sin-
gular Value Decomposition.Elsevier,1990.
[39] Gilbert W Stewart and Ji-guang Sun. Matrix Pertur-
bation Theory, 1990.
[40] Vincent Q Vu and Jing Lei. Minimax Rates of Esti-
mation for Sparse PCA in High Dimensions.
Interna-
tional Conference on Artificial Intelligence and Statis-
tics,pages12781286,2012.
[41] Vincent Q Vu and Jing Lei. Minimax sparse principal
subspace estimation in high dimensions.
Annals of
Statistics,41(6):29052947,2013.
[42] Martin J Wainwright, Michael I Jordan, and John C
Duchi. Privacy aware learning. In
Advances in Neu-
ral Information Processing Systems
,pages14301438,
2012.
[43] Zhaoran Wang, Huanran Lu, and Han Liu. Tighten
after Relax: Minimax-Optimal Sparse PCA in Polyno-
mial Time.
Advances in Neural Information Processing
Systems,pages33833391,2014.
[44] Larry Wasserman and Shuheng Zhou. A statistical
framework for dierential privacy.
Journal of the
American Statistical Association
,105(489):375389,
2010.
[45] Daniela M Witten, Robert Tibshirani, and Trevor
Hastie. A penalized matrix decomposition, with appli-
cations to sparse principal comp onents and canonical
correlation anal ysis.
Biostatistics
,pagekxp008,2009.
[46] Genqiang Wu, Yeping He, Jingzheng Wu, and Xianyao
Xia. Inherit dierential privacy in distributed setting:
Multiparty randomized function computation.
arXiv
preprint arXiv:1604.03001,2016.
[47] Xiao-Tong Yuan and Tong Zhang. Truncated power
method for sparse eigenvalue problems.
Journal of
Machine Learning Research,14(1):899925,2013.
[48] Shuheng Zhou, Katrina Ligett, and Larry Wasserman.
Dierential privacy with compression. In
IEEE In-
ternational S ymposium on Information Theory
,pages
2718–2722. IEEE, 2009.
[49] Hui Zou. The adaptive lasso and its oracle proper-
ties.
Journal of the American Statistical Association
,
101(476):1418–1429, 2006.