privatepca_aistats2018

Minimax-Optimal Privacy-Preserving Sparse PCA in Distributed

Systems

Jason Ge Zhaoran Wang Mengdi Wang Han Liu

Princeton University Northwestern University Princeton University Tencent AI Lab

Abstract

This paper proposes a distributed privacy-

preserving sparse PCA (DPS-PCA) algo-

rithm that generates a minimax-optimal

sparse PCA estimator under diﬀerential pri-

vacy constraints. In a distributed optimiza-

tion framework, data provid e rs can use this al-

gorithm to collaboratively analyze the u n ion

of their data sets while limiting the disclosure

of their private information. DPS-PCA can

recover the leading eigenspace of the popula-

tion covariance at a geometric convergence

rate, and simultaneously achieves the optimal

minimax statistical error for high-dimensional

data. Our algorithm provides ﬁne-tuned con-

trol over the tradeoﬀ between e s timation accu-

racy and privacy preservation. Numerical sim-

ulations demonstrate that DPS-PCA signiﬁ-

cantly outperforms other privacy-preserving

PCA methods in terms of estimation accuracy

and computational eﬃciency.

1Introduction

Principal component analysis (PCA) is one of the most

widely used to ols for data analysis and dimension re-

duction. Let

, x

, ··· , x

2 R

be samples drawn

from some underlying distribution with covariance ma-

trix ⌃

2 R

d⇥d

, where

is the dimension and

the data size. Principal component analysis estimates

the leading

(

k ⌧ d

)eigenvectorsof⌃ based on the

given samples. Projection of the samples into the

dimensional principal subspace spanned by the eigen-

vectors provides a low-dimensional representation of

the high-dimensional data.

We focus on a distributed setting in which multiple

data providers interact with one another. Suppose that

the data providers, who may come from diﬀerent inter-

est groups, aim to calculate the principal components

Pro ceedings of the 21

International Conference on Artiﬁ-

cial Intelligence and Statistics (AISTATS) 2018, Lanzarote,

of the union of their data sets.Forexample,thedata

providers can be healthcare providers holding medi-

cal records related to certain types of drugs or proce-

dures, or research institutes with genome-wide as socia-

tion (GWA) genetic study results for certain diseases.

Even if the data providers may wish to collaborate on

data analysis, they are often unwilling to directly share

data with others because inappropriate data sharing

may lead to severe privacy breach [22, 32].

In practical contexts where privacy matters, there has

been a surging demand for collaborative principal com-

ponent analysis applications, including clinical infer-

ence [

]andgenomepopulationstructuremodeling

[

]. This motivates us to redesign principal component

analysis algorithms, in order to allow data providers

to collaboratively analyze the data sets without phys-

ically merging them. It is desired that the new algo-

rithm achieves strong privacy guarantees, making it

nearly impossible to infer an individual’s exact state

in the data set from the algorithm’s output. We adopt

the notion of diﬀerential privacy introduced by [

], a

cryptographically inspired guarantee that is resistant

to many forms of privacy attacks (s e e [

]). Intuitively,

diﬀerential privacy of an algorithm means that a small

change in the input has no signiﬁcant impact on the

output of the algorithm (a rigorous deﬁnition to be

provided later).

High dimensionality poses another critical challenge to

modern data applications such as population genome

study and brain voxel estimation. It is desired that the

new algorithm produces estimates achieving the opti-

mal or nearly optimal statistical accuracy. However, in

the typical high-dimensional setting where

d  n

,sin-

gular value decomposition fails to work by producing

inconsistent estimates of principal components [

To overcome this problem, we adopt the sparsity regu-

larization for high dimensional PCA, which has been

extensively studied in [

36, 37, 40, 41, 43, 45, 49].

The goal of this paper is to provide an algorithmic

solution to distributed principal component analysis,

which leverages the sparsity of high-dimensional data

as well as achieves suﬃ cient privacy preservation. In

Minimax-Optimal Privacy-Preserving Sparse PCA in Distributed Systems

what follows, we brieﬂy review some earlier works on

privacy-preserving PCA, sparsity regularization and

distributed optimization, and discuss their relevance

with the current paper.

Related works.

One line of existing works is on diﬀer-

entially private algorithms for model-free PCA,suchas

the Sub Linear Queries (SuLQ) method by [

], the Pri-

vate PCA (PPCA) and modiﬁed SuLQ (MOD-SuLQ)

method by [

]andAnalyzeGaussin[

]. MOD-SuLQ

and Analyze Gauss both add noise to the sample co-

variance matrix before applying the singular value de-

composition. PPCA applies the exponential mecha-

nism [

]toperformprivateselectionoftheleading

eigenspace. However, model-free methods cannot be

applied successfully on high dimensional data. It is

demonstrated in [

]thatthesamplesizehastobeon

the same order of dimension

n ⇡ d

so that a diﬀeren-

tially private algorithm can achieve a targeted level of

estimation accuracy. The utility gap in Analyze Gauss

[

] (Theorem 3) is on the order of

(

). This is quite

discouraging for high dimensional scenarios.

Another line of related work is on the locally diﬀeren-

tial privacy of PCA. In this setting, each data sample

is obscured by some random transformation to ensure

privacy. One of the earliest among these works is [

which proposed to apply a random linear transforma-

tion to each sample. Other works such as [

]

studied the theoretical tradeoﬀ between the minimax

statistical rate and the local privacy level imposed on

the initial data. In contrast to these works, we consider

the interactive setting in which each data provider re-

leases information by answering a series of statistical

queries with designed protocol, as opposed to simply ob-

scuring all the samples once and for all. For this reason,

our approach is more task-speciﬁc and very diﬀerent

from the locally diﬀerential privacy studied earlier.

Athirdlineofrelatedworksisonhigh-dimensional

PCA, e.g., [

–

], which proved the dimension-free

statistical results under incoherent assumption of the

sample covariance matrix. Power method for sparse

PCA has been studied in [

], which is shown to achieve

the statistical minimax rate in polynomial time. In this

paper, we follow their technique of imposing sparsity

by adding a thresholding step in the power iteration.

Asimilartechniquewasalsousedinthetruncated

power method in [

]. We also note that distributed

versions of PCA have been studied by [

]. The

fo c us there is the tradeoﬀ between communication ef-

ﬁciency and approximation accuracy. In contrast, our

current focus is the privacy preservation feature of dis-

tributed PCA instead of its communication eﬃciency.

Distributed privacy-preserving analysis has also been

studied in works such as [

]and[

]. [

]focused

Bayesian inference and is not directly comparable. [

]

assumed that a non-distributed version of the privacy-

preserving procedure is already available and proposed

a multiparty computation protocol. In the context of

sparse PCA estimation, [

]isnotapplicablebecause

even the non-distributed version of privacy-preserving

sparse PCA estimation algorithm has not been studied

before in the literature.

Our Contributions.

In this paper, we aim to study

the distributed PCA with a focus on the privacy preser-

vation of high dimensional data. From the algorithmic

perspective, we are interested in designing eﬃcient al-

gorithms with fast convergence rate and reasonable

complexity. From the theoretical perspective, we are

interested in the tradeoﬀ between the estimation accu-

racy and the level of privacy preservation, especially in

the sparse high-dimensional setting. The main contri-

butions of this paper are summarized as follows:

(i)

We show that eﬃcient estimation of high dimen-

sional sparse principal subspace can be achieved

under privacy preservation constraints.

(ii)

We propose a privacy-preserving algorithmic

framework which obtains an eﬃc ient estimate

converging in geometric rate to the minimax-

optimal statistical error.

(iii)

Numerical simulations show that our method

signiﬁcantly outperforms current state-of-the-art

privacy-preserving PCA methods such as MOD-

SuLQ, PPCA [

]andAnalyzeGauss[

]interms

of estimation accuracy and computational eﬃ-

ciency.

To the best knowledge of the authors, this is the

ﬁrst work on privacy-preserving method for high-

dimensional sparse PCA in a distributed optimization

framework. This is also the ﬁrst work showing that

the minimax statistical rate of sparse PCA can be

achieved under diﬀerential privacy constraints in poly-

nomial time.

2CollaborativePCAinDistributed

Systems

Let us consider the distributed computing setting with

a number of data providers. The data providers aim to

collab orate with one another, in order to compute the

principal components of the union of all their local data

sets. Each data set may contain sensitive information

that can not be released to the public. In this section,

we introduce a distributed collaborative framework that

enables privacy-preserving PCA, and describe the mod-

eling assumptions for high-dimensional data.

We prop ose a distributed algorithmic framework for

privacy-preserving PCA, as illustrated in Figure 1. Let

(

, ··· ,N

) be the sample set held by the

data provider. A central server is used to coordinate the

distributed computation process but it is not trusted

by the data providers. Each data provider is able to

communicate with the central server. This framework

Jason Ge, Zhaoran Wang, Mengdi Wang, Han Liu

allows an iterative algorithm in which the central server

repeatedly updates the global estimates while commu-

nicating with all the data providers.

In each iteration of the algorithm, the central server

sends the current global estimate

(t)

to all the par-

ticipating providers, where

(t)

2 R

d⇥k

is an orthog-

onal basis matrix of the estimated leading eigenspace

of the covariance matrix in the underlying statistical

model. Upon receiving the current estimate

(t)

,each

data provider computes a local intermediate result

(

(t)

)

2 R

d⇥k

which is a

d ⇥ k

matrix contami-

nated with intentional noise. Then each data provider

sends back

(

(t)

) to the central server, which

later combines all the messages

, ··· ,H

to up-

date the global estimate.

Data Provider 1 Data Provider 2

Central Server

Data Provider N

(t)

)

Initial Estimation Algorithm Output

(0)

⇤

(t)

)

(t)

Figure 1: Distributed Private Sparse PCA Algorithm

Structure

The proposed framework allows individual data

providers to add noise to outgoing messages at their

own discretion. This provides signiﬁcant ﬂexibility to

the collaborative analysis. Note th at due to the exis-

tence of noise, any algorithm under th is framework is a

randomized algorithm. We use the following notion of

diﬀerential privacy for randomized algorithms, which

has been studied in [

]and[

]. It says that an algo-

rithm is diﬀerentially private if the output is insensitive

to small input perturbation.

Deﬁnition 2.1

(Diﬀerential Privacy of Randomized

Algorithms)

Let

,...,x

}

be the set

of all samples. A randomized algorithm

:⌦

(

✏, 

)-diﬀerentially private (

✏,  >

0)ifforallpairsof

neighboring data sets

S, S

diﬀering in any individual

data item:

,...,x

}

,andforall

R ✓R

the algorithm satisﬁes

Pr(M(S

) 2 R)  e

✏

· Pr(M(S

) 2 R)+.

What privacy guarantee can we promise to the data

providers under this deﬁnition? From the perspective

of the

-th data provider, ﬁrst of all, the central server

cannot determine (to some extent) the presence or ab-

sence of any individual data item in the sample set

given the information

(

(t)

) (

,...,T

)col-

lected during the computation. Secondly, the change of

any individual data item in sample set

does not have

an signiﬁcant impact on th e output of the algorithm

⇤

. The extent of privacy is determined by

✏>

0 and

>

0, with smaller

✏

and



implying less disclosure of

private information related to an individual data item.

In order to provide sharp estimates given high-

dimensional data, we adopt the following notion of

subspace sparsity.

Deﬁnition 2.2

(Subspace Sparsity)

Alinearsub-

space is

-sparse if the number of non-zero diagonal

entries of the projection matrix onto the subspace is

equal to s.

The notion of subspace sparsity has been introduced in

recent literature on sparse PCA such as [

]. Being

invariant to rotation, the sparsity level of the subspace

spanned by columns of

⇤

can be deﬁned as the number

of non-zero entries in the diagonal of the projection

matrix ⇧

⇤

⇤T

.Notethat⇧

⇤

i,i

j=1

(

⇤

i,j

)

the sparsity level

⇤

|supp

[

diag

(⇧

⇤

)]

is actually the

number of non-zero rows of Q

⇤

.Andthuswehave

⇤

= |supp[diag(⇧

⇤

)]| = ||Q

⇤

2,0



(||Q

⇤

1,⇤

, ||Q

⇤

2,⇤

,...,||Q

⇤

d,⇤

)



Now let us describe the mod eling assumptions about

the data.

Assumption 2.3.

The data samples held by the data

providers are independent and identically distributed

with a bounded random variable

in a model class

(

⇤

) for integers

k, s

⇤

<k<s

⇤

.Forany

X 2M

(

⇤

), the following three assumptions are

made.

The random variable

=⌃

1/2

, in which

Z 2

is a zero-mean bounded rand om variable with

variance proxy less than one and identity covari-

ance matrix. The matrix ⌃

2 R

d⇥d

is positive

semideﬁnite.

The

-dimensional principal eigenspace of ⌃ is

⇤

sparse.

The

-th eigengap is strictly positive, i.e.,







k+1

> 0 and 

is the kth largest eigenvalue.

Let

⌃

be the sample covariance matrix for the union

of all the samples

[ S

...[ S

.Ourmathematical

problem is to approximate the

-dimensional leading

principal subspace estimator und er sparsity and diﬀer-

ential privacy constraints

⇤

= argmin

Q2R

d⇥k

tr



⌃Q



subject to Q

Q =I

, ||Q||

2,0

 s

⇤

, (2.1)

and the algorithm has to be diﬀerentially private.

According to our framework of distributed algorithms,

the overall covariance matrix

⌃

is not known to neither

the central server or any data provider. In what follows,

we present such an algorithm that iteratively computes

an approximate solution to

(2.1)

based on locally held

sample sets S

,...,S

Minimax-Optimal Privacy-Preserving Sparse PCA in Distributed Systems

3DistributedPrivacy-Preserving

Sparse PCA Algorithm

In this section, we propose details of the privacay preser-

vation mechanism under the distributed framework of

collaborative PCA, see Algorithm 1. The algorithm can

be viewed as a noisy truncated power iteration, which

is a variant of the classical power iteration for eigenvec-

tor computation [

]. Speciﬁcally, the matrix-matrix

multiplications in the power step are corrupted by i.i.d.

Gaussian noise matrices. For the estimated principal

subspace basis matrix in each iteration, thresholding

procedure is used to set its rows to zeros whose

norm ranking (ranked in decreasing order among all

the rows) is larger than

.Sparsityconstraintisexplic-

itly imposed by thresholding.

Algorithm 1

Distributed Privacy-Preserving Sparse

PCA (DPS-PCA)

Input:

Sample sets

,...,N

held by the

data providers respectively; Number of samples

held by the

th provider is

; Initialization

(0)

2 R

d⇥k

Parameters:

Sparsity parameter

;Maximumnum-

ber of Iterations

;Diﬀerentialprivacyparameters

✏,  with 0 <✏<1 and 0 <<1/2

Output: Q

⇤

DPS-PCA(

⌃,Q

(0)

)

(0)

Threshold(Q

(0)

, bs)

2: Q

(1)

,R Thin_QR(

(0)

)

3: Let (✏, ) 2T✏

1

2(2T/)

4: for i =1, 2,...,N do

⌃

6: end for

7: for t =1, 2,...,T do

Generate entrywise i.i.d. gaussian matrix

(t)

⇠ N(0,

)

d⇥k

9: for i =1, 2,...,N do

10: H

(t)

⌃

(t)

+ G

(t)

. Privacy

Preservation Step

11: end for

12: K

(t)

i=1

(t)

i=1

13: V

(t)

Thin_QR(K

(t)

)

14:

(t)

Threshold(V

(t)

, bs)

15: Q

(t+1)

Thin_QR(

(t)

)

16: end for

17: Q

⇤

(T +1)

The Thin_QR step in Algorithm 1 is the thin QR

decomposition in [

]. For any matrix

A 2 R

m⇥n

with

full column rank, the Thin_QR factorization

A = QR

is unique where

Q 2 R

m⇥n

has orthonormal columns

and

R 2 R

n⇥n

is upper triangular with positive diago-

nal entries. Eﬀectively, the Thin_QR s tep orthogonal-

izes the

basis vectors of the

leading eigenspace in

Algorithm 2 Thresholding procedure in Algorithm 1

Input: Matrix V 2 R

d⇥k

,SparsityParameterbs

Output:

V Threshold(V, bs)

Sort rows in Euclidean norm and let

to be the set

of row indices corresponding to the top

largest

rows of V in Euclidean norm

takes zeros on the rows not indexed by

; The

rows of

indexed by

take the same value as the

entries in V

Algorithm 3

Thin_QR decomposition in Algorithm

Input: Matrix A 2 R

d⇥k

Output: Q, R

Thin_QR(

), where

Q 2 R

d⇥k

has orthonormal columns and upper triangular ma-

trix

R 2 R

k⇥k

has positive diagonal entries, and

;HereweusetheHouseholderQRmethod

describ ed in [15]

each iteration of the algorithm.

Algorithm 1 takes

(

k · d

) time per iteration with a

naive implementation. Note that the privacy preserva-

tion step has

(

k · d

) operations in the matrix multi-

plication. The Householder procedure for Thin_QR de-

composition has time complexity

(

d·k

).Besides,the

thresholding procedure has time complexity

(

d log d

)

in the sorting part and

(

d·k

) in the looping part. If we

store

(t)

as a row sparse matrix, the matrix multiplica-

tion should take much fewer operations and Algorithm

1canbeaseﬃcientas

(

k · bs · d

) time complexity for

each iteration.

4AnalysisofAccuracy-Privacy

Tra deoﬀ

In this section we state the diﬀerential privacy guar-

antee of Algorithm 1 and then analyze the minimax-

optimal estimation error obtained by Algorithm 1 in

Theorem 4.3. Some technical proofs are deferred to the

Appendix.

Theorem 4.1

(Privacy Preservation of Algorithm 1)

If 0

<✏<

<<

2,theprincipalsubspace

estimator

⇤

of the Algorithm 1 is (

✏, 

)-diﬀerentially

private. And the information collected by the central

server related to the ith data p rovider



(1)

),H

(2)

),...,H

(T )

)



is also (✏, )-diﬀerentially private.

Proof. See App endix §A for a detailed proof.

Remind that the samples are generated by a sub-

Gaussian random variable with covariance matrix

⌃

2 R

d⇥k

and we want to recover the top

leading

eigenspace of the ⌃. The eigenvalues are sorted in de-

creasing order as



 

 ...

>

k+1

 ... 

Jason Ge, Zhaoran Wang, Mengdi Wang, Han Liu





0.Weusethescalar



as a metric of the eigengap

between 

and 

k+1

 =

3

k+1

+ 

3

+ 

k+1

< 1.

Smaller



implies larger eigengap



/

k+1

and it is

easier to recover the top

eigenspace from the observed

samples.

After we ﬁx the noise level



(

✏, 

,samplenumber

the number of data providers



and the threshold-

ing parameter

,assumethatthereexistsparameters

↵>0 and ⌧>0 such that



bs +

k + ⌧



↵



n

N(✏, )

Note that when we have suﬃciently large average sam-

ple number

n/N

or signal-to-noise ratio



/

(

✏, 

),the

existence of

⌧>

0 and

↵>

0 can be easily justiﬁed.

Besides, the existence of smaller

↵

implies larger signal-

to-noise ratio



/

(

✏, 

), and as we will show later, the

parameter

⌧

is related to the tradeoﬀ between privacy

and success probability of the algorithm. We denote

⇢ =



1 ↵

as the eﬀective eigengap which characterizes the rate

of convergence in our analysis.

The tradeoﬀ analysis is presented under reasonable

assumptions on the parameters

↵

⌧

,samplesize

sample dimension

,numberofdataproviders

,the

thresholding parameter

,thetruesparsitylevelofthe

leading eigenspace of the covariance matrix ⌃ and the

quality of initialization

(0)

.PleaserefertoAppendix

§B.2 for the assumptions and their justiﬁcations.

The estimation error is analyzed in terms of subspace

distance, which is deﬁned as follows.

Deﬁnition 4.2.

Let the matrices with orthogonal

columns

U 2 R

d⇥k

and

V 2 R

d⇥k

be the basis matrices

for two

dimensional subspaces

and

dimen-

sional space. Let

2 R

d⇥(dk)

and

2 R

d⇥(dk)

be matrices whose orthogonal columns span the sub-

spaces

and

which are perpendicular to

and

respectively. We deﬁne the subspace distance between

U and V as

D[U, V]=



Such deﬁnition is related to the canonical angles be-

tween subspaces. Let ⇧

2 R

d⇥d

and ⇧

2 R

d⇥d

orthonormal projection matrices for

and

respec-

tively. It will be explained later in Appendix §B.4 that

the singular values of ⇧

⇧

are

,...,s

, 0, 0, ··· , 0.

Canonical angles between U and V are deﬁned as

✓

(U, V ) = arcsin(s

),i=1, 2,...,k.

Let ⇥(

U, V

diag

(

✓

,✓

,...,✓

). The distance be-

tween U and V can be characterized as

D[U, V]=ksin ⇥(U, V )k



In the special case of

=1,thesubspacedistance

between two unit-norm eigenvectors

u 2 R

and

v 2

D[U, V]=sin✓(u, v),✓(u, v) = arccoshu, vi

||u||

=1, ||v||

=1.

Please see Lemma B.5 in Appendix §B.4 for a detailed

discussion of Deﬁnition 4.2.

Theorem 4.3

(Convergence rate and estimation er-

ror of Algorithm 1)

Let Assumption B.1 hold. The

sequence

(t)

}

t1

generated by Algorithm 1 satisﬁes

(t)>

⇤?

 ⇠

stats_err

+ ⇠

priv_err

+ ⇠

opt_err

(t),

for all t =1, 2,...,T with probability at least

1 2Te

⌧



n 1



6 log n



, (4.1)

where T is the total number of iterations. Here the ne ar

optimal minimax optimal statistical error is

⇠

stats_err

1 ⇢

1/4



k+1



 

k+1

⇤

(k + log d)

> 0 is a constant. (4.2)

and the error induced by privacy constraint is

⇠

priv_err

3(1 ⇢

1/4

)

✓

1+2

⇤

◆

↵

1 ↵

, (4.3)

and the optimization error decays at geometric rate

⇠

opt_err

(t)=⇢

(t1)/4

· min

⇢

1/2

(1  ⇢

1/2

)



(4.4)

Proof.

Please see Appendix §B for a detailed proof.

Theorem 4.3 precisely characterizes the accuracy and

privacy tradeoﬀ of our algorithm.

•

Privacy f or free. The statistical and privacy errors

can be written as

⇠

stat_err

= C

log d, with d !1,

⇠

priv_err

(⇢)

↵

1 ↵

, with ↵ ! 0,

Minimax-Optimal Privacy-Preserving Sparse PCA in Distributed Systems

where

(

⇢

)

0 are

↵<

1 are dimension-free

parameters determined by eigenvalues, sparsity pa-

rameter and diﬀerential privacy parameters. When

the parameter

↵

is suﬃciently small and dimension

is very large, the error induced by privacy con-

straints can be dominated by the statistical error

⇠

priv_err

<⇠

stat_err

. In other words, we can enjoy

the diﬀerential privacy guarantee with induced er-

ror even smaller than the inherent statistical error,

as is shown in Figure 2.

Figure 2: Illustration of theoretical privacy-accuracy

tradeoﬀ. We ﬁx the diﬀerential parameter  as 0.01

and let ✏ range from 0.2 to 2.Withsmaller✏,more

privacy is preserved at the cost of less estimation

accuracy.

•

Privacy and success probability. The claims of The-

orem 4.3 hold with probability at least

1 2Te

⌧



n 1



6 log n



where in the second term, the parameter

⌧>

0 also

depends on the diﬀerential privacy parameters in

the following way

⌧ 

2kN



(✏, )



bs 



is large due to stringent privacy con s traint or

sample size is very small, in the above equation

⌧

has to be relatively small and thus the term



⌧

could undermine the success probabil-

ity of the algorithm.

•

Minimax optimal statistical error. The statistical

error

⇠

stat_err

obtained by our estimator actually

achieves the lowest error (up to a constant) amon g

all the possible estimators under worst-case gener-

ating distribution

⇠

stat_err

= C

inf

sup

P2M

⇥



⇤?



⇤

where

is any generating distribution in the model

class

(

⇤

) as described in Assumption

2.3,

⇤

spans the eigenspace of the covariance ma-

trix of

and

is any estimator of

⇤

based on

samples from

.Wefurtherassumethatthe

eigengap of population covariance matrix ⌃ satis-

ﬁes



 

k+1

>

k+1

for some constant

>

Under this model class, the min imax lower bound

inf

sup

P2M

⇥



⇤?



⇤



k+1



 

k+1

⇤

(k +1/4 ·log d)

where

is a positive constant [

]. With eige ngap

condition and ﬁxed ↵,wehave

⇢ =

3(1 ↵)

⇣



⌘

which can be treated as a constant for a ﬁxed

↵

Hence the statistical error

⇠

stat_err

1 ⇢

1/4



k+1



 

k+1

⇤

(k + log d)

matches the minimax optimal lower bound up to

constants.

•

Privacy and rate of convergence. The optimiza-

tion error

⇠

opt_err

decays at the geometric rate

⇢

/

↵

)

1.Morestringentprivacyrequire-

ments lead to smaller parameter

↵

and thus much

slower rate of convergence.

5NumericalResults

In this section we present numerical synthetic experi-

ments to validate our theory of accuracy and privacy

tradeoﬀ. Comparisons with existing state-of-the-art pri-

vate PCA methods are also conducted. For any tar-

geted privacy level, our algorithm is superior in both

estimation accuracy and computation eﬃciency.

5.1 Empirical Privacy-Accuracy Tradeoﬀ

We choose

= 10

=5and

⇤

= 10 in

the experiment, and assume th at there is only one data

owner in the system for fair comparisons with other non-

distributed private PCA method. Data samples are

generated independently from a multivariate Gaussian

distribution with population covariance ⌃ an d mean

0, where ⌃=

j=1



⇤

(

⇤

)

⇤

,and⇤=

diag{

,

,...,

} with



= 100,i=1, 2,...k,



⇠ Uniform[0, 10],i= k +1,k+2,...d.

We generate an

⇤

⇥k

matrix of i.i.d. standard normal

Gaussian entries and orthogonalize it into a matrix

with orthogonal columns. After concatenated with a

(

d  s

⇤

)

⇥ k

all zero matrix, it becomes a

d ⇥ k

ma-

trix

⇤

whose columns span the

leading

Jason Ge, Zhaoran Wang, Mengdi Wang, Han Liu

eigenspace. The rest of the eigenvectors

⇤?

are gen-

erated as follows. We project a

d ⇥

(

d  k

) matrix

with i.i.d. standard normal entries into the linear space

orthogonal to the space spanned by Q

⇤

=(I

d⇥d

 Q

⇤

⇤>

)R, R ⇠N(0, I

d⇥(dk)

The columns of

are then orthogonalized into matrix

⇤?

.Finally,byconcatenatingthecolumnsof

⇤

and

⇤?

we obtain

⇤

⇤?

] and thus ⌃=

⇤

The parameters in Algorithm 1 are chosen as

= 10

and bs = 50.

Fixing the diﬀerential parameter



as 0

3,weplot

the trajectory of estimation error

(t)>

⇤?

for

,...,T

under diﬀerent parameter

✏

in Figure

3(a).Remindthat

(t)>

⇤?

is the subspace dis-

tance between the subspace spanned by

(t)

and true

eigenspace of ⌃ spanned by Q

⇤

We also ﬁx

✏

to be 1

0 and plot trajectory of th e error for

,...,T

under diﬀerent parameter



and obtain

Figure 3 (b). It is clear in Figure 3 that more stringent

privacy constraints can cause slower rate of convergence

and larger approximation error of the output Q

(T )

5.2 Comparisons With Existing Methods

Firstly we com p are the accuracy of estimated

eigenspace for ﬁxed privacy level across three diﬀer-

ent methods: MOD-SULQ, PPCA and our DPS-PCA

method described in Algorithm 1. The SuLQ method is

prop osed in [

]andlatermodiﬁedin[

]asMOD-SULQ.

SuLQ contaminates the sample covariance matrix by a

random matrix with independent zero mean Gaussian

entries. The variance of the Gaussian noise is adjusted

according to the diﬀerential parameter

✏

and



. The

PPCA method [

]privatelyselectsleadingeigenvec-

tors by drawing f rom a matrix Bingham distribution

by Gibbs sampling procedure [

]. Privacy is preserved

thanks to the randomized noise in the Gibbs sampling.

There are several other methods such as [

], but

they follow diﬀerent deﬁnition of diﬀerential privacy in

terms of the granularity of privacy, which prevents di-

rect comparisons.

5.2.1 Comparisons of Estimation Accuracy

We run simulations under the same high dimensional

model described in §5.1 where dimension d = 10

and

sample number

= 10

. The diﬀerential parameter



is ﬁxed as 0

3. It is clear from Figure 4 (a) that MOD-

SULQ cannot produce meaningful estimation of leading

eigenvectors in high dimensional setting because the

standard deviation of Gaussian noise used by MOD-

SULQ is on the order of O(d log d), which becomes so

large in high dimensional setting that the estimation ac-

curacy is severely undermined. We take 1000 iterations

of Gibbs sampling for the PPCA method but it is hard

to determine whether or not the sampling proce ss has

reached the s tationary distribution. Since the Gibbs

sampling only approximates the matrix Bingham dis-

tribution, the

✏

-diﬀerential privacy is not guaranteed

by PPCA in practice.

5.2.2 Comparisons of Scalability

In low dimensional settings, our DPS-PCA method also

outperforms other methods. If we ﬁx the sample size to

= 10

and change the dimension from 40 to 140,it

can be observed in Figure 4 (b) that MOD-SULQ and

PPCA can perform reasonably well in low dimensions.

The diﬀerential privacy parameters are set as



and

✏

0. The simulations are carried out on the

same statistical model as §5.1 but with parameters

changed to k =3and s

⇤

= 10.

5.2.3 Comparisons of Computation Eﬃciency

Our DPS-PCA method excels in terms of computation

time thanks to the geometric convergen c e rate and the

(

) time complexity in each iteration. In contrast,

the Gibbs sampling procedure [

] used in PPCA takes

(

) time per iteration and the number of iterations

(burn-in time) could be on the order of 10



before

the sampling reaches the stationary distribution [

For ease of comparison, we only take

= 100 Gibbs

sampling iterations for PPCA method, and use the

Rpackagerstiefel[

]assuggestedin[

]forGibbs

sampling. The number of iterations for DPS-PCA is set

= 10. Calculation of the leading

eigenvectors in

MOD-SULQ uses the R package rARPACK [

]. We

carry out simulations on the same statistical model

described in §5.1 with the dimension

changed in each

case. All simulations are conducted on an Intel Xeon

3.4GHz CPU.

Table 1: CPU time of privacy-preserving PCA methods

under diﬀerent dimensions. DPS-PCA has shown signif-

icant advantages in terms of computation complexity

because it converges in geometric rate and takes only

(

) time per iteration. PPCA takes

(

) time for

each iteration and the number of Gibbs sampling itera-

tions can be as high as 10



as in [

]. MOD-SULQ

scales well comp utationally with dimension but its esti-

mation accuracy quickly degenerates when

becomes

large.

Method

CPU Time

d = 200 d = 400 d = 800

DPS-PCA <1ms <1ms <1ms

MOD-SULQ 8ms 21ms 97ms

PPCA 9.82s 37.46s 173.78s

6Conclusion

Our paper proposes an eﬀective distributed optimiza-

tion algorithm that produces a minimax-optimal sparse

PCA estimator su bject to diﬀerential privacy con-

Minimax-Optimal Privacy-Preserving Sparse PCA in Distributed Systems

1.0

1.5

2.0

0 1 2 3 4 5 6 7 8 9 10

Iteration

Accuracy (Subspace Distance)

Epsilon

0.5

1.0

1.5

2.0

1.0

1.5

2.0

0 1 2 3 4 5 6 7 8 9 10

Iteration

Accuracy (Subspace Distance)

Delta

0.01

0.05

0.10

0.50

(a) (b)

Figure 3: Geometric convergence of estimation accurac y (subspace distance)

(t)>

⇤?

in the ﬁrst

= 10

iterations. We ﬁx



3 and change

✏

in (a) and ﬁx

✏

0 and change



in (b). More privacy is preserved

for smaller diﬀerential parameters

✏



at the cost of compromising convergence rate and the ﬁnal estimation

accuracy.

1.0

1.5

2.0

2.5

3.0

0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

Epsilon

Accuracy (Subspace Distance)

Method

DPS-PCA

PPCA

MOD-SULQ

0.0

0.5

1.0

1.5

40 60 80 100 120 140 160

Dimension

Accuracy (Subspace Distance)

Method

DPS-PCA

PPCA

MOD-SULQ

(a) (b)

Figure 4: Comparisons of privacy-preserving PCA methods. We plot the estimation accuracy of several methods

under diﬀerent diﬀerential privacy constraints in (a) where sample dimension

and sample number

are ﬁxed

= 10

and

= 10

. In (b), the diﬀerential privacy parameters are set as

✏

0 and



1 and the sample

number is ﬁxed as

= 10

.Weletsampledimensionrangefrom40 to 160 and show that the performance of

MOD-SULQ quickly deteriorates when d becomes large. Our method DPS-PCA outperforms in all scenarios.

straints. We analyze the tradeoﬀ between privacy con-

straints and estimation accuracy and validate our the-

ory and empirical performance of the algorithm by sim-

ulation results.

Jason Ge, Zhaoran Wang, Mengdi Wang, Han Liu

References

[1] Arash Amini and Martin J Wainwright. High-

dimensional analysis of semideﬁnite relaxations for

sparse principal components.

Annals of Statistics

37:2877–2921, 2009.

[2] Rajendra Bhatia.

Matrix analysis

,volume169.

Springer Science & Business Media, 2013.

[3] Aharon Birnbaum, Iain M Johnstone, Boaz Nadler,

and Debashis Paul. Minimax bounds for sparse PCA

with noisy high-dimensional data.

Annals of Statistics

41(3):1055, 2013.

[4] Avrim Blum, Cynthia Dwork, Frank McSherry, and

Kobbi Nissim. Practical privacy: the SuLQ frame-

work. In

Proceedings of ACM Symposium on Princi-

ples of Database Systems

,pages128–138.ACM,2005.

[5] T Tony Cai, Zongming Ma, and Yihong Wu. Sparse

PCA: Optimal rates and adaptive estimation.

Annals

of Statistics,41(6):3074–3110,2013.

[6] Kamalika Chaudhuri, Anand Sarwate, and Kaushik

Sinha. Near-optim al diﬀerentially private pri nci pal

components. In

Advances in Neural Information Pro-

cessing Systems,pages989–997,2012.

[7] Alexandre d’Aspremont, Francis Bach, and Laurent El

Ghaoui. Optimal solutions for sparse principal compo-

nent analysis.

Journal of Machine Learning Researc h

9:1269–1294, 2008.

[8] Alexandre d’Aspremont, Laurent El Ghaoui, Michael I

Jordan, and Gert RG Lanckriet. A direct formula-

tion for sparse PCA using semideﬁnite programming.

SIAM review,49(3):434–448,2007.

[9] John C Duchi, Michael Jordan, and Martin J Wain-

wright. Local privacy and statistical minimax rates.

IEEE Annual Symposium on Foundations of Com-

puter Science,pages429–438.IEEE,2013.

[10] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and

Adam Smith. Calibrating noise to sensitivity in pri-

vate data analysis. In

Theory of Cryptography

,pages

265–284. Springer, 2006.

[11] Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta,

and Li Zhang. Analyze Gauss: optimal bounds for

privacy-preserving principal component analys is. In

Proceedings of ACM Symposium on Theory of Com-

puting,pages11–20.ACM,2014.

[12] Barbara E Engelhardt and Matthew Stephens. Anal-

ysis of Population Structure: A Unifying Framework

and Novel Methods Based on Sparse Factor Analysis.

PLoS genetics,6(9):e1001117,2010.

[13] PA Federolf, KA Boyer, and TP Andriacchi. Ap-

plication of principal component anal y s i s in clinical

gait research: Identiﬁcation of systematic diﬀerences

between healthy and medial knee-osteoarthritic gait.

Journal of biomechanics,46(13):2173–2178,2013.

[14] Srivatsava Ranjit Ganta, Shiva Prasad Ka-

siviswanathan, and Adam Smith. Composition

attacks and auxiliary information in data privacy.

Proceedings of the ACM SIGKDD international

Conference on Knowledge Discovery and Data Mining

pages 265–273. ACM, 2008.

[15] Gene H Golub and Charles F Van Loan.

Matrix com-

putations, volume 3. JHU Press, 2012.

[16] Moritz Hardt and Aaron Roth. Beating randomized

response on incoherent matrices. In

Proceedings of

the forty-fourth annual ACM symposium on Theory of

computing,pages1255–1268.ACM,2012.

[17] Moritz Hardt and Aaron Roth. Beyond worst-case

analysis in private singular vector computation. In

Proceedings of Annual ACM Symposium on Theory of

Computing,pages331–340.ACM,2013.

[18] Hardt, Moritz and Price, Eric. The noisy power

method: A meta algorithm with applications.

Ad-

vances in Neural Information Processing Systems

pages 2861–2869, 2014.

[19] Mikko Heikkilä, Yusuke Okimoto, Samuel Kaski,

Kana Shimizu, and Antti Honkela. Diﬀerentially pri-

vate bayesian learning on distributed data.

arXiv

preprint arXiv:1703.01106,2017.

[20] Peter Hoﬀ.

rstiefel: Random orthonormal matrix gen-

eration on the Stiefel manifold

,2014.Rpackageversion

0.10.

[21] Peter D Hoﬀ. Simulation of the matrix Bingham-von

Mises-Fisher distribution, with applications to multi-

variate and relational data.

Journal of Computational

and Graphical Statistics,18(2),2009.

[22] Nils Homer, Szab olcs Szelinger, Margot Redman,

David Duggan, Waibhav Tembe, Jill Muehling,

John V Pearson, Dietrich A Stephan, Stanley F Nel-

son, and David W Craig. Resolving individuals con-

tributing trace amounts of dna to highly complex mix-

tures using high-density snp genotyping m icroarrays.

PLoS Genetics,4(8):e1000167,2008.

[23] Iain M Johnstone and Arthur Yu Lu. On consistency

and sparsity for principal components analysis in high

dimensions.

Journal of the American Statistical Asso-

ciation,104(486),2009.

[24] Ian T Jolliﬀe, Nickolay T Trendaﬁlov, and Mudassir

Uddin. A modiﬁed principal component technique

based on the lasso.

Journal of Computational and

Graphical Statistics,12(3):531–547,2003.

[25] Ravi Kannan, Santosh Vempala, and David P

Woo druﬀ. Principal component analysis and higher

correlations for distributed data.

International Con-

ference on Machine Learning

,pages1040–1057,2014.

[26] Michael Kapralov and Kunal Talwar. On diﬀeren-

tially private low rank approximation. In

Proceedings

of the Annual ACM-SIAM Symposium on Discrete Al-

gorithms,pages1395–1414.SIAM,2013.

[27] Robert Krauthgamer, Boaz Nadler, and Dan Vi-

lenchik. Do semideﬁnite relaxations really solve sparse

PCA? arXiv preprint arXiv:1306.3690,2013.

[28] Yingyu Liang, Maria-Florina Balcan, Vandana Kan-

chanapally , and David P W oodruﬀ. Improved dis-

tributed principal component analysis.

Advances in

Neural Information Processing Systems

,pages3113–

3121, 2014.

[29] Karim Lounici. Sparse principal component analysis

with missing observations. In

High Dimensional Prob-

ability VI,pages327–356.Springer,2013.

[30] Frank McSherry and Kunal Talwar. Mechanism De-

sign via Diﬀerential Privacy. In

Proceedings of the An-

nual IEEE Symposium on Foundations of Computer

Science

,pages94–103.IEEEComputerSociety,2007.

[31] Boaz Nadler. Finite sample approximation results for

principal component analysis: A matrix perturbation

approach.

Annals of Statistics

,41(2):2791–2817,2008.

[32] Arvind Narayanan and Vitaly Shmatikov. Robust de-

anonymization of large spars e datasets.

IEEE Sympo-

sium on Security and Privacy,pages111–125,2008.

[33] Yixuan Qiu, Jiali Mei, and authors of the ARPACK

library. See ﬁle AUTHORS for details.

rARPACK: R

wrapper of ARPACK for large scale eigenvalue/vector

problems, on both dense and sparse matrices

,2014. R

package version 0.7-0.

Minimax-Optimal Privacy-Preserving Sparse PCA in Distributed Systems

[34] Yongming Qu, George Ostrouchov, Nagiza Samatova,

and Al Geist. Principal component analysis for di-

mension reduction in massive distributed data sets.

Proceedings of IEEE International Conference on

Data Mining (ICDM),2002.

[35] Mark Rudelson and Roman Vershynin. Non-

asymptotic theory of random matrices: extreme sin-

gular values. arXiv.org,March2010.

[36] Dan Shen, Haipeng Shen, and JS Marron. Consistency

of sparse PCA in high dimension, low sample size con-

texts.

Journal of Multivariate Analysis

,115:317–333,

2013.

[37] Haipeng Shen and Jianhua Z Huang. Sparse princi-

pal component analysis via regularized low rank ma-

trix approximation.

Journal of Multivariate Analysis

99(6):1015–1034, 2008.

[38] Gilbert W Stewart.

Perturbation Theory for the Sin-

gular Value Decomposition.Elsevier,1990.

[39] Gilbert W Stewart and Ji-guang Sun. Matrix Pertur-

bation Theory, 1990.

[40] Vincent Q Vu and Jing Lei. Minimax Rates of Esti-

mation for Sparse PCA in High Dimensions.

Interna-

tional Conference on Artiﬁcial Intelligence and Statis-

tics,pages1278–1286,2012.

[41] Vincent Q Vu and Jing Lei. Minimax sparse principal

subspace estimation in high dimensions.

Annals of

Statistics,41(6):2905–2947,2013.

[42] Martin J Wainwright, Michael I Jordan, and John C

Duchi. Privacy aware learning. In

Advances in Neu-

ral Information Processing Systems

,pages1430–1438,

2012.

[43] Zhaoran Wang, Huanran Lu, and Han Liu. Tighten

after Relax: Minimax-Optimal Sparse PCA in Polyno-

mial Time.

Advances in Neural Information Processing

Systems,pages3383–3391,2014.

[44] Larry Wasserman and Shuheng Zhou. A statistical

framework for diﬀerential privacy.

Journal of the

American Statistical Association

,105(489):375–389,

2010.

[45] Daniela M Witten, Robert Tibshirani, and Trevor

Hastie. A penalized matrix decomposition, with appli-

cations to sparse principal comp onents and canonical

correlation anal ysis.

Biostatistics

,pagekxp008,2009.

[46] Genqiang Wu, Yeping He, Jingzheng Wu, and Xianyao

Xia. Inherit diﬀerential privacy in distributed setting:

Multiparty randomized function computation.

arXiv

preprint arXiv:1604.03001,2016.

[47] Xiao-Tong Yuan and Tong Zhang. Truncated power

method for sparse eigenvalue problems.

Journal of

Machine Learning Research,14(1):899–925,2013.

[48] Shuheng Zhou, Katrina Ligett, and Larry Wasserman.

Diﬀerential privacy with compression. In

IEEE In-

ternational S ymposium on Information Theory

,pages

2718–2722. IEEE, 2009.

[49] Hui Zou. The adaptive lasso and its oracle proper-

ties.

Journal of the American Statistical Association

101(476):1418–1429, 2006.