chaudhuri13a.dvi

Journal of Machine Learning Research 14 (2013) 2905-2943 Submitted 6/12; Revised 5/13; Published 9/13

A Near-Optimal Algorithm for Differentially-Private Principal

Components

∗

Kamalika Chaudhuri KCHAUDHURI@UCSD.EDU

Department of Computer Science and Engineering

University of California, San Diego

9500 Gilman Drive MC 0404

La Jolla, CA, 92093-0404, USA

Anand D. Sarwate ASARWATE@ALUM.MIT.EDU

Toyota Technological Institute at Chicago

6045 S. Kenwood Ave

Chicago, IL 60637, USA

Kaushik Sinha KAUSHIK.SINHA@WICHITA.EDU

Department of Electrical Engineering and Computer Science

Wichita State Universi ty

1845 Fairmount ( Campus Box 83)

Wichita, KS 67260-0083, USA

Editor: Gabor Lugosi

Abstract

The principal components analysis (PCA) algorithm is a standard tool for identifying good low-

dimensional approximations to high-dimensional data. Many data sets of interest contain private or

sensitive information about individuals. Algorithms which operate on such data should be sensitive

to the privacy risks in publishing their outputs. Differential privacy is a framework for developing

tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and

empirical performance of differentially private approximations to PCA and propose a new method

which explicitly optimizes the utility of the output. We show that the sample complexity of the

proposed method differs from the existing procedure in the scaling with the data dimension, and

that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results,

showing that on real data there is a large performance gap between the existing method and our

method.

Keywords: differential privacy, principal components analysis, dimension reduction

1. Introduction

Dimensionality reduction is a fundamental tool for understanding complex data sets that arise in

contemporary machine learning and data mining applications. Even though a single data point

can be represented by hundreds or even thousands of features, the phenomena of interest are often

intrinsically low-dimensional. By reducing the “extrinsic” dimension of the data to its “intrinsic” di-

mension, analysts can discover important structural relationships between features, more efﬁciently

∗. A preliminary version of this work appeared at the Neural Information Processing Systems conference (Chaudhuri

et al., 2012). This full version contains more experimental details, full proofs, and additional discussion.

2013 Kamalika Chaudhuri, Anand D. Sarwate and Kaushik Sinha.

CHAUDHURI, SARWATE AND SINHA

use the transformed data for learning tasks such as classiﬁcation or regression, and greatly reduce

the space required to store the data. One of the oldest and most classical methods for dimensionality

reduction is principal components analysis (PCA), which computes a low-rank approximation to the

second moment matrix A of a s et of points in R

. The rank k of the approximation is chosen to be

the intrinsic dimension of the data. We view this procedure as specifying a k-dimensional subspace

of R

Much of today’s machine-learning is per formed on the vast amounts of personal information

collected by private companies and government agencies about individuals: examples include user

or customer behaviors, demographic surveys, and test results from experimental subjects or pa-

tients. These data sets contain sensitive information about individuals and typically involve a large

number of features. It is therefore important to design machine-learning algorithms which discover

important structural relationships in the data while taking into account its sensitive nature. We study

approximations to PCA which guarantee differential privacy, a cryptographically motivated deﬁni-

tion of privacy (Dwork et al., 2006b) that has gained signiﬁcant attention over the past few years in

the machine-learning and data-mining communities (Machanavajjhala et al., 2008; McSherry and

Mironov, 2009; McSherry, 2009; Friedman and Schuster, 2010; Mohammed et al., 2011). Differen-

tial privacy measures privacy risk by a parameter ε

that bounds the log-likelihood ratio of output

of a (private) algorithm under two databases differing in a single individual.

There are many general tools for providing differential privacy. The sensitivity method due to

Dwork et al. (2006b) computes the desired algorithm (in our case, PCA) on the data and then adds

noise proportional to the maximum change than can be induced by changing a single point in the

data set. The PCA algorithm is very sensitive in this sense because the top eigenvector can change

by 90

◦

by changing one point in the data set. Relaxations such as smoothed sensitivity (Nissim

et al., 2007) are difﬁcult to compute in this setting as well. The SUb Linear Queries (SULQ) method

of Blum et al. (2005) adds noise to the second moment matrix and then runs PCA on the noisy

matrix. As our experiments show, the noise level required by SULQ may severely impact the quality

of approximation, making it impractical for data sets of moderate size.

The goal of this paper is to characterize the problem of differentially private PCA. We assume

that the algorithm is given n data points and a target dimension k and must produce a k-dimensional

subspace that approximates that produced by the standard PCA problem. We propose a new algo-

rithm, PPCA, which is an instance of the exponential mechanism of McSherry and Talwar (2007).

Unlike SULQ, PPCA explicitly takes into account the quality of approximation—it outputs a k-

dimensional subspace which is biased towards subspaces close to the output of PCA. In our case,

the method corresponds to sampling from the matrix Bingham distribution. We implement PPCA

using a Markov Chain Monte Carlo (MCMC) procedure due to Hoff (2009); simulations show that

the subspace produced by PPCA captures more of the variance of A than SULQ. When the MCMC

procedure converges, the algorithm provides differential privacy.

In order to understand the performance gap, we prove sample complexity bounds for the case

of k = 1 for SULQ and PPCA, as well as a general lower bound on the sample complexity for

any differentially private algorithm. We show that the sample complexity scales as Ω(d

3/2

√

logd)

for SULQ and as O(d) for PPCA. Furthermore, we show that any differentially private algorithm

requires Ω(d) samples. Therefore PPCA is nearly optimal in terms of sample complexity as a

function of data dimension. These theoretical results suggest that our experiments demonstrate the

limit of how well ε

-differentially private algorithms can perform, and our experiments show that

this gap should persist for general k. The result seems pessimistic for many applications, because

2906

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

the sample complexity depends on the extrinsic dimension d rather than the intrinsic dimension k.

However, we believe this is a consequence of the fact that we make minimal assumptions on the

data; our results imply that, absent additional limitations on the data set, the sample complexity

differentially private PCA must grow linearly with the data dimension.

There are several interesting open questions suggested by this work. One set of issues is com-

putational. Differentially privacy is a mathematical deﬁnition, but algorithms must be implemented

using ﬁnite precision machines. Privacy and computation interact in many places, including pseu-

dorandomness, numerical stability, optimization, and in the MCMC procedure we use to implement

PPCA; investigating the impact of approximate sampling is an avenue for future work. A second set

of issues is theoretical—while the privacy guarantees of PPCA hold for all k, our theoretical anal-

ysis of sample complexity applies only to k = 1 in which the distance and angles between vectors

are related. An interesting direction is to develop theoretical bounds for general k; challenges here

are providing the right notion of approximation of PCA, and extending the theor y using packings of

Grassmann or Stiefel manifolds. Finally, in this work we assume k is given to the algorithm, but in

many applications k is chosen after looking at the data. Under differential privacy, the selection of

k itself must be done in a differentially private manner.

1.1 Related Work

Differential privacy was ﬁrst proposed by Dwork et al. (2006b). There has been an extensive liter-

ature following this work in the computer science theory, machine learning, and databases commu-

nities. A survey of some of the theoretical work can be found in the survey by Dwork and Smith

(2009). Differential privacy has been shown to have strong semantic guarantees (Dwork et al.,

2006b; Kasiviswanathan and Smith, 2008) and is resistant to many attacks (Ganta et al., 2008) that

succeed against alternative deﬁnitions of privacy. In particular, so-called syntactic deﬁnitions of

privacy (Sweeney, 2002; Machanavajjhala et al., 2006; Li et al., 2010) may be susceptible to attacks

based on side-information about individuals in the database.

There are several general approaches to constructing differentially private approximations to

some desired algorithm or computation. I nput perturbation (Blum et al., 2005) adds noise to the

data prior to performing the desired computation, whereas output perturbation (Dwork et al., 2006b)

adds noise to the output of the desired computation. The exponential mechanism (McSherry and

Talwar, 2007) can be used to perform differentially private selection based on a score function that

measures the quality of different outputs. Objective perturbation (Chaudhuri et al., 2011) adds noise

to the objective function for algorithms which are convex optimizations. These approaches and

related ideas such as Nissim et al. (2007) and Dwork and Lei (2009) have been used to approximate

a variety of statistical, machine learning, and data mining tasks under differential privacy (Barak

et al., 2007; Wasserman and Zhou, 2010; Smith, 2011; McSherry and Mironov, 2009; Williams and

McSherry, 2010; Chaudhuri et al., 2011; Rubinstein et al., 2012; Nis sim et al., 2007; Blum et al.,

2008; McSherry and Talwar, 2007; Friedman and Schuster, 2010; Hardt and Roth, 2012).

This paper deals with the problem of differentially private approximations to PCA. Prior to our

work, the only proposed method for PCA was the Sub-Linear Queries (SULQ) method of Blum

et al. (2005). This approach adds noise to the second moment matrix of the data before calcu-

lating the singular value decomposition. By contrast, our algorithm, PPCA, uses the exponential

mechanism (McSherry and Talwar, 2007) to choose a k-dimensional subspace biased toward those

which capture more of “energy” of the matrix. Subsequent to our work, Kapralov and Talwar (2013)

2907

CHAUDHURI, SARWATE AND SINHA

have proposed a dynamic programming algorithm for differentially private low rank matrix approx-

imation which involves sampling from a distribution induced by the exponential mechanism. The

running time of their algorithm is O(d

), where d is the data dimension, and it is unclear how this

may affect its implementation. Hardt and Roth (Hardt and Roth, 2012, 2013) have studied low-rank

matrix approximation under additional incoherence assumptions on the data. In particular, Hardt

and Roth (2012) consider the problem of differentially-private low-rank matrix reconstruction for

applications to sparse matrices; provided certain coherence conditions hold, they provide an algo-

rithm for constructing a rank 2k approximation B to a matrix A such that kA −Bk

is O(kA −A

plus some additional terms which depend on d, k and n; here A

is the best rank k approximation

to A. Hardt and Roth (2013) show a method for guaranteeing (ε, δ)-differential privacy under an

entry-wise neighborhood condition using the power method for calculating singular values. They,

like Kapralov and Talwar (2013), also prove bounds under spectral norm perturbations, and their

algorithm achieves the same error rates but with running time that is nearly linear in the number of

non-zeros in the data.

In addition to these works, other researchers have examined the interplay between projections

and differential privacy. Zhou et al. (2009) analyze a differentially private data release scheme

where a random linear transformation is applied to data to preserve differential privacy, and then

measures how much this transformation affects the utility of a PCA of the data. One example of a

random linear transformation is random projection, popularized by the Johnson-Lindenstrauss (JL)

transform. Blocki et al. (2012) show that the JL transform of the data preserves differential privacy

provided the minimum singular value of the data matrix is large. Kenthapadi et al. (2013) study

the problem of estimating the distance between data points with differential privacy using a random

projection of the data points.

There has been signiﬁcant work on other notions of privacy based on manipulating entries within

the database (Sweeney, 2002; Machanavajjhala et al., 2006; Li et al., 2010), for example by reducing

the resolution of certain features to create ambiguities. For more details on these and other alter-

native notions of privacy see Fung et al. (2010) for a survey with more references. An alternative

line of privacy-preserving data-mining work (Zhan and Matwin, 2007) is in the Secure Multiparty

Computation setting; one work (Han et al., 2009) studies privacy-preserving singular value decom-

position in this model. Finally, dimension reduction through random projection has been considered

as a technique for sanitizing data prior to publication (Liu et al., 2006); our work differs from this

line of work in that we offer differential privacy guarantees, and we only release the PCA subspace,

not actual data.

2. Preliminaries

The data given to our algorithm is a set of n vectors D = {x

,...,x

} where each x

corresponds

to the private value of one individual, x

∈ R

, and kx

k ≤ 1 for all i. Let X = [x

,...,x

] be the

matrix whose columns are the data vectors {x

}. Let A =

denote the d ×d second moment

matrix of the data. The matrix A is positive semideﬁnite, and has Frobenius norm

at most 1.

The problem of dimensionality reduction is to ﬁnd a “good” low-rank approximation to A. A

popular solution is to compute a rank-k matrix

A which minimizes the norm kA −

, where k is

much lower than the data dimension d. The Schmidt approximation theorem (Stewart, 1993) shows

that the minimizer is given by the singular value decomposition, also known as the PCA algorithm

in some areas of computer science.

2908

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

Deﬁnition 1 Suppose A is a positive semideﬁnite matrix whose ﬁrst k eigenvalues are distinct. Let

the eigenvalues of A be λ

(A) ≥ λ

(A) ≥ ··· ≥ λ

(A) ≥ 0 and let Λ be a diagonal matrix with

= λ

(A). The matrix A decomposes as

A = V ΛV

, (1)

where V is an orthonormal matrix of eigenvectors. The top-k PCA subspace of A is the matrix

(A) = [v

··· v

], (2)

where v

is the i-th column of V in (1). The k-th eigengap is ∆

= λ

−λ

k+1

Given the top-k subspace and the eigenvalue matrix Λ, we can form an approximation A

(k)

(A)Λ

(A)

to A, where Λ

contains the k largest eigenvalues in Λ. In the special case k = 1

we have A

(1)

= λ

(A)v

, where v

is the eigenvector corresponding to λ

(A). We refer to v

the top eigenvector of the data, and ∆ = ∆

is the eigengap. For a d ×k matrix

V with orthonormal

columns, the quality of

V in approximating V

(A) can be measured by

(

V ) = tr





. (3)

The

V which maximizes q(

V ) has columns equal to {v

: i ∈ [k]}, corresponding to the top-k eigen-

vectors of A.

Our theoretical results on the utility of our PCA approximation apply to the special case k = 1.

We prove results about the inner product between the output vector ˆv

and the true top eigenvector

( ˆv

) =

hˆv

. (4)

The utility in (4) is related to (3). If we write ˆv

in the basis spanned by {v

}, then

( ˆv

) = λ

( ˆv

)

∑

i=2

hˆv

Our proof techniques use the geometric properties of q

(·).

Deﬁnition 2 A randomized algorithm A(·) is an (ρ,η)-close approximation to the top eigenvector

if for all data sets D of n points we have

P (q

(A(D)) ≥ ρ) ≥ 1 −η,

where the probability is taken over A(·).

We study approximations to A to PCA that pr eserve the privacy of the underlying data. The

notion of privacy that we use is differential privacy, which quantiﬁes the privacy guaranteed by a

randomized algorithm A applied to a data set D.

2909

CHAUDHURI, SARWATE AND SINHA

Deﬁnition 3 An algorithm A(B) taking values in a set T provides ε

-differential privacy if

sup

D,D

′

µ(S | B = D)

µ(S | B = D

′

)

≤ e

where the ﬁrst supremum is over all measurable S ⊆ T , the second is over all data sets D and D

′

differing in a single entry, and µ(·|B) is the conditional distribution (measure) on T induced by

the output A(B) given a data set B. The ratio is interpreted to be 1 whenever the numerator and

denominator are both 0.

Deﬁnition 4 An algorithm A(B) taking values in a set T provides (ε

,δ)-differential privacy if

P (A(D) ∈ S) ≤ e



A(D

′

) ∈ S



+ δ,

for all measurable S ⊆ T and all data sets D and D

′

differing in a single entry.

Here ε

and δ are privacy parameters, where low ε

and δ ensure more privacy (Dwork et al.,

2006b; Wasserman and Zhou, 2010; Dwork et al., 2006a). The second privacy guarantee is weaker;

the parameter δ bounds the probability of failure, and is typically chosen to be quite small. In our

experiments we chose small but constant δ—Ganta et al. (2008) suggest δ <

is more appropriate.

In this paper we are interes ted in proving results on the sample complexity of differentially

private algorithms that approximate PCA. That is, for a given ε

and ρ, how large must the number

of individuals n in the data set be such that the algorithm is both ε

-differentially private and a

(ρ,η)-close approximation to PCA? It is well known that as the number of individuals n grows,

it is easier to guarantee the same level of privacy with relatively les s noise or perturbation, and

therefore the utility of the approximation also improves. Our results characterize how the privacy

and utility ρ scale with n and the tradeoff between them for ﬁxed n. We show that the sample

complexity depends on the eigengap ∆.

3. Algorithms and Re sults

In this section we describe differentially private techniques for approximating (2). The ﬁrst is a

modiﬁed version of the Sub-Linear Queries (SULQ) method (Blum et al., 2005). Our new algo-

rithm for differentially-private PCA, PPCA, is an instantiation of the exponential mechanism due

to McSherry and Talwar (2007). Both procedures are differentially private approximations to the

top-k subspace: SULQ guarantees (ε

,δ)-differential privacy and PPCA guarantees ε

-differential

privacy.

3.1 Input Perturbation

The only differentially-private approximation to PCA prior to this work is the SULQ method (Blum

et al., 2005). The SULQ method perturbs each entry of the empirical second moment matrix A

to ensure differential privacy and releases the top-k eigenvectors of this perturbed matrix. More

speciﬁcally, SULQ recommends adding a matrix N of i.i.d. Gaussian noise of variance

log

(d/δ)

and applies the PCA algorithm to A + N. This guarantees a weaker privacy deﬁnition known as

(ε

,δ)-differential privacy. One problem with this approach is that with probability 1 the matrix

A+N is not symmetric, so the largest eigenvalue may not be real and the entries of the corresponding

2910

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

eigenvector may be complex. Thus the SULQ algorithm, as written, is not a good candidate for

approximating PCA.

It is easy to modify SULQ to produce a an eigenvector with real entries that guarantees (ε

,δ)

differential privacy. In Algorithm 1, instead of adding an asymmetric Gaussian matrix, we add a

symmetric matrix with i.i.d. Gaussian entries N. That is, for 1 ≤ i ≤ j ≤ d, the variable N

i j

is an

independent Gaussian random variable with variance β

. Note that this matrix is symmetric but

not necessarily positive semideﬁnite, so some eigenvalues may be negative but the eigenvectors

are all real. A derivation for the noise variance in (5) of Algorithm 1 is given in Theorem 5. An

alternative is to add Laplace noise of an appropriate variance to each entry—this would guarantee

-differential privacy.

Algorithm 1: Algorithm MOD-SULQ (input pertubation)

inputs: d ×n data matrix X, privacy parameter ε

, parameter δ

outputs: d ×k matrix

= [ ˆv

ˆv

··· ˆv

] with orthonormal columns

1 Set A =

2 Set

β =

d + 1

nε

2log



+ d

δ2

√

2π



√

. (5)

Generate a d ×d symmetric random matrix N whose entries are i.i.d. drawn from N



0,β



. ;

3 Compute

= V

(A + N) according to (2). ;

3.2 Exponential Mechanism

Our new method, PPCA, randomly samples a k-dimensional subspace from a distribution that en-

sures differential privacy and is biased towards high utility. The distribution from which our re-

leased subspace is sampled is known in the statistics literature as the matrix Bingham distribution

(Chikuse, 2003), which we denote by BMF

(B). The algorithm and its privacy properties apply to

general k < d but our theoretical results on the utility focus on the special case k = 1. The matrix

Bingham distribution takes values on the set of all k-dimensional subspaces of R

and has a density

equal to

f (V ) =

1 1



d,B



exp(tr(V

BV )), (6)

where V is a d ×k matrix whose columns are orthonormal and F

1 1



d,B



is a conﬂuent hyper-

geometric function (Chikuse, 2003, p.33).

By combining results on the exponential mechanism along with properties of PCA algorithm,

we can show that this procedure is differentially private. In many cases, sampling from the dis tri-

bution speciﬁed by the exponential mechanism may be expensive computationally, especially for

continuous-valued outputs. We implement PPCA using a recently-proposed Gibbs sampler due to

Hoff (2009). Gibbs sampling is a popular Markov Chain Monte Carlo (MCMC) technique in which

samples are generated according to a Markov chain whose stationary distribution is the density in

2911

CHAUDHURI, SARWATE AND SINHA

Algorithm 2: Algorithm PPCA (exponential mechanism)

inputs: d ×n data matrix X, privacy parameter ε

, dimension k

outputs: d ×k matrix

= [ ˆv

ˆv

··· ˆv

] with orthonormal columns

1 Set A =

;

2 Sample

= BMF





;

(6). Assessing the “burn-in time” and other factors for this procedure is an interesting question in

its own right; fur ther details are in Section 6.2.

3.3 Other Approaches

There are other general algorithmic strategies for guaranteeing differential privacy. The sensitivity

method (Dwork et al., 2006b) adds noise proportional to the maximum change that can be induced

by changing a single point in the data set. Consider a data set D with m + 1 copies of a unit

vector u and m copies of a unit vector u

′

with u ⊥ u

′

and let D

′

have m copies of u and m +

1 copies of u

′

. Then v

(D) = u but v

′

) = u

′

, so

(D) −v

′

)

√

2. Thus the global

sensitivity does not scale with the number of data points, so as n increases the variance of the

noise required by the sensitivity method will not decrease. An alternative to global sensitivity is

smooth sensitivity (Nissim et al., 2007). Except for special cases, such as the sample median,

smooth sensitivity is difﬁcult to compute for general functions. A third method for computing

private, approximate solutions to high-dimensional optimization problems is objective perturbation

(Chaudhuri et al., 2011); to apply this method, we require the optimization problems to have certain

properties (namely, strong convexity and bounded norms of gradients), which do not apply to PCA.

3.4 Main Results

Our theoretical results are sample complexity bounds for PPCA and MOD-SULQ as well as a general

lower bound on the sample complexity for any ε

-differentially private algorithm. These results

show that the PPCA is nearly optimal in terms of the scaling of the sample complexity with respect

to the data dimension d, privacy parameter ε

, and eigengap ∆. We further show that MOD-SULQ

requires more samples as a function of d, despite having a slightly weaker privacy guarantee. Proofs

are presented in Sections 4 and 5.

Even though both algorithms can output the top-k PCA subspace for general k ≤ d, we prove

results for the case k = 1. Finding the scaling behavior of the sample complexity with k is an

interesting open problem that we leave for future work; challenges here are ﬁnding the right notion

of approximation of the PCA, and extending the theory using packings of Grassman or Stiefel

manifolds.

Theorem 5 For the β in (5) Algorithm MOD-SULQ is (ε

,δ) differentially private.

Theorem 6 Algorithm PPCA is ε

-differentially private.

The fact that these two algorithms are differentially private follows from some simple calcu-

lations. Our ﬁrst sample complexity result provides an upper bound on the number of samples

required by PPCA to guarantee a certain level of privacy and accuracy. The sample complexity of

2912

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

PPCA grows linearly with the dimension d, inversely with ε

, and inversely with the correlation gap

(1 −ρ) and eigenvalue gap ∆. These sample complexity results hold for k = 1.

Theorem 7 (Sample complexity of PPCA) If

n >

∆(1 −ρ)



log(1/η)

+ 2log

8λ

(1 −ρ

)∆



then the top PCA direction v

and the output of PPCA ˆv

with privacy parameter ε

satisfy

Pr(

, ˆv

> ρ) ≥1 −η.

That is, PPCA is a (ρ,η)-close approximation to PCA.

Our second result shows a lower bound on the number of samples required by any ε

-differentially-

private algorithm to guarantee a certain level of accuracy for a large class of data sets, and uses proof

techniques in Chaudhuri and Hsu (2011, 2012).

Theorem 8 (Sample complexity lower bound) Fix d ≥ 3, ε

, ∆ ≤

and let 1 − φ =

exp



−2 ·

ln8+ln(1+exp(d))

d−2



. For any ρ ≥ 1 −

1−φ

, no ε

-differentially private algorithm A can ap-

proximate PCA with expected utility greater than ρ on all databases with n points in dimension d

having eigenvalue gap ∆, where

n <

∆

max

(

1 −φ

80(1 −ρ)

)

Theorem 7 shows that if n scales like

∆(1−ρ)

log

1−ρ

then PPCA produces an approximation

ˆv

that has correlation ρ with v

, whereas Theorem 8 shows that n must scale like

∆

√

(1−ρ)

for any

-differentially private algorithm. In terms of scaling with d, ε

and ∆, the upper and lower bounds

match, and they also match up to square-root factors with respect to the correlation. By contrast, the

following lower bound on the number of samples required by MOD-SULQ to ensure a certain level

of accuracy shows that MOD-SULQ has a less favorable scaling with dimension.

Theorem 9 (Sample complexity lower bound for MOD-SULQ) There are constants c and c

′

such

that if

n < c

3/2

log(d/δ)

(1 −c

′

(1 −ρ)),

then there is a data set of size n in dimension d such that the top PCA direction v and the output ˆv

of MOD-SULQ satisfy E [

hˆv

] ≤ ρ.

Notice that the dependence on n grows as d

3/2

in SULQ as opposed to d in PPCA. Dimensionality

reduction via PCA is often used in applications where the data points occupy a low dimensional

space but are presented in high dimensions. These bounds suggest that PPCA is better suited to

such applications than MOD-SULQ.

2913

CHAUDHURI, SARWATE AND SINHA

4. Analysis of PPCA

In this section we provide theoretical guarantees on the performance of PPCA. The proof of The-

orem 6 follows from the results on the exponential mechanism (McSherry and Talwar, 2007). To

ﬁnd the sample complexity of PPCA we bound the density of the Bingham distribution, leading to

a sample complexity for k = 1 that depends on the gap λ

−λ

between the top two eigenvalues.

We also prove a general lower bound on the sample complexity that holds for any ε

-differentially

private algorithm. The lower bound matches our upper bound up to log factors, showing that PPCA

is nearly optimal in terms of the scaling with dimension, privacy ε

, and utility q

(·).

4.1 Privacy Guarantee

We ﬁrst give a proof of Theorem 6.

Proof Let X be a data matrix whose i-th column is x

and A =

. The PP-PCA algorithm is

the exponential mechanism of M cSherry and Talwar (2007) applied to the score function n ·v

Av.

Consider X

′

= [x

··· x

n−1

′

] differing from X in a single column and let A

′

′T

. We have

max

v∈S

d−1



n ·v

′

v −n ·v



≤



′

′T

−x



≤





′



−





≤ 1.

The last step follows because kx

k≤ 1 for all i. The result now follows immediately from McSherry

and Talwar (2007, Theorem 6).

4.2 Upper Bound on Utility

The results on the exponential mechanism bound the gap between the value of the function q

( ˆv

) =

n· ˆv

A ˆv

evaluated at the output ˆv

of the mechanism and the optimal value q(v

) = n ·λ

. We derive

a bound on the correlation q

( ˆv

) = |hˆv

i| via geometric arguments.

Lemma 10 (Lemmas 2.2 and 2.3 of Ball (1997)) Let µ be the uniform measure on the unit sphere

d−1

. For any x ∈ S

d−1

and 0 ≤ c < 1 the following bounds hold:

exp



−

d −1

log

1 −c



≤ µ

n

v ∈ S

d−1

: hv,xi ≥ c

o

≤ exp



−dc



We are now ready to provide a proof of Theorem 7.

Proof Fix a privacy level ε

, target correlation ρ, and probability η. Let X be the data matrix and

B = (ε

/2)XX

and

{

u : |hu,v

i| ≥ ρ

}

be the union of the two spherical caps centered at ±v

. Let

denote the complement of U

d−1

An output vector ˆv

is “good” if it is in U

. We ﬁrst give some bounds on the score function

(u) on the boundary between U

and U

, where hu,v

i = ±ρ. On this boundary, the function

2914

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

(u) is maximized when u is a linear combination of v

and v

, the top two eigenvectors of A. It

minimized when u is a linear combination of v

and v

. Therefore

(u) ≤

nε

(ρ

+ (1 −ρ

)λ

) u ∈

, (7)

(u) ≥

nε

(ρ

+ (1 −ρ

)λ

) u ∈ U

. (8)

Let µ(·) denote the uniform measure on the unit s phere. Then ﬁxing an 0 ≤ b < 1, using (7),

(8), and the fact that λ

≥ 0,





≤





P (U

)

1 1

(

m,B

)

exp





dµ

1 1

(

m,B

)

exp(u

Bu)dµ

≤

exp



n(ε

/2)



+ (1 −ρ

)λ



·µ





exp(n(ε

/2)(σ

+ (1 −σ

)λ

)) ·µ (U

)

≤ exp



−

nε



−(ρ

+ (1 −ρ

)λ

)









µ(U

)

. (9)

Applying the lower bound from Lemma 10 to the denominator of (9) and the upper bound µ





≤

1 yields





≤ exp



−

nε



−(ρ

+ (1 −ρ

)λ

)





·exp



d −1

log

1 −σ



. (10)

We must choose a σ

> ρ

to make the upper bound smaller than 1. More precisely,

> ρ

+ (1 −ρ

)

1 −σ

< (1 −ρ

)



1 −



For simplicity, choose

1 −σ

(1 −ρ

)



1 −



So that

−(ρ

+ (1 −ρ

)λ

) = (1 −ρ

)λ

−(1 −σ

)λ

−(1 −ρ

)λ

= (1 −ρ

)



−

(λ

−λ

) −λ



(1 −ρ

)(λ

−λ

)

2915

CHAUDHURI, SARWATE AND SINHA

and

log

1 −σ

< log

1 −σ

= log

8λ

(1 −ρ

)(λ

−λ

)

Setting the right hand side of (10) less than η yields

nε

(1 −ρ

)(λ

−λ

) > log

d −1

log

8λ

(1 −ρ

)(λ

−λ

)

Because 1 −ρ < 1 −ρ

, if we choose

n >

(1 −ρ)(λ

−λ

)



log(1/η)

+ 2log

8λ

(1 −ρ

)(λ

−λ

)



then the output of PPCA will produce a ˆv

such that

P (

hˆv

< ρ) < η.

4.3 Lower Bound on Utility

We now turn to a general lower bound on the sample complexity for any differentially private ap-

proximation to PCA. We construct K databases which differ in a small number of points whose top

eigenvectors are not too far from each other. For such a collection, Lemma 12 shows that for any

differentially private mechanism, the average correlation over the collection cannot be too large.

That is, any ε

-differentially private mechanism cannot have high utility on all K data sets. T he

remainder of the argument is to construct these K data sets.

The proof uses some simple eigenvalue and eigenvector computations. A matrix of positive

entries

A =



a b

b c



(11)

has characteristic polynomial

det(A −λI) = λ

−(a + c)λ + (ac −b

)

and eigenvalues

λ =

(a + c) ±

(a + c)

−4(ac −b

)

(a + c) ±

(a −c)

+ 4b

The eigenvectors are in the directions (b,−(a −λ))

We will also need the following Lemma, which is proved in the Appendix.

2916

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

Lemma 11 (Simple packing set) For φ ∈ [(2πd)

−1/2

,1), there exists a set of

K =

exp

(d −1)log

1 −φ

(12)

vectors C in S

d−1

such that for any pair µ,ν ∈ C, the inner product between them is upper bounded

by φ:

hµ,νi

≤ φ.

The following Lemma gives a lower bound on the expected utility averaged over a set of

databases which differ in a “small” number of elements.

Lemma 12 Let D

,...,D

be K databases which differ in the value of at most

ln(K−1)

points,

and let u

,...,u

be the top eigenvectors of D

,...,D

. If A is any ε

-differentially private

algorithm, then,

∑

i=1

[

hA(D

),u

] ≤ K



1 −

(1 −max



)



Proof Let

t = min

i6= j

(



−u



+ u



and G

be the “double cap” around ±u

of radius t/2:

{

u :

u −u

< t/2

}

∪

{

u :

u + u

< t/2

}

We claim that

∑

i=1

(A(D

) /∈ G

) ≥

(K −1). (13)

The proof is by contradiction. Suppose the claim is false. Because all of the caps G

are disjoint,

and applying the deﬁnition of diff erential privacy,

(K −1) >

∑

i=1

(A(D

) /∈ G

)

≥

∑

i=1

∑

′

6=i

(A(D

) ∈ G

′

)

≥

∑

i=1

∑

′

6=i

−ε

·ln(K−1)/ε

(A(D

′

) ∈ G

′

)

≥ (K −1) ·

K −1

∑

i=1

(A(D

) ∈ G

)

≥ K −

(K −1),

2917

CHAUDHURI, SARWATE AND SINHA

which is a contradiction, so (13) holds. Therefore by the Markov inequality

∑

i=1

min(

A(D

) −u

A(D

) + u

)

≥

∑

i=1

P(A(D

) /∈ G

) ·

≥

(K −1)t

Rewr iting the norms in terms of inner products shows

2K −2

∑

i=1

[

hA(D

),u

] ≥

(K −1)



2 −2 max





∑

i=1

[

hA(D

),u

] ≤ K



1 −

K −1

(1 −max



)



≤ K



1 −

(1 −max



)



We can now prove Theorem 8.

Proof From Lemma 12, given a set of K databases differing in

ln(K−1)

points with top eigenvectors

: i = 1,2,...,K}, for at least one database i,

[

hA(D

),u

] ≤ 1 −



1 −max





for any ε

-differentially private algorithm. Setting the left side equal to some target ρ,

1 −ρ ≥



1 −max





. (14)

So our goal is construct these data bases such that the inner product between their eigenvectors is

small.

Let y = e

, the d-th coordinate vector, and let φ ∈ ((2πd)

−1/2

,1). Lemma 11 shows that there

exists a packing W = {w

,...,w

} of the sphere S

d−2

spanned by the ﬁr st d −1 elementary

vectors {e

,...,e

d−1

} such that max

i6= j

|hw

i| ≤ φ, where

K =

(1 −φ)

−(d−2)/2

Choose φ such that ln(K −1) = d. This means

1 −φ = exp



−2 ·

ln8 + ln(1 + exp(d))

d −2



The right side is minimized for d = 3 but this leads to a weak lower bound 1 −φ > 3.5 ×10

−5

. By

contrast, for d = 100, the bound is 1 −φ > 0.12. In all cases, 1 −φ is at least a constant value.

2918

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

We construct a database with n points for each w

. Let β =

nε

. For now, we assume that

β ≤ ∆ ≤

. The other case, when β ≥∆ will be considered later. Because β ≤ ∆, we have

n >

∆

The construction uses a parameter 0 ≤ m ≤ 1 that will be set as a function of the eigenvalue gap ∆.

We will derive conditions on n based on the requirements on d, ε

, ρ, and ∆. For i = 1,2,...,K let

the data set D

contain

• n(1 −β) copies of

√

• nβ copies of z

√

y +

√

Thus data sets D

and D

differ in the values of nβ =

ln(K−1)

nε

individuals. The second moment

matrix A

of D

= ((1 −β)m +

β)yy

β(w

y + yw

) +

βw

By choosing an basis containing y and w

, we can write this as





(1 −β)m +

β 0

0 0 0





This is in the form (11), with a = (1 −β)m +

β, b =

β, and c =

β.

The matrix A

has two nonzero eigenvalues given by

λ =

(a + c) +

(a −c)

+ 4b

, (15)

′

(a + c) −

(a −c)

+ 4b

The gap ∆ between the top two eigenvalues is:

∆ =

(a −c)

+ 4b

(1 −β)

+ β

We can thus set m in the construction to ensure an eigengap of ∆:

m =

(∆

−β

)

1 −β

. (16)

The top eigenvector of A

is given by

+ (a −λ)

y +

(a −λ)

+ (a −λ)

2919

CHAUDHURI, SARWATE AND SINHA

where λ is given by (15). Therefore

max

i6= j



≤

+ (a −λ)

(a −λ)

+ (a −λ)

max

i6= j



≤ 1 −

(a −λ)

+ (a −λ)

(1 −φ). (17)

To obtain an upper bound on max

i6= j



we must lower bound

(a−λ)

+(a−λ)

Since x/(ν + x) is monotonically increasing in x when ν > 0, we will ﬁnd a lower bound on

(a −λ). Observe that from (15),

λ −a =

λ −c

So to lower bound λ −a we need to upper bound λ −c. We have

λ −c =

(a −c) +

∆ =

((1 −β)m + ∆).

Because b = β/2,

(λ −a)



2((1 −β)m + ∆)



4((1 −β)m + ∆)

Now,

(a −λ)

+ (a −λ)

((1 −β)m + ∆)

+ β

+ ((1 −β)m + ∆)

5∆

, (18)

where the last step follows by plugging in m from (16) and using the fact that β ≤ ∆. Putting it all

together, we have from (14), (17), and (18), and using the fact that φ is such that ln(K −1) = d and

β =

nε

1 −ρ ≥

(a −λ)

+ (a −λ)

(1 −φ)

1 −φ

∆

1 −φ

∆

which implies

n >

∆

1 −φ

80(1 −ρ)

2920

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

Thus f or β ≤ ∆ ≤ 1/2, any ε

-differentially private algorithm needs Ω



∆

√

1−ρ



points to get

expected inner product ρ on all data sets with eigengap ∆.

We now consider the case where β > ∆. We choose a slightly different construction here. T he

i-th database now consists of n(1 −β) copies of the 0 vector, and nβ copies of

∆

. Thus, every pair

of databases differ in the values of nβ =

ln(K−1)

people, and the eigenvalue gap between the top two

eigenvectors is β ·

∆

= ∆.

As the top eigenvector of the i-th database is u

= w

max

i6= j

|hu

i| = max

i6= j

|hw

i| ≤ φ.

Combining this with (14), we obtain

1 −ρ ≥

(1 −φ),

which provides the additional condition in the Theorem.

5. Analysis of MOD-SULQ

In this section we provide theoretical guarantees on the performance of the MOD-SULQ algorithm.

Theorem 5 shows that MOD-SULQ is (ε

,δ)-differentially private. Theorem 15 provides a lower

bound on the distance between the vector released by MOD-SULQ and the true top eigenvector in

terms of the privacy parameters ε

and δ and the number of points n in the data set. This im-

plicitly gives a lower bound on the sample complexity of MOD-SULQ. We provide some graphical

illustration of this tradeoff.

The following upper bound will be useful for future calculations : for two unit vectors x and y,

∑

1≤i≤j≤d

−y

)

≤ 2. (19)

Note that this upper bound is achievable by setting x and y to be orthogonal elementary vectors.

5.1 Privacy Guarantee

We ﬁrst justify the choice of β

in the MOD-SULQ algorithm by proving Theorem 5.

Proof Let B and

B be two independent symmetric random matrices where {B

i j

: 1 ≤i ≤ j ≤d} and

{

i j

: 1 ≤ i ≤ j ≤d} are each sets of i.i.d. Gaussian random variables with mean 0 and variance β

Consider two data sets D = {x

: i = 1, 2, . . .,n} and

D = D

∪{ˆx

}\{x

} and let A and

A denote

their second moment matrices. L et G = A +B and

G =

A +

B. We ﬁrst calculate the log ratio of the

densities of G and

G at a point H:

log

(H)

∑

1≤i≤j≤d



−

2β

i j

−A

i j

)

2β

i j

−

i j

)



2β

∑

1≤i≤j≤d



i j

−A

i j

)(x

n,i

n, j

− ˆx

n,i

ˆx

n, j

) +

( ˆx

n,i

ˆx

n, j

−x

n,i

n, j

)



2921

CHAUDHURI, SARWATE AND SINHA

From (19) the last term is upper bounded by 2/n

. To upper bound the ﬁrst term,

∑

1≤i≤j≤d

|ˆx

n,i

ˆx

n, j

−x

n,i

n, j

| ≤ 2 max

≤1

∑

1≤i≤j≤d

≤ 2 ·

+ d) ·

= d + 1.

Note that this bound is not too loose—by taking ˆx = d

−1/2

1 and x = (1,0,...,0)

, this term is still

linear in d.

Then for any measurable set S of matrices,

P (G ∈ S) ≤ exp



2β



(d + 1)γ +





G ∈ S



+ P (B

i j

> γ for all i, j). (20)

To handle the last term, use a union bound over the (d

+ d)/2 variables {B

i j

} together with the

tail bound, which holds for γ > β:

P (B

i j

> γ) ≤

√

2π

−γ

/2β

Thus setting P (B

i j

> γ for some i, j) = δ yields the condition

δ =

+ d

√

2π

−γ

/2β

Rearranging to solve for γ gives

γ = max

β,β

2log



+ d

δ2

√

2π



= β

2log



+ d

δ2

√

2π



for d > 1 and δ < 3/

√

2πe. This then gives an expression for ε

to make (20) imply (ε

,δ) differ-

ential privacy:

2β



(d + 1)γ +



2β

(d + 1)β

2log



+ d

δ2

√

2π



Solving for β using the quadratic formula yields the particularly messy expression in (5):

β =

d + 1

2nε

2log



+ d

δ2

√

2π



2nε



2(d + 1)

log



+ d

δ2

√

2π



+ 4ε



1/2

≤

d + 1

nε

2log



+ d

δ2

√

2π



√

2922

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

5.2 Proof of Theorem 9

In this section we provide theoretical guarantees on the performance of the MOD-SULQ algorithm.

Theorem 5 shows that MOD-SULQ is (ε

,δ)-differentially private. Theorem 15 provides a lower

bound on the distance between the vector released by MOD-SULQ and the true top eigenvector in

terms of the privacy parameters ε

and δ and the number of points n in the data set. This implicitly

gives a lower bound on the sample complexity of MOD-SULQ. We provide some graphical illus-

tration of this tradeoff. The main tool in our lower bound is a generalization by Yu (1997) of an

information-theoretic inequality due to Fano.

Theorem 13 (Fano’s inequality (Yu, 1997)) Let R be a set and Θ be a parameter space with a

pseudo-metric d(·). Let F be a set of r densities {f

,..., f

} on R corresponding to parameter

values {θ

,...,θ

} in Θ. Let X have distribution f ∈ F with corresponding parameter θ and let

θ(X) be an estimate of θ. If, for all i and j

d(θ

,θ

) ≥ τ

and

KL( f

) ≤ γ,

then

max

[d(

θ,θ

)] ≥



1 −

γ + log 2

logr



where E

[·] denotes the expectation with respect to distribution f

To use this inequality, we will construct a set of densities on the set of covariance matrices

corresponding distribution of the random matrix in the MOD-SULQ algorithm under different inputs.

These inputs will be chosen using a set of unit vectors which are a packing on the surface of the unit

sphere.

Lemma 14 Let Σ be a positive deﬁnite matrix and let f denote the density N (a,Σ) and g denote

the density N (b,Σ). Then KL( f kg) =

(a −b)

−1

(a −b).

Proof This is a simple calculation:

KL( f kg) = E

x∼f



−

(x −a)

−1

(x −a) +

(x −b)Σ

−1

(x −b)





−1

a −a

−1

b −b

−1

a + b

−1



(a −b)

−1

(a −b).

The next theorem is a lower bound on the expected distance between the vector output by MOD-

SULQ and the true top eigenvector. In order to get this lower bound, we construct a class of data

sets and use Theorem 13 to derive a bound on the minimax error over the class.

2923

CHAUDHURI, SARWATE AND SINHA

Theorem 15 (Utility bound for MOD-SULQ) Let d, n, and ε

> 0 be given and let β be given by

Algorithm 1 so that the output of MOD-SULQ is (ε

,δ)-differentially private for all data sets in R

with n elements. Then there exists a data set with n elements such that if ˆv

denotes the output of

MOD-SULQ and v

is the top eigenvector of the empirical covariance matrix of the data set, the

expected correlation hˆv

i is upper bounded:

E [|hˆv

i|] ≤ min

φ∈Φ







1 −

(1 −φ)





1 −

1/β

+ log2

(d −1)log

√

1−φ

−log(8)











, (21)

where

Φ ∈

max

(

√

2πd

1 −exp



−

2log(8d)

d −1



1 −exp



−

2/β

+ log(256)

d −1



)

. (22)

Proof For φ ∈ [(2πd)

−1/2

,1), Lemma 11 shows there exists a set of K unit vectors C such that for

µ,ν ∈ C , the inner product between them satisﬁes

hµ,νi

< φ, where K is given by (12). Note that

for small φ this setting of K is loose, but any orthonormal basis provides d unit vectors which are

orthogonal, setting K = d and solving for φ yields



1 −exp



−

2log(8d)

d −1



1/2

Setting the lower bound on φ to the maximum of these two yields the set of φ and K which we will

consider in (22).

For any unit vector µ, let

A(µ) = µµ

+ N,

where N is a d ×d symmetric random matrix such that {N

i j

: 1 ≤ i ≤ j ≤ d} are i.i.d. N (0,β

where β

is the noise variance used in the MOD-SULQ algorithm. Due to s ymmetry, the matrix A(µ)

can be thought of as a jointly Gaussian random vector on the d(d + 1)/2 variables {A

i j

(µ) : 1 ≤i ≤

j ≤ d}. The mean of this vector is

¯µ =



,µ

,...,µ

,µ

,...,µ

d−1



and the covariance is β

d(d+1)/2

. Let f

denote the density of this vector.

For µ,ν ∈ C , the divergence between f

and f

can be calculated using Lemma 14:

KL( f

) =

(¯µ −

ν)

−1

(¯µ −

ν)

2β

¯µ −

≤

. (23)

The last line follows from the fact that the vectors in C are unit norm.

2924

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

For any two vectors µ,ν ∈C , lower bound the Euclidean distance between them using the upper

bound on the inner product:

µ −ν

≥

2(1 −φ). (24)

Let Θ = S

d−1

with the Euclidean norm and R be the set of distributions {A(µ) : µ ∈ Θ}. From

(24) and (23), the set C satisﬁes the conditions of Theorem 13 with F = {f

: µ ∈ C}, r = K,

τ =

2(1 −φ), and γ =

. The conclusion of the Theorem shows that for MOD-SULQ,

max

µ∈C

[

ˆv −µ

] ≥

2(1 −φ)



1 −

1/β

+ log2

logK



. (25)

This lower bound is vacuous when the term inside the parenthesis is negative, which imposes further

conditions on φ. Setting log K = 1/β

+ log2, we can solve to ﬁnd another lower bound on φ:

φ ≥

1 −exp



−

2/β

+ log(256)

d −1



This yields the third term in (22). Note that for larger n this term will dominate the others.

Using Jensen’s inequality on the the left side of (25):

max

µ∈C

[2(1 −|hˆv,µi|)] ≥

(1 −φ)



1 −

1/β

+ log2

logK



So there exists a µ ∈ C such that

[|hˆv,µi|] ≤ 1 −

(1 −φ)



1 −

1/β

+ log2

logK



. (26)

Consider the data set consisting of n copies of µ. The corresponding covariance matrix is µµ

with

top eigenvector v

= µ. The output of the algorithm MOD-SULQ applied to this data set is an esti-

mator of µ and hence satisﬁes (26). Minimizing over φ gives the desired bound.

The minimization over φ in (21) does not lead to analytically pretty results, so we plotted the

results in Figure 1 in order to get a sense of the bounds. Figure 1 shows the lower bound on the

expected correlation E [|hˆv

i|] as a function of the number of data points (given on a logarithmic

scale). Each panel shows a different dimension, from d = 50 to d = 1000, and plots are given for

different values of ε

ranging from 0.01 to 2. In all experiments we set δ = 0.01. In high dimension,

the lower bound shows that the expected performance of MOD-SULQ is poor when there are a small

number of data points. This limitation may be particularly acute when the data lies in a very low

dimensional subspace but is presented in very high dimension. In such “sparse” settings, perturbing

the input as in MOD-SULQ is not a good approach. However, in lower dimensions and data-rich

regimes, the performance may be more favorable.

A little calculation yields the sample complexity bound in Theorem 9

Proof Suppose E [|h ˆv

i|] = ρ. Then a little algebra shows

1 −ρ ≥min

φ∈Φ

1 −φ





1 −

1/β

+ log2

(d −1)log

√

1−φ

−log(8)





2925

CHAUDHURI, SARWATE AND SINHA

50 100

300 1000

0.85

0.90

0.95

1.00

0.85

0.90

0.95

1.00

3 4 5 6 3 4 5 6

log

(n)

Utility (correlation)

privacy

0.01

0.05

0.1

0.2

0.5

Utility as a function of sample size

Figure 1: Upper bound from Theorem 15 on the expected correlation between the true top eigenvec-

tor and the ˆv

produced by MOD-SULQ. The horizontal axis is log

(n) and the vertical

axis shows the lower bound in (21). The four panels correspond to different values of the

dimension d, from 50 to 1000. Each panel contains plots of the bound for different values

of ε

Setting φ such that (d −1)log

√

1−φ

−log(8) = 2(1/β

+ log2) we have

1 −ρ ≥

1 −φ.

Since we are concerned with the scaling behavior for large d and n, this implies

log

1 −φ

= Θ





φ =

1 −exp



−Θ





= Θ

From Algorithm 1, to get for some constant c

, we have the following lower bound on β:

> c

log(d/δ).

2926

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

Substituting, we get for some constants c

and c

that

(1 −c

(1 −ρ)) ≤c

log(d/δ)

Now solving for n shows

n ≥ c

3/2

log(d/δ)



1 −c

′

(1 −ρ)



6. Experiments

We next turn to validating our theoretical results on real data. We implemented MOD-SULQ and

PPCA in order to test our theoretical bounds. Implementing PPCA involved using a Gibbs sampling

procedure (Hoff, 2009). A crucial parameter in MCMC procedures is the burn-in time, which is

how long the chain must be run for it to reach its stationary distribution. Theoretically, chains reach

their stationary distribution only in the limit; however, in practice MCMC users must sample after

some ﬁnite time. In order to use this procedure appropriately, we determined a burn-in time using

our data sets. The interaction of MCMC procedures and differential privacy is a rich area for future

research.

6.1 Data and Preprocessing

We report on the performance of our algorithm on some real data sets. We chose four data sets

from four different domains—

kddcup99

(Bache and Lichman, 2013), which includes features of

494,021 network connections,

census

(Bache and Lichman, 2013), a demographic data set on

199,523 individuals,

localization

(Kalu

za et al., 2010), a medical data set with 164,860 instances

of sensor readings on individuals engaged in different activities, and

insurance

(van der Putten and

van Someren, 2000), a data set on product usage and demographics of 9,822 individuals.

These data sets contain a mix of continuous and categorical features. We preprocessed each

data set by converting a feature with q discrete values to a vector in {0,1}

; after prepr ocessing, the

data sets

kddcup99

census

localization

and

insurance

have dimensions 116, 513, 44 and 150

respectively. We also normalized each row so that each entry has maximum value 1, and normalize

each column such that the maximum (Euclidean) column norm is 1. We choose k = 4 for

kddcup

k = 8 for

census

, k = 10 for

localization

and k = 11 for

insurance

; in each case, the utility

) of the top-k PCA subspace of the data matrix accounts for at least 80% of

. Thus, all

four data sets, although fairly high dimensional, have good low-dimensional r epresentations. The

properties of each data set are summarized in Table 1.

6.2 Implementation of Gibbs Sampling

The theoretical analysis of PPCA uses properties of the Bingham distribution BMF

(·) given in (6).

To implement this algorithm for experiments we use a Gibbs sampler due to Hoff (2009). The Gibbs

sampling scheme induces a Markov Chain, the stationary distribution of which is the density in (6).

2927

CHAUDHURI, SARWATE AND SINHA

Data Set #instances #dimensions k q

) q

kddcup

494,021 116 4 0.6587 0.96

census

199,523 513 8 0.7321 0.81

localization

164,860 44 10 0.5672 0.81

insurance

9,822 150 11 0.5118 0.81

Table 1: Parameters of each data set. The second column is the number of dimensions after prepro-

cessing. k is the dimensionality of the PCA, the third column contains q

), where V

the top-k PCA subspace, and the ﬁfth column is the normalized utility q

Gibbs sampling and other MCMC procedures are widely used in statistics, scientiﬁc modeling, and

machine learning to estimate properties of complex distributions (Brooks, 1998).

Finding the speed of convergence of MCMC methods is still an open area of research. There has

been much theoretical work on estimating convegence times (Jones and Hobart, 2004; Douc et al.,

2004; Jones and Hobart, 2001; Roberts, 1999; Roberts and Sahu, 2001; Roberts, 1999; Roberts and

Sahu, 2001; Rosenthal, 1995; Kolassa, 1999, 2000), but unfortunately, most theoretical guarantees

are available only in special cases and are often too weak for practical use. In lieu of theoretical

guarantees, users of MCMC methods empirically estimate the burn-in time, or the number of iter-

ations after which the chain is sufﬁciently close to its stationary distribution. Statisticians employ

a range of diagnostic methods and statistical tests to empirically determine if the Markov chain

is close to stationarity (Cowles and Carlin, 1996; Brooks and Roberts, 1998; Brooks and Gelman,

1998; El Adlouni et al., 2006). These tests do not provide a sufﬁcient guarantee of stationarity, and

there is no “best test” to use. In practice, the convergence of derived statistics is used to estimate an

appropriate the burn-in time. In the case of the Bingham distribution, Hoff (2009) performs qualita-

tive measures of convergence. Developing a better characterization of the convergence of this Gibbs

sampler is an important question for future work.

Because the MCMC procedure of Hoff (2009) does not come with convergence-time guaran-

tees, for our experiments we had to choose an appropriate burn-in time. The “ideal” execution of

PPCA provides ε

-differential privacy, but because our implementation only approximates sampling

from the Bingham distribution, we cannot guarantee that this implementation provides the privacy

guarantee. As noted by Mironov (2012), even current implementations of ﬂoating-point arithmetic

may suffer from privacy problems, so there is still signiﬁcant work to do between theory and imple-

mentation. For this paper we tried to ﬁnd a burn-in time that was sufﬁciently long so that we could

be conﬁdent that the empirical performance of PPCA was not affected by the initial conditions of

the sampler.

In order to choose an appropriate burn-in time, we examined the time series trace of the Markov

Chain. We ran l copies, or traces, of the chain, starting from l different initial locations drawn

uniformly from the set of all d ×k matrices with orthonormal columns. Let X

(t) be the output of

the i-th copy at iteration t, and let U be the top-k PCA subspace of the data. We used the following

statistic as a function of iteration T :

(T ) =

√



∑

t=1

(t)



2928

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

where ||·||

is the Frobenius norm. The matrix Bingham distribution has mean 0, and hence with

increasing T , the statistic F

(T ) should converge to 0.

−4

−3

−2

−1

0 10000 20000 30000 40000

Time steps

Test statistic value

Trace Number

Trace1

Trace2

Trace3

Trace4

Trace5

Test statistic vs. Gibbs sampler steps

(a)

kddcup

(k = 4)

−3

−2

−1

0 10000 20000 30000 40000

Time steps

Test statistic value

Trace Number

Trace1

Trace2

Trace3

Trace4

Trace5

Test statistic vs. Gibbs sampler steps

(b)

insurance

(k = 11)

Figure 2: Plots of log F

(T ) for ﬁve different traces (values of i) on two different data sets. Figure

2(a) shows log F

(T ) for for k = 4 as a function of iteration T for 40,000 steps of the

Gibbs sampler on the

kddcup

data set. Figure 2(b) shows the same for the

insurance

data set.

Figure 2 illustrates the behavior of the Gibbs sampler. The plots show the value of log F

(T ) as

a function of the Markov chain iteration for 5 different restarts of the MCMC procedure for two data

sets,

kddcup

and

insurance

. The initial starting points were chosen uniformly from the set of all

d ×k matrices with orthonormal columns. The plots show that F

(T ) decreases rapidly after a few

thousand iterations, and is less than 0.01 after T = 20,000 in both cases. log F

(T ) also appears to

have a larger variance for

kddcup

than for

insurance

; this is explained by the fact that the

kddcup

data set has a much larger number of samples, which makes its stationary distribution farther from

the initial distribution of the sampler. Based on these and other simulations, we observed that

the Gibbs sampler converges to F

(t) < 0.01 at t = 20,000 when run on data with a few hundred

dimensions and with k between 5 and 10; we thus chose to run the Gibbs sampler for T = 20,000

timesteps for all the data sets.

Our simulations indicate that the chains converge fairly rapidly, particularly when

A −A

small so that A

is a good approximation to A. Convergence is slower for larger n when the initial

state is chosen from the uniform distribution over all k ×d matrices with orthonormal columns; this

is explained by the fact that for larger n, the stationary distribution is farther in variation distance

from the starting distribution, which results in a longer convergence time.

6.3 Scaling with Data Set Size

We ran three algorithms on these data sets : standard (non-private) PCA, MOD-SULQ, and PPCA.

As a sanity check, we also tried a uniformly generated random projection—since this projection is

data-independent we would expect it to have low utility. We measured the utility q

(U), where U is

the k-dimensional subspace output by the algorithm; q

(U) is maximized when U is the top-k PCA

subspace, and thus this reﬂects how close the output subspace is to the true PCA subspace in terms

2929

CHAUDHURI, SARWATE AND SINHA

of representing the data. Although our theoretical results hold for q

(·), the “energy” q

(·) is more

relevant in practice for larger k.

To investigate how well these different algorithms performed on real data, for each data set

we subsampled data s ets of different sizes n uniformly and ran the algorithms on the subsets. We

chose ε

= 0.1 for this experiment, and for MOD-SULQ we used δ = 0.01. We averaged over 5

such subsets and over several instances of the randomized algorithms (10 restarts for PPCA and 100

for MOD-SULQ and random projections). For each subset and instance we calculated the resulting

utility q

(·) of the output subspace.

Figures 3(a), 3(b), 4(a), and 4(b) show q

(U) as a function of the subsampled data set sizes.

The bars indicate the standard deviation over the restarts (from subsampling the data and random

sampling for privacy). The non-private algorithm achieved q

) for nearly all subset sizes (see

Table 1 for the values). These plots illustrate how additional data can improve the utility of the

output for a ﬁxed privacy level ε

. As n increases, the dashed blue line indicating the utility of

PPCA begins to approach q

), the utility of the optimal subspace.

These experiments also show that the performance of PPCA is signiﬁcantly better than that of

MOD-SULQ, and MOD-SULQ produces subspaces whose utility is on par with randomly choosing a

subspace. The only exception to this latter point is

localization

, We believe this is because d is

much lower for this data set (d = 44), which shows that for low dimension and large n, MOD-SULQ

may produce subspaces with reasonable utility. Furthermore, MOD-SULQ is simpler and hence runs

faster than PPCA, which requires running the Gibbs sampler past the burn-in time. Our theoretical

results suggest that the performance of differentially private PCA cannot be signiﬁcantly impr oved

over the performance of PPCA but since those results hold for k = 1 they do not immediately apply

here.

6.4 Effect of Privacy on Classiﬁcation

A common use of a dimension reduction algorithm is as a precursor to classiﬁcation or clustering;

to evaluate the effectiveness of the different algorithms, we projected the data onto the subspace

output by the algorithms, and measured the classiﬁcation accuracy us ing the projected data. The

classiﬁcation results are summarized in Table 2. We chose the normal vs. all classiﬁcation task in

kddcup99

, and the falling vs. all classiﬁcation task in

localization

We used a linear SVM for

all classiﬁcation tasks, which is implemented by libSVM (Chang and Lin, 2011).

For the classiﬁcation experiments, we used half of the data as a holdout set for computing a

projection subspace. We projected the classiﬁcation data onto the subspace computed based on the

holdout set; 10% of this data was used for training and parameter-tuning, and the rest for testing. We

repeated the classiﬁcation process 5 times for 5 different (random) projections for each algorithm,

and then ran the entire procedure over 5 random permutations of the data. Each value in the ﬁgure

is thus an average over 5 ×5 = 25 rounds of classiﬁcation.

The classiﬁcation results s how that our algorithm performs almost as well as non-private PCA

for classiﬁcation in the top-k PCA subspace, while the performance of MOD-SULQ and random

projections are a little worse. The classiﬁcation accuracy while using MOD-SULQ and random pro-

jections also appears to have higher variance compared to our algorithm and non-private PCA. This

1. For the other two data sets,

census

and

insurance

, the classiﬁcation accuracy of linear SVM after (non-private)

PCAs is as low as always predicting the majority label.

2930

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

0.0

0.2

0.4

0.6

0 50000 100000 150000

sample size

variance in output subspace

Algorithm

Nonprivate

PPCA

Random

SULQ

Utility versus sample size for Census

(a)

census

(k = 8)

0.0

0.2

0.4

0.6

2e+04 4e+04 6e+04 8e+04 1e+05

sample size

variance in output subspace

Algorithm

Nonprivate

PPCA

Random

SULQ

Utility versus sample size for KDDcup99

(b)

kddcup

(k = 4)

Figure 3: Plot of the unnormalized utility q

(U) versus the sample size n, averaged over random

subsets of the data and randomness in the algorithms. The bars are at one standard de-

viation about the mean. The top red line is the PCA algorithm without privacy con-

straints. The dashed line in blue is the utility for PPCA. The green and purple dashed

lines are nearly indistinguishable and represent the utility from random projections and

MOD-SULQ, respectively. In these plots ε

= 0.1 and δ = 0.01.

is because the projections tend to be farther from the top-k PCA subspace, making the classiﬁcation

error more variable.

2931

CHAUDHURI, SARWATE AND SINHA

0.2

0.3

0.4

0.5

2e+04 4e+04 6e+04 8e+04 1e+05

sample size

variance in output subspace

Algorithm

Nonprivate

PPCA

Random

SULQ

Utility versus sample size for Localization

(a)

localization

(k = 10)

0.1

0.2

0.3

0.4

0.5

2000 4000 6000 8000 10000

sample size

variance in output subspace

Algorithm

Nonprivate

PPCA

Random

SULQ

Utility versus sample size for Insurance

(b)

insurance

(k = 11)

Figure 4: Plot of the unnormalized utility q

(U) versus the sample size n, averaged over random

subsets of the data and randomness in the algorithms . The bars are at one standard devi-

ation about the mean. The top red line is the PCA algorithm without privacy constraints.

The dashed line in blue is the utility for PPCA. The green and purple dashed lines are

nearly indistinguishable for

insurance

but diverge f or

localization

—they represent

the utility from random projections and MOD-SULQ, respectively. In these plots ε

= 0.1

and δ = 0.01.

6.5 Effect of the Privacy Requirement

How to choose ε

is important open question for many applications. We wanted to understand

the impact of varying ε

on the utility of the subspace. We did this via a synthetic data set—

we generated n = 5,000 points drawn from a Gaussian distribution in d = 10 with mean 0 and

covariance matrix with eigenvalues

{0.5,0.30,0.04,0.03,0.02,0.01,0.004,0.003, 0.001, 0.001}. (27)

2932

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

kddcup99 localization

Non-private PCA 98.97 ±0.05 100 ±0

PPCA 98.95 ±0.05 100 ±0

MOD-SULQ 98.18 ±0.65 97.06 ±2.17

Random Projections 98.23 ±0.49 96.28 ±2.34

Table 2: Classiﬁcation accuracy in the k-dimensional subspaces for

kddcup99

(k = 4), and

localization

(k = 10) in the k-dimensional subspaces reported by the different algo-

rithms.

In this case the space spanned by the top two eigenvectors has most of the energy, so we chose k = 2

and plotted the utility q

(·) for non-private PCA, MOD-SULQ with δ = 0.05, and PPCA with a burn-

in time of T = 1000. We drew 100 samples from each privacy- preserving algorithm and the plot of

the average utility versus ε

is shown in Figure 5. The privacy requirement relaxes as ε

increases,

and both MOD-SULQ and PPCA approach the utility of PCA without privacy constraints. However,

for moderate ε

PPCA still captures most of the utility, whereas the gap between MOD-SULQ and

PPCA becomes quite large.

●

0.2

0.4

0.6

0.8

0.0 0.5 1.0 1.5 2.0

Privacy parameter ε

Utility q

(U)

Alg

Non−Private

SULQ

PPCA 1000

Utility versus privacy parameter

Figure 5: Plot of q

(U) versus ε

for a synthetic data set with n = 5,000, d = 10, and k = 2. The

data has a Gaussian distribution whose covariance matrix has eigenvalues given by (27).

7. Conclusion

In this paper we investigated the theoretical and empirical performance of differentially private

approximations to PCA. Empirically, we showed that MOD-SULQ and PPCA differ markedly in

how well they approximate the top-k subspace of the data. The reason for this, theoretically, is that

the sample complexity of MOD-SULQ scales as d

3/2

√

logd whereas PPCA scales as d. Because

PPCA uses the exponential mechanism with q

(·) as the utility function, it is not surprising that it

2933

CHAUDHURI, SARWATE AND SINHA

performs well. However, MOD-SULQ often had a performance comparable to random pr ojections,

indicating that the real data sets had too f ew points for it to be effective. We furthermore s howed

that PPCA is nearly optimal, in that any differentially private approximation to PCA must use Ω(d)

samples.

Our investigation brought up many interesting issues to consider for future work. The descrip-

tion of differentially private algorithms assume an ideal model of computation : real systems require

additional security assumptions that have to be veriﬁed. The difference between truly random noise

and pseudorandomness and the effects of ﬁnite precision can lead to a gap between the theoretical

ideal and practice. Numerical optimization methods used in some privacy methods (Chaudhuri et al.,

2011) can only produce approximate solutions; they may also have complex termination conditions

unaccounted for in the theoretical analysis. MCMC sampling is similar : if we can guarantee that

the sampler’s distribution has total variation distance δ from the Bingham distribution, then sampler

can guarantee (ε

,δ) differential privacy. However, we do not yet have such analytical bounds on

the convergence rate; we must determine the Gibbs sampler’s convergence empirically. Accounting

for these effects is an interesting avenue for future work that can bring theory and practice together.

For PCA more speciﬁcally, it would be interesting to extend the sample complexity results to

general k > 1. For k = 1 the utility functions q

(·) and q

(·) are related, but for larger k it is not

immediately clear what metric best captures the idea of “approximating” the top-k PCA subspace.

For minimax lower bounds, it may be possible to construct a packing with respect to a general utility

metric. For example, Kapralov and Talwar (2013) use properties of packings on the Grassmann

manifold. Upper bounds on the sample complexity for PPCA may be possible by performing a more

careful analysis of the Bingham distribution or by ﬁnding better approximations for its normalizing

constant. Developing a framework for analyzing general approximations to PCA may be of interest

more broadly in machine learning.

Acknowledgments

The authors would like to thank the reviewers for their detailed comments, which greatly improved

the quality and readability of the manuscript, and the action editor, Gabor Lugosi, for his patience

during the revision process. KC and KS would like to thank NIH for res earch support under U54-

HL108460. The experimental results were made possible by support from the UCSD FWGrid

Project, NSF Research Infrastructure Grant Number EIA-0303622. ADS was supported in part by

the California Institute for Telecommunications and Information Technology (CALIT2) at UC San

Diego.

Appendix A. A Packing Lemma

The proof of this lemma is relatively straightforward. The following is a slight reﬁnement of a

lemma due to Csisz

ar and Narayan (1988, 1991).

Lemma 16 Let Z

,...,Z

be arbitrary random variables and let f

,...,Z

) be arbitrary

with 0 ≤ f

≤ 1, i = 1,2,...,N. Then the condition

E [ f

,...,Z

)|Z

,...,Z

i−1

] ≤ a

a.s., i = 1,2,...,N

2934

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

implies that

∑

i=1

,...,Z

) > t

≤ exp

−Nt(log 2) +

∑

i=1

Proof First apply Markov’s inequality:

∑

i=1

,...,Z

) > t

= P



∑

i=1

,...,Z

)

> 2



≤ 2

−Nt

∑

i=1

,...,Z

)

≤ 2

−Nt

∑

N−1

i=1

,...,Z

)

,...,Z

)

,...,Z

N−1

Now note that for b ∈ [0,1] we have 2

≤ 1 + b ≤ e

, so

,...,Z

)

,...,Z

N−1

≤ E [1 + f

,...,Z

)|Z

,...,Z

N−1

]

≤ (1 + a

)

≤ exp(a

Therefore

∑

i=1

,...,Z

) > t

≤ exp(−Nt(log 2) + a

∑

N−1

i=1

,...,Z

)

Continuing in the same way yields

∑

i=1

,...,Z

) > t

≤ exp

−Nt(log 2) +

∑

i=1

The second technical lemma (Csisz

ar and Narayan, 1991, Lemma 2) is a basic result about the

distribution of inner product between a r andomly chosen unit vector and any other ﬁxed vector. It

is a consequence of a result of Shannon (Shannon, 1959) on the distribution of the angle between a

uniformly distributed unit vector and a ﬁxed unit vector.

Lemma 17 (Lemma 2 of Csisz

ar and Narayan (1991)) Let U be a random vector distributed uni-

formly on the unit sphere S

d−1

in R

. Then for every unit vector u on this sphere and any φ ∈

[(2πd)

−1/2

,1), the following inequality holds:

P (hU,ui ≥ φ) ≤ (1 −φ

)

(d−1)/2

Lemma 18 (Packing set on the unit sphere) Let the dimension d and parameter φ ∈[(2πd)

−1/2

,1)

be given. For N and t satisfying

−Nt(log 2) + N(N −1)(1 −φ

)

(d−1)/2

< 0 (28)

2935

CHAUDHURI, SARWATE AND SINHA

there exists a set of K = ⌊(1 −t)N⌋ unit vectors C such that for all distinct pairs µ,ν ∈ C ,

hµ,νi

< φ. (29)

Proof The goal is to generate a set of N unit vectors on the surface of the sphere S

d−1

such that

they have large pairwise dis tances or, equivalently, small pairwise inner products. To that end, deﬁne

i.i.d. random variables Z

,...,Z

uniformly distributed on S

d−1

and functions

,...,Z

) = 1





> φ, j < i



That is, f

= 1 if Z

has large inner product with any Z

for j < i. The conditional expectation, by a

union bound and Lemma 17, is

E [ f

,...,Z

)|Z

,...,Z

i−1

] ≤ 2(i −1)(1 −φ

)

(d−1)/2

Let a

= 2(i −1)(1 −φ

)

(d−1)/2

. Then

∑

i=1

= N(N −1)(1 −φ

)

(d−1)/2

Then Lemma 16 shows

∑

i=1

,...,Z

) > t

≤ exp



−Nt(log 2) + N(N −1)(1 −φ

)

(d−1)/2



This inequality implies that as long as

−Nt(log 2) + N(N −1)(1 −φ

)

(d−1)/2

< 0,

then there is a ﬁnite probability that {Z

} contains a subset {Z

′

} of size ⌊(1 −t)N⌋ such that



′



< φ for all (i, j). Therefore such a set exists.

A simple setting of the parameters gives the packing in Lemma 11.

Proof Applying Lemma 18 yields a set of K vectors C satisfying (28) and (29). To get a simple

bound that’s easy to work with, we can set

−Nt(log 2) + N(N −1)(1 −φ

)

(d−1)/2

= 0,

and ﬁnd an N close to this. Setting ψ = (1 −φ

)

(d−1)/2

, and solving for N we see

N = 1 +

t log2

2ψ

Now setting K =

t(1−t)

2ψ

and t = 1/2 gives (12). So there exists a set of K vectors on S

d−1

whose

pairwise inner products are smaller than φ.

The maximum set of points that can be selected on a sphere of dimension d such that their

pairwise inner products are bounded by φ is an open question. These sets are sometimes referred to

as spherical codes (Conway and Sloane, 1998) because they correspond to a set of signaling points

of dimension d that can be perfectly decoded over a channel with bounded noise. The bounds here

are from a probabilistic construction and can be tightened for smaller d. However, in terms of

scaling with d this construction is essentially optimal (Shannon, 1959).

2936

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

References

Rakesh Agrawal and Ramakrishnan Sr ikant. Privacy-preserving data mining. SIGMOD Record,

29(2):439–450, 2000. ISSN 0163-5808. doi: 10.1145/335191.335438. URL

http://dx.doi.

org/10.1145/335191.335438

Kevin Bache and Moshe Lichman. UCI machine learning repository, 2013. URL

http://archive.

ics.uci.edu/ml

Keith M. Ball. An elementary introduction to modern convex geometry. In S. Levy, editor, Flavors

of Geometry, volume 31 of Mathematical Sciences Research Institute Publications, pages 1–58.

Cambridge University Press, 1997.

Boaz Barak, Kamalika Chaudhuri, Cynthia Dwork, Satyen Kale, Frank McSherry, and Kunal Tal-

war. Privacy, accuracy, and consistency too: a holistic solution to contingency table release.

In Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles

of Database Systems (PODS ’07), pages 273–282, New York, NY, USA, 2007. ACM. doi:

10.1145/1265530.1265569. URL

http://dx.doi.org/10.1145/1265530.1265569

Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. T he Johnson-Lindenstrauss Trans-

form itself preserves differential privacy. In IEE E 53rd Annual Symposium on Foundations of

Computer Science (FOCS), pages 410–419, October 2012. doi: 10.1109/FOCS.2012.67. URL

http://dx.doi.org/10.1109/FOCS.2012.67

Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim. Practical privacy: the SuLQ

framework. In Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Sympo-

sium on Principles of Database Systems (PODS ’05), pages 128–138, New York, NY, USA,

2005. ACM. doi: 10.1145/1065167.1065184. URL

http://dx.doi.org/10.1145/1065167.

1065184

Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to non-interactive

database privacy. In R. E. Ladner and C. Dwork, editors, Proceedings of the 40th Annual

ACM Symposium on Theory of Computing (STOC ’08), pages 609–618, New York, NY, USA,

2008. ACM. doi: 10.1145/1374376.1374464. URL

http://dx.doi.org/10.1145/1374376.

1374464

Stephen P. Brooks. Markov chain Monte Carlo method and its application. Journal of the Royal

Statistical Society. Series D (The Statistician), 47(1):69–100, April 1998. ISSN 00390526. doi:

10.1111/1467-9884.00117. URL

http://dx.doi.org/10.1111/1467-9884.00117

Stephen P. Brooks and Andrew Gelman. General methods for monitoring convergence of iterative

simulations. Journal of Computational and Graphical Statistics, 7(4):434–455, December 1998.

doi: 10.2307/1390675. URL

http://dx.doi.org/10.2307/1390675

Stephen P. Brooks and Gareth O. Roberts. Convergence assessment techniques for Markov chain

Monte Carlo. Statistics and Computing, 8(4):319–335, December 1998. doi: 10.1023/A:

1008820505350. URL

http://dx.doi.org/10.1023/A:1008820505350

2937

CHAUDHURI, SARWATE AND SINHA

Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM

Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at

http://www.csie.ntu.edu.tw/

cjlin/libsvm

Kamalika Chaudhuri and Daniel Hsu. Sample complexity bounds for differentially pr ivate learning.

In Sham Kakade and Ulrike von Luxburg, editors, Proceedings of the 24th Annual Conference

on Learning Theory (COLT ’11), volume 19 of JMLR Workshop and Conference Proceedings,

pages 155–186, Budapest, Hungary, June 2011. URL

http://www.jmlr.org/proceedings/

papers/v19/chaudhuri11a/chaudhuri11a.pdf

Kamalika Chaudhuri and Daniel Hsu. Convergence rates for differentially private statistical estima-

tion. In John Langford and Joelle Pineau, editors, Proceedings of the 29th International Confer-

ence on Machine Learning (ICML-12), ICML ’12, pages 1327–1334, New York, NY, USA, July

2012. Omnipress. URL

http://icml.cc/2012/papers/663.pdf

Kamalika Chaudhuri and Nina Mishra. When random sampling preserves privacy. In Cynthia

Dwork, editor, Advances in Cryptology - CRYPTO 2006, volume 4117 of Lecture Notes in

Computer Science, pages 198–213, Berlin, Heidelberg, August 2006. Springer-Verlag. doi:

10.1007/11818175

12. URL

http://dx.doi.org/10.1007/11818175_12

Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical

risk minimization. Journal of Machine Learning Research, 12:1069–1109, March 2011. URL

http://jmlr.csail.mit.edu/papers/v12/chaudhuri11a.html

Kamalika Chaudhuri, Anand D. Sarwate, and Kaushik Sinha. Near-optimal differentially private

principal components. In P. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou, and K. Q.

Weinberger, editors, Advances in Neural Infor mation Processing Systems 25, pages 998–1006,

2012. URL

http://books.nips.cc/papers/files/nips25/NIPS2012_0482.pdf

Yasuko Chikuse. Statistics on Special Manifolds. Number 174 in Lecture Notes in Statistics.

Springer, New York, 2003.

John H. Conway and Neil J. A. Sloane. Sphere Packing, Lattices and Groups. Springer-Verlag, New

York, 1998.

Mary Kathryn Cowles and Bradley P. Carlin. Markov Chain Monte Carlo convergence diagnostics:

A comparative review. Journal of the American Statistical Association, 91(434):883, June 1996.

ISSN 01621459. doi: 10.2307/2291683. URL

http://dx.doi.org/10.2307/2291683

Imre Csisz

ar and Pr akash Narayan. The capacity of the arbitrarily varying channel revisited :

Positivity, constraints. IEEE Transactions on Information Theor y, 34(2):181–193, 1988. doi:

10.1109/18.2627. URL

http://dx.doi.org/10.1109/18.2627

Imre Csisz

ar and Prakash Narayan. Capacity of the Gaussian arbitrarily varying channel. IEEE

Transactions on Information Theory, 37(1):18–26, 1991. doi: 10.1109/18.61125. URL

http:

//dx.doi.org/10.1109/18.61125

Randal Douc, Eric Moulines, and Jeffrey S. Rosenthal. Quantitative bounds on convergence of time-

inhomogeneous Markov chains. The Annals of Applied Probability, 14(4):1643–1665, November

2938

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

2004. ISSN 1050-5164. doi: 10.1214/105051604000000620. URL

http://dx.doi.org/10.

1214/105051604000000620

Cynthia Dwork and Jing Lei. Differential privacy and robust statistics. In Proceedings of the

41st Annual ACM Symposium on Theory of Computing (STOC ’09), pages 371–380, New York,

NY, USA, 2009. ACM. doi: 10.1145/1536414.1536466. URL

http://dx.doi.org/10.1145/

1536414.1536466

Cynthia Dwork and Adam Smith. Differential privacy for statistics: What we know and what

we want to learn. Journal of Pr ivacy and Conﬁdentiality, 1( 2):135–154, 2009. URL

http:

//repository.cmu.edu/jpc/vol1/iss2/2

Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our

data, ourselves: Privacy via distributed noise generation. In Serge Vaudenay, editor, Advances

in Cryptology - EUROCRYPT 2006, volume 4004 of Lecture Notes in Computer Science, pages

486–503, Berlin, Heidelberg, 2006a. Springer-Verlag. doi: 10.1007/11761679

29. URL

http:

//dx.doi.org/10.1007/11761679_29

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in

private data analysis. In Shai Halevi and Tal Rabin, editors, Theory of Cryptography, volume 3876

of Lecture Notes in Computer Science, pages 265–284, Berlin, Heidelberg, March 4–7 2006b.

Springer. doi: 10.1007/11681878

14. URL

http://dx.doi.org/10.1007/11681878_14

Salaheddine El Adlouni, Anne-Catherine Favre, and Bernard Bob

ee. Comparison of methodologies

to assess the convergence of Markov chain Monte Carlo methods. Computational Statistics &

Data Analysis, 50(10):2685–2701, June 2006. ISSN 01679473. doi: 10.1016/j.csda.2005.04.018.

URL

http://dx.doi.org/10.1016/j.csda.2005.04.018

Alexandre Evﬁmievski, J ohannes Gehrke, and Ramakrishnan Srikant. Limiting privacy breaches in

privacy preserving data mining. In Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-

SIGART Symposium on Principles of Database Systems (PODS), pages 211–222, 2003. doi:

10.1145/773153.773174. URL

http://dx.doi.org/10.1145/773153.773174

Arik Fr iedman and Assaf Schuster. Data mining with differential privacy. In Proceedings of the

16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD

’10), pages 493–502, New York, NY, USA, 2010. ACM. doi: 10.1145/1835804.1835868. URL

http://dx.doi.org/10.1145/1835804.1835868

Benjamin C. M. Fung, Ke Wang, Rui Chen, and Philip S. Yu. Privacy-preserving data publishing:

A survey of recent developments. ACM Computing Surveys, 42(4):14:1–14:53, June 2010. doi:

10.1145/1749603.1749605. URL

http://dx.doi.org/10.1145/1749603.1749605

Srivatsava Ranjit Ganta, Shiva Prasad Kasiviswanathan, and Adam Smith. Composition attacks and

auxiliary information in data privacy. In P roceedings of the 14th ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining (KDD ’08), pages 265–273, New York,

NY, USA, 2008. ACM. doi: 10.1145/1401890.1401926. URL

http://dx.doi.org/10.1145/

1401890.1401926

2939

CHAUDHURI, SARWATE AND SINHA

Shuguo Han, Wee Keong Ng, and P.S. Yu. Privacy-preserving singular value decomposition. In

Proceedings of the 25th IEEE International Conference on Data Engineering (ICDE), pages

1267 –1270, 2009. doi: 10.1109/ICDE.2009.217. URL

http://dx.doi.org/10.1109/ICDE.

2009.217

Moritz Hardt and Aaron Roth. Beating randomized response on incoherent matrices. In Proceedings

of the 44th Annual ACM Symposium on Theory of Computing (STOC ’12), pages 1255–1268,

New York, NY, USA, 2012. ACM. doi: 10.1145/2213977.2214088. URL

http://dx.doi.org/

10.1145/2213977.2214088

Moritz Hardt and Aaron Roth. Beyond worst-case analysis in private singular vector computation.

In Proceedings of the 45th Annual ACM Symposium on Theory of Computing (STOC ’13), pages

331–340, New York, NY, USA, June 2013. ACM. doi: 10.1145/2488608.2488650. URL

http:

//dx.doi.org/10.1145/2488608.2488650

Michael Hay, Chao Li, Gerome Miklau, and David Jensen. Accurate estimation of the degree

distribution of private networks. In 2009 Ninth IEEE International Conference on Data Mining

(ICDM ’09), pages 169–178, 2009. doi: 10.1109/ICDM.2009.11. URL

http://dx.doi.org/

10.1109/ICDM.2009.11

Peter D. Hoff. Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications

to multivariate and relational data. Journal of Computational and Graphical Statistics, 18(2):

438–456, 2009. ISSN 1061-8600. doi: 10.1198/jcgs.2009.07177. URL

http://dx.doi.org/

10.1198/jcgs.2009.07177

Galin L. Jones and James P. Hobart. Honest exploration of intractable probability distributions

via Markov Chain Monte Carlo. Statistical Science, 16(4):312–334, 2001. doi: 10.1214/ss/

1015346317. URL

http://dx.doi.org/10.1214/ss/1015346317

Galin L. Jones and James P. Hobart. Sufﬁcient burn-in for Gibbs samplers for a hierarchical

random effects model. The Annals of Statistics, 32(2):784–817, April 2004. doi: 10.1214/

009053604000000184. URL

http://dx.doi.org/10.1214/009053604000000184

stjan Kalu

za, Violeta Mirchevska, Erik Dovgan, Mitja Lu

strek, and Matja

z Gams. An agent-

based approach to care in independent living. In B. de Ruyter et al., editor, International

Joint Conference on Ambient Intelligence (AmI-10), volume 6439/2010 of Lecture Notes in

Computer Science, pages 177–186. Springer-Verlag, Berlin Heidelberg, 2010. doi: 10.1007/

978-3-642-16917-5

18. URL

http://dx.doi.org/10.1007/978-3-642-16917-5_18

Mikhail Kapralov and Kunal Talwar. On differentially private low rank approximation. In Proceed-

ings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’13),

pages 1395–1414, New Orleans, LA, USA, January 2013.

Shiva Prasad Kasiviswanathan and Adam Smith. A note on differential privacy: Deﬁning resistance

to arbitrary side information. Technical Report arXiv:0803.3946v1 [cs.CR], ArXiV, March 2008.

URL

http://arxiv.org/abs/0803.3946

Krishnaram Kenthapadi, Aleksandra Korolova, Ilya Mironov, and Nina Mishra. Privacy via the

Johnson-Lindenstrauss transform. Journal of Privacy and Conﬁdentiality, 5(1):39–71, 2013.

URL

http://repository.cmu.edu/jpc/vol5/iss1/2

2940

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

John E. Kolassa. Convergence and accuracy of Gibbs sampling for conditional distributions in

generalized linear models. The Annals of Statistics, 27(1):129–142, 1999. doi: 10.1214/aos/

1018031104. URL

http://dx.doi.org/10.1214/aos/1018031104

John E. Kolassa. Explicit bounds for geometric covergence of Markov chains. Journal of Applied

Probability, 37(3):642–651, 2000. doi: 10.1239/jap/1014842825. URL

http://dx.doi.org/

10.1239/jap/1014842825

Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. Closeness: A new privacy measure for

data publishing. IEEE Transactions on Knowledge and Data Engineering, 22(7):943–956, 2010.

doi: 10.1109/TKDE.2009.139. URL

http://dx.doi.org/10.1109/TKDE.2009.139

Kun Liu, Hillol Kargupta, and Jessica Ryan. Random projection-based multiplicative data perturba-

tion for privacy preserving distributed data mining. IEEE Transactions on Knowledge and Data

Engineering, 18(1):92–106, 2006. doi: 10.1109/TKDE.2006.14. URL

http://dx.doi.org/

10.1109/TKDE.2006.14

Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer, and Muthuramakrishnan Venkitasubra-

maniam. l-diversity: Privacy beyond k-anonymity. In Proceedings of the 22nd IEEE Interna-

tional Conference on Data Engineering (ICDE), page 24, 2006. doi: 10.1109/ICDE.2006.1.

URL

http://dx.doi.org/10.1109/ICDE.2006.1

Ashwin Machanavajjhala, Daniel Kifer, John M. Abowd, Johannes Gehrke, and Lars Vilhuber.

Privacy: Theory meets practice on the map. In IEEE 24th International Conference on Data

Engineering (ICDE), pages 277–286, April 2008. doi: 10.1109/ICDE.2008.4497436. URL

http://dx.doi.org/10.1109/ICDE.2008.4497436

Frank McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data

analysis. In SIGMOD Conference, pages 19–30, 2009. doi: 10.1145/1559845.1559850. URL

http://dx.doi.org/10.1145/1559845.1559850

Frank McSherry and Ilya Mironov. Differentially private recommender systems: Building pri-

vacy into the netﬂix prize contenders. In Proceedings of the 15th ACM SIGKDD Interna-

tional Conference on Knowledge Discovery and Data (KDD), pages 627–636, 2009. doi:

10.1145/1557019.1557090. URL

http://dx.doi.org/10.1145/1557019.1557090

Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In 48th Annual IEEE

Symposium on Foundations of Computer Science (FOCS ’07), pages 94–103, October 2007. doi:

10.1109/FOCS.2007.41. URL

http://dx.doi.org/10.1109/FOCS.2007.41

Ilya Mironov. On signiﬁcance of the least signiﬁcant bits for differential privacy. In Proceedings

of the ACM Conference on Computer and Communications Security (CCS ’12), pages 650–661,

New York, NY, USA, 2012. ACM. doi: 10.1145/2382196.2382264. URL

http://dx.doi.org/

10.1145/2382196.2382264

Noman Mohammed, Rui Chen, Benjamin C. M. Fung, and Philip S. Yu. Differentially private data

release for data mining. In Proceedings of the 17th ACM SIGKDD International Conference

on Knowledge Discovery and Data Mining (KDD ’11), pages 493–501, New York, NY, USA,

2941

CHAUDHURI, SARWATE AND SINHA

2011. ACM. doi: 10.1145/2020408.2020487. URL

http://dx.doi.org/10.1145/2020408.

2020487

Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth s ensitivity and sampling in pri-

vate data analysis. In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of

Computing (STOC ’07), pages 75–84, New York, NY, USA, 2007. ACM. doi: 10.1145/1250790.

1250803. URL

http://dx.doi.org/10.1145/1250790.1250803

Gareth O. Roberts. Bounds on regeneration times and convergence rates for Markov chains.

Stochastic Processes and their Applications, 80(2):211–229, April 1999. ISSN 03044149.

doi: 10.1016/S0304-4149(98)00085-4. URL

http://dx.doi.org/10.1016/S0304-4149(98)

00085-4

Gareth O. Roberts and Sujit K. Sahu. Approximate predetermined convergence properties of the

Gibbs sampler. Journal of Computational and Graphical Statistics, 10(2):216–229, June 2001.

ISSN 1061-8600. doi: 10.1198/10618600152627915. URL

http://dx.doi.org/10.1198/

10618600152627915

Jeffrey S. Rosenthal. Minorization conditions and convergence rates for Markov Chain Monte Carlo.

Journal of the American Statistical Association, 90(430):558–566, June 1995. I SSN 01621459.

doi: 10.2307/2291067. URL

http://dx.doi.org/10.2307/2291067

Benjamin I. P. Rubinstein, Peter L. Bartlett, Ling Huang, and Nina Taft. Learning in a large function

space: Privacy-preserving mechanisms for SVM learning. Journal of Privacy and Conﬁdentiality,

4(1):65–100, 2012. URL

http://repository.cmu.edu/jpc/vol4/iss1/4/

Claude. E. Shannon. Probability of error for optimal codes in a Gaussian channel. Bell System

Technical Journal, 38:611–656, 1959.

Adam Smith. Privacy-preserving statistical estimation with optimal convergence rates. In Pro-

ceedings of the 43rd Annual ACM Symposium on Theory of Computing (STOC ’11), pages

813–822, New York, NY, USA, 2011. ACM. doi: 10.1145/1993636.1993743. URL

http:

//dx.doi.org/10.1145/1993636.1993743

Gilbert W. Stewart. On the early history of the s ingular value decomposition. SIAM Review, 35(4):

551–566, December 1993.

Latanya Sweeney. k-anonymity: a model for protecting pr ivacy. International Journal on

Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557–570, October 2002. doi:

10.1142/S0218488502001648. URL

http://dx.doi.org/10.1142/S0218488502001648

Peter van der Putten and Maarten van Someren. CoIL Challenge 2000: The Insurance Company

Case, 2000. URL

http://www.liacs.nl/

putten/library/cc2000/

. Leiden Institute of

Advanced Computer Science Technical Report 2000-09.

Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy. Journal of

the American Statistical Association, 105(489):375–389, 2010. doi: 10.1198/jasa.2009.tm08651.

URL

http://dx.doi.org/10.1198/jasa.2009.tm08651

2942

A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS

Oliver Williams and Frank McSherry. Probabilistic inference and differential privacy. In J. Lafferty,

C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural

Information Processing Systems 23, pages 2451–245, 2010. URL

http://books.nips.cc/

papers/files/nips23/NIPS2010_1276.pdf

Bin Yu. Assouad, Fano, and Le Cam. In David Pollard, Erik Torgersen, and Grace L. Yang, editors,

Festschrift for Lucien Le Cam, Research Papers in Probability and Statistics, chapter 29, pages

423–425. Springer-Verlag, 1997.

Justin Z. Zhan and Stan Matwin. Privacy-preserving support vector machine classiﬁcation. Inter-

national Journal of Intelligent Information and Database Systems, 1(3/4):356–385, 2007. doi:

10.1504/IJIIDS.2007.016686.

Shuheng Zhou, Katrina Ligett, and Larry Was serman. Differential privacy with compression. In

Proceedings of the 2009 International Symposium on Information Theory (ISIT), pages 2718–

2722, Seoul, South Korea, June–July 2009. doi: 10.1109/ISIT.2009.5205863.

2943