Journal of Machine Learning Research 14 (2013) 2905-2943 Submitted 6/12; Revised 5/13; Published 9/13
A Near-Optimal Algorithm for Differentially-Private Principal
Components
Kamalika Chaudhuri KCHAUDHURI@UCSD.EDU
Department of Computer Science and Engineering
University of California, San Diego
9500 Gilman Drive MC 0404
La Jolla, CA, 92093-0404, USA
Anand D. Sarwate ASARWATE@ALUM.MIT.EDU
Toyota Technological Institute at Chicago
6045 S. Kenwood Ave
Chicago, IL 60637, USA
Kaushik Sinha KAUSHIK.SINHA@WICHITA.EDU
Department of Electrical Engineering and Computer Science
Wichita State Universi ty
1845 Fairmount ( Campus Box 83)
Wichita, KS 67260-0083, USA
Editor: Gabor Lugosi
Abstract
The principal components analysis (PCA) algorithm is a standard tool for identifying good low-
dimensional approximations to high-dimensional data. Many data sets of interest contain private or
sensitive information about individuals. Algorithms which operate on such data should be sensitive
to the privacy risks in publishing their outputs. Differential privacy is a framework for developing
tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and
empirical performance of differentially private approximations to PCA and propose a new method
which explicitly optimizes the utility of the output. We show that the sample complexity of the
proposed method differs from the existing procedure in the scaling with the data dimension, and
that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results,
showing that on real data there is a large performance gap between the existing method and our
method.
Keywords: differential privacy, principal components analysis, dimension reduction
1. Introduction
Dimensionality reduction is a fundamental tool for understanding complex data sets that arise in
contemporary machine learning and data mining applications. Even though a single data point
can be represented by hundreds or even thousands of features, the phenomena of interest are often
intrinsically low-dimensional. By reducing the “extrinsic” dimension of the data to its “intrinsic” di-
mension, analysts can discover important structural relationships between features, more efficiently
. A preliminary version of this work appeared at the Neural Information Processing Systems conference (Chaudhuri
et al., 2012). This full version contains more experimental details, full proofs, and additional discussion.
c
2013 Kamalika Chaudhuri, Anand D. Sarwate and Kaushik Sinha.
CHAUDHURI, SARWATE AND SINHA
use the transformed data for learning tasks such as classification or regression, and greatly reduce
the space required to store the data. One of the oldest and most classical methods for dimensionality
reduction is principal components analysis (PCA), which computes a low-rank approximation to the
second moment matrix A of a s et of points in R
d
. The rank k of the approximation is chosen to be
the intrinsic dimension of the data. We view this procedure as specifying a k-dimensional subspace
of R
d
.
Much of today’s machine-learning is per formed on the vast amounts of personal information
collected by private companies and government agencies about individuals: examples include user
or customer behaviors, demographic surveys, and test results from experimental subjects or pa-
tients. These data sets contain sensitive information about individuals and typically involve a large
number of features. It is therefore important to design machine-learning algorithms which discover
important structural relationships in the data while taking into account its sensitive nature. We study
approximations to PCA which guarantee differential privacy, a cryptographically motivated defini-
tion of privacy (Dwork et al., 2006b) that has gained significant attention over the past few years in
the machine-learning and data-mining communities (Machanavajjhala et al., 2008; McSherry and
Mironov, 2009; McSherry, 2009; Friedman and Schuster, 2010; Mohammed et al., 2011). Differen-
tial privacy measures privacy risk by a parameter ε
p
that bounds the log-likelihood ratio of output
of a (private) algorithm under two databases differing in a single individual.
There are many general tools for providing differential privacy. The sensitivity method due to
Dwork et al. (2006b) computes the desired algorithm (in our case, PCA) on the data and then adds
noise proportional to the maximum change than can be induced by changing a single point in the
data set. The PCA algorithm is very sensitive in this sense because the top eigenvector can change
by 90
by changing one point in the data set. Relaxations such as smoothed sensitivity (Nissim
et al., 2007) are difficult to compute in this setting as well. The SUb Linear Queries (SULQ) method
of Blum et al. (2005) adds noise to the second moment matrix and then runs PCA on the noisy
matrix. As our experiments show, the noise level required by SULQ may severely impact the quality
of approximation, making it impractical for data sets of moderate size.
The goal of this paper is to characterize the problem of differentially private PCA. We assume
that the algorithm is given n data points and a target dimension k and must produce a k-dimensional
subspace that approximates that produced by the standard PCA problem. We propose a new algo-
rithm, PPCA, which is an instance of the exponential mechanism of McSherry and Talwar (2007).
Unlike SULQ, PPCA explicitly takes into account the quality of approximation—it outputs a k-
dimensional subspace which is biased towards subspaces close to the output of PCA. In our case,
the method corresponds to sampling from the matrix Bingham distribution. We implement PPCA
using a Markov Chain Monte Carlo (MCMC) procedure due to Hoff (2009); simulations show that
the subspace produced by PPCA captures more of the variance of A than SULQ. When the MCMC
procedure converges, the algorithm provides differential privacy.
In order to understand the performance gap, we prove sample complexity bounds for the case
of k = 1 for SULQ and PPCA, as well as a general lower bound on the sample complexity for
any differentially private algorithm. We show that the sample complexity scales as (d
3/2
logd)
for SULQ and as O(d) for PPCA. Furthermore, we show that any differentially private algorithm
requires (d) samples. Therefore PPCA is nearly optimal in terms of sample complexity as a
function of data dimension. These theoretical results suggest that our experiments demonstrate the
limit of how well ε
p
-differentially private algorithms can perform, and our experiments show that
this gap should persist for general k. The result seems pessimistic for many applications, because
2906
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
the sample complexity depends on the extrinsic dimension d rather than the intrinsic dimension k.
However, we believe this is a consequence of the fact that we make minimal assumptions on the
data; our results imply that, absent additional limitations on the data set, the sample complexity
differentially private PCA must grow linearly with the data dimension.
There are several interesting open questions suggested by this work. One set of issues is com-
putational. Differentially privacy is a mathematical definition, but algorithms must be implemented
using finite precision machines. Privacy and computation interact in many places, including pseu-
dorandomness, numerical stability, optimization, and in the MCMC procedure we use to implement
PPCA; investigating the impact of approximate sampling is an avenue for future work. A second set
of issues is theoretical—while the privacy guarantees of PPCA hold for all k, our theoretical anal-
ysis of sample complexity applies only to k = 1 in which the distance and angles between vectors
are related. An interesting direction is to develop theoretical bounds for general k; challenges here
are providing the right notion of approximation of PCA, and extending the theor y using packings of
Grassmann or Stiefel manifolds. Finally, in this work we assume k is given to the algorithm, but in
many applications k is chosen after looking at the data. Under differential privacy, the selection of
k itself must be done in a differentially private manner.
1.1 Related Work
Differential privacy was first proposed by Dwork et al. (2006b). There has been an extensive liter-
ature following this work in the computer science theory, machine learning, and databases commu-
nities. A survey of some of the theoretical work can be found in the survey by Dwork and Smith
(2009). Differential privacy has been shown to have strong semantic guarantees (Dwork et al.,
2006b; Kasiviswanathan and Smith, 2008) and is resistant to many attacks (Ganta et al., 2008) that
succeed against alternative definitions of privacy. In particular, so-called syntactic definitions of
privacy (Sweeney, 2002; Machanavajjhala et al., 2006; Li et al., 2010) may be susceptible to attacks
based on side-information about individuals in the database.
There are several general approaches to constructing differentially private approximations to
some desired algorithm or computation. I nput perturbation (Blum et al., 2005) adds noise to the
data prior to performing the desired computation, whereas output perturbation (Dwork et al., 2006b)
adds noise to the output of the desired computation. The exponential mechanism (McSherry and
Talwar, 2007) can be used to perform differentially private selection based on a score function that
measures the quality of different outputs. Objective perturbation (Chaudhuri et al., 2011) adds noise
to the objective function for algorithms which are convex optimizations. These approaches and
related ideas such as Nissim et al. (2007) and Dwork and Lei (2009) have been used to approximate
a variety of statistical, machine learning, and data mining tasks under differential privacy (Barak
et al., 2007; Wasserman and Zhou, 2010; Smith, 2011; McSherry and Mironov, 2009; Williams and
McSherry, 2010; Chaudhuri et al., 2011; Rubinstein et al., 2012; Nis sim et al., 2007; Blum et al.,
2008; McSherry and Talwar, 2007; Friedman and Schuster, 2010; Hardt and Roth, 2012).
This paper deals with the problem of differentially private approximations to PCA. Prior to our
work, the only proposed method for PCA was the Sub-Linear Queries (SULQ) method of Blum
et al. (2005). This approach adds noise to the second moment matrix of the data before calcu-
lating the singular value decomposition. By contrast, our algorithm, PPCA, uses the exponential
mechanism (McSherry and Talwar, 2007) to choose a k-dimensional subspace biased toward those
which capture more of “energy” of the matrix. Subsequent to our work, Kapralov and Talwar (2013)
2907
CHAUDHURI, SARWATE AND SINHA
have proposed a dynamic programming algorithm for differentially private low rank matrix approx-
imation which involves sampling from a distribution induced by the exponential mechanism. The
running time of their algorithm is O(d
6
), where d is the data dimension, and it is unclear how this
may affect its implementation. Hardt and Roth (Hardt and Roth, 2012, 2013) have studied low-rank
matrix approximation under additional incoherence assumptions on the data. In particular, Hardt
and Roth (2012) consider the problem of differentially-private low-rank matrix reconstruction for
applications to sparse matrices; provided certain coherence conditions hold, they provide an algo-
rithm for constructing a rank 2k approximation B to a matrix A such that kA Bk
F
is O(kA A
k
k)
plus some additional terms which depend on d, k and n; here A
k
is the best rank k approximation
to A. Hardt and Roth (2013) show a method for guaranteeing (ε, δ)-differential privacy under an
entry-wise neighborhood condition using the power method for calculating singular values. They,
like Kapralov and Talwar (2013), also prove bounds under spectral norm perturbations, and their
algorithm achieves the same error rates but with running time that is nearly linear in the number of
non-zeros in the data.
In addition to these works, other researchers have examined the interplay between projections
and differential privacy. Zhou et al. (2009) analyze a differentially private data release scheme
where a random linear transformation is applied to data to preserve differential privacy, and then
measures how much this transformation affects the utility of a PCA of the data. One example of a
random linear transformation is random projection, popularized by the Johnson-Lindenstrauss (JL)
transform. Blocki et al. (2012) show that the JL transform of the data preserves differential privacy
provided the minimum singular value of the data matrix is large. Kenthapadi et al. (2013) study
the problem of estimating the distance between data points with differential privacy using a random
projection of the data points.
There has been significant work on other notions of privacy based on manipulating entries within
the database (Sweeney, 2002; Machanavajjhala et al., 2006; Li et al., 2010), for example by reducing
the resolution of certain features to create ambiguities. For more details on these and other alter-
native notions of privacy see Fung et al. (2010) for a survey with more references. An alternative
line of privacy-preserving data-mining work (Zhan and Matwin, 2007) is in the Secure Multiparty
Computation setting; one work (Han et al., 2009) studies privacy-preserving singular value decom-
position in this model. Finally, dimension reduction through random projection has been considered
as a technique for sanitizing data prior to publication (Liu et al., 2006); our work differs from this
line of work in that we offer differential privacy guarantees, and we only release the PCA subspace,
not actual data.
2. Preliminaries
The data given to our algorithm is a set of n vectors D = {x
1
,x
2
,...,x
n
} where each x
i
corresponds
to the private value of one individual, x
i
R
d
, and kx
i
k 1 for all i. Let X = [x
1
,...,x
n
] be the
matrix whose columns are the data vectors {x
i
}. Let A =
1
n
XX
T
denote the d ×d second moment
matrix of the data. The matrix A is positive semidefinite, and has Frobenius norm
k
A
k
F
at most 1.
The problem of dimensionality reduction is to find a “good” low-rank approximation to A. A
popular solution is to compute a rank-k matrix
ˆ
A which minimizes the norm kA
ˆ
Ak
F
, where k is
much lower than the data dimension d. The Schmidt approximation theorem (Stewart, 1993) shows
that the minimizer is given by the singular value decomposition, also known as the PCA algorithm
in some areas of computer science.
2908
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
Definition 1 Suppose A is a positive semidefinite matrix whose first k eigenvalues are distinct. Let
the eigenvalues of A be λ
1
(A) λ
2
(A) ··· λ
d
(A) 0 and let Λ be a diagonal matrix with
Λ
ii
= λ
i
(A). The matrix A decomposes as
A = V ΛV
T
, (1)
where V is an orthonormal matrix of eigenvectors. The top-k PCA subspace of A is the matrix
V
k
(A) = [v
1
v
2
··· v
k
], (2)
where v
i
is the i-th column of V in (1). The k-th eigengap is
k
= λ
k
λ
k+1
.
Given the top-k subspace and the eigenvalue matrix Λ, we can form an approximation A
(k)
=
V
k
(A)Λ
k
V
k
(A)
T
to A, where Λ
k
contains the k largest eigenvalues in Λ. In the special case k = 1
we have A
(1)
= λ
1
(A)v
1
v
T
1
, where v
1
is the eigenvector corresponding to λ
1
(A). We refer to v
1
as
the top eigenvector of the data, and =
1
is the eigengap. For a d ×k matrix
ˆ
V with orthonormal
columns, the quality of
ˆ
V in approximating V
k
(A) can be measured by
q
F
(
ˆ
V ) = tr
ˆ
V
T
A
ˆ
V
. (3)
The
ˆ
V which maximizes q(
ˆ
V ) has columns equal to {v
i
: i [k]}, corresponding to the top-k eigen-
vectors of A.
Our theoretical results on the utility of our PCA approximation apply to the special case k = 1.
We prove results about the inner product between the output vector ˆv
1
and the true top eigenvector
v
1
:
q
A
( ˆv
1
) =
|
hˆv
1
,v
1
i
|
. (4)
The utility in (4) is related to (3). If we write ˆv
1
in the basis spanned by {v
i
}, then
q
F
( ˆv
1
) = λ
1
q
A
( ˆv
1
)
2
+
d
i=2
λ
i
hˆv
1
,v
i
i
2
.
Our proof techniques use the geometric properties of q
A
(·).
Definition 2 A randomized algorithm A(·) is an (ρ,η)-close approximation to the top eigenvector
if for all data sets D of n points we have
P (q
A
(A(D)) ρ) 1 η,
where the probability is taken over A(·).
We study approximations to A to PCA that pr eserve the privacy of the underlying data. The
notion of privacy that we use is differential privacy, which quantifies the privacy guaranteed by a
randomized algorithm A applied to a data set D.
2909
CHAUDHURI, SARWATE AND SINHA
Definition 3 An algorithm A(B) taking values in a set T provides ε
p
-differential privacy if
sup
S
sup
D,D
µ(S | B = D)
µ(S | B = D
)
e
ε
p
,
where the first supremum is over all measurable S T , the second is over all data sets D and D
differing in a single entry, and µ(·|B) is the conditional distribution (measure) on T induced by
the output A(B) given a data set B. The ratio is interpreted to be 1 whenever the numerator and
denominator are both 0.
Definition 4 An algorithm A(B) taking values in a set T provides (ε
p
,δ)-differential privacy if
P (A(D) S) e
ε
p
P
A(D
) S
+ δ,
for all measurable S T and all data sets D and D
differing in a single entry.
Here ε
p
and δ are privacy parameters, where low ε
p
and δ ensure more privacy (Dwork et al.,
2006b; Wasserman and Zhou, 2010; Dwork et al., 2006a). The second privacy guarantee is weaker;
the parameter δ bounds the probability of failure, and is typically chosen to be quite small. In our
experiments we chose small but constant δ—Ganta et al. (2008) suggest δ <
1
n
2
is more appropriate.
In this paper we are interes ted in proving results on the sample complexity of differentially
private algorithms that approximate PCA. That is, for a given ε
p
and ρ, how large must the number
of individuals n in the data set be such that the algorithm is both ε
p
-differentially private and a
(ρ,η)-close approximation to PCA? It is well known that as the number of individuals n grows,
it is easier to guarantee the same level of privacy with relatively les s noise or perturbation, and
therefore the utility of the approximation also improves. Our results characterize how the privacy
ε
p
and utility ρ scale with n and the tradeoff between them for fixed n. We show that the sample
complexity depends on the eigengap .
3. Algorithms and Re sults
In this section we describe differentially private techniques for approximating (2). The first is a
modified version of the Sub-Linear Queries (SULQ) method (Blum et al., 2005). Our new algo-
rithm for differentially-private PCA, PPCA, is an instantiation of the exponential mechanism due
to McSherry and Talwar (2007). Both procedures are differentially private approximations to the
top-k subspace: SULQ guarantees (ε
p
,δ)-differential privacy and PPCA guarantees ε
p
-differential
privacy.
3.1 Input Perturbation
The only differentially-private approximation to PCA prior to this work is the SULQ method (Blum
et al., 2005). The SULQ method perturbs each entry of the empirical second moment matrix A
to ensure differential privacy and releases the top-k eigenvectors of this perturbed matrix. More
specifically, SULQ recommends adding a matrix N of i.i.d. Gaussian noise of variance
8d
2
log
2
(d/δ)
n
2
ε
2
p
and applies the PCA algorithm to A + N. This guarantees a weaker privacy definition known as
(ε
p
,δ)-differential privacy. One problem with this approach is that with probability 1 the matrix
A+N is not symmetric, so the largest eigenvalue may not be real and the entries of the corresponding
2910
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
eigenvector may be complex. Thus the SULQ algorithm, as written, is not a good candidate for
approximating PCA.
It is easy to modify SULQ to produce a an eigenvector with real entries that guarantees (ε
p
,δ)
differential privacy. In Algorithm 1, instead of adding an asymmetric Gaussian matrix, we add a
symmetric matrix with i.i.d. Gaussian entries N. That is, for 1 i j d, the variable N
i j
is an
independent Gaussian random variable with variance β
2
. Note that this matrix is symmetric but
not necessarily positive semidefinite, so some eigenvalues may be negative but the eigenvectors
are all real. A derivation for the noise variance in (5) of Algorithm 1 is given in Theorem 5. An
alternative is to add Laplace noise of an appropriate variance to each entry—this would guarantee
ε
p
-differential privacy.
Algorithm 1: Algorithm MOD-SULQ (input pertubation)
inputs: d ×n data matrix X, privacy parameter ε
p
, parameter δ
outputs: d ×k matrix
ˆ
V
k
= [ ˆv
1
ˆv
2
··· ˆv
k
] with orthonormal columns
1 Set A =
1
n
XX
T
.;
2 Set
β =
d + 1
nε
p
s
2log
d
2
+ d
δ2
2π
+
1
n
ε
p
. (5)
Generate a d ×d symmetric random matrix N whose entries are i.i.d. drawn from N
0,β
2
. ;
3 Compute
ˆ
V
k
= V
k
(A + N) according to (2). ;
3.2 Exponential Mechanism
Our new method, PPCA, randomly samples a k-dimensional subspace from a distribution that en-
sures differential privacy and is biased towards high utility. The distribution from which our re-
leased subspace is sampled is known in the statistics literature as the matrix Bingham distribution
(Chikuse, 2003), which we denote by BMF
k
(B). The algorithm and its privacy properties apply to
general k < d but our theoretical results on the utility focus on the special case k = 1. The matrix
Bingham distribution takes values on the set of all k-dimensional subspaces of R
d
and has a density
equal to
f (V ) =
1
F
1 1
1
2
k,
1
2
d,B
exp(tr(V
T
BV )), (6)
where V is a d ×k matrix whose columns are orthonormal and F
1 1
1
2
k,
1
2
d,B
is a confluent hyper-
geometric function (Chikuse, 2003, p.33).
By combining results on the exponential mechanism along with properties of PCA algorithm,
we can show that this procedure is differentially private. In many cases, sampling from the dis tri-
bution specified by the exponential mechanism may be expensive computationally, especially for
continuous-valued outputs. We implement PPCA using a recently-proposed Gibbs sampler due to
Hoff (2009). Gibbs sampling is a popular Markov Chain Monte Carlo (MCMC) technique in which
samples are generated according to a Markov chain whose stationary distribution is the density in
2911
CHAUDHURI, SARWATE AND SINHA
Algorithm 2: Algorithm PPCA (exponential mechanism)
inputs: d ×n data matrix X, privacy parameter ε
p
, dimension k
outputs: d ×k matrix
ˆ
V
k
= [ ˆv
1
ˆv
2
··· ˆv
k
] with orthonormal columns
1 Set A =
1
n
XX
T
;
2 Sample
ˆ
V
k
= BMF
n
ε
p
2
A
;
(6). Assessing the “burn-in time” and other factors for this procedure is an interesting question in
its own right; fur ther details are in Section 6.2.
3.3 Other Approaches
There are other general algorithmic strategies for guaranteeing differential privacy. The sensitivity
method (Dwork et al., 2006b) adds noise proportional to the maximum change that can be induced
by changing a single point in the data set. Consider a data set D with m + 1 copies of a unit
vector u and m copies of a unit vector u
with u u
and let D
have m copies of u and m +
1 copies of u
. Then v
1
(D) = u but v
1
(D
) = u
, so
k
v
1
(D) v
1
(D
)
k
=
2. Thus the global
sensitivity does not scale with the number of data points, so as n increases the variance of the
noise required by the sensitivity method will not decrease. An alternative to global sensitivity is
smooth sensitivity (Nissim et al., 2007). Except for special cases, such as the sample median,
smooth sensitivity is difficult to compute for general functions. A third method for computing
private, approximate solutions to high-dimensional optimization problems is objective perturbation
(Chaudhuri et al., 2011); to apply this method, we require the optimization problems to have certain
properties (namely, strong convexity and bounded norms of gradients), which do not apply to PCA.
3.4 Main Results
Our theoretical results are sample complexity bounds for PPCA and MOD-SULQ as well as a general
lower bound on the sample complexity for any ε
p
-differentially private algorithm. These results
show that the PPCA is nearly optimal in terms of the scaling of the sample complexity with respect
to the data dimension d, privacy parameter ε
p
, and eigengap . We further show that MOD-SULQ
requires more samples as a function of d, despite having a slightly weaker privacy guarantee. Proofs
are presented in Sections 4 and 5.
Even though both algorithms can output the top-k PCA subspace for general k d, we prove
results for the case k = 1. Finding the scaling behavior of the sample complexity with k is an
interesting open problem that we leave for future work; challenges here are finding the right notion
of approximation of the PCA, and extending the theory using packings of Grassman or Stiefel
manifolds.
Theorem 5 For the β in (5) Algorithm MOD-SULQ is (ε
p
,δ) differentially private.
Theorem 6 Algorithm PPCA is ε
p
-differentially private.
The fact that these two algorithms are differentially private follows from some simple calcu-
lations. Our first sample complexity result provides an upper bound on the number of samples
required by PPCA to guarantee a certain level of privacy and accuracy. The sample complexity of
2912
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
PPCA grows linearly with the dimension d, inversely with ε
p
, and inversely with the correlation gap
(1 ρ) and eigenvalue gap . These sample complexity results hold for k = 1.
Theorem 7 (Sample complexity of PPCA) If
n >
d
ε
p
(1 ρ)
4
log(1/η)
d
+ 2log
8λ
1
(1 ρ
2
)
,
then the top PCA direction v
1
and the output of PPCA ˆv
1
with privacy parameter ε
p
satisfy
Pr(
|
hv
1
, ˆv
1
i
|
> ρ) 1 η.
That is, PPCA is a (ρ,η)-close approximation to PCA.
Our second result shows a lower bound on the number of samples required by any ε
p
-differentially-
private algorithm to guarantee a certain level of accuracy for a large class of data sets, and uses proof
techniques in Chaudhuri and Hsu (2011, 2012).
Theorem 8 (Sample complexity lower bound) Fix d 3, ε
p
,
1
2
and let 1 φ =
exp
2 ·
ln8+ln(1+exp(d))
d2
. For any ρ 1
1φ
16
, no ε
p
-differentially private algorithm A can ap-
proximate PCA with expected utility greater than ρ on all databases with n points in dimension d
having eigenvalue gap , where
n <
d
ε
p
max
(
1,
s
1 φ
80(1 ρ)
)
.
Theorem 7 shows that if n scales like
d
ε
p
(1ρ)
log
1
1ρ
2
then PPCA produces an approximation
ˆv
1
that has correlation ρ with v
1
, whereas Theorem 8 shows that n must scale like
d
ε
p
(1ρ)
for any
ε
p
-differentially private algorithm. In terms of scaling with d, ε
p
and , the upper and lower bounds
match, and they also match up to square-root factors with respect to the correlation. By contrast, the
following lower bound on the number of samples required by MOD-SULQ to ensure a certain level
of accuracy shows that MOD-SULQ has a less favorable scaling with dimension.
Theorem 9 (Sample complexity lower bound for MOD-SULQ) There are constants c and c
such
that if
n < c
d
3/2
p
log(d/δ)
ε
p
(1 c
(1 ρ)),
then there is a data set of size n in dimension d such that the top PCA direction v and the output ˆv
of MOD-SULQ satisfy E [
|
hˆv
1
,v
1
i
|
] ρ.
Notice that the dependence on n grows as d
3/2
in SULQ as opposed to d in PPCA. Dimensionality
reduction via PCA is often used in applications where the data points occupy a low dimensional
space but are presented in high dimensions. These bounds suggest that PPCA is better suited to
such applications than MOD-SULQ.
2913
CHAUDHURI, SARWATE AND SINHA
4. Analysis of PPCA
In this section we provide theoretical guarantees on the performance of PPCA. The proof of The-
orem 6 follows from the results on the exponential mechanism (McSherry and Talwar, 2007). To
find the sample complexity of PPCA we bound the density of the Bingham distribution, leading to
a sample complexity for k = 1 that depends on the gap λ
1
λ
2
between the top two eigenvalues.
We also prove a general lower bound on the sample complexity that holds for any ε
p
-differentially
private algorithm. The lower bound matches our upper bound up to log factors, showing that PPCA
is nearly optimal in terms of the scaling with dimension, privacy ε
p
, and utility q
A
(·).
4.1 Privacy Guarantee
We first give a proof of Theorem 6.
Proof Let X be a data matrix whose i-th column is x
i
and A =
1
n
XX
T
. The PP-PCA algorithm is
the exponential mechanism of M cSherry and Talwar (2007) applied to the score function n ·v
T
Av.
Consider X
= [x
1
x
2
··· x
n1
x
n
] differing from X in a single column and let A
=
1
n
X
X
T
. We have
max
vS
d1
n ·v
T
A
v n ·v
T
Av
v
T
(x
n
x
T
n
x
n
x
T
n
)v
v
T
x
n
2
v
T
x
n
2
1.
The last step follows because kx
i
k 1 for all i. The result now follows immediately from McSherry
and Talwar (2007, Theorem 6).
4.2 Upper Bound on Utility
The results on the exponential mechanism bound the gap between the value of the function q
F
( ˆv
1
) =
n· ˆv
T
1
A ˆv
1
evaluated at the output ˆv
1
of the mechanism and the optimal value q(v
1
) = n ·λ
1
. We derive
a bound on the correlation q
A
( ˆv
1
) = |hˆv
1
,v
1
i| via geometric arguments.
Lemma 10 (Lemmas 2.2 and 2.3 of Ball (1997)) Let µ be the uniform measure on the unit sphere
S
d1
. For any x S
d1
and 0 c < 1 the following bounds hold:
1
2
exp
d 1
2
log
2
1 c
µ
n
v S
d1
: hv,xi c
o
exp
dc
2
/2
.
We are now ready to provide a proof of Theorem 7.
Proof Fix a privacy level ε
p
, target correlation ρ, and probability η. Let X be the data matrix and
B = (ε
p
/2)XX
T
and
U
ρ
=
{
u : |hu,v
1
i| ρ
}
.
be the union of the two spherical caps centered at ±v
1
. Let
U
ρ
denote the complement of U
ρ
in
S
d1
.
An output vector ˆv
1
is “good” if it is in U
ρ
. We first give some bounds on the score function
q
F
(u) on the boundary between U
ρ
and U
ρ
, where hu,v
1
i = ±ρ. On this boundary, the function
2914
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
q
F
(u) is maximized when u is a linear combination of v
1
and v
2
, the top two eigenvectors of A. It
minimized when u is a linear combination of v
1
and v
d
. Therefore
q
F
(u)
nε
p
2
(ρ
2
λ
1
+ (1 ρ
2
)λ
2
) u
U
ρ
, (7)
q
F
(u)
nε
p
2
(ρ
2
λ
1
+ (1 ρ
2
)λ
d
) u U
ρ
. (8)
Let µ(·) denote the uniform measure on the unit s phere. Then fixing an 0 b < 1, using (7),
(8), and the fact that λ
d
0,
P
U
ρ
P
U
ρ
P (U
σ
)
=
1
F
1 1
(
1
2
k,
1
2
m,B
)
R
U
ρ
exp
u
T
Bu
dµ
1
F
1 1
(
1
2
k,
1
2
m,B
)
R
U
σ
exp(u
T
Bu)dµ
exp
n(ε
p
/2)
ρ
2
λ
1
+ (1 ρ
2
)λ
2

·µ
U
ρ
exp(n(ε
p
/2)(σ
2
λ
1
+ (1 σ
2
)λ
d
)) ·µ (U
σ
)
exp
nε
p
2
σ
2
λ
1
(ρ
2
λ
1
+ (1 ρ
2
)λ
2
)
·
µ
U
ρ
µ(U
σ
)
. (9)
Applying the lower bound from Lemma 10 to the denominator of (9) and the upper bound µ
U
ρ
1 yields
P
U
ρ
exp
nε
p
2
σ
2
λ
1
(ρ
2
λ
1
+ (1 ρ
2
)λ
2
)
·exp
d 1
2
log
2
1 σ
. (10)
We must choose a σ
2
> ρ
2
to make the upper bound smaller than 1. More precisely,
σ
2
> ρ
2
+ (1 ρ
2
)
λ
2
λ
1
,
1 σ
2
< (1 ρ
2
)
1
λ
2
λ
1
.
For simplicity, choose
1 σ
2
=
1
2
(1 ρ
2
)
1
λ
2
λ
1
.
So that
σ
2
λ
1
(ρ
2
λ
1
+ (1 ρ
2
)λ
2
) = (1 ρ
2
)λ
1
(1 σ
2
)λ
1
(1 ρ
2
)λ
2
= (1 ρ
2
)
λ
1
1
2
(λ
1
λ
2
) λ
2
=
1
2
(1 ρ
2
)(λ
1
λ
2
)
2915
CHAUDHURI, SARWATE AND SINHA
and
log
2
1 σ
< log
4
1 σ
2
= log
8λ
1
(1 ρ
2
)(λ
1
λ
2
)
.
Setting the right hand side of (10) less than η yields
nε
p
4
(1 ρ
2
)(λ
1
λ
2
) > log
1
η
+
d 1
2
log
8λ
1
(1 ρ
2
)(λ
1
λ
2
)
.
Because 1 ρ < 1 ρ
2
, if we choose
n >
d
ε
p
(1 ρ)(λ
1
λ
2
)
4
log(1/η)
d
+ 2log
8λ
1
(1 ρ
2
)(λ
1
λ
2
)
,
then the output of PPCA will produce a ˆv
1
such that
P (
|
hˆv
1
,v
1
i
|
< ρ) < η.
4.3 Lower Bound on Utility
We now turn to a general lower bound on the sample complexity for any differentially private ap-
proximation to PCA. We construct K databases which differ in a small number of points whose top
eigenvectors are not too far from each other. For such a collection, Lemma 12 shows that for any
differentially private mechanism, the average correlation over the collection cannot be too large.
That is, any ε
p
-differentially private mechanism cannot have high utility on all K data sets. T he
remainder of the argument is to construct these K data sets.
The proof uses some simple eigenvalue and eigenvector computations. A matrix of positive
entries
A =
a b
b c
(11)
has characteristic polynomial
det(A λI) = λ
2
(a + c)λ + (ac b
2
)
and eigenvalues
λ =
1
2
(a + c) ±
1
2
q
(a + c)
2
4(ac b
2
)
=
1
2
(a + c) ±
1
2
q
(a c)
2
+ 4b
2
.
The eigenvectors are in the directions (b,(a λ))
T
.
We will also need the following Lemma, which is proved in the Appendix.
2916
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
Lemma 11 (Simple packing set) For φ [(2πd)
1/2
,1), there exists a set of
K =
1
8
exp
(d 1)log
1
p
1 φ
2
!
(12)
vectors C in S
d1
such that for any pair µ,ν C, the inner product between them is upper bounded
by φ:
|
hµ,νi
|
φ.
The following Lemma gives a lower bound on the expected utility averaged over a set of
databases which differ in a “small” number of elements.
Lemma 12 Let D
1
,D
2
,...,D
K
be K databases which differ in the value of at most
ln(K1)
ε
p
points,
and let u
1
,...,u
K
be the top eigenvectors of D
1
,D
2
,...,D
K
. If A is any ε
p
-differentially private
algorithm, then,
K
i=1
E
A
[
|
hA(D
i
),u
i
i
|
] K
1
1
16
(1 max
hu
i
,u
j
i
)
.
Proof Let
t = min
i6= j
(
u
i
u
j
,
u
i
+ u
j
),
and G
i
be the “double cap” around ±u
i
of radius t/2:
G
i
=
{
u :
k
u u
i
k
< t/2
}
{
u :
k
u + u
i
k
< t/2
}
.
We claim that
K
i=1
P
A
(A(D
i
) / G
i
)
1
2
(K 1). (13)
The proof is by contradiction. Suppose the claim is false. Because all of the caps G
i
are disjoint,
and applying the definition of diff erential privacy,
1
2
(K 1) >
K
i=1
P
A
(A(D
i
) / G
i
)
K
i=1
i
6=i
P
A
(A(D
i
) G
i
)
K
i=1
i
6=i
e
ε
p
·ln(K1)/ε
p
P
A
(A(D
i
) G
i
)
(K 1) ·
1
K 1
·
K
i=1
P
A
(A(D
i
) G
i
)
K
1
2
(K 1),
2917
CHAUDHURI, SARWATE AND SINHA
which is a contradiction, so (13) holds. Therefore by the Markov inequality
K
i=1
E
A
h
min(
k
A(D
i
) u
i
k
2
,
k
A(D
i
) + u
i
k
2
)
i
K
i=1
P(A(D
i
) / G
i
) ·
t
2
4
1
8
(K 1)t
2
.
Rewr iting the norms in terms of inner products shows
2K 2
K
i=1
E
A
[
|
hA(D
i
),u
i
i
|
]
1
8
(K 1)
2 2 max
hu
i
,u
j
i
,
so
K
i=1
E
A
[
|
hA(D
i
),u
i
i
|
] K
1
1
8
K 1
K
(1 max
hu
i
,u
j
i
)
K
1
1
16
(1 max
hu
i
,u
j
i
)
.
We can now prove Theorem 8.
Proof From Lemma 12, given a set of K databases differing in
ln(K1)
ε
p
points with top eigenvectors
{u
i
: i = 1,2,...,K}, for at least one database i,
E
A
[
|
hA(D
i
),u
i
i
|
] 1
1
16
1 max
hu
i
,u
j
i
for any ε
p
-differentially private algorithm. Setting the left side equal to some target ρ,
1 ρ
1
16
1 max
hu
i
,u
j
i
. (14)
So our goal is construct these data bases such that the inner product between their eigenvectors is
small.
Let y = e
d
, the d-th coordinate vector, and let φ ((2πd)
1/2
,1). Lemma 11 shows that there
exists a packing W = {w
1
,w
2
,...,w
K
} of the sphere S
d2
spanned by the fir st d 1 elementary
vectors {e
1
,e
2
,...,e
d1
} such that max
i6= j
|hw
i
,w
j
i| φ, where
K =
1
8
(1 φ)
(d2)/2
.
Choose φ such that ln(K 1) = d. This means
1 φ = exp
2 ·
ln8 + ln(1 + exp(d))
d 2
.
The right side is minimized for d = 3 but this leads to a weak lower bound 1 φ > 3.5 ×10
5
. By
contrast, for d = 100, the bound is 1 φ > 0.12. In all cases, 1 φ is at least a constant value.
2918
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
We construct a database with n points for each w
i
. Let β =
d
nε
p
. For now, we assume that
β
1
2
. The other case, when β will be considered later. Because β , we have
n >
d
ε
p
.
The construction uses a parameter 0 m 1 that will be set as a function of the eigenvalue gap .
We will derive conditions on n based on the requirements on d, ε
p
, ρ, and . For i = 1,2,...,K let
the data set D
i
contain
n(1 β) copies of
my
nβ copies of z
i
=
1
2
y +
1
2
w
i
.
Thus data sets D
i
and D
j
differ in the values of nβ =
ln(K1)
nε
p
individuals. The second moment
matrix A
i
of D
i
is
A
i
= ((1 β)m +
1
2
β)yy
T
+
1
2
β(w
T
i
y + yw
T
i
) +
1
2
βw
i
w
T
i
.
By choosing an basis containing y and w
i
, we can write this as
A
i
=
(1 β)m +
1
2
β
1
2
β 0
1
2
β
1
2
β 0
0 0 0
.
This is in the form (11), with a = (1 β)m +
1
2
β, b =
1
2
β, and c =
1
2
β.
The matrix A
i
has two nonzero eigenvalues given by
λ =
1
2
(a + c) +
1
2
q
(a c)
2
+ 4b
2
, (15)
λ
=
1
2
(a + c)
1
2
q
(a c)
2
+ 4b
2
,
The gap between the top two eigenvalues is:
=
q
(a c)
2
+ 4b
2
=
q
m
2
(1 β)
2
+ β
2
.
We can thus set m in the construction to ensure an eigengap of :
m =
p
(
2
β
2
)
1 β
. (16)
The top eigenvector of A
i
is given by
u
i
=
b
p
b
2
+ (a λ)
2
y +
(a λ)
p
b
2
+ (a λ)
2
w
i
.
2919
CHAUDHURI, SARWATE AND SINHA
where λ is given by (15). Therefore
max
i6= j
hu
i
,u
j
i
b
2
b
2
+ (a λ)
2
+
(a λ)
2
b
2
+ (a λ)
2
max
i6= j
hw
i
,w
j
i
1
(a λ)
2
b
2
+ (a λ)
2
(1 φ). (17)
To obtain an upper bound on max
i6= j
hu
i
,u
j
i
we must lower bound
(aλ)
2
b
2
+(aλ)
2
.
Since x/(ν + x) is monotonically increasing in x when ν > 0, we will find a lower bound on
(a λ). Observe that from (15),
λ a =
b
2
λ c
.
So to lower bound λ a we need to upper bound λ c. We have
λ c =
1
2
(a c) +
1
2
=
1
2
((1 β)m + ).
Because b = β/2,
(λ a)
2
>
β
2
2((1 β)m + )
2
=
β
4
4((1 β)m + )
2
.
Now,
(a λ)
2
b
2
+ (a λ)
2
>
β
4
β
2
((1 β)m + )
2
+ β
4
=
β
2
β
2
+ ((1 β)m + )
2
>
β
2
5
2
, (18)
where the last step follows by plugging in m from (16) and using the fact that β . Putting it all
together, we have from (14), (17), and (18), and using the fact that φ is such that ln(K 1) = d and
β =
d
nε
p
,
1 ρ
1
16
·
(a λ)
2
b
2
+ (a λ)
2
(1 φ)
>
1 φ
80
β
2
2
=
1 φ
80
·
d
2
n
2
ε
2
p
2
,
which implies
n >
d
ε
p
s
1 φ
80(1 ρ)
.
2920
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
Thus f or β 1/2, any ε
p
-differentially private algorithm needs
d
ε
p
1ρ
points to get
expected inner product ρ on all data sets with eigengap .
We now consider the case where β > . We choose a slightly different construction here. T he
i-th database now consists of n(1 β) copies of the 0 vector, and nβ copies of
β
w
i
. Thus, every pair
of databases differ in the values of nβ =
ln(K1)
ε
p
people, and the eigenvalue gap between the top two
eigenvectors is β ·
β
= .
As the top eigenvector of the i-th database is u
i
= w
i
,
max
i6= j
|hu
i
,u
j
i| = max
i6= j
|hw
i
,w
j
i| φ.
Combining this with (14), we obtain
1 ρ
1
16
(1 φ),
which provides the additional condition in the Theorem.
5. Analysis of MOD-SULQ
In this section we provide theoretical guarantees on the performance of the MOD-SULQ algorithm.
Theorem 5 shows that MOD-SULQ is (ε
p
,δ)-differentially private. Theorem 15 provides a lower
bound on the distance between the vector released by MOD-SULQ and the true top eigenvector in
terms of the privacy parameters ε
p
and δ and the number of points n in the data set. This im-
plicitly gives a lower bound on the sample complexity of MOD-SULQ. We provide some graphical
illustration of this tradeoff.
The following upper bound will be useful for future calculations : for two unit vectors x and y,
1ijd
(x
i
x
j
y
i
y
j
)
2
2. (19)
Note that this upper bound is achievable by setting x and y to be orthogonal elementary vectors.
5.1 Privacy Guarantee
We first justify the choice of β
2
in the MOD-SULQ algorithm by proving Theorem 5.
Proof Let B and
ˆ
B be two independent symmetric random matrices where {B
i j
: 1 i j d} and
{
ˆ
B
i j
: 1 i j d} are each sets of i.i.d. Gaussian random variables with mean 0 and variance β
2
.
Consider two data sets D = {x
i
: i = 1, 2, . . .,n} and
ˆ
D = D
1
{ˆx
n
}\{x
n
} and let A and
ˆ
A denote
their second moment matrices. L et G = A +B and
ˆ
G =
ˆ
A +
ˆ
B. We first calculate the log ratio of the
densities of G and
ˆ
G at a point H:
log
f
G
(H)
f
ˆ
G
(H)
=
1ijd
1
2β
2
(H
i j
A
i j
)
2
+
1
2β
2
(H
i j
ˆ
A
i j
)
2
=
1
2β
2
1ijd
2
n
(H
i j
A
i j
)(x
n,i
x
n, j
ˆx
n,i
ˆx
n, j
) +
1
n
2
( ˆx
n,i
ˆx
n, j
x
n,i
x
n, j
)
2
.
2921
CHAUDHURI, SARWATE AND SINHA
From (19) the last term is upper bounded by 2/n
2
. To upper bound the first term,
1ijd
|ˆx
n,i
ˆx
n, j
x
n,i
x
n, j
| 2 max
a:
k
a
k
1
1ijd
a
i
a
j
2 ·
1
2
(d
2
+ d) ·
1
d
= d + 1.
Note that this bound is not too loose—by taking ˆx = d
1/2
1 and x = (1,0,...,0)
T
, this term is still
linear in d.
Then for any measurable set S of matrices,
P (G S) exp
1
2β
2
2
n
(d + 1)γ +
3
n
2

P
ˆ
G S
+ P (B
i j
> γ for all i, j). (20)
To handle the last term, use a union bound over the (d
2
+ d)/2 variables {B
i j
} together with the
tail bound, which holds for γ > β:
P (B
i j
> γ)
1
2π
e
γ
2
/2β
2
.
Thus setting P (B
i j
> γ for some i, j) = δ yields the condition
δ =
d
2
+ d
2
2π
e
γ
2
/2β
2
.
Rearranging to solve for γ gives
γ = max
β,β
s
2log
d
2
+ d
δ2
2π
!
= β
s
2log
d
2
+ d
δ2
2π
for d > 1 and δ < 3/
2πe. This then gives an expression for ε
p
to make (20) imply (ε
p
,δ) differ-
ential privacy:
ε
p
=
1
2β
2
2
n
(d + 1)γ +
2
n
2
=
1
2β
2
2
n
(d + 1)β
s
2log
d
2
+ d
δ2
2π
+
2
n
2
!
.
Solving for β using the quadratic formula yields the particularly messy expression in (5):
β =
d + 1
2nε
p
s
2log
d
2
+ d
δ2
2π
+
1
2nε
p
2(d + 1)
2
log
d
2
+ d
δ2
2π
+ 4ε
p
1/2
d + 1
nε
p
s
2log
d
2
+ d
δ2
2π
+
1
ε
p
n
.
2922
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
5.2 Proof of Theorem 9
In this section we provide theoretical guarantees on the performance of the MOD-SULQ algorithm.
Theorem 5 shows that MOD-SULQ is (ε
p
,δ)-differentially private. Theorem 15 provides a lower
bound on the distance between the vector released by MOD-SULQ and the true top eigenvector in
terms of the privacy parameters ε
p
and δ and the number of points n in the data set. This implicitly
gives a lower bound on the sample complexity of MOD-SULQ. We provide some graphical illus-
tration of this tradeoff. The main tool in our lower bound is a generalization by Yu (1997) of an
information-theoretic inequality due to Fano.
Theorem 13 (Fano’s inequality (Yu, 1997)) Let R be a set and Θ be a parameter space with a
pseudo-metric d(·). Let F be a set of r densities {f
1
,..., f
r
} on R corresponding to parameter
values {θ
1
,...,θ
r
} in Θ. Let X have distribution f F with corresponding parameter θ and let
ˆ
θ(X) be an estimate of θ. If, for all i and j
d(θ
i
,θ
j
) τ
and
KL( f
i
kf
j
) γ,
then
max
j
E
j
[d(
ˆ
θ,θ
j
)]
τ
2
1
γ + log 2
logr
,
where E
j
[·] denotes the expectation with respect to distribution f
j
.
To use this inequality, we will construct a set of densities on the set of covariance matrices
corresponding distribution of the random matrix in the MOD-SULQ algorithm under different inputs.
These inputs will be chosen using a set of unit vectors which are a packing on the surface of the unit
sphere.
Lemma 14 Let Σ be a positive definite matrix and let f denote the density N (a,Σ) and g denote
the density N (b,Σ). Then KL( f kg) =
1
2
(a b)
T
Σ
1
(a b).
Proof This is a simple calculation:
KL( f kg) = E
xf
1
2
(x a)
T
Σ
1
(x a) +
1
2
(x b)Σ
1
(x b)
=
1
2
a
T
Σ
1
a a
T
Σ
1
b b
T
Σ
1
a + b
T
Σ
1
b
=
1
2
(a b)
T
Σ
1
(a b).
The next theorem is a lower bound on the expected distance between the vector output by MOD-
SULQ and the true top eigenvector. In order to get this lower bound, we construct a class of data
sets and use Theorem 13 to derive a bound on the minimax error over the class.
2923
CHAUDHURI, SARWATE AND SINHA
Theorem 15 (Utility bound for MOD-SULQ) Let d, n, and ε
p
> 0 be given and let β be given by
Algorithm 1 so that the output of MOD-SULQ is (ε
p
,δ)-differentially private for all data sets in R
d
with n elements. Then there exists a data set with n elements such that if ˆv
1
denotes the output of
MOD-SULQ and v
1
is the top eigenvector of the empirical covariance matrix of the data set, the
expected correlation hˆv
1
,v
1
i is upper bounded:
E [|hˆv
1
,v
1
i|] min
φΦ
1
(1 φ)
4
1
1/β
2
+ log2
(d 1)log
1
1φ
2
log(8)
2
, (21)
where
Φ
"
max
(
1
2πd
,
s
1 exp
2log(8d)
d 1
,
s
1 exp
2/β
2
+ log(256)
d 1
)
,1
!
. (22)
Proof For φ [(2πd)
1/2
,1), Lemma 11 shows there exists a set of K unit vectors C such that for
µ,ν C , the inner product between them satisfies
|
hµ,νi
|
< φ, where K is given by (12). Note that
for small φ this setting of K is loose, but any orthonormal basis provides d unit vectors which are
orthogonal, setting K = d and solving for φ yields
1 exp
2log(8d)
d 1

1/2
.
Setting the lower bound on φ to the maximum of these two yields the set of φ and K which we will
consider in (22).
For any unit vector µ, let
A(µ) = µµ
T
+ N,
where N is a d ×d symmetric random matrix such that {N
i j
: 1 i j d} are i.i.d. N (0,β
2
),
where β
2
is the noise variance used in the MOD-SULQ algorithm. Due to s ymmetry, the matrix A(µ)
can be thought of as a jointly Gaussian random vector on the d(d + 1)/2 variables {A
i j
(µ) : 1 i
j d}. The mean of this vector is
¯µ =
µ
2
1
,µ
2
2
,...,µ
2
d
,µ
1
µ
2
,µ
1
µ
3
,...,µ
d1
µ
d
T
,
and the covariance is β
2
I
d(d+1)/2
. Let f
µ
denote the density of this vector.
For µ,ν C , the divergence between f
µ
and f
ν
can be calculated using Lemma 14:
KL( f
µ
kf
ν
) =
1
2
(¯µ
¯
ν)
T
Σ
1
(¯µ
¯
ν)
=
1
2β
2
k
¯µ
¯
ν
k
2
1
β
2
. (23)
The last line follows from the fact that the vectors in C are unit norm.
2924
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
For any two vectors µ,ν C , lower bound the Euclidean distance between them using the upper
bound on the inner product:
k
µ ν
k
p
2(1 φ). (24)
Let Θ = S
d1
with the Euclidean norm and R be the set of distributions {A(µ) : µ Θ}. From
(24) and (23), the set C satisfies the conditions of Theorem 13 with F = {f
µ
: µ C}, r = K,
τ =
p
2(1 φ), and γ =
1
β
2
. The conclusion of the Theorem shows that for MOD-SULQ,
max
µC
E
f
µ
[
k
ˆv µ
k
]
p
2(1 φ)
2
1
1/β
2
+ log2
logK
. (25)
This lower bound is vacuous when the term inside the parenthesis is negative, which imposes further
conditions on φ. Setting log K = 1/β
2
+ log2, we can solve to find another lower bound on φ:
φ
s
1 exp
2/β
2
+ log(256)
d 1
.
This yields the third term in (22). Note that for larger n this term will dominate the others.
Using Jensen’s inequality on the the left side of (25):
max
µC
E
f
µ
[2(1 |hˆv,µi|)]
(1 φ)
2
1
1/β
2
+ log2
logK
2
.
So there exists a µ C such that
E
f
µ
[|hˆv,µi|] 1
(1 φ)
4
1
1/β
2
+ log2
logK
2
. (26)
Consider the data set consisting of n copies of µ. The corresponding covariance matrix is µµ
T
with
top eigenvector v
1
= µ. The output of the algorithm MOD-SULQ applied to this data set is an esti-
mator of µ and hence satisfies (26). Minimizing over φ gives the desired bound.
The minimization over φ in (21) does not lead to analytically pretty results, so we plotted the
results in Figure 1 in order to get a sense of the bounds. Figure 1 shows the lower bound on the
expected correlation E [|hˆv
1
,v
1
i|] as a function of the number of data points (given on a logarithmic
scale). Each panel shows a different dimension, from d = 50 to d = 1000, and plots are given for
different values of ε
p
ranging from 0.01 to 2. In all experiments we set δ = 0.01. In high dimension,
the lower bound shows that the expected performance of MOD-SULQ is poor when there are a small
number of data points. This limitation may be particularly acute when the data lies in a very low
dimensional subspace but is presented in very high dimension. In such “sparse” settings, perturbing
the input as in MOD-SULQ is not a good approach. However, in lower dimensions and data-rich
regimes, the performance may be more favorable.
A little calculation yields the sample complexity bound in Theorem 9
Proof Suppose E [|h ˆv
1
,v
1
i|] = ρ. Then a little algebra shows
2
p
1 ρ min
φΦ
p
1 φ
1
1/β
2
+ log2
(d 1)log
1
1φ
2
log(8)
.
2925
CHAUDHURI, SARWATE AND SINHA
50 100
300 1000
0.85
0.90
0.95
1.00
0.85
0.90
0.95
1.00
3 4 5 6 3 4 5 6
log
10
(n)
Utility (correlation)
privacy
0.01
0.05
0.1
0.2
0.5
1
2
Utility as a function of sample size
Figure 1: Upper bound from Theorem 15 on the expected correlation between the true top eigenvec-
tor and the ˆv
1
produced by MOD-SULQ. The horizontal axis is log
10
(n) and the vertical
axis shows the lower bound in (21). The four panels correspond to different values of the
dimension d, from 50 to 1000. Each panel contains plots of the bound for different values
of ε
p
.
Setting φ such that (d 1)log
1
1φ
2
log(8) = 2(1/β
2
+ log2) we have
4
p
1 ρ
p
1 φ.
Since we are concerned with the scaling behavior for large d and n, this implies
log
1
p
1 φ
2
= Θ
1
β
2
d
,
so
φ =
s
1 exp
Θ
1
β
2
d

= Θ
s
1
β
2
d
!
.
From Algorithm 1, to get for some constant c
1
, we have the following lower bound on β:
β
2
> c
1
d
2
n
2
ε
2
p
log(d/δ).
2926
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
Substituting, we get for some constants c
2
and c
3
that
(1 c
2
(1 ρ)) c
3
n
2
ε
2
p
d
3
log(d/δ)
.
Now solving for n shows
n c
d
3/2
p
log(d/δ)
ε
p
1 c
(1 ρ)
.
6. Experiments
We next turn to validating our theoretical results on real data. We implemented MOD-SULQ and
PPCA in order to test our theoretical bounds. Implementing PPCA involved using a Gibbs sampling
procedure (Hoff, 2009). A crucial parameter in MCMC procedures is the burn-in time, which is
how long the chain must be run for it to reach its stationary distribution. Theoretically, chains reach
their stationary distribution only in the limit; however, in practice MCMC users must sample after
some finite time. In order to use this procedure appropriately, we determined a burn-in time using
our data sets. The interaction of MCMC procedures and differential privacy is a rich area for future
research.
6.1 Data and Preprocessing
We report on the performance of our algorithm on some real data sets. We chose four data sets
from four different domains—
kddcup99
(Bache and Lichman, 2013), which includes features of
494,021 network connections,
census
(Bache and Lichman, 2013), a demographic data set on
199,523 individuals,
localization
(Kalu
ˇ
za et al., 2010), a medical data set with 164,860 instances
of sensor readings on individuals engaged in different activities, and
insurance
(van der Putten and
van Someren, 2000), a data set on product usage and demographics of 9,822 individuals.
These data sets contain a mix of continuous and categorical features. We preprocessed each
data set by converting a feature with q discrete values to a vector in {0,1}
q
; after prepr ocessing, the
data sets
kddcup99
,
census
,
localization
and
insurance
have dimensions 116, 513, 44 and 150
respectively. We also normalized each row so that each entry has maximum value 1, and normalize
each column such that the maximum (Euclidean) column norm is 1. We choose k = 4 for
kddcup
,
k = 8 for
census
, k = 10 for
localization
and k = 11 for
insurance
; in each case, the utility
q
F
(V
k
) of the top-k PCA subspace of the data matrix accounts for at least 80% of
k
A
k
F
. Thus, all
four data sets, although fairly high dimensional, have good low-dimensional r epresentations. The
properties of each data set are summarized in Table 1.
6.2 Implementation of Gibbs Sampling
The theoretical analysis of PPCA uses properties of the Bingham distribution BMF
k
(·) given in (6).
To implement this algorithm for experiments we use a Gibbs sampler due to Hoff (2009). The Gibbs
sampling scheme induces a Markov Chain, the stationary distribution of which is the density in (6).
2927
CHAUDHURI, SARWATE AND SINHA
Data Set #instances #dimensions k q
F
(V
k
) q
F
(V
k
)/
k
A
k
F
kddcup
494,021 116 4 0.6587 0.96
census
199,523 513 8 0.7321 0.81
localization
164,860 44 10 0.5672 0.81
insurance
9,822 150 11 0.5118 0.81
Table 1: Parameters of each data set. The second column is the number of dimensions after prepro-
cessing. k is the dimensionality of the PCA, the third column contains q
F
(V
k
), where V
k
is
the top-k PCA subspace, and the fifth column is the normalized utility q
F
(V
k
)/
k
A
k
F
.
Gibbs sampling and other MCMC procedures are widely used in statistics, scientific modeling, and
machine learning to estimate properties of complex distributions (Brooks, 1998).
Finding the speed of convergence of MCMC methods is still an open area of research. There has
been much theoretical work on estimating convegence times (Jones and Hobart, 2004; Douc et al.,
2004; Jones and Hobart, 2001; Roberts, 1999; Roberts and Sahu, 2001; Roberts, 1999; Roberts and
Sahu, 2001; Rosenthal, 1995; Kolassa, 1999, 2000), but unfortunately, most theoretical guarantees
are available only in special cases and are often too weak for practical use. In lieu of theoretical
guarantees, users of MCMC methods empirically estimate the burn-in time, or the number of iter-
ations after which the chain is sufficiently close to its stationary distribution. Statisticians employ
a range of diagnostic methods and statistical tests to empirically determine if the Markov chain
is close to stationarity (Cowles and Carlin, 1996; Brooks and Roberts, 1998; Brooks and Gelman,
1998; El Adlouni et al., 2006). These tests do not provide a sufficient guarantee of stationarity, and
there is no “best test” to use. In practice, the convergence of derived statistics is used to estimate an
appropriate the burn-in time. In the case of the Bingham distribution, Hoff (2009) performs qualita-
tive measures of convergence. Developing a better characterization of the convergence of this Gibbs
sampler is an important question for future work.
Because the MCMC procedure of Hoff (2009) does not come with convergence-time guaran-
tees, for our experiments we had to choose an appropriate burn-in time. The “ideal” execution of
PPCA provides ε
p
-differential privacy, but because our implementation only approximates sampling
from the Bingham distribution, we cannot guarantee that this implementation provides the privacy
guarantee. As noted by Mironov (2012), even current implementations of floating-point arithmetic
may suffer from privacy problems, so there is still significant work to do between theory and imple-
mentation. For this paper we tried to find a burn-in time that was sufficiently long so that we could
be confident that the empirical performance of PPCA was not affected by the initial conditions of
the sampler.
In order to choose an appropriate burn-in time, we examined the time series trace of the Markov
Chain. We ran l copies, or traces, of the chain, starting from l different initial locations drawn
uniformly from the set of all d ×k matrices with orthonormal columns. Let X
i
(t) be the output of
the i-th copy at iteration t, and let U be the top-k PCA subspace of the data. We used the following
statistic as a function of iteration T :
F
i
k
(T ) =
1
k
1
T
T
t=1
X
i
(t)
F
,
2928
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
where ||·||
F
is the Frobenius norm. The matrix Bingham distribution has mean 0, and hence with
increasing T , the statistic F
i
k
(T ) should converge to 0.
−4
−3
−2
−1
0
0 10000 20000 30000 40000
Time steps
Test statistic value
Trace Number
Trace1
Trace2
Trace3
Trace4
Trace5
Test statistic vs. Gibbs sampler steps
(a)
kddcup
(k = 4)
−3
−2
−1
0
0 10000 20000 30000 40000
Time steps
Test statistic value
Trace Number
Trace1
Trace2
Trace3
Trace4
Trace5
Test statistic vs. Gibbs sampler steps
(b)
insurance
(k = 11)
Figure 2: Plots of log F
i
k
(T ) for five different traces (values of i) on two different data sets. Figure
2(a) shows log F
i
k
(T ) for for k = 4 as a function of iteration T for 40,000 steps of the
Gibbs sampler on the
kddcup
data set. Figure 2(b) shows the same for the
insurance
data set.
Figure 2 illustrates the behavior of the Gibbs sampler. The plots show the value of log F
i
k
(T ) as
a function of the Markov chain iteration for 5 different restarts of the MCMC procedure for two data
sets,
kddcup
and
insurance
. The initial starting points were chosen uniformly from the set of all
d ×k matrices with orthonormal columns. The plots show that F
i
k
(T ) decreases rapidly after a few
thousand iterations, and is less than 0.01 after T = 20,000 in both cases. log F
i
k
(T ) also appears to
have a larger variance for
kddcup
than for
insurance
; this is explained by the fact that the
kddcup
data set has a much larger number of samples, which makes its stationary distribution farther from
the initial distribution of the sampler. Based on these and other simulations, we observed that
the Gibbs sampler converges to F
k
(t) < 0.01 at t = 20,000 when run on data with a few hundred
dimensions and with k between 5 and 10; we thus chose to run the Gibbs sampler for T = 20,000
timesteps for all the data sets.
Our simulations indicate that the chains converge fairly rapidly, particularly when
k
A A
k
k
F
is
small so that A
k
is a good approximation to A. Convergence is slower for larger n when the initial
state is chosen from the uniform distribution over all k ×d matrices with orthonormal columns; this
is explained by the fact that for larger n, the stationary distribution is farther in variation distance
from the starting distribution, which results in a longer convergence time.
6.3 Scaling with Data Set Size
We ran three algorithms on these data sets : standard (non-private) PCA, MOD-SULQ, and PPCA.
As a sanity check, we also tried a uniformly generated random projection—since this projection is
data-independent we would expect it to have low utility. We measured the utility q
F
(U), where U is
the k-dimensional subspace output by the algorithm; q
F
(U) is maximized when U is the top-k PCA
subspace, and thus this reflects how close the output subspace is to the true PCA subspace in terms
2929
CHAUDHURI, SARWATE AND SINHA
of representing the data. Although our theoretical results hold for q
A
(·), the “energy” q
F
(·) is more
relevant in practice for larger k.
To investigate how well these different algorithms performed on real data, for each data set
we subsampled data s ets of different sizes n uniformly and ran the algorithms on the subsets. We
chose ε
p
= 0.1 for this experiment, and for MOD-SULQ we used δ = 0.01. We averaged over 5
such subsets and over several instances of the randomized algorithms (10 restarts for PPCA and 100
for MOD-SULQ and random projections). For each subset and instance we calculated the resulting
utility q
F
(·) of the output subspace.
Figures 3(a), 3(b), 4(a), and 4(b) show q
F
(U) as a function of the subsampled data set sizes.
The bars indicate the standard deviation over the restarts (from subsampling the data and random
sampling for privacy). The non-private algorithm achieved q
F
(V
k
) for nearly all subset sizes (see
Table 1 for the values). These plots illustrate how additional data can improve the utility of the
output for a fixed privacy level ε
p
. As n increases, the dashed blue line indicating the utility of
PPCA begins to approach q
F
(V
k
), the utility of the optimal subspace.
These experiments also show that the performance of PPCA is significantly better than that of
MOD-SULQ, and MOD-SULQ produces subspaces whose utility is on par with randomly choosing a
subspace. The only exception to this latter point is
localization
, We believe this is because d is
much lower for this data set (d = 44), which shows that for low dimension and large n, MOD-SULQ
may produce subspaces with reasonable utility. Furthermore, MOD-SULQ is simpler and hence runs
faster than PPCA, which requires running the Gibbs sampler past the burn-in time. Our theoretical
results suggest that the performance of differentially private PCA cannot be significantly impr oved
over the performance of PPCA but since those results hold for k = 1 they do not immediately apply
here.
6.4 Effect of Privacy on Classification
A common use of a dimension reduction algorithm is as a precursor to classification or clustering;
to evaluate the effectiveness of the different algorithms, we projected the data onto the subspace
output by the algorithms, and measured the classification accuracy us ing the projected data. The
classification results are summarized in Table 2. We chose the normal vs. all classification task in
kddcup99
, and the falling vs. all classification task in
localization
.
1
We used a linear SVM for
all classification tasks, which is implemented by libSVM (Chang and Lin, 2011).
For the classification experiments, we used half of the data as a holdout set for computing a
projection subspace. We projected the classification data onto the subspace computed based on the
holdout set; 10% of this data was used for training and parameter-tuning, and the rest for testing. We
repeated the classification process 5 times for 5 different (random) projections for each algorithm,
and then ran the entire procedure over 5 random permutations of the data. Each value in the figure
is thus an average over 5 ×5 = 25 rounds of classification.
The classification results s how that our algorithm performs almost as well as non-private PCA
for classification in the top-k PCA subspace, while the performance of MOD-SULQ and random
projections are a little worse. The classification accuracy while using MOD-SULQ and random pro-
jections also appears to have higher variance compared to our algorithm and non-private PCA. This
1. For the other two data sets,
census
and
insurance
, the classification accuracy of linear SVM after (non-private)
PCAs is as low as always predicting the majority label.
2930
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
0.0
0.2
0.4
0.6
0 50000 100000 150000
sample size
variance in output subspace
Algorithm
Nonprivate
PPCA
Random
SULQ
Utility versus sample size for Census
(a)
census
(k = 8)
0.0
0.2
0.4
0.6
2e+04 4e+04 6e+04 8e+04 1e+05
sample size
variance in output subspace
Algorithm
Nonprivate
PPCA
Random
SULQ
Utility versus sample size for KDDcup99
(b)
kddcup
(k = 4)
Figure 3: Plot of the unnormalized utility q
F
(U) versus the sample size n, averaged over random
subsets of the data and randomness in the algorithms. The bars are at one standard de-
viation about the mean. The top red line is the PCA algorithm without privacy con-
straints. The dashed line in blue is the utility for PPCA. The green and purple dashed
lines are nearly indistinguishable and represent the utility from random projections and
MOD-SULQ, respectively. In these plots ε
p
= 0.1 and δ = 0.01.
is because the projections tend to be farther from the top-k PCA subspace, making the classification
error more variable.
2931
CHAUDHURI, SARWATE AND SINHA
0.2
0.3
0.4
0.5
2e+04 4e+04 6e+04 8e+04 1e+05
sample size
variance in output subspace
Algorithm
Nonprivate
PPCA
Random
SULQ
Utility versus sample size for Localization
(a)
localization
(k = 10)
0.1
0.2
0.3
0.4
0.5
2000 4000 6000 8000 10000
sample size
variance in output subspace
Algorithm
Nonprivate
PPCA
Random
SULQ
Utility versus sample size for Insurance
(b)
insurance
(k = 11)
Figure 4: Plot of the unnormalized utility q
F
(U) versus the sample size n, averaged over random
subsets of the data and randomness in the algorithms . The bars are at one standard devi-
ation about the mean. The top red line is the PCA algorithm without privacy constraints.
The dashed line in blue is the utility for PPCA. The green and purple dashed lines are
nearly indistinguishable for
insurance
but diverge f or
localization
—they represent
the utility from random projections and MOD-SULQ, respectively. In these plots ε
p
= 0.1
and δ = 0.01.
6.5 Effect of the Privacy Requirement
How to choose ε
p
is important open question for many applications. We wanted to understand
the impact of varying ε
p
on the utility of the subspace. We did this via a synthetic data set—
we generated n = 5,000 points drawn from a Gaussian distribution in d = 10 with mean 0 and
covariance matrix with eigenvalues
{0.5,0.30,0.04,0.03,0.02,0.01,0.004,0.003, 0.001, 0.001}. (27)
2932
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
kddcup99 localization
Non-private PCA 98.97 ±0.05 100 ±0
PPCA 98.95 ±0.05 100 ±0
MOD-SULQ 98.18 ±0.65 97.06 ±2.17
Random Projections 98.23 ±0.49 96.28 ±2.34
Table 2: Classification accuracy in the k-dimensional subspaces for
kddcup99
(k = 4), and
localization
(k = 10) in the k-dimensional subspaces reported by the different algo-
rithms.
In this case the space spanned by the top two eigenvectors has most of the energy, so we chose k = 2
and plotted the utility q
F
(·) for non-private PCA, MOD-SULQ with δ = 0.05, and PPCA with a burn-
in time of T = 1000. We drew 100 samples from each privacy- preserving algorithm and the plot of
the average utility versus ε
p
is shown in Figure 5. The privacy requirement relaxes as ε
p
increases,
and both MOD-SULQ and PPCA approach the utility of PCA without privacy constraints. However,
for moderate ε
p
PPCA still captures most of the utility, whereas the gap between MOD-SULQ and
PPCA becomes quite large.
0.2
0.4
0.6
0.8
0.0 0.5 1.0 1.5 2.0
Privacy parameter ε
p
Utility q
F
(U)
Alg
Non−Private
SULQ
PPCA 1000
Utility versus privacy parameter
Figure 5: Plot of q
F
(U) versus ε
p
for a synthetic data set with n = 5,000, d = 10, and k = 2. The
data has a Gaussian distribution whose covariance matrix has eigenvalues given by (27).
7. Conclusion
In this paper we investigated the theoretical and empirical performance of differentially private
approximations to PCA. Empirically, we showed that MOD-SULQ and PPCA differ markedly in
how well they approximate the top-k subspace of the data. The reason for this, theoretically, is that
the sample complexity of MOD-SULQ scales as d
3/2
logd whereas PPCA scales as d. Because
PPCA uses the exponential mechanism with q
F
(·) as the utility function, it is not surprising that it
2933
CHAUDHURI, SARWATE AND SINHA
performs well. However, MOD-SULQ often had a performance comparable to random pr ojections,
indicating that the real data sets had too f ew points for it to be effective. We furthermore s howed
that PPCA is nearly optimal, in that any differentially private approximation to PCA must use (d)
samples.
Our investigation brought up many interesting issues to consider for future work. The descrip-
tion of differentially private algorithms assume an ideal model of computation : real systems require
additional security assumptions that have to be verified. The difference between truly random noise
and pseudorandomness and the effects of finite precision can lead to a gap between the theoretical
ideal and practice. Numerical optimization methods used in some privacy methods (Chaudhuri et al.,
2011) can only produce approximate solutions; they may also have complex termination conditions
unaccounted for in the theoretical analysis. MCMC sampling is similar : if we can guarantee that
the sampler’s distribution has total variation distance δ from the Bingham distribution, then sampler
can guarantee (ε
p
,δ) differential privacy. However, we do not yet have such analytical bounds on
the convergence rate; we must determine the Gibbs sampler’s convergence empirically. Accounting
for these effects is an interesting avenue for future work that can bring theory and practice together.
For PCA more specifically, it would be interesting to extend the sample complexity results to
general k > 1. For k = 1 the utility functions q
F
(·) and q
A
(·) are related, but for larger k it is not
immediately clear what metric best captures the idea of “approximating” the top-k PCA subspace.
For minimax lower bounds, it may be possible to construct a packing with respect to a general utility
metric. For example, Kapralov and Talwar (2013) use properties of packings on the Grassmann
manifold. Upper bounds on the sample complexity for PPCA may be possible by performing a more
careful analysis of the Bingham distribution or by finding better approximations for its normalizing
constant. Developing a framework for analyzing general approximations to PCA may be of interest
more broadly in machine learning.
Acknowledgments
The authors would like to thank the reviewers for their detailed comments, which greatly improved
the quality and readability of the manuscript, and the action editor, Gabor Lugosi, for his patience
during the revision process. KC and KS would like to thank NIH for res earch support under U54-
HL108460. The experimental results were made possible by support from the UCSD FWGrid
Project, NSF Research Infrastructure Grant Number EIA-0303622. ADS was supported in part by
the California Institute for Telecommunications and Information Technology (CALIT2) at UC San
Diego.
Appendix A. A Packing Lemma
The proof of this lemma is relatively straightforward. The following is a slight refinement of a
lemma due to Csisz
´
ar and Narayan (1988, 1991).
Lemma 16 Let Z
1
,Z
2
,...,Z
N
be arbitrary random variables and let f
i
(Z
1
,...,Z
i
) be arbitrary
with 0 f
i
1, i = 1,2,...,N. Then the condition
E [ f
i
(Z
1
,...,Z
i
)|Z
1
,...,Z
i1
] a
i
a.s., i = 1,2,...,N
2934
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
implies that
P
1
N
N
i=1
f
i
(Z
1
,...,Z
i
) > t
!
exp
Nt(log 2) +
N
i=1
a
i
!
.
Proof First apply Markov’s inequality:
P
1
N
N
i=1
f
i
(Z
1
,...,Z
i
) > t
!
= P
2
N
i=1
f
i
(Z
1
,...,Z
i
)
> 2
Nt
2
Nt
E
h
2
N
i=1
f
i
(Z
1
,...,Z
i
)
i
2
Nt
E
h
2
N1
i=1
f
i
(Z
1
,...,Z
i
)
E
h
2
f
N
(Z
1
,...,Z
N
)
|Z
1
,...,Z
N1
ii
.
Now note that for b [0,1] we have 2
b
1 + b e
b
, so
E
h
2
f
N
(Z
1
,...,Z
N
)
|Z
1
,...,Z
N1
i
E [1 + f
N
(Z
1
,...,Z
N
)|Z
1
,...,Z
N1
]
(1 + a
N
)
exp(a
N
).
Therefore
P
1
N
N
i=1
f
i
(Z
1
,...,Z
i
) > t
!
exp(Nt(log 2) + a
N
)E
h
2
N1
i=1
f
i
(Z
1
,...,Z
i
)
i
.
Continuing in the same way yields
P
1
N
N
i=1
f
i
(Z
1
,...,Z
i
) > t
!
exp
Nt(log 2) +
N
i=1
a
i
!
.
The second technical lemma (Csisz
´
ar and Narayan, 1991, Lemma 2) is a basic result about the
distribution of inner product between a r andomly chosen unit vector and any other fixed vector. It
is a consequence of a result of Shannon (Shannon, 1959) on the distribution of the angle between a
uniformly distributed unit vector and a fixed unit vector.
Lemma 17 (Lemma 2 of Csisz
´
ar and Narayan (1991)) Let U be a random vector distributed uni-
formly on the unit sphere S
d1
in R
d
. Then for every unit vector u on this sphere and any φ
[(2πd)
1/2
,1), the following inequality holds:
P (hU,ui φ) (1 φ
2
)
(d1)/2
.
Lemma 18 (Packing set on the unit sphere) Let the dimension d and parameter φ [(2πd)
1/2
,1)
be given. For N and t satisfying
Nt(log 2) + N(N 1)(1 φ
2
)
(d1)/2
< 0 (28)
2935
CHAUDHURI, SARWATE AND SINHA
there exists a set of K = (1 t)N unit vectors C such that for all distinct pairs µ,ν C ,
|
hµ,νi
|
< φ. (29)
Proof The goal is to generate a set of N unit vectors on the surface of the sphere S
d1
such that
they have large pairwise dis tances or, equivalently, small pairwise inner products. To that end, define
i.i.d. random variables Z
1
,Z
2
,...,Z
N
uniformly distributed on S
d1
and functions
f
i
(Z
1
,...,Z
i
) = 1
hZ
i
,Z
j
i
> φ, j < i
.
That is, f
i
= 1 if Z
i
has large inner product with any Z
j
for j < i. The conditional expectation, by a
union bound and Lemma 17, is
E [ f
i
(Z
1
,...,Z
i
)|Z
1
,...,Z
i1
] 2(i 1)(1 φ
2
)
(d1)/2
.
Let a
i
= 2(i 1)(1 φ
2
)
(d1)/2
. Then
N
i=1
a
i
= N(N 1)(1 φ
2
)
(d1)/2
.
Then Lemma 16 shows
P
1
N
N
i=1
f
i
(Z
1
,...,Z
i
) > t
!
exp
Nt(log 2) + N(N 1)(1 φ
2
)
(d1)/2
.
This inequality implies that as long as
Nt(log 2) + N(N 1)(1 φ
2
)
(d1)/2
< 0,
then there is a finite probability that {Z
i
} contains a subset {Z
i
} of size (1 t)N such that
hZ
i
,Z
j
i
< φ for all (i, j). Therefore such a set exists.
A simple setting of the parameters gives the packing in Lemma 11.
Proof Applying Lemma 18 yields a set of K vectors C satisfying (28) and (29). To get a simple
bound that’s easy to work with, we can set
Nt(log 2) + N(N 1)(1 φ
2
)
(d1)/2
= 0,
and find an N close to this. Setting ψ = (1 φ
2
)
(d1)/2
, and solving for N we see
N = 1 +
t log2
ψ
>
t
2ψ
.
Now setting K =
t(1t)
2ψ
and t = 1/2 gives (12). So there exists a set of K vectors on S
d1
whose
pairwise inner products are smaller than φ.
The maximum set of points that can be selected on a sphere of dimension d such that their
pairwise inner products are bounded by φ is an open question. These sets are sometimes referred to
as spherical codes (Conway and Sloane, 1998) because they correspond to a set of signaling points
of dimension d that can be perfectly decoded over a channel with bounded noise. The bounds here
are from a probabilistic construction and can be tightened for smaller d. However, in terms of
scaling with d this construction is essentially optimal (Shannon, 1959).
2936
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
References
Rakesh Agrawal and Ramakrishnan Sr ikant. Privacy-preserving data mining. SIGMOD Record,
29(2):439–450, 2000. ISSN 0163-5808. doi: 10.1145/335191.335438. URL
http://dx.doi.
org/10.1145/335191.335438
.
Kevin Bache and Moshe Lichman. UCI machine learning repository, 2013. URL
http://archive.
ics.uci.edu/ml
.
Keith M. Ball. An elementary introduction to modern convex geometry. In S. Levy, editor, Flavors
of Geometry, volume 31 of Mathematical Sciences Research Institute Publications, pages 1–58.
Cambridge University Press, 1997.
Boaz Barak, Kamalika Chaudhuri, Cynthia Dwork, Satyen Kale, Frank McSherry, and Kunal Tal-
war. Privacy, accuracy, and consistency too: a holistic solution to contingency table release.
In Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles
of Database Systems (PODS ’07), pages 273–282, New York, NY, USA, 2007. ACM. doi:
10.1145/1265530.1265569. URL
http://dx.doi.org/10.1145/1265530.1265569
.
Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. T he Johnson-Lindenstrauss Trans-
form itself preserves differential privacy. In IEE E 53rd Annual Symposium on Foundations of
Computer Science (FOCS), pages 410–419, October 2012. doi: 10.1109/FOCS.2012.67. URL
http://dx.doi.org/10.1109/FOCS.2012.67
.
Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim. Practical privacy: the SuLQ
framework. In Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Sympo-
sium on Principles of Database Systems (PODS ’05), pages 128–138, New York, NY, USA,
2005. ACM. doi: 10.1145/1065167.1065184. URL
http://dx.doi.org/10.1145/1065167.
1065184
.
Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to non-interactive
database privacy. In R. E. Ladner and C. Dwork, editors, Proceedings of the 40th Annual
ACM Symposium on Theory of Computing (STOC ’08), pages 609–618, New York, NY, USA,
2008. ACM. doi: 10.1145/1374376.1374464. URL
http://dx.doi.org/10.1145/1374376.
1374464
.
Stephen P. Brooks. Markov chain Monte Carlo method and its application. Journal of the Royal
Statistical Society. Series D (The Statistician), 47(1):69–100, April 1998. ISSN 00390526. doi:
10.1111/1467-9884.00117. URL
http://dx.doi.org/10.1111/1467-9884.00117
.
Stephen P. Brooks and Andrew Gelman. General methods for monitoring convergence of iterative
simulations. Journal of Computational and Graphical Statistics, 7(4):434–455, December 1998.
doi: 10.2307/1390675. URL
http://dx.doi.org/10.2307/1390675
.
Stephen P. Brooks and Gareth O. Roberts. Convergence assessment techniques for Markov chain
Monte Carlo. Statistics and Computing, 8(4):319–335, December 1998. doi: 10.1023/A:
1008820505350. URL
http://dx.doi.org/10.1023/A:1008820505350
.
2937
CHAUDHURI, SARWATE AND SINHA
Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM
Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at
http://www.csie.ntu.edu.tw/
˜
cjlin/libsvm
.
Kamalika Chaudhuri and Daniel Hsu. Sample complexity bounds for differentially pr ivate learning.
In Sham Kakade and Ulrike von Luxburg, editors, Proceedings of the 24th Annual Conference
on Learning Theory (COLT ’11), volume 19 of JMLR Workshop and Conference Proceedings,
pages 155–186, Budapest, Hungary, June 2011. URL
http://www.jmlr.org/proceedings/
papers/v19/chaudhuri11a/chaudhuri11a.pdf
.
Kamalika Chaudhuri and Daniel Hsu. Convergence rates for differentially private statistical estima-
tion. In John Langford and Joelle Pineau, editors, Proceedings of the 29th International Confer-
ence on Machine Learning (ICML-12), ICML ’12, pages 1327–1334, New York, NY, USA, July
2012. Omnipress. URL
http://icml.cc/2012/papers/663.pdf
.
Kamalika Chaudhuri and Nina Mishra. When random sampling preserves privacy. In Cynthia
Dwork, editor, Advances in Cryptology - CRYPTO 2006, volume 4117 of Lecture Notes in
Computer Science, pages 198–213, Berlin, Heidelberg, August 2006. Springer-Verlag. doi:
10.1007/11818175
12. URL
http://dx.doi.org/10.1007/11818175_12
.
Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical
risk minimization. Journal of Machine Learning Research, 12:1069–1109, March 2011. URL
http://jmlr.csail.mit.edu/papers/v12/chaudhuri11a.html
.
Kamalika Chaudhuri, Anand D. Sarwate, and Kaushik Sinha. Near-optimal differentially private
principal components. In P. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou, and K. Q.
Weinberger, editors, Advances in Neural Infor mation Processing Systems 25, pages 998–1006,
2012. URL
http://books.nips.cc/papers/files/nips25/NIPS2012_0482.pdf
.
Yasuko Chikuse. Statistics on Special Manifolds. Number 174 in Lecture Notes in Statistics.
Springer, New York, 2003.
John H. Conway and Neil J. A. Sloane. Sphere Packing, Lattices and Groups. Springer-Verlag, New
York, 1998.
Mary Kathryn Cowles and Bradley P. Carlin. Markov Chain Monte Carlo convergence diagnostics:
A comparative review. Journal of the American Statistical Association, 91(434):883, June 1996.
ISSN 01621459. doi: 10.2307/2291683. URL
http://dx.doi.org/10.2307/2291683
.
Imre Csisz
´
ar and Pr akash Narayan. The capacity of the arbitrarily varying channel revisited :
Positivity, constraints. IEEE Transactions on Information Theor y, 34(2):181–193, 1988. doi:
10.1109/18.2627. URL
http://dx.doi.org/10.1109/18.2627
.
Imre Csisz
´
ar and Prakash Narayan. Capacity of the Gaussian arbitrarily varying channel. IEEE
Transactions on Information Theory, 37(1):18–26, 1991. doi: 10.1109/18.61125. URL
http:
//dx.doi.org/10.1109/18.61125
.
Randal Douc, Eric Moulines, and Jeffrey S. Rosenthal. Quantitative bounds on convergence of time-
inhomogeneous Markov chains. The Annals of Applied Probability, 14(4):1643–1665, November
2938
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
2004. ISSN 1050-5164. doi: 10.1214/105051604000000620. URL
http://dx.doi.org/10.
1214/105051604000000620
.
Cynthia Dwork and Jing Lei. Differential privacy and robust statistics. In Proceedings of the
41st Annual ACM Symposium on Theory of Computing (STOC ’09), pages 371–380, New York,
NY, USA, 2009. ACM. doi: 10.1145/1536414.1536466. URL
http://dx.doi.org/10.1145/
1536414.1536466
.
Cynthia Dwork and Adam Smith. Differential privacy for statistics: What we know and what
we want to learn. Journal of Pr ivacy and Confidentiality, 1( 2):135–154, 2009. URL
http:
//repository.cmu.edu/jpc/vol1/iss2/2
.
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our
data, ourselves: Privacy via distributed noise generation. In Serge Vaudenay, editor, Advances
in Cryptology - EUROCRYPT 2006, volume 4004 of Lecture Notes in Computer Science, pages
486–503, Berlin, Heidelberg, 2006a. Springer-Verlag. doi: 10.1007/11761679
29. URL
http:
//dx.doi.org/10.1007/11761679_29
.
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in
private data analysis. In Shai Halevi and Tal Rabin, editors, Theory of Cryptography, volume 3876
of Lecture Notes in Computer Science, pages 265–284, Berlin, Heidelberg, March 4–7 2006b.
Springer. doi: 10.1007/11681878
14. URL
http://dx.doi.org/10.1007/11681878_14
.
Salaheddine El Adlouni, Anne-Catherine Favre, and Bernard Bob
´
ee. Comparison of methodologies
to assess the convergence of Markov chain Monte Carlo methods. Computational Statistics &
Data Analysis, 50(10):2685–2701, June 2006. ISSN 01679473. doi: 10.1016/j.csda.2005.04.018.
URL
http://dx.doi.org/10.1016/j.csda.2005.04.018
.
Alexandre Evfimievski, J ohannes Gehrke, and Ramakrishnan Srikant. Limiting privacy breaches in
privacy preserving data mining. In Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-
SIGART Symposium on Principles of Database Systems (PODS), pages 211–222, 2003. doi:
10.1145/773153.773174. URL
http://dx.doi.org/10.1145/773153.773174
.
Arik Fr iedman and Assaf Schuster. Data mining with differential privacy. In Proceedings of the
16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD
’10), pages 493–502, New York, NY, USA, 2010. ACM. doi: 10.1145/1835804.1835868. URL
http://dx.doi.org/10.1145/1835804.1835868
.
Benjamin C. M. Fung, Ke Wang, Rui Chen, and Philip S. Yu. Privacy-preserving data publishing:
A survey of recent developments. ACM Computing Surveys, 42(4):14:1–14:53, June 2010. doi:
10.1145/1749603.1749605. URL
http://dx.doi.org/10.1145/1749603.1749605
.
Srivatsava Ranjit Ganta, Shiva Prasad Kasiviswanathan, and Adam Smith. Composition attacks and
auxiliary information in data privacy. In P roceedings of the 14th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD ’08), pages 265–273, New York,
NY, USA, 2008. ACM. doi: 10.1145/1401890.1401926. URL
http://dx.doi.org/10.1145/
1401890.1401926
.
2939
CHAUDHURI, SARWATE AND SINHA
Shuguo Han, Wee Keong Ng, and P.S. Yu. Privacy-preserving singular value decomposition. In
Proceedings of the 25th IEEE International Conference on Data Engineering (ICDE), pages
1267 –1270, 2009. doi: 10.1109/ICDE.2009.217. URL
http://dx.doi.org/10.1109/ICDE.
2009.217
.
Moritz Hardt and Aaron Roth. Beating randomized response on incoherent matrices. In Proceedings
of the 44th Annual ACM Symposium on Theory of Computing (STOC ’12), pages 1255–1268,
New York, NY, USA, 2012. ACM. doi: 10.1145/2213977.2214088. URL
http://dx.doi.org/
10.1145/2213977.2214088
.
Moritz Hardt and Aaron Roth. Beyond worst-case analysis in private singular vector computation.
In Proceedings of the 45th Annual ACM Symposium on Theory of Computing (STOC ’13), pages
331–340, New York, NY, USA, June 2013. ACM. doi: 10.1145/2488608.2488650. URL
http:
//dx.doi.org/10.1145/2488608.2488650
.
Michael Hay, Chao Li, Gerome Miklau, and David Jensen. Accurate estimation of the degree
distribution of private networks. In 2009 Ninth IEEE International Conference on Data Mining
(ICDM ’09), pages 169–178, 2009. doi: 10.1109/ICDM.2009.11. URL
http://dx.doi.org/
10.1109/ICDM.2009.11
.
Peter D. Hoff. Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications
to multivariate and relational data. Journal of Computational and Graphical Statistics, 18(2):
438–456, 2009. ISSN 1061-8600. doi: 10.1198/jcgs.2009.07177. URL
http://dx.doi.org/
10.1198/jcgs.2009.07177
.
Galin L. Jones and James P. Hobart. Honest exploration of intractable probability distributions
via Markov Chain Monte Carlo. Statistical Science, 16(4):312–334, 2001. doi: 10.1214/ss/
1015346317. URL
http://dx.doi.org/10.1214/ss/1015346317
.
Galin L. Jones and James P. Hobart. Sufficient burn-in for Gibbs samplers for a hierarchical
random effects model. The Annals of Statistics, 32(2):784–817, April 2004. doi: 10.1214/
009053604000000184. URL
http://dx.doi.org/10.1214/009053604000000184
.
Bo
ˇ
stjan Kalu
ˇ
za, Violeta Mirchevska, Erik Dovgan, Mitja Lu
ˇ
strek, and Matja
ˇ
z Gams. An agent-
based approach to care in independent living. In B. de Ruyter et al., editor, International
Joint Conference on Ambient Intelligence (AmI-10), volume 6439/2010 of Lecture Notes in
Computer Science, pages 177–186. Springer-Verlag, Berlin Heidelberg, 2010. doi: 10.1007/
978-3-642-16917-5
18. URL
http://dx.doi.org/10.1007/978-3-642-16917-5_18
.
Mikhail Kapralov and Kunal Talwar. On differentially private low rank approximation. In Proceed-
ings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’13),
pages 1395–1414, New Orleans, LA, USA, January 2013.
Shiva Prasad Kasiviswanathan and Adam Smith. A note on differential privacy: Defining resistance
to arbitrary side information. Technical Report arXiv:0803.3946v1 [cs.CR], ArXiV, March 2008.
URL
http://arxiv.org/abs/0803.3946
.
Krishnaram Kenthapadi, Aleksandra Korolova, Ilya Mironov, and Nina Mishra. Privacy via the
Johnson-Lindenstrauss transform. Journal of Privacy and Confidentiality, 5(1):39–71, 2013.
URL
http://repository.cmu.edu/jpc/vol5/iss1/2
.
2940
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
John E. Kolassa. Convergence and accuracy of Gibbs sampling for conditional distributions in
generalized linear models. The Annals of Statistics, 27(1):129–142, 1999. doi: 10.1214/aos/
1018031104. URL
http://dx.doi.org/10.1214/aos/1018031104
.
John E. Kolassa. Explicit bounds for geometric covergence of Markov chains. Journal of Applied
Probability, 37(3):642–651, 2000. doi: 10.1239/jap/1014842825. URL
http://dx.doi.org/
10.1239/jap/1014842825
.
Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. Closeness: A new privacy measure for
data publishing. IEEE Transactions on Knowledge and Data Engineering, 22(7):943–956, 2010.
doi: 10.1109/TKDE.2009.139. URL
http://dx.doi.org/10.1109/TKDE.2009.139
.
Kun Liu, Hillol Kargupta, and Jessica Ryan. Random projection-based multiplicative data perturba-
tion for privacy preserving distributed data mining. IEEE Transactions on Knowledge and Data
Engineering, 18(1):92–106, 2006. doi: 10.1109/TKDE.2006.14. URL
http://dx.doi.org/
10.1109/TKDE.2006.14
.
Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer, and Muthuramakrishnan Venkitasubra-
maniam. l-diversity: Privacy beyond k-anonymity. In Proceedings of the 22nd IEEE Interna-
tional Conference on Data Engineering (ICDE), page 24, 2006. doi: 10.1109/ICDE.2006.1.
URL
http://dx.doi.org/10.1109/ICDE.2006.1
.
Ashwin Machanavajjhala, Daniel Kifer, John M. Abowd, Johannes Gehrke, and Lars Vilhuber.
Privacy: Theory meets practice on the map. In IEEE 24th International Conference on Data
Engineering (ICDE), pages 277–286, April 2008. doi: 10.1109/ICDE.2008.4497436. URL
http://dx.doi.org/10.1109/ICDE.2008.4497436
.
Frank McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data
analysis. In SIGMOD Conference, pages 19–30, 2009. doi: 10.1145/1559845.1559850. URL
http://dx.doi.org/10.1145/1559845.1559850
.
Frank McSherry and Ilya Mironov. Differentially private recommender systems: Building pri-
vacy into the netflix prize contenders. In Proceedings of the 15th ACM SIGKDD Interna-
tional Conference on Knowledge Discovery and Data (KDD), pages 627–636, 2009. doi:
10.1145/1557019.1557090. URL
http://dx.doi.org/10.1145/1557019.1557090
.
Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In 48th Annual IEEE
Symposium on Foundations of Computer Science (FOCS ’07), pages 94–103, October 2007. doi:
10.1109/FOCS.2007.41. URL
http://dx.doi.org/10.1109/FOCS.2007.41
.
Ilya Mironov. On significance of the least significant bits for differential privacy. In Proceedings
of the ACM Conference on Computer and Communications Security (CCS ’12), pages 650–661,
New York, NY, USA, 2012. ACM. doi: 10.1145/2382196.2382264. URL
http://dx.doi.org/
10.1145/2382196.2382264
.
Noman Mohammed, Rui Chen, Benjamin C. M. Fung, and Philip S. Yu. Differentially private data
release for data mining. In Proceedings of the 17th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining (KDD ’11), pages 493–501, New York, NY, USA,
2941
CHAUDHURI, SARWATE AND SINHA
2011. ACM. doi: 10.1145/2020408.2020487. URL
http://dx.doi.org/10.1145/2020408.
2020487
.
Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth s ensitivity and sampling in pri-
vate data analysis. In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of
Computing (STOC ’07), pages 75–84, New York, NY, USA, 2007. ACM. doi: 10.1145/1250790.
1250803. URL
http://dx.doi.org/10.1145/1250790.1250803
.
Gareth O. Roberts. Bounds on regeneration times and convergence rates for Markov chains.
Stochastic Processes and their Applications, 80(2):211–229, April 1999. ISSN 03044149.
doi: 10.1016/S0304-4149(98)00085-4. URL
http://dx.doi.org/10.1016/S0304-4149(98)
00085-4
.
Gareth O. Roberts and Sujit K. Sahu. Approximate predetermined convergence properties of the
Gibbs sampler. Journal of Computational and Graphical Statistics, 10(2):216–229, June 2001.
ISSN 1061-8600. doi: 10.1198/10618600152627915. URL
http://dx.doi.org/10.1198/
10618600152627915
.
Jeffrey S. Rosenthal. Minorization conditions and convergence rates for Markov Chain Monte Carlo.
Journal of the American Statistical Association, 90(430):558–566, June 1995. I SSN 01621459.
doi: 10.2307/2291067. URL
http://dx.doi.org/10.2307/2291067
.
Benjamin I. P. Rubinstein, Peter L. Bartlett, Ling Huang, and Nina Taft. Learning in a large function
space: Privacy-preserving mechanisms for SVM learning. Journal of Privacy and Confidentiality,
4(1):65–100, 2012. URL
http://repository.cmu.edu/jpc/vol4/iss1/4/
.
Claude. E. Shannon. Probability of error for optimal codes in a Gaussian channel. Bell System
Technical Journal, 38:611–656, 1959.
Adam Smith. Privacy-preserving statistical estimation with optimal convergence rates. In Pro-
ceedings of the 43rd Annual ACM Symposium on Theory of Computing (STOC ’11), pages
813–822, New York, NY, USA, 2011. ACM. doi: 10.1145/1993636.1993743. URL
http:
//dx.doi.org/10.1145/1993636.1993743
.
Gilbert W. Stewart. On the early history of the s ingular value decomposition. SIAM Review, 35(4):
551–566, December 1993.
Latanya Sweeney. k-anonymity: a model for protecting pr ivacy. International Journal on
Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557–570, October 2002. doi:
10.1142/S0218488502001648. URL
http://dx.doi.org/10.1142/S0218488502001648
.
Peter van der Putten and Maarten van Someren. CoIL Challenge 2000: The Insurance Company
Case, 2000. URL
http://www.liacs.nl/
˜
putten/library/cc2000/
. Leiden Institute of
Advanced Computer Science Technical Report 2000-09.
Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy. Journal of
the American Statistical Association, 105(489):375–389, 2010. doi: 10.1198/jasa.2009.tm08651.
URL
http://dx.doi.org/10.1198/jasa.2009.tm08651
.
2942
A NEAR-OPTIMAL ALGORITHM FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENTS
Oliver Williams and Frank McSherry. Probabilistic inference and differential privacy. In J. Lafferty,
C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural
Information Processing Systems 23, pages 2451–245, 2010. URL
http://books.nips.cc/
papers/files/nips23/NIPS2010_1276.pdf
.
Bin Yu. Assouad, Fano, and Le Cam. In David Pollard, Erik Torgersen, and Grace L. Yang, editors,
Festschrift for Lucien Le Cam, Research Papers in Probability and Statistics, chapter 29, pages
423–425. Springer-Verlag, 1997.
Justin Z. Zhan and Stan Matwin. Privacy-preserving support vector machine classification. Inter-
national Journal of Intelligent Information and Database Systems, 1(3/4):356–385, 2007. doi:
10.1504/IJIIDS.2007.016686.
Shuheng Zhou, Katrina Ligett, and Larry Was serman. Differential privacy with compression. In
Proceedings of the 2009 International Symposium on Information Theory (ISIT), pages 2718–
2722, Seoul, South Korea, June–July 2009. doi: 10.1109/ISIT.2009.5205863.
2943