東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

On Improving the Utility for Data An...

Wang, Yue.

FindBook

Google Book

Amazon

博客來

On Improving the Utility for Data Analyses Under Differential Privacy.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	On Improving the Utility for Data Analyses Under Differential Privacy./
作者:	Wang, Yue.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2019,
面頁冊數:	211 p.
附註:	Source: Dissertations Abstracts International, Volume: 80-12, Section: B.
Contained By:	Dissertations Abstracts International80-12B.
標題:	Information Technology. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=13917968
ISBN:	9781392319017

On Improving the Utility for Data Analyses Under Differential Privacy.
Wang, Yue.

On Improving the Utility for Data Analyses Under Differential Privacy. - Ann Arbor : ProQuest Dissertations & Theses, 2019 - 211 p.

Source: Dissertations Abstracts International, Volume: 80-12, Section: B.

Thesis (Ph.D.)--The Pennsylvania State University, 2019.

Differential privacy has been widely used for protecting sensitive data. When it comes to statistical hypothesis testing under differential privacy, earlier approaches either added too much noise, leading to a significant loss of power, or added a small amount of noise but failed to adjust the test to account for the added noise, resulting in unreliable results. We aim to conduct test of independence, test of sample proportions and goodness of fit test on tabular data that get rid of those drawbacks meanwhile providing valid results. With an asymptotic regime more suited to privacy preserving hypothesis testing, we showed a modified equivalence between the chi-squared and likelihood ratio tests, and used these tests for the three applications.On a more general basis, we studied the sampling distribution for statistics computed from data. In the non-private setting, when such statistics are used for hypothesis testing or confidence intervals, their true sampling distributions are often replaced by approximating distributions that are easier to work with (e.g., using the Gaussian approximation justified by the Central Limit Theorem). When data are perturbed for differential privacy, the approximating distributions need to be modified accordingly to account for the privacy noise. Various competing methods for creating such approximating distributions were proposed in prior works, with a lack of formal justification despite that they worked well empirically. We solved the problem by introducing a general asymptotic recipe for creating the approximating distributions for differentially private statistics, providing finite sample guarantees for the quality of the approximations as well as degradation results under postprocessing on the statistics.Beyond the statistical analyses carried out on sensitive data, we also targeted at quantifying the uncertainty for models trained on such data. With the data protected by differential privacy, there are two sources of randomness: the randomness due to the (non-private) data sampling process and the randomness in the privacy preserving mechanism. We proposed a general framework to construct confidence intervals for the model parameters of a variety of differentially private machine learning models, accounting for both sources of randomness. Specifically, we provided algorithms for models trained with objective perturbation and output perturbation. The algorithms work for both ε-differential privacy and ρ-zcdp.In another work, instead of focusing on the application-specific adjustment needed for the privacy enforcement, we worked on optimizing an estimate for the perturbed data such that more accurate inference can be obtained based on the estimate. Prior works showed that the accuracy of many queries could be improved by postprocessing the perturbed data to enforce the consistency constraints that were known to hold for the original data. It is common to formulate the problem with a least squares minimization. However, such methods lacked the strength of making use of the noise distribution used to perturb the data. We decided to apply the maximum likelihood estimation with constraints to further improve the performance. Moreover, we proposed a general framework to solve such formulations efficiently based on the alternating direction method of multipliers (ADMM). It also yields the benefit of re-using existing efficient solvers for the least squares approach.We tested the performance of the proposed methods on a variety of datasets with extensive experiments, and pointed out their strength as well as the limitations.

ISBN: 9781392319017Subjects--Topical Terms:

1030799
Information Technology.

On Improving the Utility for Data Analyses Under Differential Privacy.
LDR:04563nmm a2200289 4500 001 2207770
005 20190920102419.5
008 201008s2019 ||||||||||||||||| ||eng d
020 $a 9781392319017
035 $a (MiAaPQ)AAI13917968
035 $a AAI13917968
040 $a MiAaPQ $c MiAaPQ
100 1 $a Wang, Yue. $3 1908045
245 1 0 $a On Improving the Utility for Data Analyses Under Differential Privacy.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2019
300 $a 211 p.
500 $a Source: Dissertations Abstracts International, Volume: 80-12, Section: B.
500 $a Publisher info.: Dissertation/Thesis.
500 $a Advisor: Kifer, Daniel.
502 $a Thesis (Ph.D.)--The Pennsylvania State University, 2019.
520 $a Differential privacy has been widely used for protecting sensitive data. When it comes to statistical hypothesis testing under differential privacy, earlier approaches either added too much noise, leading to a significant loss of power, or added a small amount of noise but failed to adjust the test to account for the added noise, resulting in unreliable results. We aim to conduct test of independence, test of sample proportions and goodness of fit test on tabular data that get rid of those drawbacks meanwhile providing valid results. With an asymptotic regime more suited to privacy preserving hypothesis testing, we showed a modified equivalence between the chi-squared and likelihood ratio tests, and used these tests for the three applications.On a more general basis, we studied the sampling distribution for statistics computed from data. In the non-private setting, when such statistics are used for hypothesis testing or confidence intervals, their true sampling distributions are often replaced by approximating distributions that are easier to work with (e.g., using the Gaussian approximation justified by the Central Limit Theorem). When data are perturbed for differential privacy, the approximating distributions need to be modified accordingly to account for the privacy noise. Various competing methods for creating such approximating distributions were proposed in prior works, with a lack of formal justification despite that they worked well empirically. We solved the problem by introducing a general asymptotic recipe for creating the approximating distributions for differentially private statistics, providing finite sample guarantees for the quality of the approximations as well as degradation results under postprocessing on the statistics.Beyond the statistical analyses carried out on sensitive data, we also targeted at quantifying the uncertainty for models trained on such data. With the data protected by differential privacy, there are two sources of randomness: the randomness due to the (non-private) data sampling process and the randomness in the privacy preserving mechanism. We proposed a general framework to construct confidence intervals for the model parameters of a variety of differentially private machine learning models, accounting for both sources of randomness. Specifically, we provided algorithms for models trained with objective perturbation and output perturbation. The algorithms work for both ε-differential privacy and ρ-zcdp.In another work, instead of focusing on the application-specific adjustment needed for the privacy enforcement, we worked on optimizing an estimate for the perturbed data such that more accurate inference can be obtained based on the estimate. Prior works showed that the accuracy of many queries could be improved by postprocessing the perturbed data to enforce the consistency constraints that were known to hold for the original data. It is common to formulate the problem with a least squares minimization. However, such methods lacked the strength of making use of the noise distribution used to perturb the data. We decided to apply the maximum likelihood estimation with constraints to further improve the performance. Moreover, we proposed a general framework to solve such formulations efficiently based on the alternating direction method of multipliers (ADMM). It also yields the benefit of re-using existing efficient solvers for the least squares approach.We tested the performance of the proposed methods on a variety of datasets with extensive experiments, and pointed out their strength as well as the limitations.
590 $a School code: 0176.
650 4 $a Information Technology. $3 1030799
690 $a 0489
710 2 $a The Pennsylvania State University. $b Computer Science and Engineering. $3 2095963
773 0 $t Dissertations Abstracts International $g 80-12B.
790 $a 0176
791 $a Ph.D.
792 $a 2019
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=13917968