DIFFERENTIALLY PRIVATE REGRESSION TREE AND FOREST

Quốc Hoàng Vũ , Đình Thúc Nguyễn

Main Article Content

Abstract

 

 

 

Data modeling is an important problem in data analysis as well as machine learning. There exist many different data modeling solutions, of which regression tree is a method which has many advantages compared to other regression methods. In addition to the accuracy and interpretability of the result model, the issue of ensuring the privacy of the training dataset is also very important and urgent, especially with sensitive and personal data. This paper proposes basic methods and algorithms to build privacy-preserving regression trees based on the differential privacy techniques and algorithms. The experimental results indicate the feasibility of the proposed methods, while also raise challenges which could be further studied.

 

Article Details

References

Blum, A., Dwork, C., McSherry, F., & Nissim, K. (2005). Practical privacy: The SuLQ framework. PODS '05.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (2017). Classification and Regression Trees.
Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006). Calibrating Noise to Sensitivity in Private Data Analysis. J. Priv. Confidentiality, 7, 17-51.
Dwork, C., & Roth, A. (2014). The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science, 9, 211-407.
Fletcher, S., & Islam, M. Z. (2015). A Differentially Private Decision Forest. AusDM.
Fletcher, S., & Islam, M. Z. (2016). Decision Tree Classification with Differential Privacy: A Survey. ACM Comput. Surv., 52, 83:1-83:33.
Fletcher, S., & Islam, M. Z. (2017). Differentially Private Random Decision Forests using Smooth Sensitivity. ArXiv, abs/1606.03572.
Friedman, A., & Schuster, A. (2010). Data mining with differential privacy. KDD '10.
Han, J., Kamber, M., & Pei, J. (2012). Data mining concepts and techniques, third edition Morgan Kaufmann Publishers.
Jagannathan, G., Pillaipakkamnatt, K., & Wright, R. N. (2012). A Practical Differentially Private Random Decision Tree Classifier. 2012 IEEE International Conference on Data Mining Workshops, 114-121.
McSherry, F., & Talwar, K. (2007). Mechanism Design via Differential Privacy. 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07), 94-103.
McSherry, F. (2009). Privacy integrated queries: an extensible platform for privacy-preserving data analysis. SIGMOD Conference.
Pace, R. K. & Barry, R. (1997). Sparse spatial autoregressions. Statistics & Probability Letters, 33,
291-297.
Patil, A., & Singh, S. (2014). Differential private random forest. 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2623-2630.
Pedregosa et al. (2011). Scikit-learn: Machine Learning in Python. JMLR 12, 2825-2830.
Rana, S., Gupta, S. K., & Venkatesh, S. (2015). Differentially Private Random Forest with High Utility. 2015 IEEE International Conference on Data Mining, 955-960.
Sarwate, A. D., & Chaudhuri, K. (2013). Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data. IEEE Signal Processing Magazine, 30, 86-94.
Xin, B., Yang, W., Wang, S., & Huang, L. (2019). Differentially Private Greedy Decision Forest. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2672-2676.