Valeria Nikolaenko : 2013 Security Workshop


Monday, April 15, 2013
Location: Fisher Conference Center, Arrillaga Alumni Center

"Data Mining on Gigabytes of Encrypted Data"


Data mining known as knowledge discovery attempts to extract meaningful information from large volumes of data. In the information age data mining is a rapidly developing field that connects Artificial Intelligence, Statistics and Databases. In current implementations, data mining algorithms must see the data in the clear and a key problem that arises is confidentiality. In this talk we address a question of whether certain data mining algorithms can operate without the data in the clear, thereby allowing users to retain control of their data. For medical data this allows to carry computation without affecting user privacy. For books and movie preferences letting users keep control of their data reduces the risk of future unexpected embarrassment in case of a data breach at the service provider. We consider a concrete cases of building a privacy preserving algorithm for ridge regression.

Ridge regression is an algorithm that takes as input a large number of data points and finds the best-fit linear curve through these points. The algorithm is a building block for many machine-learning operations. The system outputs the best-fit curve in the clear, but exposes no other information about the input data.


Valeria Nikolaenko is a PhD student in Computer Science advised by Prof. Boneh. Her research focuses on computations on encrypted data. Her recent work on privacy preserving data mining algorithms was carried in collaboration with Technicolor research lab.