Responsible Innovation

Conference Responsible Innovation 18-19 April 2011 - Debate Datamining without discrimiation

Data mining without discrimination

Dr Toon Calders Main researcher
Eindhoven Technical University

Dr Sunil Choenni Member valorisation panel
WODC

www.nwo.nl/mvi/projects/custers

Data mining is a technology that extracts useful information, such as patterns and trends, from large amounts of data. The privacy sensitive input data and the output data that is often used for selecting deserve protection against abuse. The aim of this project is to investigate to what extent legal and ethical rules can be integrated in data mining algorithms to prevent such abuse. For this purpose, data sets in the domain of public security are used, made available by police and justice departments.

The focus is on preventing that selection rules turn out to discriminate particular groups of people. Key questions are how existing legal and ethical rules and principles can be translated in formats understandable for computers and in which way these rules can be used to guide the data mining process. Furthermore, the technological possibilities are used as feedback to formulate concrete directives and recommendations for formalising legislation. This will further clarify how existing ethical and legal principles are to be applied on new technologies and, when necessary, which new ethical and legal principles are to be developed.

Contrary to previous attempts to protect privacy in data mining, this research will not focus on (a priori) access limiting measures regarding input data, but rather focus on (a posteriori) responsibility and transparency. Instead of limiting access to data, which is increasingly hard to enforce in a world of automated and interlinked databases and information networks, rather the question how data can and may be used is stressed.

  • Removing sensitive information such as gender and ethnicity from databases used in data mining is insufficient to guarantee non-discriminatory models.