Data science ethics & privacy-preserving analytics

Dino Pedresci, Professor of Computer Science, University of Pisa, Italy

Data science created unprecedented opportunities but also new risks. Data science techniques might expose sensitive traits of individuals and invade their privacy, this information could be used to discriminate people based on their presumed characteristics, or profiles. Sophisticated data driven machine learning algorithms yield classification and prediction models of behavioral traits of individuals, such as credit score, insurance risk, health status, personal preferences and orientations, on the basis of personal data disseminated in the digital environment by users, with or
sometimes without their awareness. Such automated decision-making systems are often “black boxes”, mapping user’s features into a class label or a ranking value without exposing the reasons .

This is worrying not only for the lack of transparency, which undermines the trust of stakeholders, but also for possible social biases and prejudices hidden in the training data and learned by the algorithms, which may bring to discriminatory decisions or unfair actions. Gartner says that, within 2018, half of business ethics violations will occur through improper use of Big Data analytics .

Often, the achievements of data science are the result of re-interpreting available data for analysis goals that differ from the original reasons motivating data collection. Examples include mobile phone call records, originally collected by telecom operators for billing and operations, used for accurate and timely demography and human mobility analysis at country orregional scale. This re-purposing of data clearly shows the importance of legal compliance and data ethics technologies and safeguards to protect privacy and anonymity, secure data, engage users, avoid discrimination and misuse, account for transparency and fair use — to the purpose of seizing the opportunities of data science while controlling the associated risks. This is the focus of my lecture.

• Fairness, Accountability, Confidentiality, Accuracy: the ethical challenges of data science • Privacy-preserving data mining • Privacy-by-design and data-driven risk assessment • Democratizing data science: centralised vs. user-centric analytics • Personal data analytics, collective awareness • Algorithmic bias and ethical challenges of machine learning • Discrimination-aware data mining