Monday, 29 October 2018

Customizable ethics checklists for Big Data researchers

Deon is a project to create automated "ethics checklists" for data science projects; by default, running the code creates a comprehensive checklist covering data collection and storage, modeling and deployment: the checklist items aren't specific actions, they're "meant to provoke discussion among good-faith actors who take their ethical responsibilities seriously. Because of this, most of the items are framed as prompts to discuss or consider. Teams will want to document these discussions and decisions for posterity."

The lists can be customized for your own purposes, and if you think the default list needs revising, there's a democratic process for amending it.

Checklists are a powerful way to ensure that important steps are not missed out; the rise of surgical checklists made an enormous positive change in patient outcomes -- hilariously, though surgeons often chafe at having to refer to checklists while doing a procedure they've done a hundred times, they overwhelmingly say that they would prefer to have checklists in the mix any time they are the patients.

First and foremost, our goal is not to be arbitrators of what ethical concerns merit inclusion. We have a process for changing the default checklist, but we believe that many domain-specific concerns are not included and teams will benefit from developing custom checklists. Not every checklist item will be relevant. We encourage teams to remove items, sections, or mark items as N/A as the concerns of their projects dictate.

Second, we built our initial list from a set of proposed items on multiple checklists that we referenced. This checklist was heavily inspired by an article written by Mike Loukides, Hilary Mason, and DJ Patil and published by O'Reilly: "Of Oaths and Checklists". We owe a great debt to the thinking that proceeded this, and we look forward to thoughtful engagement with the ongoing discussion about checklists for data science ethics.

Third, we believe in the power of examples to bring the principles of data ethics to bear on human experience. This repository includes a list of real-world examples connected with each item in the default checklist. We encourage you to contribute relevant use cases that you believe can benefit the community by their example. In addition, if you have a topic, idea, or comment that doesn't seem right for the documentation, please add it to the wiki page for this project!

Fourth, it's not up to data scientists alone to decide what the ethical course of action is. This has always been a responsibility of organizations that are part of civil society. This checklist is designed to provoke conversations around issues where data scientists have particular responsibility and perspective. This conversation should be part of a larger organizational commitment to doing what is right.

Fifth, we believe the primary benefit of a checklist is ensuring that we don't overlook important work. Sometimes it is difficult with pressing deadlines and a demand to multitask to make sure we do the hard work to think about the big picture. This package is meant to help ensure that those discussions happen, even in fast-moving environments. Ethics is hard, and we expect some of the conversations that arise from this checklist may also be hard.

Sixth, we are working at a level of abstraction that cannot concretely recommend a specific action (e.g., "remove variable X from your model"). Nearly all of the items on the checklist are meant to provoke discussion among good-faith actors who take their ethical responsibilities seriously. Because of this, most of the items are framed as prompts to discuss or consider. Teams will want to document these discussions and decisions for posterity.

Seventh, we can't define exhaustively every term that appears in the checklist. Some of these terms are open to interpretation or mean different things in different contexts. We recommend that when relevant, users create their own glossary for reference.

Eighth, we want to avoid any items that strictly fall into the realm of statistical best practices. Instead, we want to highlight the areas where we need to pay particular attention above and beyond best practices.

Ninth, we want all the checklist items to be as simple as possible (but no simpler), and to be actionable.

An ethics checklist for data scientists [Deon]

(via Four Short Links)

(Image: