Data Harm Record

Data Harm Record (Updated)

Updated August 2020

Joanna Redden, Jessica Brand and Vanesa Terzieva                                                                                                              

The aim of this document is to provide a running record of ‘data harms’, harms that have been caused by uses of algorithmic systems. The goal is to document so that we can learn from where things have gone wrong and ideally together work toward redressing harms and preventing further harm. The document compiles examples of harms that have been detailed in previous research and publications. Each listed example contains a link to the original source and often also related information.

The Data Harm Record pulls together concrete examples of harm that have been referenced in previous work so that we might gain a better ‘big picture’ appreciation of how people have already been negatively affected by uses of algorithmic systems. A survey of harms also suggests where things may go wrong in the future and ideally stimulates more debate and interventions into where we may want to change course. The idea is that we can learn a lot by paying attention to where things have gone wrong and by considering data harms in relation to each other.

The Data Harm Record was first published in 2017. Over the last year we have attempted to update it with recent examples. We have tried to capture a wide range of examples, but there are gaps in what we have been able to identify and list here due to time, resource and language limitations.



People working in business, government, politics and for non-profit organizations are all developing new ways to make use of algorithmic systems. These bodies have always collected and analysed data, but what’s changed is the size, scope and methods to analyse data. The digitization of near everything along with major computing advances mean that it is now possible to combine sizes and types of data previously unimaginable, and to then analyse these staggering datasets in new ways to find patterns and make predictions.

There is an abundance of enthusiasm and optimism about how automated, predictive and AI data systems can be used for good. Optimism persists for good reason, there is a lot of good that can be done through new uses of data systems [1] There is also growing consensus that with these new algorithmic systems comes risks to individuals and society. Previous work has detailed how data analytics can be used in ways that threaten privacy, security, as well as increase inequality and discrimination. The danger with automated and predictive decision support systems is that harms can be caused unintentionally and intentionally.

As argued by Cathy O’Neil , this is important to keep in mind as in many cases the algorithmic systems that are leading to harm were developed with very good intentions. The problem is that new algorithmic tools present new ways to sort, profile, exclude, exploit, and discriminate. The complexity, opacity, and proprietary nature of many datafied systems mean that often we don’t know things have gone wrong until after large numbers of people have been affected. Another problem is that few people have the skills needed to interrogate and challenge these new automated and predictive systems. What recourse do citizens have if they have been wrongfully targeted, profiled, excluded or exploited? Government agencies, civil society organizations and researchers across disciplines are drawing attention to these risks.


Defining data harms

Dictionary definitions of harm link it to physical and material injuries, but also to potential injuries, damages and adverse effects.[2] Solove and Citron argue that harm can be understood as ‘the impairment, or set back, of a person, entity, or society’s interests. People or entities suffer harm if they are in worse shape than they would be had the activity not occurred’.[3]

Building on these definitions, one way to understand data harms is as the adverse effects caused by uses of data that may impair, injure, or set back a person, entity or society’s interests. While this definition is a start, clearly it is insufficient and will need to be developed given the increasing ubiquity of datafied practices all around us.

Our legal and political systems are struggling to come to terms with data harms. Across nations it is becoming easier for corporate and government bodies to share data internally and externally. New data about us is being generated by us and collected by others through new systems. Consider for example the range of data that can be generated and collected through the Internet of Things and also the range of harms that can be caused if the wrong people hack into industrial systems. Increasingly, our digital selves and the digitization of services affect the kind of lives we lead, the opportunities afforded to us, the services we can access and the ways we are treated. All of these developments present new types of risk and harm. For all of these reasons we need to develop a more complex understanding and appreciation of data harms and a means to assess current and future harms, from the perspective of people who are and may be negatively affected by these harms.


Data violence

The harms can be so significant that researchers like Anna Lauren Hoffman are arguing that we need to go further and recognize that in many cases we are dealing with ‘data violence.’ [4] One example of data violence is when people are wrongly denied access to essential services and resources.

 We know that algorithms and automated systems are increasingly being used in decision- making: for job recruitment, risk assessment , credit and bail hearings in the US among others. Research is documenting the ways these systems can embed bias. The implementation of algorithmic systems in areas that link people to essential services means that the bias and errors introduced via these algorithms can cause significant harm. Research demonstrates that the already marginalized are far more likely to be negatively affected. To quote Virginia Eubanks: “These systems impact all of us, but they don’t impact us all equally”.



Commercial uses of data – Exploitation

Targeting based on perceived vulnerability

Some have drawn attention to how new tools make it possible to discriminate and socially sort with increasing precision. By combining multiple forms of data sets a lot can be learned.[5] Newman calls this ‘algorithmic profiling’ and raises concern about how much of this profiling is invisible as citizens are unaware of how data is collected about them across searches, transactions, site visits, movements, etc. This data can be used to profile and sort people into marketing categories, some highly problematic. For example, data brokers combine data sets to identify specific groups. Much of this sorting goes under the radar. Some of it raises serious concerns. In her testimony to Congress, World Privacy Forum’s Pam Dixon reported finding brokers selling lists of rape victims, addresses of domestic violence shelters, sufferers of genetic diseases, sufferers of addiction and more.

Another example, in 2015 the U.S. Federal Trade Commission ‘charged a data broker operation with illegally selling payday loan applicants’ financial information to a scam operation that took millions from consumers by debiting their bank accounts and charging their credit cards without their consent’.[6]


When your personal information gets used against you

Concerns have been raised about how credit card companies are using personal details like where someone shops or whether or not they have paid for marriage counselling to set rates and limits.[7] This has been called ‘personalization’, or ‘behavioural analysis’ or ‘behavioural scoring’ and refers to companies tailoring things to people based on what is known about them. Croll notes that American Express used purchase history to adjust credit limits based on where customers shopped. Croll as well as Hurley and Adebayo , describe the case of one man who found his credit rating reduced from $10,800 to $3,800 in 2008 because American Express determined that ‘other customers who ha[d] used their card at establishments where [he] recently shopped have a poor repayment history with American Express’.[8] This event, in 2008, was an early example of ‘creditworthiness by association’ and is linked to ongoing practices of determining value or trustworthiness by drawing on ‘big data’ to make predictions about people.[9]


Discrimination – skin colour, ethnicity, class or religion

Credit Scoring

As companies responsible for credit scoring, background checks, and hiring make more use of automated data systems, an individual’s appearance, background, personal details, social network, or socio-economic status may influence their ability to get housing, insurance, access education, or a job.

There are new start-up companies that make use of a range of ‘alternative’ data points to make predictions about consumers and provide people with credit scores. In addition, traditional credit scoring agencies are making use of data and machine learning to develop profiles. While the argument is that these tools could open up the potential for some not served by traditional credit scoring systems to receive credit, there are a range of concerns about how algorithmic scoring may discriminate. For example, a consumer’s purchase history could be used, intentionally or unintentionally, as a proxy for ethnicity or religion. If an algorithmic system ends up penalizing one group more than others it may be hard to figure this out given the access issues, opacity and complexity of algorithmic processes. While there are laws in place for people to review conventional credit scores, there are not yet measures in place for people to interrogate new generated scores.

In relation to all of these examples, researchers have raised concerns about how new data driven processes reproduce illegal redlining practices. Historically, redlining was used to discriminate against certain groups of people by denying some groups access, or more expensive access, to housing or insurance. This was often done by ‘redlining’ certain communities. The issue is that where someone lives is often associated with ethnicity and class. In this way location facilitates racism and inequality. Critics are concerned about how new automated and predictive data tools can be used to ‘redline’ given the amount of detail that can be determined about us through our data. Previous research has demonstrated the potential to accurately determine our age, gender, sexuality, ethnicity, religion and political views through the data that can be collected and combined about us.

Relatedly, groups are raising concerns about how new data driven processes may facilitate ‘reverse redlining’. This is when a particular group of people is targeted, as was done with sub-prime mortgages. Newman argues that big data was central to the subprime financial crash in 2007 as it played a key role in the manipulation of markets but also in the subprime mortgage industry. Online advertising and data collected about people online was used to direct and target them for sub-prime loans. In 2012 the American Department of Justice reached a settlement with the Wells Fargo Bank concerning allegations that it had ‘engaged in a pattern or practice of discrimination against qualified African-American and Hispanic borrowers in its mortgage lending from 2004 through 2009’ by pushing these borrowers into more costly sub-prime loans. In the settlement they agreed to provide $184 million in compensation.

The practice of targeting low-income groups continues in the payday loan industry. A U.S. Senate Investigation reports that data brokers have been found selling lists that focus on citizen financial vulnerability. For example, data brokers have compiled the following lists to sell to those interested in targeting such groups: ‘Rural and Barely Making It’, ‘Ethnic Second-City Strugglers’, ‘Retiring on Empty: Singles’, ‘Tough Start: Young Single Parents’. One company was found selling a marketing tool to ‘identify and more effectively market to under-banked consumers’.[10]

As argued by Madden et al., the fact that those with low-incomes are less likely to take privacy protection measures when online and to also rely more on their mobile phone for online access places them at greater risk than others for online targeting and exploitation.[11] In fact, ‘opting out’ of being tracked becomes increasingly difficult as technologies become more sophisticated. New tools that make cross-device tracking possible or that are embedded the Internet of Things, mean that the objects we use everyday make more of our lives ‘knowable’ and trackable and make ‘opting out’ even harder.[12] Newman raises concerns about how in this age of datafication, information inequality is transferred into economic inequality, as companies have more information about citizens that can be used to target and exploit them to their disadvantage.[13]

Citron and Pasquale note that ‘evidence suggests that credit scoring does indeed have a negative disparate impact on traditionally disadvantaged groups’. They provide a number of examples in their article, just one is the case of All-State which was challenged in court and agreed to a multi-million dollar settlement over their scoring procedure which plaintiffs argued ‘resulted in discriminatory action against approximately five million African-American and Hispanic customers’.[14] They also raise concerns about how scoring systems and predictive tools may actually create the situations they claim to indicate and “take a life” of their own, for example by labelling someone a poor candidate or unemployable.[15]

In 2015, Christian Haigh, a Harvard undergraduate, discovered that the prices for The Princeton Review’s online SAT tutoring packages offered to high school students varied depending on where customers live. Julia Angwin and Jeff Larson of ProPublica investigated Haigh’s findings and found that the highest prices were being offered to ZIP codes with a large Asian population and high median income. The Princeton Review said that the price difference was not intentional, but as noted by ProPublica, the pricing algorithm clearly did discriminate. Angwin and Larson note that it is significant that in the United States ‘unintentional racial discrimination is illegal in housing and employment under the legal doctrine known as ‘disparate impact’ which prohibits inadvertent actions that hurt people in a protected class’. However this doctrine does not extend to the online world, making it difficult in that country (and others) to take legal action against ‘adverse impact’ caused by unintentional algorithmic bias.

In 2012, a Wall Street Journal investigation found that Staples Inc. website displayed ‘different prices to people after estimating their locations’ and that in what appeared to be an ‘unintended side effect’ Staples tended to show discounted prices to areas with a higher average income and higher prices to areas with lower average incomes.[16]

A 2017 investigation by ProPublica and Consumer Reports showed that minority neighborhoods pay more for car insurance than white neighborhoods with the same risk levels. The study, which compared premiums and payouts in California, Illinois, Texas and Missouri, showed that minority neighborhoods paid ‘as much as 30 percent more than other areas with similar accident costs’.


In 2015 Facebook suspended the accounts of Native Americans because its algorithm did not recognize their names as real. [17] The “real name” policy left hundreds of native Americans with suspended accounts and they had to prove their identity in order to use their accounts again. Dana Lone Hill was one of the Native Americans who had to produce multiple ID documents to prove her identity and have her profile reinstated. Her case generated a lot of media attention and Facebook had to review its algorithm and eliminate the possibilities for discrimination.


Recognition technologies

There are numerous reports of the biases embedded in facial recognition systems: they have problems identifying people with darker skin and also with gender. Algorithms that are used to focus smartphone cameras, for border security and advertisements sometimes cannot identify, or misidentify, people of colour. It has been reported that the problem is that the facial recognition algorithms used across various systems have been trained using datasets that have mostly male white faces, that these algorithms have not been exposed to enough diversity and that this problem is also connected to the fact that many of these systems are being developed and tested largely by white men. As argued by Joy Buolamwini, the issue of bias and inaccuracy becomes increasingly important as facial recognition tools are adopted by police and security systems.



Examples of problems include the New Zealand case where one man’s passport photograph was rejected when a facial recognition program mistakenly identified him as having closed eyes. People have posted reviews online raising questions about the ability of Microsoft’s Kinect facial recognition feature to recognize people with darker skin and of HP’s tracking webcams ‘to see Black people’. Recent work by Buolamwini and Gebru involved testing tools and found “darker skinned females to be the most misclassified group.”

One of the biggest developers of facial recognition software is Amazon and their tool ‘Rekognition’ has been in the centre of debates about machine bias and racial discrimination in AI technologies. Amazon’s Rekognition has been designed to cross-reference photos of unknown suspects and criminals against a database of mugshots from jails in the country .[18] When the software was tested by the American Civil Liberties Union of Northern California it was found that people of colour are predominantly being misidentified from the mugshot database. The ACLU used photos of Members of Congress and cross-referenced them with the mugshot database using Amazon’s facial recognition system: 28 members of Congress were misidentified as people from the mugshot database and over 40% of them were from ethnic minorities. People of colour make up only around 20% of Congress, but the misidentification rate with them is two times higher than people of Caucasian origin.[19]

The ACLU argues that due to concerns about the high risk of misidentification and discrimination Amazon’s software unfit and dangerous for use. The technology has been sold to the American government and police forces.[20]



In August 2019 an American Federal court agreed with a group of Illinois Facebook users that Facebook’s use of face recognition technology on their photographs without their knowledge or consent violated their privacy rights. The ACLU had supported their case, arguing that ‘performing a scan of an individual’s face without disclosing how that information will be stored, used, or destroyed, and without properly obtaining written consent, creates an actionable privacy harm. Notice and informed consent empower individuals to protect their privacy and are central to privacy laws in the United States, generally, and to BIPA, specifically.’

Sasha Costanza-Chock’s work details how “norms, values, assumptions – are encoded in and reproduced through the design of sociotechnical data-driven systems.” Their essay on the politics of border security systems illustrates this as well as the harmful experience of confronting the normative politics of these systems. The injustice and harm caused by normative security systems has also been stressed by Shadi Petosky as has the fact that there are alternatives that are being ignored.


Eyeo 2019 – Sasha Costanza-Chock from Eyeo Festival on Vimeo.


Discrimination – gender and ethnicity

A study of Google ads found that men and women are being shown different job adverts, with men receiving ads for higher paying jobs more often.[21] The study, which used a tool called AdFisher to set up hundreds of simulated user profiles, was designed to investigate the operation of Google’s ad settings. Although researchers could determine that men and women are being shown different ads, they could not determine why this is happening. Doing so would require access to more information that would need to be provided by advertisers about who they were targeting and by Google about how their system works.

Facebook allows advertisers to target people based on race, ethnicity and gender. Another ProPublica investigation revealed that third party companies can target ads to reach people by gender, ethnicity and race and also to be hidden from people based on these kinds of classifications. [22] A ProPublica investigation identified that Facebook’s ads software gives advertising companies the option to exclude men or women from their ad demographic. Job positions for Uber and truck drivers, police officers and military were all shown to predominantly male audiences, while job vacancies for nurses, medical assistants and care-takers targeted women almost exclusively.



A complaint by the American Civil Liberties Union (ACLU) submitted to the US Equal Opportunity Commission (EEOC) mentions three women from the states of Ohio and Illinois who were not shown job advertisements for positions in traditionally male-dominated fields which violates federal law of gender. [23] The ACLU argued that by targeting only men in already predominantly male fields, women are denied the opportunity to break into particular industries.

Some landlords have been found to selectively target different groups of users with housing ads, excluding people from “redlined” neighbourhoods, and inner-city areas with high rates of Black and Latino residents.[24] The housing adverts run by some American landlords were devised to selectively target certain communities and exclude people from Hispanic or African-American origin, or citizens with bad credit scores . The ads that they publish are invisible for the “excluded” people, as their ethnicity deems them an “undesirable” demographic. Discriminating based on sensitive factors such as ethnicity and nationality in the United States is prohibited by The Fair Housing Act of 1968 and ads for housing that target based on those characteristics are in fact illegal, but nonetheless existing .[25]

Discrimination – health

Cathy O’Neil has produced a great deal of work demonstrating how unfair and biased algorithmic processes can be. In one example, she tells the story of Kyle Behm, a high achieving university student who noticed that he was repeatedly not getting the minimum wage jobs he was applying for. All of these job applications required him to take personality tests which included questions about mental health. Although healthy when looking for work, Behm did suffer from bipolar disorder and had taken time out previously to get treatment. Behm’s father is a lawyer and he became suspicious of the fairness of these tests for hiring. He decided to investigate and found that a lot of companies were using personality tests, like the Kronos test. These tests are used as part of automated systems to sort through applications and in this process decide which applicants proceed and which are ‘red-lighted’ or discarded. As O’Neil details, these tests are often highly complex, with ‘certain patterns of responses’ disqualifying people. This example raises a number of ethical questions about the use of health information in automated systems but also about the uses of automated systems in hiring more generally, particularly as it is unlikely that those who have been ‘red-lighted’ will ever know they were subject to an automated system. O’Neil argues that the increasing use of automated systems to sort and whittle down job applications creates more unfairness as those who know or can pay for help to ensure their applications get to the top of the pile have an advantage.


Loss of privacy

This can happen unintentionally when attempts to release data anonymously do not work. Big data makes anonymity difficult because it is possible to re-identify data that has been anonymized by combining multiple data points.



As detailed by Paul Ohm, in 2006 America Online (AOL) launched ‘AOL Research’ to ‘embrace the vision of an open research community’. The initiative involved publicly releasing twenty million search queries from 650,000 users of AOL’s search engine. The data, which represented three months of activity, was posted to a public website. Although the data was anonymized, once the data was posted some users demonstrated that it was possible to identify people’s identities using the data which included name, age and address.

Two New York Times reporters Michael Barbaro and Tom Zeller Jr. cross-linked data to identify Thelma Arnold, a sixty-two year old widow from Lilburn Georgia. Her case demonstrates the problems with ‘anonymisation’ in an age of big data, but also the danger in reading too much into search queries. As Barbaro and Zeller note, Ms Arnold’s search queries ‘hand tremors’, ‘nicotine effects on the body’, ‘dry mouth’ and ‘bipolar’, could lead someone to think she suffered from a range of health issues. Such a conclusion could have negative effects if the organization making that conclusion was her insurance provider. In fact, when they interviewed Arnold, Barbaro and Zeller found that Arnold often does searches for her friends because she wants to help them.

In 2006 Netflix publicly released one hundred million records detailing the film ratings of 500,000 of its users between Dec. 1999 and Dec. 2005. As Ohm reports, the objective was to launch a competition and for those competing to use this data to improve Netflix’s recommendation algorithm.[26] Netflix anonymized the data by assigning users a unique identifier. Researchers from the University of Texas demonstrated not long after this release how relatively easy it was for people to be re-identified with the data.[27] This led to a court case in which Jane Doe argued that the data could be used to out her sexuality.[28] Jane Doe argued that her homosexuality was being revealed by the data as it revealed her interest in gay and lesbian themed films. She argued the data outed her, a lesbian mother, against her wishes and could damage herself and her family. The court case was covered by Wired in 2009.


Fitness Trackers

Some employers are gaining highly personal information about their employees through their use of fitness trackers. Employers now collect health and biometric data about their employees through the use of wearable tech and health tracking apps.[29] Through performance monitoring employers can now collect regular reports about staff activity. Concerns are being raised about companies that are encouraging their female employees to use family planning apps that give employers and other corporate entities access to details about their employee’s private lives, health, hopes and fears.[30]

Fitness apps and trackers allow its users to monitor their calorie intake, physical activity and vital signals such as heart rate and blood pressure. In 2018 a data leak involving MyFitnessPal exposed the accounts of 115 million users after a security breach in their systems.[31] The user names, email addresses and scrambled passwords to user accounts were stolen from the parent sportswear company Under Armour.[32] Access to the fitness accounts means access to vital information that can be used to track individuals, view their location in live time, predict behaviour and activities or share sensitive health information with third party organisations such as private health clinics, insurance companies and even employers [33]

In 2018, Strava revealed that fitness app data can reveal highly sensitive location information. Heat map visualisations released by Strava showed activity captured by the app, lighting up different user routes. This mapping involved more than 3 trillion GPS data points.[34] A problem is that this app is used by military personnel and by releasing these ‘anonymous’ heat maps Strava was revealing the location of military activity. While the information on the heatmaps is an aggregate of all the user activities, the Strava website allows the user to track running routes in detail and eventually connect them to usernames and the individuals behind them, which could endanger military personnel on missions overseas.[35]

Another fitness tracking app – Polar Flow, also exposed the geolocation of its users through a tool called “Explore map”. An investigation by De Correspondent and Bellingcat revealed that the app makes it possible to explore sensitive locations and locate individual users and their exercise routines. It turned out that tracking an individual behind a username has been made readily available and fairly easy and the names and addresses of personnel from intelligence agencies such as the NSA, the US Secret Services and the MI6 could be uncovered.



Smart microchips

Some tech companies have found even more intrusive ways to monitor their employees. They have started using smart microchips that can be implanted under the skin- the same technology that the US justice department uses on prisoners on probation instead of ankle braces. The chips work similar to the access key cards a lot of firms have, except these key cards do not track the workers’ physical and physiological condition. A Chinese mining company has even introduced helmets that could read the brainwaves of workers and distinguish feelings such as fatigue, distraction and even anger [36]. Earlier last year, the American company Three Square Market began putting microchips the size of a rice grain in their workers’ hands- from 6-7 chipped employees initially, the process expanded to include more the 100 with plans to implant chips on all its 10 000 employees.

The chips can be used instead of physical IDs to open doors, log into computers, pay at the vending machines on site, and can be paired with the GPS on employers’ smartphones to show the exact location of every employee at any point throughout the day.[37]

But unlike fitness tracking devices and key cards that could be taken off at the end of the work day, chips are worn constantly- giving the employer infinite possibilities to track its employees. This invasion of personal space has expanded substantially. BioTeq, a UK- based firm is one of the new businesses that offer implants to companies and individuals and has implanted more than 150 chips in various firms across the UK.[38] Experts and researchers warn that microchip implanting can hide a lot of dangers, especially for employees, as it can completely erode their right to privacy.


Smart devices- smart spies

As of 2019, over 100 million Amazon Echos have been sold. Amazon employs thousands of people in its headquarters to listen to what Alexa has recorded about users.[39] The recordings are transcribed and it is argued used to eliminate gaps in Alexa’s communication and introduce new accents and words. Together with the conversations that people have had with their Alexa devices, the devices pick up other audio in the home. Access to the device has been requested by judges in court proceedings. There is increasing concern about how the use of such devices can compromise privacy.

Concerns are also being raised about Smart TVs.[40] In 2017 Wikileaks published documents suggesting a CIA operation: Weeping Angel, which allegedly involved the use of smart TV microphones for mass surveillance.[41] Consumer Reports also compiled information and advice for the owners of smart TVs about to avoid tracking.


Companies have been facing increasing complaints for collecting data about children and sharing or using this to target them with advertisements or more content designed to keep them online for longer. In 2019, the American Federal Trade Commission ordered Google, including its subsidiary YouTube, to pay a record $170 million to settle allegations that YouTube had illegally collected data about children without their parent’s consent.

Concerns are also being raised about YouTube’s site for children and the YouTube algorithm, which privileges the kind of sensational content that keeps children online for longer. Concerns have been raised about the algorthm through privileging content like this leading to the promotion of violent and suggestive content.

The Children’s Commissioner for England published a report highlighting the range of data that is being collected about children. The report draws attention to the fact that while there is increasing attention to the need to be alert to privacy infringements related to online platforms, there is also a need to consider the privacy issues and potentials for harms introduced by smart internet connected devices, like monitors and toys. Consumer groups have called for some smart toys to be re-called after learning that these toys could be hacked to enable strangers to talk to children and parents reported that their baby monitors were hacked.


Identity theft, blackmail, reputation damage, distress


Data breaches

Although data breaches are listed under corporate uses of data, they could also be listed here under government uses of data as breaches have happened in both sectors. Solove and Citron argue that ‘harm’ in relation to data breaches relates to ‘a risk of future injury, such as identity theft, fraud, or damaged reputations’ and also to a current injury as people experience anxiety about this future risk. They note that the anxiety and emotional distress created about future risk is a harm that people experience ‘in the here and now’. Identity theft is a major problem, particularly for those of low-income who lack the resources to pay for legal representation and challenge mistakes due to identity fraud. Further, the sudden loss of income or errors that result from identity fraud can be disastrous for those living from pay cheque to pay cheque. Sarah Dranoff notes that in addition to financial loss, identity theft can lead to ‘wrongful arrests, loss of utility service, erroneous information on health records, improper child support garnishments, and harassment by collection agencies’.[42] A number of data breach examples are detailed by Solove and Citron: 1) The Office of Policy Management breach leaked people’s fingerprints, background check information, and analysis of security risks, 2) The Ashley Madison breach released information about people’s extramarital affairs, 3) The Target breach resulted in leaking credit card information, bank account numbers and other financial data and 4) the Sony breach involved employee email.

This regularly updated visualization by the Information is Beautiful team demonstrates how common major data breaches are:


Commercial Data Breaches

Dating sites

The Ashley Madison data hack was one of the biggest data in online dating site history. In 2015 the personal data of more than 37 million users of the site was stolen.[43] Personal details of site users was posted by an online hacking group called the Impact Team.[44] The sites parent company Avid Live Media (ALM) faced a class action in US court. As a result the corporation had to pay nearly quarter of its revenue – $11.2 million in settlement.[45] The hack also led to reputational damage, high-profile resignations from site users whose names were exposed, divorce filings and two suicides of former employees of the company.[46]

Another online dating site, Adult FriendFinder, was hacked in 2015 and the highly personal data of almost 4 million users was leaked online.[47] Just hours after the data was posted on a dark web forum, the victims of the hack received spam and threatening emails to expose their private information. A year later, the site suffered a second hack, this time exposing the information from 412 million accounts.[48]


Security companies

In August 2019 the Guardian reported that researchers had discovered that Suprema’s Biostar 2 database was ‘unprotected and mostly unencrypted’. The researchers said they had access to millions of personal records which included fingerprint and facial recognition data and usernames and passwords. The system is used by government agencies, defence contractors and banks.


Government Database Breaches

Swedish Government Database

In 2016 the Swedish government suffered a massive data breach which endangered the identities of undercover operatives. The data breach originated from the Swedish transport Agency which exposed the personal information of millions of Swedish citizens and the identities of some military personnel. The agency previously had contracted a deal with an outsourcing company – IBM, and the mishandling of data between the government agency and the private company led to a massive leak of sensitive information.[49] The information also exposed sensitive information about bridges, roads, ports, subway systems in the capital and other key infrastructures. The exposure of sensitive information in this case was the result of an absence of proper safeguards and protective measures between the government agency and IBM.[50]

British Government Database

In 2019 the Guardian reported that the fingerprints of “over 1 million people, as well as facial recognition information, unencrypted usernames and passwords, and personal information of employees, was discovered on a publicly accessible database for a company used by the likes of the UK Metropolitan police, defence contractors and banks.” The company involved was called Suprema and the breach involved their web-based Biostar 2 biometrics lock system.

India’s Aadhaar Data Breach

India’s ID System Aadhaar has also suffered a data breach that exposed the identities of more than a billion people online. In 2018 Excel files and documents containing the names, addresses and phone numbers of Aadhaar holders were erroneously leaked by various government websites, compromising data and giving unauthorised access to personal information of Aadhaar ID’s.[51] A Tribune investigation revealed that the personal and biometric information of more than a billion Indian citizens was being sold online for as little as 500 rupees or £6. Authorities denied the allegations and said that the leaked demographic data cannot be misused without biometric information, which was kept safe and protected .[52]


Physical injury

Esther Kaplan’s investigation into the effects of workplace data monitoring revealed how the monitoring of employees in order to increase their productivity is leading to physical injury in some cases. She interviewed a UPS worker who noted that the physical demands of his job have increased since the company introduced a telematics system. The system monitors employees in real time through tracking devices that include ‘delivery information acquisition devices’ and sensors on delivery trucks. The pressure to do more work in less time is leading to injury as drivers do not have the time to lift and carry packages properly.[53]


Political uses of Data

Political Manipulation and social harm

The damage that can be done by fake news, bots and filter bubbles have generated much discussion recently. Uses of automated and algorithmic processes in these cases can lead to social and political harm as the information that informs citizens is manipulated, potentially leading to misinformation and undermining democratic and political processes as well as social well-being. A recent study by researchers at the Oxford Internet Institute details the diverse ways that people are trying to use social media to manipulate public opinion across nine countries. They note that this is a concern given the increasing role that social media plays as a key information source for citizens, particularly young people. Further, that social media are fundamental in many countries to the sharing of political information. Civil society groups are ‘trying, but struggling, to protect themselves and respond to active misinformation campaigns’.

Woolley and Howard define computational propaganda as involving ‘learning from and mimicking real people so as to manipulate public opinion across a diverse range of platforms and device networks’. Bots, automated programs, are used to spread computational propaganda. While bots can be used for legitimate functions, the Internet Institute study details how bots can be used to spam, harass, silence opponents, ‘give the illusion of large-scale consensus’, sway votes, defame critics, and spread disinformation campaigns. The authors argue that ‘computational propaganda is one of the most powerful new tools against democracy’.

Facebook- Cambridge- Analytica Scandal

In 2018, through the reporting of Carole Cadwalladr, we learned about how Facebook was implicated in political manipulation on a grand scale through its involvement with Cambridge Analytica and others. Whistleblower Christopher Wylie revealed how the company used the data of more than 80 million people to build a profiling system used for political advertising.[54] The company allegedly used the psychological profiles for what a CA intern has called “Psyops”- psychological operations that, much like in the military, are used to affect and change opinion. The use of ‘dark ads’ on Facebook have been linked to Brexit and Trump’s election campaign in the United States.



But the Cambridge Analytica scandal was not only a data leak crisis; in fact, it can be argued that it was not a data breach at all, as Facebook is designed for this- to collect data, analyze and exploit it.


Government uses of Data

Exclusion and Error

Big data blacklisting and watch-lists in the U.S. have wrongfully identified individuals. As detailed by Margaret Hu, being wrongfully identified in this case can negatively affect employment, ability to travel, and in some cases lead to wrongful detention and deportation.[55]

Hu details the problems with the American E-Verify programme, which ‘attempts to “verify” the identity or citizenship of a worker based upon complex statistical algorithms and multiple databases’. Employers across states use the programme to determine if a person is legally able to work in the U.S. Hu writes that it appears that employers have wrongfully denied employment for thousands. Hu argues that e-verify is problematic due to the unreliability of the data that informs the database screening protocol. The problems with the e-verify programme have also been detailed by Upturn. A study by the American Civil Liberties Union demonstrates that errors are far more likely to affect foreign-born employees and citizens with foreign names. People with multiple surnames and women who change their names after marriage are also more likely to face errors. Harm is further exacerbated by the difficulty in challenging or correcting e-verify errors. As discussed by Alex Rosenblat and others: ‘[L]ow-wage, hourly workers, whether they are flagged for a spelling error or for other reasons, often lack the time, resources, or legal literacy required to navigate complex bureaucracies to correct misinformation about them in a national database’.

Hu also raises concerns about The Prioritised Enforcement Programme (PEP), formerly the Secure Communities Programme (S-COMM). This is a data-sharing programme between the Federal Bureau of Investigation (FBI), DHS and local law enforcement agencies that requires local agencies to run fingerprints taken from suspects against federal fingerprint databases (ibid: 1770). The programme has made errors. For example, inaccurate database screening results wrongfully targeted 5,880 US citizens for potential detention and deportation, leading critics to question the reliability of PEP/S-COMM’s algorithms and data. Furthermore, by using the biometric data of arrestees contained in the S-COMM databases the Immigration and Customs Enforcement (ICE) reportedly may have wrongly apprehended approximately 3,600 US citizens, due to faulty information feeding database screening protocols. As Hu points out, ‘error-prone’ databases and screening protocols ‘appear to facilitate the unlawful detention and deportation of US citizens’.

Hu argues that the big data systems underlying both E-Verify and S-COMM/PEP are causing harm by mistakenly targeting and assigning inferential guilt to individuals. Legally speaking, this kind of digitally generated suspicion is at odds with constitutional rights and there is a growing consensus, at least in the U.S, on the need for substantive and binding due process when it comes to big data governance.

In Arkansas, U.S., the government introduced an algorithm to determine how many hours of home care people were entitled to. This was something that was previously done by home care nurses. The change meant that home care nurses were now required to help people fill in a questionnaire with 260 questions. The responses to the questionnaire were then processed by an algorithmic system which then determined how many home care hours people were entitled to. The result for many was a major reduction in home care hours, which drastically limited people’s quality of life and in some cases their ability to stay in their own homes. As with other examples listed in this record, finding out information about how the algorithm worked proved very difficult.

Seven of those affected took the government to court with the help of Legal Aid of Arkansas. Six of those involved in this case had their home care hours reduced by more than 30 percent. There have been ongoing challenges to the use of this algorithm and its effects.



A similar situation has occurred in Idaho where the government started using a data system to determine home care costs which led to beneficiaries seeing their funds drastically reduced. Only after an ACLU lawsuit did it become clear how limited the data being used was and the need for system change.

A study published in Science magazine in 2019 has found that an algorithmic system used to identify follow up health care needs of patients across the United States is biased against Black patients – the system dramatically underestimates the amount of care Black patients need as compared to white patients.



Concerns are being raised in the United States about how data matching systems are being used as part of a wider strategy to disenfranchise African American and Latino voters. In one highly publicized example, data matching requirements in Georgia, have been linked to voter suppression by civil rights activists and the democratic nominee. According to the “exact match” legislation, the system that processes the voter registration applications would only count the votes of the people with the same name or address spelling on all documents as legitimate. The changes were introduced by Brian Kemp’s office, who at that time was Georgia’s Secretary of State and the Republican candidate in the governor’s race. The new regulations resulted in 53 000 voter applications being put on hold, as a result of “misspelling of names.” The data inconsistencies and application suspensions mostly affected people with foreign names, people with more than one surname, those from minority groups and people who have recently changed their surnames (newly married women) or have a new address.



In Australia in 2019, after years of activism and advocacy, the federal government conceded that the automated debt recovery system it had introduced was flawed. Government communications suggested that anywhere from 600,000 to 900,000 “robo-debts” that had been issued to people to repay would need to be reassessed. The program had at this point already been investigated by the Ombudsman and Senate after numerous complaints of errors and unfair targeting of vulnerable people. The system uses data matching and income averaging to determine if people have been overpaid benefits. Onus was placed on those receiving letters to prove an error had been made.

Numerous accounts of errors were published in the press and calls for investigation were taken up by opposition politicians. One case involved a man who was repeatedly sent letters saying he owed the government repayment of $4,000. This turned out to be an error. The man, who suffers from depression and became suicidal, said he successfully convinced the government this was an error only to receive a similar letter a few months later. He again successfully proved this was an error. One of the ombudsman’s conclusions was that better project planning and risk management should have been done from the outset.

Cassandra Goldie, Chief Executive of the Australian Council of Social Service, was quoted in the Guardian as saying:

[R]obo-debt has issued thousands of debt notices in error to parents, people with disabilities, carers and those seeking paid work, resulting in people slapped with Centrelink debts they do not owe or debts higher than what they owe … It has been a devastating abuse of government power that has caused extensive harm, particularly among people who are the most vulnerable in our community.



In October 2019, Virginia Eubanks reported a similar practice happening in the United States. In this case it is being reported that Government working with tech companies are sending out debt notices to thousands of vulnerable people across the country that allege people have been overpaid benefits. When people have received these letters they have few options, particularly as challenging the details in the letter may require finding pay stubs or other documents that are decades old. These debts are being called zombie debts because of the devastating impact they are having on the families forced to repay them, who have little ability to challenge them. This is despite the fact, that much like the Australia robo-debt scandal, people are finding error and ‘miscalculation’ in these notices.

Social Exclusion

Social exclusion can be perpetuated by many factors including identification systems. In a number of countries ethnic groups are being routinely excluded and labelled as different through the use of national IDs. Privacy International research of national identification systems raises concerns about how ID systems can be used in ways that can lead to intentional and unintentional exclusion. Such exclusion can lead to great harm by affecting people’s survival as ID cards are linked to the ability to access food, fuel, work and education.

In India, data errors linked to the world’s biggest biometric identification system-Aadhaar are being linked to deaths due to starvation as people, through data system errors, are being left without access to food and other life essentials. In some cases this can be because of data system errors such as ID’s not being matched to the right person or people’s finger prints not registering. Aadhaar, India’s identification database, contains the names, addresses, phone numbers and biometrical specifics (fingerprints, palm veins and print, face and iris recognition, DNA, hand geometry, retina) of 80% of India’s population.[56] The Aadhaar ID system started as a completely voluntary ID card system run by the government on the private servers of HCL, but quickly became a vital aspect of identification in India and more and more government services have made the use of Aadhaar mandatory. As of 2019 access to fuel, food, financial subsidies, health services, job positions and school scholarships is open almost exclusively to Aadhaar number holders.

Other examples of data failure include attempts to automate welfare services in the U.S. Virginia Eubanks details the system failures that devastated the lives of many in Indiana, Florida and Texas at great cost to taxpayers. The automated system errors led to people losing access to their Medicaid, food stamps and benefits. The changes made to the system led to crisis, hospitalization and as Eubanks reports, death. These states cancelled their contracts and were then sued.



Big data applications used by governments rely on combining multiple data sets. As noted by Logan and Ferguson, ‘small data (i.e. individual level discrete data points) … provides the building blocks for all data-driven systems’. The accuracy of big data applications will be affected by the accuracy of small data. We already know there are issues with government data, just two examples: 1) in the United States, in 2011 the Los Angeles Times reported that nearly 1500 people were unlawfully arrested in the previous five years due to invalid warrants and 2) in New York, a Legal Action Center study of rap sheet records ‘found that sixty-two percent contained at least one significant error and that thirty-two percent contained multiple errors’. [57]


Harms due to algorithm / machine bias

Research into predictive policing and predictive sentencing shows the potential to over-monitor and criminalize marginalized communities and the poor.[58]

Journalists working with ProPublica are investigating algorithmic injustice. Their article titled ‘Machine Bias’ in particular, has received a great deal of attention. Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner’s investigation was a response to concerns being raised by various communities about judicial processes of risk assessment. These processes of risk assessment involved computer programs that produce scores predicting the likelihood that people charged with crimes would commit future crimes. These scores are being integrated throughout the US criminal justice system and influencing decisions about bond amounts and sentencing. The ProPublica journalists looked at the risk scores assigned to 7,000 people and checked to see how many were charged with new crimes. They found that the scores were ‘remarkably unreliable in forecasting violent crime’. They found that only 61%, just over half, of those predicted to commit future crimes did. But the big issue is bias. They found that the system was much more likely to flag Black defendants as future criminals, wrongly labelling them as future criminals at twice the rate as white defendants. White people were also wrongly labelled as low risk more often than Black defendants. The challenge is that these risk scores and the algorithm that determines them is produced by a for profit company, so researchers were not able to interrogate the algorithm only the outcomes. ProPublica reports that the software is one of the most widely used tools in the country.

Kristian Lum and William Isaac, of the Human Rights Data Analysis Group, published an article detailing bias in predictive policing. They note that because predictive policing tools rely on historical data, predictive policing should be understood as predicting where police are likely to make arrests and not necessarily where crime is happening. As noted by Lum and Isaac, as well as by O’Neil, if nuisance crimes like vagrancy are added to these models this further complicates matters and there is an over policing of poor communities, more arrests, and you have a feedback loop of injustice. Lum and Isaac used a range of data sources to produce an estimate of illicit drug use from non-criminal justice, population based data sources which they then compared to police records. They found that while drug arrests tend to happen in areas with more BIPOC and low income communities, drug use is fairly evenly distributed across all communities. Using one of the most popular predictive policing tools, they find that the tool targets Black people twice as much as whites even though their data on drug use shows that drug use is roughly equivalent across racial classifications. Similarly they find that low income households are targeted by police at much higher rates than higher income households.

O’Neil describes how crime prediction software, as used by the police in Pennsylvania leads to a biased feedback loop. In this case the police include nuisance crimes, such as vagrancy, in their prediction model. The inclusion of nuisance crimes, or so-called antisocial behaviour, in a model that predicts where future crimes will occur distorts the analysis and ‘creates a pernicious feedback loop’ by drawing more police into the areas where there is likely to be vagrancy. This leads to more punishment and recorded crimes in these areas, poor areas where there is likely to be vagrancy. O’Neil draws attention to specific examples of problems: Pennsylvania police use of PredPol, the NYCPD use of CompStat and the Philadelphia police use of Hunchlab.[59]

Amnesty International also carried out an investigation of predictive policing algorithms. They published a detailed report about the Gang Matrix – the London Metropolitan Police database- and the implications it has on marginalized communities. The Gang Matrix contains information about individuals who are suspected gang members in the city of London. Created as a risk-management tool after the riots in London in 2011, the database has proven inefficient and been criticized as discriminating against young Black men often based on nothing more substantial than their cultural preferences. The database has over 3800 suspects and gathers intelligence about them from various sources online, using data such as the websites the individuals visit, the songs they stream, the content they watch on YouTube and more sensitive data such as ethnicity and nationality. In 2018 the mayor of London Sadiq Khan commissioned a Review of the Metropolitan Police Service Gang Matrix and, according to the paper, there is a disproportionate number of Black men included in the list .[60] 78% of the individuals on the list are young Black men aged under 25 and altogether 80% of all the suspects on the list are Black. In reality, however, only 27% of the people actually responsible for gang crime are Black.



The highly controversial database perpetuates racial profiling and unjust prosecution of people who have not committed any serious offences and can have serious repercussions for the individuals, who are being routinely marginalized.[61] The information on the database is being shared with jobcentre and housing workers, head teachers and school principals and representatives from local hospitals.

Gang labelling can not only affect the individuals listed, but their families as well. It also can prevent young people from moving on with their lives. In 2012, the Metropolitan Police threatened to evict the family of a young Black man that was suspected for gang activity. The mother of the young man pursuing education at Cambridge University, received a threatening letter from the MPS that the family was going to lose their home, because their son was involved in gang activities. Although this young man was not associated with the area where he used to live, the “gangster” label continued to follow him even after he tried to move on.[62] In 2013 another young Black man was expelled from college, after the college authorities found that he had been listed in the Matrix.[63] In another case, Paul, a 21 year old graduate was denied the position because his name was still on the matrix for an offence committed when he was 12 years old.[64]


How can harms be prevented?

Ultimately the goal of this Data Harm Record is to stimulate more debate and critical interrogation of how automated and predictive data systems are being used across sectors and areas of life.

The goal is to maintain the Data Harm Record as a running record. Please let us know of any cases you think we should add by sending a message here.

It is hoped that this work contributes to the work of others in this area, many referenced in this page, who are trying help us gain a better appreciation of: a) how uses of automated and predictive systems are affecting people, b) the kind of datafied world we are creating and experiencing, c) the fact that datafication practices affect people differently, d) how datafication is political and may lead to practices that intentionally or unintentionally discriminate, be unfair, and increase inequality and e) how to challenge and redress data harms.

There are a range of individuals and groups coming together to develop ideas about how data harms can be prevented.[65] Researchers, civil society organizations, government bodies and activists have all, in different ways, identified the need for greater transparency, accountability, systems of oversight and due process, and the means for citizens to interrogate and intervene in the datafied processes that affect them. It is hoped that this record demonstrates the urgent need for more public debate and attention to developing systems of transparency, accountability, oversight and citizen intervention.

For example, O’Neil argues that auditing should be done across the stages of data projects and include auditing: the integrity of the data; the terms being used; definitions of success; the accuracy of models; who the models fail; the long-term effects of the algorithms being used; and the feedback loops created through new big data applications. The Our Data Bodies team is based in marginalized communities and interrogating data practices from a human rights perspective. We at the Data Justice Lab are working on another project, Towards Democratic Auditing, to investigate how to increase citizen participation and intervention where these systems are being implemented. AI Now, note the need for greater involvement with civil society groups, particularly groups advocating for social justice who have long-standing experience identifying and challenging the biases embedded in social systems. Researchers at AI Now have argued that government uses of automated and artificial intelligence systems in the delivery of core services in criminal justice, healthcare, welfare and education should stop until the risks and harms can be fully assessed and we can decide on where, given the risks involved, there should be no go areas for uses of automated systems because the risks are too great.




[1] For example see: a), b) Gangadharan, SP (2013) ‘How can big data be used for social good’, Guardian, 30 May, available:, c) Raghupathi, W and Raghupathi, V (2014) ‘Big data analytics in healthcare: promise and potential’ Health Information Science and Systems 2(3), available:

PMC4341817/ d) Mayer-Schönberger, Viktor and Cukier, Kenneth. 2013. Big Data: A Revolution That Will Transform How We Live, Work, and Think. New York: Houghton Mifflin Harcourt, e) Manyika, James, Chui, Michael, Brown, Brad, Bughin, Jacques, Dobbs, Richard, Roxburgh, Charles and Hung Byers, Angela. 2011. “Big Data: The Next Frontier for Innovation, Competition, and Productivity.” McKinsey Global Institute, f) Armah, Nii Ayi. 2013. “Big Data Analysis: The Next Frontier.” Bank of Canada Review. Summer.

[2] Cambridge Dictionary ‘harm’, available:, Oxford Living Dictionaries ‘harm’, available:

[3] See Citron, D K and Pasquale, F (2014) The scored society: due process for automated Predictions. Washington Law Review, 89: 1-33.

[4] Medium (2018) Data Violence and How Bad Engineering Can Damage Society. [Online]. Available on:

[5] See Lyon, D (2015) Surveillance as Social Sorting: Privacy, Risk and Automated Discrimination, New York: Routledge.

[6] Federal Trade Commission (2015) FTC charges data brokers with helping scammer take more than $7 million from Consumers’ Accounts, 12 August, available:

[7] Andrews, Lori. 2013. I Know Who You Are and I Saw What You Did: Social Networks and the Death of

Privacy, New York: Free Press.

[8] As cited in Hurley, M and Adebayo, J (2016) Credit scoring in the era of big data, Yale Journal of Law and

Technology, 18(1), p.151.

[9] Ibid, p. 151

[10] Office of Oversight and Investigations Majority Staff (2013) A Review of the Data Broker Industry: Collection, Use, and Sale of Consumer Data for Marketing Purposes, Staff Report for Chairman Rockefeller, Dec. 18, available:

[11] Madden, M, Gilman, M, Levy, K and Marwick, A (2017) ‘Privacy, Poverty, and Big Data: A Matrix of Vulnerabilities for Poor Americans’, Washington University Law Review, 95(1)

[12] Whitener, M (2015) ‘Cookies are so yesterday; Cross-Device Tracking is In – Some Tips’, Privacy Advisor, 27 Jan. available:

[13] Newman, N (2014) ‘How big data enables economic harm to consumers, especially to low-income and other vulnerable sectors of the population’, Public Comments to FTC, available:


[14] As cited in Citron, D K and Pasquale, F (2014) The scored society: due process for automated

Predictions. Washington Law Review, 89, p. 15.

[15] Ibid

[16] Valentino-DeVries, J, Singer-Vine, J., and Soltani, A (2012) ‘Watched: Websites vary prices, deals based on users’ information’, The Wall Street Journal, 24 Dec., A1

[17] The Guardian (2015) Facebook Still Suspending Native Americans Over ‘Real Name’ Policy. [Online]. Available on:—————————

[18] The New York Times (2019) Amazon Faces Investor Pressure Over Facial Recognition. [Online]. Available on:

[19] The Guardian (2018) Amazon Face Recognition Falsely Matches 28 Lawmakers with Mugshots, ACLU says. [Online]. Available on:

[20] ACLU (2018) Amazon Teams up With Law Enforcement to Deploy Dangerous New Face Recognition Technology. [Online]. Available on:

[21] Datta, A, Tschantz, MC and Datta, A (2015) ‘Automated Experiments on Ad Privacy Settings’, Proceedings on Privacy Enhancing Technologies, available:

[22] ProPublica (2016) Facebook Lets Advertisers Exclude Users by Race. [Online]. Available on:

[23] BBC (2018) Facebook Accused of Job Ad Gender Discrimination. [Online]. Available on:

[24] Financial Times (2018) Facebook “Dark Ads” and Discrimination. [Online]. Available on:

[25] The Guardian (2019) Facebook Charged with Housing Discrimination in targeted Ads. [Online]. Available on:

[26] Ohm, P. (2010). “Broken Promises of Privacy: responding to the surprising failure of anonymization”, UCLA Law Review, vol 57 (2010) pp 1701–1777

[27] Arvind Narayanan & Vitaly Shmatikov (2008), How to Break the Anonymity of the Netflix Prize Dataset, available:

[28] Singel, R (2009) Netflix spilled your Brokeback Mountain secret, lawsuit claims, Wired, 17 December, available:

[29] The Washington Post (2019) With Fitness Trackers in the Workplace, Bosses Can Monitor Your Every Step- And Possibly More. [Online]. Available on:–and-possibly-more/2019/02/15/75ee0848-2a45-11e9-b011-d8500644dc98_story.html?utm_term=.b48be1cf9096

[30] The Guardian (2019) There’s a Dark Side to Women’s Health Apps: “Menstrual Surveillance”. [Online]. Available::

[31] BBC (2018) MyFitnessPal Breach Affects Millions of Under Armour Users. [Online]. Available on:

[32] The Guardian (2018) Personal Data of a Billion Indians Sold Online for £6, Report Claims. [Online]. Available on:

[33] Reuters (2019) Your Health App Could be Sharing Your Medical Data. [Online]. Available on:

[34] The Guardian (2018) Fitness Tracking App Strava Gives Away Locations of Secret US Army Bases. [Online]. Available on:

[35] The Guardian (2018) Strava Suggest Military Users Opt Out of Heatmap as Row Deepens. [Online]. Available on:

[36] The Guardian (2018) Employers Are Monitoring Computers, Toilet Breaks- Even Emotions. Is Your Boss Watching You? [Online]. Available on:

[37] Ibid.

[38] The Guardian (2018) Alarm Over Talks to Implant UK Employees with Microchips. [Online]. Available on:

[39]Bloomberg (2019) Amazon Workers Are Listening to What You Tell Alexa. [Online]. Available on:

[40] BBC (2015) Not in Front of the Telly: Warning Over ‘Listening’ TV. [Online]. Available on:

[41] The Guardian (2017) Wikileaks Publishes ‘Biggest leak Ever of Secret CIA Documents’. [Online]. Available on:

[42] Dranoff, S (2014) ‘Identity Theft: A Low-Income Issue’, Dialogue, Winter, available: https://www.–a-lowincome-issue.html

[43] BBC (2015) Ashley Madison Infidelity Site Customer Data Leaked. [Online]. Available on:

[44] The Guardian (2015) Infidelity Site Ashley Madison Hacked as Attackers Demand Total Shutdown. [Online]. Available on:

[45] Reuters (2017) Ashley Madison Parent in $11.2 Million Settlement Over Data Breach. [Online]. Available on:

[46] BBC (2015) Ashley Madison: ‘Suicides’ Over Website Hacks. [Online]. Available on:

[47] The Guardian (2015) Dating Site Hackers Expose the Details of Millions of Users. [Online]. Available on:

[48] BBC (2016) Up To 400 Million Accounts in Adult Friend Finder Breach. [Online]. Available on:

[49] BBC (2017) Sweden Data Leak a ‘Disaster’, Says PM. [Online]. Available on:

[50] The New York Times (2017) Swedish Government Scrambles to Contain Damage From Data Breach. [Online]. Available on:

[51] BBC (2018) Aadhaar: “Leak” in World’s Biggest Database Worries Indians. [Online]. Available at:

[52] The Guardian (2018) Personal Data of a Billion Indians Sold Online for £6, Report Claims. [Online]. Available on:

[53] Kaplan, E (2015) ‘The Spy Who Fired me’, Harper’s, March, available:


[54] The Guardian (2018) “I Made Steve Bannon’s Psychological Warfare Tool”: Meet the Data War Whistleblower. [Online]. Available on:

[55] Hu, M. (2015) ‘Big Data Blacklisting’, Florida Law Review, 67: 1735-1809.

[56] Dixon, P. (2017) A Failure to “Do No Harm” – India’s Aadhaar Biometric ID Program and its Inability to Protect Privacy in Relation to Measures in Europe and the U.S. Health Technol,7(4): 539-567. [Online]. Available on:

[57] Logan, WA and Ferguson, AG (2016) ‘Policing Criminal Justice Data’, Minnesota Law Review 541, available:

[58] See: Sullivan, E and Greene, R (2015) States predict inmates’ future crimes with secretive

Surveys. AP, Feb. 24, available at:; Barocas, S and Selbst, A D (2016) Big data’s disparate impact. California Law Review 104: 671-732; Starr, S (2016) The odds of justice: actuarial risk prediction and the criminal justice system. Chance 29(1): 49-51.

[59] O’Neil, C (2016) Weapons of Math Destruction, London: Allen Lane, p. 84-87.

[60]BBC (2018) Met Police “Gang Matrix” Requires Overhaul. [Online]. Available on:


[61] Evening Standard (2018) Sadiq Khan Calls for Overhaul of Scotland Yard’s Gang Matrix as 4 in 5 Names on it are Shown to be Black. [Online]. Available on:

[62] Amnesty (2018) Trapped in the Matrix: Secrecy, stigma, and bias in the Met’s Gangs Database. London: Amnesty International United Kingdom Section. Available on:

[63] StopWatch (2018) Being Matrixed: The (Over)policing of Gang Suspects in London. [Online]Available on:

[64] The Guardian (2018) Met Gang Matrix May be Discriminatory, Review Finds. [Online]. Available on:

[65] Throughout the record the hyperlinks provided link to individuals and groups whose work raises concerns and also provides recommendations about how to reduce data harms. In addition to those links, some examples of others doing work in this area include those working as part of the FAT / ML Fairness, Accountability and Transparency in Machine Learning group and the Algorithmic Justice League.