2.2 What are the risks associated to data processing
Now that you have mapped the data you process, read through the following slides, make a small quiz and then continue to the next exercise.
Slide 1: Risks
Risk is the possibility of losing something of value.
Risk perception is an individual’s impression of the impact, likelihood and severity of a risk.
Risk assessment is a classification of the different risks, their impact and likelihood versus the effort you are willing to protect yourself against these risks. For example, if you identify ten risks, a risks assessment will help you to prioritize which risks will have the highest impact and the actions you can reasonably take to prevent these risks.
Slide 2: Who Is at Risk
Data processing can pose risks to
- A group of individuals
- Your institution
- Society at large
Slide 3: Risks to Individuals
The processing, sharing and publishing of personal data can pose a risk to the individual as third parties can re-identify an individual and subsequently take action. Known cases of data misuse that can be found in the public domain are profiling, identity theft, bullying and extortion. This can lead to emotional or physical harm, reputation damage or financial damage.
After the release of anonymous taxi data in New York, data experts were able to identify the salary of taxi drivers, the places of residence of individual passengers and the movement patterns of celebrities. This information could be used by a potential attacker in order to assault one of the individuals that appear in the data set, for example by blackmailing or preparing an burglary.
Slide 4: Risks to a group
If the dataset that is processed, shared and published holds sensitive personal information, like race, ethnicity, gender, or religion, groups can be identified and subsequently excluded, marginalized or discriminated against.
Example: Gangmatrix in London
In the city of London police are using algorithmic decision-making to profile individuals who could be part of a gang. The data processing is heavily skewed towards ethnic minorities and local civil society is calling for the discontinuation of the program on discriminatory grounds.
Slide 5: Risks to institutions
When institutions do not responsibly manage data it runs the risk of not complying to national data protection legislation, lose trust of their citizens and gain reputational damage, or even be exposed to financial damage.
Example: User data from Indian Railways
In May 2016, it was reported that the ticket-booking website of Indian Railways had been hacked and personal data of around 10 million customers was feared to have been stolen. It was reported that IRCTC officials also feared that personal details including phone numbers, date of birth and other such details of its customers had been sold on a CD for Rs 15,000.
Slide 6: Risks to society
There can be many unintended consequences of insecurely or unethically managing data. The targeting of specific groups can lead to the manipulation of votes in elections and the polarization of the public debate through personalization of online content on social media.
Example: Cambridge Analytica
The unethical data collection, profiling and targeting of Cambridge Analytica is not only tied to influencing referenda like the Brexit, or the presidential elections in the USA in 2016. The scandal has also led to distrust of citizens towards governments and companies like Facebook.
Slide 7: Intangibility of Risks
Even though more stories about the consequences of hacks, leaks and misuse of data are getting known it is often still a challenge to understand the risks associated with the processing of data. Risks remain intangibile as
- There is usually an extended time period between the collection of data and the
potential negative consequences,
- Non-data-scientists find it difficult to move from one datapoint to a datasets. One
location data point might not seem harmful, but having a dataset full of location data
allows for the identification of patterns and behavior.
- It is unclear how specific groups can be identified and targeted in a dataset. Specific
groups that are at risk of being identified depends on the local context, but think of
journalists, women, ethnic minorities, people with a physical or mental disability etc.
- Data collected that seems innocent today might get more sensitive over time. A
regime change, data falling in the wrong hands, or data being connected with other
dataset, could increase the risk to certain individuals and groups.
- Individuals in open anonymized datasets can be re-identified through combining the
open dataset with another dataset.