In this
essay, I will systematically quantify the extent to which Big Data is a danger
for democracy by establishing a comprehensive metric that lays out the
cornerstones of modern democracy, and then further elaborate on the undue
damage of Big Data in each of these spheres, and the overall impact engendered
by employing the variety of applications of Big Data. A sine qua non of obtaining the extent to which
Big Data is dangerous for democracy is a standardized metric laying down the
principles of democracy in the modern era and those
influenced by Big Data, and a characteristic based definition of Big Data.
A
specific definition of Big Data goes against the vast, ever-expanding nature of
Big Data, however, “it does depend on the size and heterogeneity of the data,
and the scale and scope of analytic operations made possible by that size.”
(American Association for the Advancement of Science, 2014) The primary characteristics of Big Data can be encapsulated
by the 3 ‘V’s, namely Volume, Velocity, Variety (Laney, 2001). Big data represents the information assets characterized by such a
high volume, velocity and variety to require
specific technology and analytical methods for its transformation into value
(De Mauro, et al., 2016). At its core, Big Data proves to refer to data sets
characterized by huge amounts (volume) of continually updated data (velocity)
in various formats which may include images,
videos, textual or numeric data (variety) that can
be analysed with the help of complex algorithms. In recent times, seemingly to add to the catchy 3 Vs, terms like veracity and variability have
been thrown into the mix, referring to the consequence of dealing with data on
such a large scale – incomplete, imperfect or error-prone datasets with
unstandardized data. While this would greatly play
into how Big Data erodes the foundation of democracy, it is not a defining
attribute of Big Data but rather relates more to data
uses. (Grimes, 2013) The streams of data obtained from the users of social
media websites and mobile applications like Facebook or daily crime statistics
of a metropolitan city can be easily classified as Big Data under this
definition seeing that there is a constant influx
of vast amounts of data pouring in, in various forms whether that is the zip
code of place where a mugging took place or a
geotagged photo that a person posted in addition to the readily available
zettabytes of public or state records. The value of all
this Big Data however, lie in the outputs that come
because of the mathematical models and algorithms employed to crunch through
all the data supplied.
Relative to Big Data, democracy is something far more deep-rooted in
history, originating first in Athens in 430 B.C. and since then as the state
has evolved, so has democracy. Therefore, to have a definition that encompasses
the essence of democracy, the intrinsic traits that constitute the various
forms of democracy must be established. By subsequently considering the impact
of Big Data on each of these attributes, we can
approach a conclusion that quantifies the danger for democracy that Big Data
invites and its potential impact in the future.
The starting point of any democracy is ‘free and fair elections’. These four words imply that the authority of the government can only be derived from the will of the people as determined by credible elections held at periodic intervals based on universal, equal and secret suffrage. Firstly, Big Data shifts politics from the Overton Window where it ought to be, to a private affair, giving candidates the opportunity to engage in duplicity on a far broader scale and secondly, Big Data narrows down an election from the entire voting population of the country to merely a section of “persuadable voters”, poisoning the democratic process along the way.
Big Data collected from social media applications serve as fuel for microtargeted advertisement. Elections should be a crucial part of public discourse. Since the days of Aaron Burr, when open campaigning was introduced, the formula for marketing a candidate to the masses remained consistent – rallies, party manifestos, town halls, and billboards and later, televised debates and television advertisements. To an extent these methods were enacted as an argumentum ad captandum, however, simultaneously through these methods, the general policy and program of the candidate became subject to public discussion and scrutiny by the media, thus involving citizens in the decision-making process.
However,
with the advent of social media applications and the influx of data
accompanying it, political strategy has shifted from trying to appeal to
everyone, to targeting the group of undecided persuadable voters and
efficiently using the campaign budget for a greater payoff. Within this group of voters exist a sea of political
beliefs, maybe some who are environmentalists or proponents of the gun
ownership or some other specific belief. On algorithmically deducing these core
beliefs of each individual persuadable voter,
microtargeted advertisements can be delivered to them, presenting the candidate
in the best possible light to receive that individual’s vote. In a single
sentence, it’s about finding the one thing a person
cares about, and then pushing a message that validates that belief. The 2012
Obama campaign was notable in this regard, as it
marked the entry of the data science into politics, largely following the steps
outlined in the above lines, which would later be honed by the 2016 Trump
campaign (Bartlett, 2018).
The consequences of this type of hyper-personalized marketing spill over into the next principle of democracy, namely accountability. Assuming this type of data-driven political demagoguery is left unchecked, it could be plausible that every individual receives a different campaign pledge and advertisement from the same candidate. How does one go about holding a politician responsible and regulating such a system, when no two people receive the same advertisements? Unlike billboards and televised debates, which are the same whether viewed by an environmentalist or a pro-gun rights supporter, the specifically curated content that is being sent to that particular individual is not fact-checked and discussed by the media, and consequently politics gradually drains out of public conversation. Politicians no longer become accountable to entire public, but to the segment of the population that supports them.
Though
the most prominent characteristic of a democracy is the right to franchise,
democracy extends beyond this to entitle each citizen to participate in the
civil and political life of society without facing any form of discrimination based on an individual’s personal characteristics.
Only following the Civil Rights Movement in the US did legislation preventing
the discrimination of citizens based on race, ethnicity and gender get introduced.
Granted their effectiveness was limited as the law
translated into reality inchmeal, over time, more comprehensive safeguards
spread across a larger range of public settings of employment, education and access to public services. Big Data
inadvertently opens a gateway to a path that seeks to undo all this, allowing
architects of models that heavily rely on Big Data to camouflage their biases
into “impartial” models that function like black boxes and sneak around the
existing civil rights established by using unfairly weighted proxies.
Every
mathematical model attempts to ascertain a significant output by determining the
relation of the factors culminating in that result. However, when trying to
compute the quantity of an intangible concept,
possibly, how responsible a driver is or how “good” a potential employee could
be, there are no straightforward numbers that point
directly at the answer. By their very nature, models
feed on data that can be measured and counted. Therefore, proxies for these
incorporeal terms like responsibility or goodness are
implemented. The room for error and by extension, a
form of discrimination, enters at this age. In the same
way a model trained using historical data pulled from times, where the
preponderance of institutionalized racism was unquestionable, is racist, a
model in which the relevance of factors is decided completely by the modeler,
will inherit the biases of the modeler. Multiple models in different sectors
like insurance, education, human resources and
banking indirectly function collectively to form a pernicious feedback loop
that punishes the poor and rewards the rich in what leads to a greater divide.
Take the example of one of the world’s largest democracies, the United States of America. Partially resulting from the institutionalized racism that haunts America’s history, the socio-economic condition of African-Americans today is dismal. White families today have nearly 10 times the net worth of black families and more than eight times that of Hispanic families (Dettling, et al., 2017). In this circulus vitiosus, unemployment, credit scores, insurance, education, poverty and crime are all interconnected. For instance, lack of decent schools could mean inability to get a job, possibly leading to debt and a lower credit score, increasing the poverty rate of that neighbourhood and creating an impetus for crime which in turn would reduce the quality of education for future generations. In this way, these models increase the correlation between race and neighbourhoods with worse socio-economic conditions. Thus, a seemingly unbiased factor while determining insurance like a zip code, becomes a stand-in for race. The use of such proxies become a method of sidestepping around the civil rights legislation established. Further, the results from an inherently flawed model like so cannot be questioned or regulated since majority of those affected are unaware of its internal workings. The outcome is that racial discrimination persists in subtler unregulatable ways, technically considered within the parameters of the law.
With the
existence of rights, a system to enforce them and ensure that they are not
infringed upon by the government or any other individual is necessary, and
thus, rule of law finds itself as the fourth principle of democracy. 2000 words
cannot capture the whole essence of what this
phrase means, much less a paragraph, hence I will focus on the characteristics of rule of law pertinent to Big Data. One of the salient
characteristics of rule of law is that it stresses
that everyone is equal in the eyes of the law, however, Big Data has perverted
this aspect.
In
twenty-four state prisons in the US (O’Neil, 2018), a Level of Service
Inventory-Revised (LSI-R) model is implemented, part of which prisoners have to
fill in a questionnaire including questions around the circumstances of a
criminal’s birth and upbringing, crime rate in the neighbourhood
which contribute towards calculating the risk of
recidivism. These risk scores have been used by judges in some states like
Idaho and Colorado to guide their sentencing (National Centre for State Courts,
2013). These risk scores are unfairly calculated based on
spurious correlations that have no bearing in an individual’s case. Simply
because of high crime rate in the neighbourhood, sentencing a criminal for longer is the
antithesis of a fairness. Similar to how the earlier models sidestepped civil
rights using proxies, the LSI-R model unquestionably brings racial bias into
the sphere of justice, contradicting the rule of law. Justice however, even in
the age of technology has largely remained in the
hands of humans given its nuanced nature and seeing that fairness is not
quantifiable to a model.
Through this essay, I have attempted to outline the manner in which Big Data has ticked away at the very fundamental principles of democracy, namely – free and fair elections, accountability, civil rights and rule of law. Big Data undermines the votes of all the citizens by serving as ammunition for micro-targeted advertising that limits an election to a mere segment of the population (i.e. the “persuadable voters”), it shifts politics from a subject of public discourse to a private affair based on a game of numbers and data analytics and by use of proxies empowers modelers to sidestep civil rights laws. The danger Big Data poses to democracy lies in the fact that it fabricates a pernicious feedback loop that increases inequality due to inherent prejudices and taints the democratic process in such a discrete and unregulatable manner that the quality of democracy is gradually eroded.
References:
American Association for the Advancement of Science in conjunction with the FBI and UNICRI, 2014. National and Transnational Security Implications of Big Data in the Life Sciences. [Online]
Available at : http://www.aaas.org/sites/default/files/AAAS-FBI-UNICRI_Big_Data_Report_111014.pdf [Accessed July 27, 2019].
Laney, Doug, 2001. 3D Data Management: Controlling Data Volume, Velocity and Variety. [Online]
Available at : https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf [Accessed July 27, 2019].
De Mauro,
Andrea; Greco, Marco; Grimaldi, Michele, 2016. “A Formal definition of Big Data
based on its essential Features.” Library
Review. 65 (3): 122–135
Grimes, Seth, 2013. “Big Data: Avoid ‘Wanna V’ Confusion.” InformationWeek. [Online]
Available at : www.informationweek.com/big-data/big-data-analytics/big-data-avoid-wanna-v-confusion/d/d-id/1111077 [Accessed July 27, 2019].
Bartlett, Jamie, 2018. The People vs Tech: How the Internet is Killing Democracy (and How We Save It) Ebury Press.
Dettling, Lisa J, et al., 2017. “Recent Trends in Wealth-Holding by Race and Ethnicity: Evidence from the Survey of Consumer Finances.” Federal Reserve [Online]
Available at : www.federalreserve.gov/econres/notes/feds-notes/recent-trends-in-wealth-holding-by-race-and-ethnicity-evidence-from-the-survey-of-consumer-finances-20170927.htm. [Accessed July 30, 2019].
O’Neil, Cathy, 2018. Weapons of Math Destruction: How
Big Data Increases Inequality and Threatens Democracy Penguin Books, pp.
222-223
Centre for Sentencing Initiatives, Research Division, National Centre for State Courts, 2013. “Use of Risk and Needs Assessment Information at Sentencing: 7th Judicial District, Idaho” [Online]
Available at : https://www.ncsc.org/~/media/Microsites/Files/CSI/RNA%20Brief%20-%207th%20Judicial%20District%20ID%20csi.ashx [Accessed July 30, 2019].