How Government Agencies Flex Their Data Science Muscle

From NASA to the NSA, data science is being employed by the governments of every major country to inform policy, provide public services and, in some cases, surveil ordinary people. In the United States in particular, it underpins many of the public sector’s most important functions, whether we citizens are aware of it or not.

Take your own local government. Even the smallest, least obvious aspects of city management - keeping fire hydrants ready, monitoring the health of trees, maintaining health standards among restaurants - can be optimized by effective data collection, open sharing practices, and data analysis.

As an example, let's look at how housing issues are addressed in New York City. The city has around 200 building inspectors, charged with investigating complaints of illegally-converted dwellings. The problem, of course, is that 200 inspectors hardly suffices for a population of over eight million people. How can the city effectively deploy such few inspectors to the greatest effect?

Well, between the Department of City Planning’s Information Technology Division, and the databases maintained by the 311 service and the Fire Department, the city was able to collaboratively build a model for what signs best indicated that a home might be illegally lived-in. Inspectors were then deployed to these locations, with much more evidence to go on than before. On the tenants’ rights end of things, the NYC Tenant Harassment Prevention Task Force uses 311 complaints to more efficiently distribute its inspectors, and predict which buildings are most likely to lose rent-stabilized units.

The practice of data science is equally useful at the federal level. It was the 2010s when data science began to take on an increasingly significant role in the federal government. On March 29, 2012, an official White House press release titled “Big Data is a Big Deal” announced the “Big Data Research and Development Initiative.” BDRDI was an investment of $200 million, spread over six federal agencies, designed to improve “our ability to extract knowledge and insights from large and complex collections of digital data [. . .] to help accelerate the pace of discovery in science and engineering, strengthen our national security, and transform teaching and learning.”

BDRDI functioned as a statement of intent: that data science was valued, and would play a key role in the future of public sector research and development. It wasn’t much more than a statement of intent, though. Just about every major federal government agency already had its own big data initiatives underway long before 2012.

The Department of Energy, for example, created Mathematics for Analysis of Petascale Data (MAPD) to improve upon existing methods for analyzing massive datasets. Combining statistical analysis with machine learning and data reduction techniques, methods gained from MAPD could find use in applications as far-reaching as cosmological research and electric grid maintenance.

The Veteran’s Administration’s “ProWatch” program began developing algorithms that would detect, measure and track soldiers’ health, to gain a better understanding of the health and safety conditions associated with military deployment.

In 2010 NASA formed its Center for Climate Simulation (NCCS). With $5.5 million in-hand, they developed a multi-petabyte data archive, a management system for interacting with and accessing that data, and an analysis system for transforming that data into useful models and diagnostics. Most impressive was the “Discover” supercomputer - one of the most powerful machines in the world - containing 150,000 processors, with the capacity of performing up to 160 trillion operations per second. All these fancy tools, supported by one of the largest contingents of earth scientists in the world, positioned NASA as an international research leader in the domains of weather and climate change.

Initiatives like these represent only a small fraction of the many uses of data science throughout the U.S. federal government. As a result of the widespread success of such programs, the White House announced an updated initiative in 2016, meant to expand on the goals of the BDRDI in 2012. These days, data science initiatives are hardly something that require special announcement: they are implied; baked into how the government operates at a fundamental level. With easy access to quality public sector data, private sector organizations such as Visual Capitalist, Apollo Yard, and Our World in Data, etc. are turning this data into insights with the help of data science and data visualization and sharing it with general public.

Unfortunately, for all this good, the power of data science has contributed equally to the federal government’s less-than-virtuous goals.

The Department of Defense, for example, has used big data towards both useful and potentially harmful ends. As an example of the former: the “CINDER” program aimed to boost national defense by fostering new detection techniques capable of identifying foreign espionage over government and military computer networks. On the other hand, “Mind’s Eye” was the name given to their effort to develop visual intelligence in machines - the kind of visual intelligence relevant to mass surveillance and advanced weaponry. A White House document outlines the stated goal of this research:

Whereas traditional study of machine vision has made progress in recognizing a wide range of objects and their properties—the nouns in the description of a scene—Mind's Eye seeks to add the perceptual and cognitive underpinnings needed for recognizing and reasoning about the verbs in those scenes.Together, these technologies could enable a more complete visual narrative.

What could cameras armed with the ability to construct a “narrative” do? Well, they might make AI-driven weaponry - drones, for example - less prone to error, and even more deadly than they otherwise were. They could also empower surveillance agencies to more effectively spy on citizens at scale.

As the White House acknowledged back in 2012, big data is a big deal. It is powerful, and effective across nearly every sector, and every level of government. Now that we all understand this, the question becomes: how much of this data and techniques will be used for less than moral tasks?