The public statements of the Berkeley Institute of Data Science (BIDS), as part of the larger Moore and Sloan Foundation’s “bold new partnership...to harness the potential of data science and big data” articulate a vision for academic data science. According to that vision, BIDS is itself an experiment in designing a data science environment at a university. In this series of blog posts, I will outline reflexive data science, a research program that:
- can contribute to several of the stated goals of BIDS,
- has the potential to produce technology reusable by universities and transferable back to industry, and
- can address broader theoretical issues in social science.
In this post, I will present a broad overview of what I mean by “reflexive data science,” specifically a response to the interests of BIDS. In future posts, I will defend some of the assumptions I make here, elaborate on the methodology of reflexive data science, and argue for its broader theoretical implications.
I am writing these posts with the emerging UC Berkeley data science community in mind, but deliberately making them available for engagement with a broader audience. My hope is that this research will interest intellectually and institutionally diverse collaborators. I am also hoping to contribute to the conversation about what social data science—data science that analyzes social phenomena—should look like.
A constellation of goals
The inspiration for reflexive data science comes from the ambitions of BIDS, as articulated in the presentations by its co-PI’s at the November 12th announcement sponsored by the White House Office of Science and Technology Policy (OSTP). Among its stated objectives are:
- Development of alternative metrics (alt-metrics) applicable to data scientists in the academic setting. This is an acknowledgment that the current incentive structure of the academic system, which puts a premium on first-author papers, does not reward the kinds of activities (e.g. software development) required of data scientists.
- Open source. BIDS has committed to open licensing and the open source model of software development in the interest of “rapid adoption both within and beyond the Moore/Sloan participating institutions,” as well as providing data for objective alternative metrics of impact.
- Reproducibility and open science. An emphasis on reproducible and accessible research reinforces the need for open source development and the availability of data for alternative metrics.
- Ethnography and evaluation of BIDS and its sister organizations to rigorously examine what it takes to build data science environments that achieve the partnership’s goals of meaningful interdisciplinary collaboration, sustainable career paths, and a robust ecosystem of analytical tools and research practices.
The plan for BIDS assigns these goals to separate working groups. Clearly, they are intertwined. Open source development and open science will make the working environment of BIDS scientists “data-rich” in the sense that digital traces of their daily communications and work will be readily available. This data, in addition to data about software downloads and code adoption, is available to those working on alt-metrics. In particular, it will be possible to apply the methods of data science to the data scientists themselves, both in order to measure them and measure their response to being measured. How these alt-metrics, once introduced, impact the success of BIDS and academic data science more generally, is an empirical question that can be addressed through mixed-methods (including ethnographic) research.
From metrics to incentives
I owe to Thomas Maillart this insight: that an academic alt-metric is not just a tool for measurement, but becomes part of the incentive structure of the academic environment. The question of appropriate alt-metrics for data scientists is therefore a question suitable for study with social scientific methods that are concerned with the systemic effects of incentives, e.g. economics.
Fernando Perez, one of the BIDS co-PIs, has argued that the metric used to evaluate most academic scientists—the publication of first-author papers in high-impact journals—has resulted in an environment that is toxic to collaboration. This undermines the reproducibility of science. The BIDS proposal expresses the intention to design and implement a better system in a moment when “data science” appears to be disrupting science as usual.
The wealth of data about practicing data scientists afforded by open source practices and its proximity to computational and statistical methodologists is an opportunity of tremendous good fortune. In particular it enables two possibilities that while not explicitly mentioned in the BIDS presentation are implied by its vision.
- The alt-metrics that incentivize scientific activity can leverage the sophistication of the methods of data science itself. In designing these metrics, we have at our disposal the tools of machine learning, network analysis, and natural language processing. And we can direct our attention not to merely the products of science (papers and completed software) but its process. (“Metric” may prove to be a misnomer if criteria for evaluation are learned adaptively, but I will continue to use the term here.)
- We can scientifically test predictions about the effects of a chosen alt-metric on a community of researchers using the data they generate in real time. We can compare data before and after the intervention of introducing an alt-metric and associated incentives.
What would a research program that takes advantage of these possibilities look like? It could begin by identifying values. What do stakeholders in BIDS—the funders, the university, the scientists themselves—think a “data science environment that actually works” should be accomplishing? Then researchers could operationalize those goals in relation to the available data and use domain knowledge in the social sciences (such as mechanism design or social psychology) to design an appropriate alt-metric as an intervention. With experiment designed and predictions made, the researchers can implement the system that computes the metric and present it as feedback to the data science environment, with associated real incentives as far as is institutionally possible. They finish the experiment with mixed-methods evaluation of the consequences of their intervention.
This is an ambitious research program for developing alt-metrics. One of its virtues is that it can be conducted iteratively. Rather than institutionally enshrining a static set of metrics up front, the evaluation of the data science environment can be empirical, adaptive, and open for agile redesign.
I think a good name for this proposed research program is reflexive data science. Here is a toy example of what it could look like in practice.
Example: Gratitude flows
The Harvard Business Review has been writing about the benefits of cultures of gratitude in the workplace. Suppose a reflexive data scientist takes this seriously, and believes that it is valuable for people in the data science environment to express gratitude to each other. Furthermore, they have determined that being a recipient of gratitude is an indication that one is contributing positively to the data scientific environment a whole. They propose an alt-metric—a linear combination of how often one is thanked and how often one thanks others.
The reflexive data scientist begins to collect data from the IPython mailing list, issue tracker, and pull request logs—every occurrences of the words “thank you” and equivalent terms. For several months, the researcher collects the data in secret and plots a visualization of gratitude flow within the community. At last, as an intervention, the scientist reveals the visualization and facilitates a discussion of it at the IPython team meeting.
The reflexive data scientist predicts that informal social pressures will drive people towards expressed gratefulness to avoid the social perception of being ingrate. But the empirical consequences of their intervention are uncertain. Perhaps the intervention results in an explosion of expressed gratitude, a success in light of the scientists operationalization, that ethnographic methods show to be facetious or perfunctory. Maybe instead gratitude, once quantified, becomes a currency that is hoarded and traded strategically. Whatever the result, this research can inform the next iteration of research that tunes its parameters or introduces new input parameters.
Thinking ahead
The above example is meant to be a simple illustration of how reflexive data science could work. Of course, in practice it should involve a more rigorous process of research design. But I hope the example shows how data science is possible in an open source and open science environment, and how it is applicable to the problem of choosing alt-metrics and evaluating their impact in data science environments like BIDS.
In future posts, I intend to defend the assumptions I’ve made here, elaborate on models and methodology in reflexive data science, and anticipate some of the theoretical implications of this kind of work in light of other academic literature. As this is an ongoing process of writing and thinking, I welcome conversation with others interested in related topics in comments or other blog posts. I also invite anyone interested in collaborating on this line of research to contact me, as I think there is a lot of exciting opportunity here.
Comments
The Devil's in the Details
Great post, Seb!
I think the chances of 'data science' positively influencing the practice of science increase as it becomes self-aware. So, I like this call for 'reflexive data science.' I suggest a little caution, though, when developming alt-metrics/sociometers. Identifying which concepts matter is hard. (Maybe gratitude matters for success outcomes less than you think. Or maybe it matters under some conditions and not others.) Operationalizing those concepts effectively into a measure is even harder. (Maybe face-to-face gratitude matters, but email gratitude does not. Or maybe, it's the unexpected "thanks" that matters. But how do you get at that?) And preventing people from performing to your measures is hard, too, as you identify. I'm reminded of the Hawthorne effect, as well -- "a form of reactivity whereby subjects improve or modify an aspect of their behavior, which is being experimentally measured, in response to the fact that they know that they are being studied, not in response to any particular experimental manipulation."
Identifying theoretically important and distinct concepts and effectively operationalizing them is a big challenge in the social sciences -- bigger, I think than in the harder sciences where physical objects behave rather predictably.
These caveats entered, more power to you and your collaborators! Just remember that the devil is in the details.
"And if you win you get this shiny fiddle made of gold"
Thanks for this incisive response, Nick. You definitely hit on the major challenges of this approach. I'm going to tuck these thoughts away and try to incorporate them into future work on this.
By way of thinking out loud, I've got a few responses.
I agree that coming up with the right concepts and operationalizing them is very hard. That's why I didn't write about it in this blog post--I haven't done any of that work yet! But I think that these questions are precisely what makes this line of inquiry an opportunity for novel and academically interesting research as opposed to merely a management or business analytics task. It's going to take real scientific insight to explore this space of ideas. That's the part I personally am looking forward to most of all.
Great point about the Hawthorne effect. I guess reflexive data scientists should read up than literature and try to build on it. I guess I see it as a source of inspiration rather than reason for caution. Testing if scientists perform to the measures, or if they work better knowing that they are being studied, is one of the questions being studied. And if scientists wind up working together better as a result, isn't that better for the academic system? Sounds like win-win to me.
Your point about the unexpected "thanks" is particularly fascinating. My hunch is that operationalizing expectation and surprise are critical to understanding of the scientific process. Thankfully, mathematical information theory, for example, provides a toolkit for formalizing ideas of regularity and surprise. I think that's where it gets really interesting.