Our first collaboratory brought us together with a diverse group of genomic scientists working in academia, business, foundations and research institutes to discuss practical and conceptual issues about working with big DNA data. One objective of this collaboratory involved developing, experimenting and trialling an interdisciplinary and cross-sectoral method for collaborative discussions across the social and biosciences. A second was to move away from generic claims about the ‘data deluge’ and challenges of Big Data and instead attend to specific problems, risks and potentials within a practical setting.
On the first objective we anticipated that it would be challenging for social scientists on the project team to follow and understand the concepts, terminology, and knowledge practices of bioscientists – on this point we were not mistaken! From SNPs (single nucleotide polymorphisms) to exomes (selective rather than whole genetic sequence), I found it difficult to keep up with the terminology and discussion at times. This comes as no surprise; big epistemic asymmetries and unequal distributions of skills and expertise of course exist and can be a hindrance to our equal participation. Generating a ‘shared literacy’ is thus no easy task and only ever more difficult to do when such discussions are extended to various publics.
That said, I was struck by a number of issues raised in several of the presentations that echo those that social scientists encounter when working with various Big Data sources. For instance, in genomic science, missing data or non-concordance across genetic sequencers is complicated in part because decisions are distributed across different sites, technologies and actors. Yes, the practices of bioscience have long involved distributed relations of roles, expertise and technologies and their accompanying judgments and decisions. But through the discussions it became clear to me that these relations are becoming more dispersed across scientific and commercial locations, people and technologies with a blurring of roles and responsibilities that are not so easy to disentangle. Equally perhaps it is not easy to disentangle the influence of cost considerations, which seem to be significant drivers of decisions about how and what is sequenced and in trade-offs made between data quality and quantity. One participant used the increasingly common description of this interdependence of decisions and actions as an ecosystem leading to the federated organisation of data.
If this is correct then issues of how judgments, evaluations and decisions are accounted for and with what consequences are crucial especially if interpretational moments are understood as possibly being the most consequential in practices such as genetic sequence analysis. How can trade-offs, errors or uncertainty – however these might be described – be accounted for and communicated while at the same time generating actionable and trustable knowledge? At a time of openness and greater transparency how can trust in data be cultivated?
Evelyn Ruppert, PI, Socialising Big Data