The previous posts in this series on data mining to inform policy have covered our initial technical framework and two of its further developments. In this post, I present some of the project management lessons we learned over the course of the data mining project, which are largely drawn from the two less successful case studies that we carried out. In particular, I’ll be looking at the impact of organizational context on the conduct of data mining projects and, ultimately, on their chances of success.
Briefly, the two case studies that proved most challenging aimed to assess the use and impacts of the 2014–2020 European Structural and Investment Funds, and the use and impacts of R&D tax incentives. Applying the same framework to all six cases and achieving very different levels of success prompted an in-depth reflection not only on the structure of the technical framework itself, but also on what broader contextual factors influenced its application and the ultimate success or failure of a project.
The model below displays a simplified conceptualization of the key dimensions we settled upon. While the leftmost bubble will look familiar to those who have taken Project Management 101 and are used to talking about the trade-offs between budget, timeline, and expectations (which include quality requirements), the other components and the interplay between them are more specific to the type of studies we endeavored to undertake in this project.
I’ll cover the organizational context in this post, whereas the project novelty, expectation/budget/timeline, and data assessment bubbles will be covered in a later post.
The organizational context underpins the rest of the points I’ll make in this post, as project implementation is situated within a specific setting, complete with its own cultural, operational and contractual realities. The challenges and opportunities we discuss throughout this series reflect our own specific organizational reality and those of the clients with which we work, so the reader is encouraged to critically reflect on his or her own organizational setting rather than taking these points at face value (which is generally ill-advised when reading posts on the internet in the first place, though we’re all serial offenders in this regard!).
Our first hint of the importance of the organizational context arose while we were conducting the literature review at the start of the two-year project. During this phase, we analyzed grey and white literature to pull out relevant examples of how data mining and big data were being used in government and private-sector operations around the world. A recurring theme began to emerge: massive amounts of data are being produced, and this data can be harnessed for innovation, increased profit, or improved delivery of social services (depending on one’s end goal).
However, there was considerable variation in the extent to which novel data sources and analyses had been exploited. Most notably, private-sector firms tend to be able to hire highly skilled technical staff to work on these projects, while the public sector often struggles to recruit and retain the necessary people to carry out this type of work. Public-sector advances can also be constrained by IT infrastructure barriers and bureaucratic processes that do not always lend themselves to keeping pace with a rapidly changing external environment.
Cognizant of the fact that people play a central role in the data mining process, and that this role is defined largely by the organization in which they operate, we still found it challenging to define exactly who the end-users of our initial framework would be and how they would go about using the tool in a practical sense. Was our technical framework supposed to be a tool primarily for policymakers? If so, was it meant to be able to substitute for the engagement of a data scientist? Totally removing the need for a good data scientist would of course be a hopelessly ambitious objective.
In this vein, one of the key messages that arose from our first expert workshop in Brussels was that the target users needed to be clearly identified and the roles and responsibilities of each actor within the framework needed to be clarified. Following this workshop, we came up with an initial (overly simplistic, in retrospect) set of profiles for team composition: someone responsible for leading the overall project execution (project manager), someone with the background policy knowledge to ensure that the appropriate questions are being asked (policy analyst), and someone with the technical skills to operationalize these questions into a data mining project (data analyst). We kept this question about target users at the forefront of our minds while applying the framework in the second half of the project, eager for this practical experience to help us concretize these points.
We soon realized that defining the profiles of the team alone was insufficient, as we had been focusing solely on our own internal group, thereby omitting a key player: the client. To conceptualize our relationship with the European Commission, we drew upon Saner’s model for orientations of science advice, which situates client–contractor dynamics across two axes: temporal (series of distinct tasks or an iterative process) and spatial (embedded or sequestered). I won’t cover the dimensions in very much detail here, and I encourage you to read Saner’s original paper for more information.
As with many contracts to provide data and analysis to governments, ours landed in the bottom left-hand side of the figure, consisting of a commissioning model characterized by a relatively sequestered relationship. The sequestration had organizational and spatial dimensions, as we were outside both the organizational chart and professional work spaces of our clients. There was also a temporal aspect, whereby we had planned, semi-regular exchanges with the client as well as two workshops with external experts—defined, distinct tasks rather than an ongoing, interactive process.
This sort of relationship is quite typical of data and analysis provision for policy purposes: we were responding to policy questions laid out in the original request for proposals. These policy questions are not expected to evolve considerably during the course of the project, and many of the finer-grained methodological decisions that are needed to operationalize the question for study can be informed by a policy analyst with general experience in R&I policy. However, data mining brings with it a much greater flexibility in operationalizing the question—along with much greater uncertainty.
Conceptualizing the client–contractor relationship using Saner’s model was very helpful in framing our experience and answering the questions “Who will use the framework?” and “How they will use it?” The framework we were building was ultimately intended to facilitate the collaboration between experts in policy and experts in data mining. However, the traditional relationship of sequestered policy and technical staff, interacting in distinct rather than interactive processes, was ill-suited to the iterative nature of data mining.Data mining brings with it a much greater flexibility in operationalizing policy questions Click To Tweet
In this circumstance, formulating the policy question and operationalizing it into a data mining problem requires the direct involvement of the client, whose in-depth knowledge of operational needs within their specific organizational context cannot be replaced by the more general input of an external policy analyst. Accordingly, we recognized the need for an “Executive sponsor” role, fulfilled by the end-user within the client organization, who must be involved interactively with the project team, to guide the higher-level strategic orientation of the study as it evolves.
Still working in a more traditional client–contractor relationship, the rigidity of the question formulation phase (and absence of iteration with any direct users of the findings we were working towards in our case studies) meant that we struggled to orient the methodological decisions in the project so as to maximize the ultimate relevance of our findings. A data mining project is better driven by an operational need than by a request for specific information to address that need.
Science-Metrix’s final report for this data mining project is available from the Publications Office of the European Union.
Data Mining. Knowledge and technology flows in priority domains within the private sector and between the public and private sectors. (2017). Prepared by Science-Metrix for the European Commission. ISBN 978-92-79-68029-8; DOI 10.2777/089
All views expressed are those of the individual author and are not necessarily those of Science-Metrix or 1science.