This paper has explored the ways in which birth parents and mothers conceptualize trustworthy AI in the context of intrapartum care, filling a gap in the research on patient perspectives. A small number of qualitative studies have been published to date on patient perspectives regarding the introduction of AI tools in various parts of healthcare [55,56,57] but none, to our knowledge, on the specific area of intrapartum care that is examining the issue of trust.
Promotion of public good, reliability, fairness, personalized and holistic care, human mediation, and empathy were all deemed as necessary to the perceived trustworthiness of AI.
These findings shed new light on themes pertaining to the trustworthiness of AI and raise questions about how human values can be interpreted into the trustworthy development, design, and implementation of AI. Our findings chime with those of a large five-country survey based study regarding trust in AI  and shed new light on the topic by providing a more in-depth and nuanced perspective of these issues. They also raise questions about how human values can be interpreted into the trustworthy development, design, and implementation of AI. The remainder of this section explores some of these possibilities and the ethical challenges that may arise in the value-sensitive design process.
Institutions and public good
Patients and birth parents interviewed for this study expressed the view that public institutions such as universities and the NHS are more trustworthy than private companies because of their commitment to promoting public good, whereas private companies are driven by profit maximization. They attributed a moral significance in the motivation and aims of these stakeholders, which they viewed as part of their moral character and perceived trustworthiness. This position is not unique to participants of this research; other studies have also demonstrated the public’s skepticism regarding private companies’ motivations in the context of health . Although there is philosophical disagreement about whether collective actors such as institutions and companies are moral actors and therefore warrant trust (or distrust) , this reported attitude towards universities and the NHS chimes with theoretical approaches to public trust as the trust warranted towards public institutions that aim at providing some kind of public good or benefit . Motivation to serve a public good can be understood as an indication of an institution’s moral motivation and character, and thus indicate trustworthiness even if it cannot, in itself, guarantee trust.
However, research and development of medical products and devices, including medical AI, can be costly. Even if original research is led by universities using national healthcare system data, bringing these products to the patients at scale often requires budgets and expertise found in the private sector. This raises an ethical challenge for public institutions about how to involve and collaborate with private companies for the development and deployment of medical technologies, including AI, whilst preserving their trustworthiness and promoting public good.
It is important to bear in mind that public trust operates on multiple levels. As participants revealed, although they perceive public research institutions such as universities to be more trustworthy, they also trust the NHS to introduce only those technologies that are effective on the ground and beneficial to patients. This is because they rely on its processes for checking and validating the efficiency and effectiveness of new technologies, but also on its solidaristic character for making decisions that benefit all patients .
Collaborations between public and private institutions when it comes to the introduction of AI in healthcare could be perceived as trustworthy not because the private company would develop an interest in public good, but because a trustworthy institution would be overseeing the technology and its implementation on the ground, such as NHS Trusts or regulatory bodies that check and approve the introduction of new technologies (e.g. MHRA). A number of studies have tried to articulate criteria that would make such partnerships trustworthy. Horn and Kerasidou (2020) maintain that norms such as commitment to public good should be incorporated into these agreements. They suggest that requirements such as preferential access to technologies developed using NHS patient data, limiting the use of patient data to for-public-benefit purposes, and transparency and effective resolutions of conflicts of interests as well as use of trusted research environments to manage data use outside the NHS could promote trustworthy collaborations. Graham (2021) points at transparency, accountability, representation, and ensuring social purpose even if this at times might come to the expense of commercial gains, as ways of ensuring trust and confidence in public-private collaborations that aim at producing data-driven medical technologies like AI .
Data, reliability, and fairness
In the literature on CTG interpretation, reliability is expressed as one of the main potential benefits of incorporating machine learning and AI . In this context, reliability refers to consistency (i.e., the same inputs producing the same outputs, unlike with humans who are subjective in their CTG interpretation), as well as accuracy (i.e., more thorough and nuanced data/inputs to generate more precise outputs). A more reliable CTG would detect more adverse perinatal outcomes while also minimizing false positive rates and unnecessary interventions .
Reliability, in terms of accuracy and consistency of output, was important to participants of this study. However, they also framed reliability primarily in relation to the concept of bias , thus linking reliability with the values of equity and fairness in the over- or under-representation of certain populations in datasets as well as output. In their view, a biased dataset, particularly one that did not include data from marginalized populations, such as BAME parents, was perceived as contrary to reliability. According to participants, in order for AI tools to be reliable, data used in their development should account for the heterogeneity of relevant patient populations so as not to reproduce existing social inequalities, such as worse maternal outcomes for BAME patients. With (health) equality being seen as essential for reliability, rather than a separate and additional value, the measurability of reliability is linked to justice, not consistency and accuracy alone. Furthermore, participants of this study pointed at another value, that of solidarity, which is less often mentioned in discussions regarding reliable AI [64, 65]. By framing reliable AI as something that preserves and promotes mutual support and equal access to benefits for all populations, participants in this study might be reflecting the solidaristic character of their national healthcare system , as well as more widespread conceptions of healthcare as a form of public good to which all should have access [66, 67].
There has been a considerable attention to issue of bias in the development and deployment of AI, including medical AI, and how to best understand, interpret and incorporate values such as fairness and equality in AI systems [68,69,70,71]. To address these issues, any weaknesses and bias in the dataset should be identified so its limitations can be understood. Then, developers can identify ways of overcoming these limitations. This might include continuous collection of more robust and inclusive data, so that the dataset itself is more representative of all population groups. Furthermore, there may be an argument for introducing fairness and equity type of considerations at the stage of product approval for medical use. One requirement might be that AI tools should produce and declare confidence or reliability scores for the different population groups to which these tools would be relevant. This way, those assessing (e.g. MHRA, FDA)—and later on, those using the tools (e.g., healthcare professionals)—might be in better position to make decisions regarding the reliability and effectiveness of these new technologies on different patient groups. Of course, this solution raises its own set of questions that relate to the way that principles of justice, fairness, and solidarity should be understood and translated into practice. For example, if AI is only reliable for some groups, should it not be used on others? If it benefits white birth parents and mothers only, then the tool could exacerbate health outcome disparities. Should there a reliability threshold AI must pass on all relevant population groups in order to be implemented on the ground?
Although this research does not provide solutions to these challenges, it nevertheless provokes the ethical questions that need to be addressed.
Decisions and holistic, patient-centered care
Participants were unanimous in their preference for decisions regarding their care being made by humans, not autonomous machines. This point reiterates the need for decisions to be mediated by healthcare professionals, but fundamentally reveals participant perceptions regarding the limits of AI in considering them as persons rather than a collection of data, as well as delivering patient-centered care . There are a few ways in which design features of AI might be able to address this desire for personalized care. For example, the ability to input individual risk factors is a potential design decision that could address clinical relevancy of assessments and recommendations. Additionally, while the ability to input and personalize clinical risk factors is the most obvious way that AI-based CTG can increase personalization through design, it should also be noted that patients may have values, preferences, and risk thresholds that differ from their healthcare professionals. There is an argument to be made, then, that AI also requires value flexibility  to improve personalization capacities, not just flexibility in clinical inputs alone. In the case of AI-based CTG, this feature might look like adjusting for a patient’s risk threshold before an intervention is suggested.
However, although AI might be able to improve upon some aspects of personalization, the healthcare professional remains essential making individual assessments and providing patient-centered care. As Taylor’s experience highlighted, if personalization is primarily grounded in data derived from population groups (e.g., diabetics, birth parents and mothers over a certain age, etc.), patients may still feel like they are being reduced to data points and not being treated individually. There is also an argument that patient preferences and values should be considered and incorporated separately from AI-based clinical assessments. In addition, patients also want the ability to communicate back with their healthcare professionals and be part of a shared decision-making process, which requires dialogue, communication, and empathy. Ultimately, then, it is important not to focus on AI design alone but how its implementation on the ground can also enable healthcare professionals to perform more personalized, holistic, empathetic, and patient-centered medicine.
Participants’ understanding of AI as a tool at a healthcare professional’s disposal, at least in the case of AI-based CTG, aligns with its intended use case: as a decision-support tool, rather than as an autonomous decision maker. If AI is meant to be supportive rather than a replacement for healthcare professionals, guidelines should be put in place so that healthcare professionals do not over-rely on AI. Moreover, if healthcare professionals are meant to integrate this AI into their care, more research is needed on how clinical decisions are made so that this AI system can be seamlessly integrated into their practice to improve upon decision-making processes, including shared decision making with patients.
AI is often seen as a solution to inefficiency. It is thought of as something that can relieve time pressures on clinicians and alleviate the burden of staffing shortages. It is even imagined that as AI becomes more prominent in healthcare, the relationship between healthcare professionals and patients will become less important. However, despite the ever increasing kinds of technologies used in healthcare, it is evident that the trusting relationship between healthcare professionals and patients is still relevant. One participant even said that they would always see AI as a medical supplement ‘to a HCP I trusted.’ As such, developing that trusting relationship between patient and clinician is still of utmost importance—and potentially of greater significance—when introducing new technology into the healthcare space.
For this reason, any time saved with AI should be reinvested in the healthcare professional-patient relationship. This is not only so there is time and space for empathy and human-centered care, but also for improved (more holistic and personalized) clinical decision making.
Implications for OxSys Development
This research is part of a larger project to develop an AI-based CTG, OxSys. The results of this qualitative study were presented to the whole OxSys research team and discussed during group meetings. A workshop was also organized with members of the ethics group, patient involvement group, and development group to discuss the findings in more depth and consider further steps and take-home messages. During these meetings a number of action points were identified and discussed. In light of the here presented findings, the team discussed, firstly, potential strategies regarding the financing of the future development of OxSys and considered what kind of partnerships the project team should pursue. Secondly, the bias, fairness, and reliability points raised by the study participants led to a discussion regarding how the model can be continuously reevaluated and whether the model can correct for bias in the data. Furthermore, the possibility of including a confidence score for different populations as an add-on to the model was considered. Finally, and in relation to the third finding presented, the OxSys team was encouraged to see that their aim to build a decision-aid tool rather one that could be used to replace clinical expertise chimed with the views of our participants. A fruitful discussion evolved from this point regarding how risk scores could be presented to facilitate decision-making. For example, we discussed whether to present risk scores using a traffic-light color scheme, how to allow for personalization of risk assessments, whether it is possible to codify personal risk-perceptions regarding certain interventions (e.g. use of ventouse, forceps or caesarian section), and how to build in a functionality to allow users to focus on specific parameters of the risk score analysis. A number of points for actions to help refine the planned feasibility study were also extrapolated from our findings, including the point of appropriate training of healthcare professionals in using the tool.
Although we found that we had reached data saturation after a preliminary thematic analysis of these seventeen interviews, we nevertheless recognize the limited size of our study. Further research with a greater sample size would make it easier to extrapolate themes and arguments about what birth parents and mothers (and patient groups more broadly) consider ethical and trustworthy AI. Additionally, this study did not collect personal data about participants, such as ethnicity, age, educational attainment, sexuality, and socioeconomic background. While abstaining from this kind of data collection protects participants’ privacy, it nevertheless limits the capacity for intersectional analysis. Future research may benefit from collecting this kind of data and exploring the ways in which people’s identities and experiences inform their values and world views, including where they might converge and diverge by population group (and within them, too).
Prospective participants were told that there was no need to have expertise in AI to participate in our research. However, it is worth acknowledging that research participants are a self-selecting group and may therefore be more interested in discussing the topic of our research, as well as open to talking to university researchers. Nevertheless, many interviewees explicitly stated that they felt uninformed about AI and even silly sharing what first came to mind when they thought of it (one participant laughed at herself when answering, ‘little robots on wheels’); yet, even these participants were able to think through some of ethical and practical issues that may arise with introducing new technology into maternity care. It is not possible to ascertain from our data whether our participant group were more informed or educated on issues that pertain to AI than the general population, and we concede that further research could explore the relationship between educational attainment and perceptions of AI, as well as birthing experiences more broadly. Moreover, although people who choose to participate in university research may be biased in favor of public institutions, the finding that the people find public institutions more trustworthy than private institutions, especially in a healthcare context, is corroborated by other research .
Finally, this research was premised on a speculative design scenario rather than lived experiences with the AI in development. In part, this was due to a practical limiting factor, namely that at the time of interviews, OxSys was not being trialed with birth parents and mothers. However, speculative research is also an important step in the process of ethical design; it enables values and perspectives of users to inform early iterations of the AI undergoing development, rather than being reduced to an afterthought. This type of approach has been referred to as ‘proactive orientation toward influencing design’ . Nevertheless, further research that investigates people’s lived experiences with AI is also needed. Creating and implementing trustworthy AI is an iterative process that requires an understanding of the ethical challenges at every stage of its development ; therefore, the collection and evaluation of users’ perspectives should be sustained and carried forward in future research.