A Collaborative Model for Integration of Artificial Intelligence in Primary Care

The cost of primary care is rapidly increasing in the developed world, and improving the accuracy of screening and diagnostic testing as well as other areas of primary care can be seen as an essential component in ensuring the long-term sustainability of the quality and efficiency of public health care systems. In this study, the authors propose a simple yet robust model of collaborative decision-making incorporating machine and human competences whereby the strengths and advantages of artificial intelligence methods can be harnessed to improve the overall accuracy of essential testing, diagnostics, screening, and other critical areas of patient care while addressing concerns and ensuring safety and complete human control over the course of diagnostics and treatment.


Introduction
The cost of primary care is rapidly increasing in the developed world, with the accuracy of screening and diagnostic testing being one of the essential factors in the overall cost of health care systems. The cost of misdiagnosis can be significant in both cases of undetected serious conditions, resulting in prolonged recovery and a higher cost of treatment, as well as in the case of a false positive diagnosis, leading to a higher cost of subsequent testing and a possible emotional impact on the patient and their families. Regarding the cost of direct consequences of misdiagnosis, "1 million added days in hospitals and $750 million in extra health-care spending" may be attributable to medical errors by doctors, hospitals, and pharmacists," according to the Canadian Institute for Health Information's (CIHI) examination of patient safety in Canada [1], while "improving patient safety in US Medicare hospitals is estimated to have saved $28 billion" [2]. The high cost of diagnostic errors to patients as well as the primary care system was highlighted in the World Health Organization's Technical Series on Safer Primary Care report on diagnostic errors [3].
The causes of misdiagnosis are complex, and while no perfect or simple solution has been found for this serious problem, it is clear that personal and environmental influences on the human operators in the field are one of the contributing factors. It is well known that the performance of even professional and highly trained personnel may vary in time and be affected by multiple factors such as physical condition, mood, fatigue, stress, and others. In particular, the burnout syndrome is well known among professionals whose work involves conditions of high and constant stress and responsibility for the life and well-being of other people, such as military personnel, aircraft pilots, medical professionals, teachers, social workers, and other essential care professions [4][5][6].
On the other hand, the advances in the field of Artificial Intelligence technologies over the past decades have brought the performance of machine systems in some areas to the level of human experts or exceeding it, such as with the games Chess and Go, including in a number of applications in health care [7,8]. Unlike humans, machine systems offer stable performance not affected by personal and transient factors. These advancements provide opportunities to significantly improve the performance of critical diagnostic practices and procedures by incorporating effective and accurate machine intelligence systems into diagnostic and primary care applications [9][10][11][12].
However, the introduction of such complex systems into direct human care can bring serious challenges of their own, particularly in the areas of trust and confidence in the system that employs such components. The internal operation of complex machine systems such as deep learning neural networks used in high accuracy image analysis is not very well understood at the time of writing, and trusting them with an essential treatment decision can be seen as premature at this point, and less than clear if achievable in the long term. Quoting Raj Jena, "if you are a deep learning algorithm, when you fail you can often fail in a very unpredictable and spectacular way" [13], stressing that applications of machine intelligence systems need to be robust and predictable, subject to comprehensive clinical validation [14] and explainable [15,16].
Taking into account these challenges and opportunities, we undertook this study to investigate possibilities of safe and efficient introduction of Artificial Intelligence methods in the operational practices of primary care and proposed a simple yet robust model whereby high accuracy machine methods can be harnessed to improve the accuracy of essential testing, diagnostics and other critical areas of health care without any compromise of safety, trust and confidence in the system.

The motivation for this study is:
 To investigate opportunities and models of incorporating high performance Artificial Intelligence methods into the diagnostics practices to improve accuracy and cost efficiency of essential diagnostics without compromising safety, trust and confidence in the system, and  To propose a general approach to incorporating machine intelligence methods and systems in the processes of primary care including diagnostics with the potential to measurably improve accuracy and performance while complying with the requirements of safety and full human control over the processes of diagnostics and treatment.

Challenges and Shortcomings of the Current Practice
In many health care systems and institutions, both private and public, the diagnostics following an essential test is performed by a single human practitioner and passed on to the next stage in the patient care chain that often takes it as a given with no further feedback or analysis. This practice may create a single link chain model ( Figure 1) in which the accuracy of the entire chain is dependent and determined by that of the links or stages, with correct diagnostics playing primary and sometimes critical role in the outcome of the treatment.
The logical consequence of this observation is that the efficiency of the chain cannot exceed that of any single link, and the error rate in the diagnostics phase may drive down the overall efficiency, both in terms of the patient outcome and the cost to the system.

Figure 1. Single-link diagnostics and treatment model
On the other hand, the ability to reduce the incidence of essential errors is limited by the factors of human nature that is essentially impacted by the condition and the environment; as well as cost and resource limitations in the system that do not allow significant duplication of processes to reduce the overall error. For example, to reduce an error at each of the stages in Figure 1, the system would need a second opinion on every diagnostics test or decision, resulting in the doubling of the cost of the diagnostics system, the direction that is rarely acceptable for practical reasons.
The advances in machine intelligence methods and systems over the last decade can offer an avenue toward a solution of this complex and costly problem as the cost of operating a pre-trained in a specific diagnostics area high accuracy and high-performance machine intelligence module can be negligible compared to educating and hiring hundreds of human practitioners, and its performance is more stable and not affected as much by internal or environment factors.
However, as mentioned earlier, any such development must be cautious and would have to deal with the issues of trust and confidence in machine based decision-making systems that at this time cannot be taken for granted [6]. The challenge therefore lies in creating combined, hybrid human-machine expertise decision-making models that would be able to combine the benefits of accuracy, high performance and stability offered by machine intelligence systems with trust and confidence of complete and uncompromised human control over the outcome of the diagnostics and treatment. Such an approach is investigated and proposed in this study based on an observation that strength of machine and human intelligences are often complimentary and a collaborative approach taking advantage of both can be effective.

Human and Machine Intelligences
As was already mentioned in many areas human and machine intelligences have complementary strengths as illustrated in Figure 2, the more so due the advances in machine intelligence methods over the recent decades that brought the accuracy and confidence of decision to the level of a regular human operator, and in a number of cases, human expert. This observation offers an opportunity and a basis for introduction of collaborative human-machine decision-making systems harnessing the synergy of strengths of human and machine intelligent systems and as a result, improving the quality and performance of decisions in multiple tasks domains including critical applications such as medical diagnostics.

Figure 2. Human and machine intelligence: strengths and synergy
However, as discussed previously, introduction of decision-making systems based on, or with participation of machine intelligence methods and technologies needs to be cautious, based on comprehensively verified systems and technology as they deal with the issues of trust and confidence in machine-based technology by the general public that can be essential in the real operational practice [6].
The challenge therefore lies in creating collaborative human-machine intelligent decision-making methods, models and systems that are capable of combining the benefits and strengths of human and machine expertise, while minimizing their respective shortcomings and ensuring confidence and trust in the produced decisions.

Decision Functions: Cumulative and Conflict
Let us suppose that a decision-making system has multiple decision making channels C1, .. Cn and the final decision on an input X is obtained from the partial decisions of the channels by a certain summation process that can be described by a "cumulative function" taking as input the partial decisions of the channels and producing the final decision D(X): In the simplest case, the channel decisions can have Boolean values of True (condition detected) or False (normal, no condition) and one of the simplest forms of the cumulative function A on an input X could be a logical operator of the channel decisions: The interpretation of the above being: at least one channel detected the condition of interest; all channels detected the condition of interest. Certainly, other types of cumulative functions between these two options are possible as well.
In addition to the cumulative function, the "conflict function" K(X) can be defined as the opposite perspective on the cumulative set of the decisions of the channels indicating the number or the rate of conflicts between the decisions of individual channels. In the simplest form it can be defined as a logical sum of pair-wise comparisons of the channel decisions: Thus, the meaning of the conflict function would be, "there's at least one conflict between the decisions of the channels".
The above definitions are summarized in Table 1.

Decision Quality
We can now evaluate the accuracy of the decision functions based on decisions produced by individual channels. Suppose the mean accuracies of the two channels are a1 and a2, respectively. It easily follows from the definitions of decision and conflict functions in Table 1 that the probabilities of an agreement (no conflict) and a conflict of the channels under that assumption will be as follows: , and obviously, We will now make a small number of assumptions justified by the discussion in the preceding sections. The first one is that the channels have similar level of accuracy in practical operation. The second assumption is that the intelligences of the channels are complementary; it means that the relative area of inputs where both channels produce erroneous decisions is small compared to the overall number. This condition can be attributed for example to independent learning process of the channels, or a different process by which they operate. The final assumption is that the accuracy of the channels is sufficiently high, for example, significantly higher than that of a random decision.
The former of the two is suppressed quadratically under the assumption that the accuracy is sufficiently high and therefore, (1 -) is small. Then, the latter case (3) is that of conflict of channels, where the channels are expected to "catch" the errors of each other, based on the assumption of independent learning.
To improve the probability of a correct decision in the case of channel conflict we shall introduce into the model with two parallel channels the third channel, sequential to the channels C1 and C2 that takes the input of the channels as well as values of the cumulative and conflict functions A and K and produces the final decision (Figure 3).
A further constraint that will be imposed in this model is that the final "expert" channel C3 will be involved in producing a decision on an input X only if a conflict between the parallel channels has been detected, that is, if K(C1(X), C2(X)) = True. It will also be assumed that the accuracy of the expert channel, 3 is superior to that of the parallel channels: 3 ≫ 1, 2 ~ . And it follows that introduction of an expert channel under these assumptions results in a quadratic suppression of the error in the conflict case as well. As a result, a multi-channel system with an expert arbiter can be expected to produce a quadratically suppressed decision error, a significant or even massive reduction in comparison with a conventional single-link model.

A Practical Demonstration of a Multi-Channel Decision System
In this section we demonstrate a practical application of a collaborative decision model described in the previous section, based on realistic values of the current diagnostics accuracy reported in the literature.
In this analysis, following Liu et al. (2019) [5] and other reports it will be assumed that in the diagnostics domain of interest the accuracy of machine intelligence system has reached or approached the average accuracy of a qualified, but not necessarily that of an expert human practitioner. The model contained the following units and components (refer also to Figure 3 3. A data collection and processing unit that combined the results of the channels producing the cumulative and conflict outputs as described in the earlier sections.
4. An expert human practitioner called to make the final decision in the case of a conflict between the channels as described in the previous section.
As before our assumptions were:  The accuracies of the human and machine channels were in the same range supported by Liu et al. (2019) and The Guardian (2019) [5,6].
 The accuracy of the expert channel in the final stage of the model is higher than that of either of the human or the machine channels in the parallel stage.
 Human and machine operators were trained independently, excluding significant correlation of systematic error.
For verification of the proposed model several different diagnostics areas were selected based on availability of the data on accuracy of diagnostics procedures and incidence of errors from comprehensive studies of diagnostics errors [8,13]: (1) Internal conditions (such as COPD, rheumatoid arthritis), [9]: diagnostics error incidence 13%, not including false positive cases. Adjusted to 20% to account for false positives; (2) Asthma, [8]: diagnostic error of up to 30% within a reasonable timeframe (includes no diagnosis); (3) Mammography, [13]: 10% and above; (4) An average across multiple diagnostic areas [13]: 13-15% excluding false positives.
In Table 2, we show the results of application of the proposed model to the above conditions based on the discussed assumptions and analysis of the model accuracy in Section 4. The accuracy of the machine system has been assumed to be in the range of an average human practitioner and below that of a human expert. As can be observed from these results, based on reported incidence of diagnostic errors, an expected improvement in the accuracy of diagnostics resulting from introduction of a multi-channel decision-making system with an incorporated AI channel ranged from 8% to 13%, a significant to major improvement.
These results demonstrate clearly that incorporation of machine intelligence systems as a parallel source of opinion in the decision-making process with a human expert follow-up can significantly improve the accuracy of diagnostics in most reviewed areas with measurable potential benefits for the patients and for the primary care system. The reasons of such an improvement will be discussed in the next section.

Discussion
The results in the preceding section demonstrated that the accuracy of routine diagnostics can be significantly improved by harnessing the capabilities and advantages of machine intelligence systems as a parallel decision-making channel to that of a human practitioner, as in the standard practice of the day.
This conclusion, and the ensuing results are based on the assumption that the probability distributions of channel errors are primarily independent, as illustrated in Figure 4. In this case, the probability of a conflict between the channels can be estimated by (3).
We will attempt to justify this assumption as reasonable. Indeed, as has been pointed by multiple studies, e.g. [4], human performance in critical tasks is often affected by the factors of their condition and environment which machine systems are less if at all dependent upon and influenced by. Consequently, it can be expected that errors caused by these factors would not be correlated between the human and machine channels.

Figure 4. Specific and correlated systematic error in a parallel multi-channel system
Another possible cause of a correlation of erroneous decisions by channels can lie in the specifics of education and experience of the human practitioner versus that the machine system. Again, it can be noted, that the machine system would likely be trained with a much broader and larger sets of data, across broader geographical and individual practice spectrum, reducing the likelihood of correlated systematic errors with a human practitioner. The opposite would apply as well: any systematic or system errors in development and / or training of machine systems are less likely to be reflected in the education and practice of a human practitioner reducing the likelihood of correlated errors. For these reasons, the authors believe that the assumption of independence of human and machine decision-making can be made as a good first approximation in evaluating the accuracy of hybrid decision-making systems with multiple parallel channels. An example of cases causing such systematic correlated errors in the channels can be a subset of rare, nonstandard, novel or substantially deviating from the norm in the diagnostics area cases where neither an average human practitioner nor the machine system have acquired sufficient training or experience. While for aforementioned reasons the authors consider the possibility of such errors reasonably low for an average diagnostics domain, it can certainly be an issue in some specific diagnostics areas.
One approach to address systematic issues of this type could be to trace the diagnostic decision to the eventual outcome of the treatment. Availability and the analysis of such data would allow to identify, track and resolve this type of systematic errors by adding them into the curriculum and practice of both human and machine diagnostics practitioners. Systems of the proposed type, incorporating essential components of automation and data processing certainly allow a possibility of such positive feedback. Importantly, the model equally addresses both types of potential error in the single chain scenario: false negative cases that may cause deterioration of the patient condition and the prognosis due to undetected condition, resulting in prolonged treatment, less positive prognosis and an increase in the overall cost of treatment; whereas false positives ones may lead to unnecessary further testing and treatment and cause emotional discomfort to the patient and their families. In either case, if a disagreement in the decisions between the channels has been detected, the case is brought to the attention of a qualified expert in the diagnostic area with strongly improved chance of a correct decision.
An operational system incorporating systems and methods of machine intelligence in the proposed architecture can have a number of essential advantages over conventional single-chain practice described in Section 2. First, it would not introduce any significant overhead in time or effort, other than in the cases where it would be justified by the complexity of the case. If both human and machine channels agreed on the initial assessment, the expert channel would not be involved. And due to high operational efficiency of the machine system and the fact that it can be used in the 24 × 365 regime, in most cases its result would be ready for evaluation well before those of the human practitioner, whereas the time and the additional cost of combining the results of the channels in a modern computer system can be negligible.
Secondly, such a system allows to free highly knowledgeable and high demand expert resources only for the most challenging cases where higher level of expertise is warranted. Such limited resources can be involved in a highly efficient distributed network on a regional or even national level with remote access to all necessary data, tests and case history.
Thirdly, as has been demonstrated in the practical application section collaborative systems of the proposed type can offer substantial improvement in the overall accuracy of the diagnostics process via taking advantage of complementary nature of human and machine expertise in parallel decision-making process, resulting in measurable reduction of the overall incidence of errors in the diagnosis phase and as a direct consequence noted in the aforementioned studies, improving the outcome as well as cost efficiency of the entire treatment chain.
Finally, it is worth mentioning that the incremental cost of deployment of a pre-trained and pre-tested in the given diagnostics area machine intelligence system can be minimal, comparable to that of a routine operation of installing or upgrading software packages thus offering a measurable addition of value and quality of care at a minimal cost.

Future: Superior AI?
With the advance of machine system accelerating in a number of fields to the level of human expert and up to above the best of human experts as has been the case with Chess and Go [14,15], in the medium-term perspective diagnostics accuracy of machine systems can be expected to improve further and eventually surpass not only an average but even the expert ability of humans. Such developments may suggest a potential for further gain in the accuracy and efficiency of diagnostics systems, however the noted concerns about perception and human control in critical decisions cannot be discarded.
Once the accuracy of the machine channel significantly surpasses that of a human specialist, the effectiveness of the multi-channel model proposed in the preceding sections would begin to decline, due to higher number of conflicts attributed to higher rate of errors in the human channel. Although at the time of writing this criterion has not been met in many areas of diagnostics, it can be anticipated that it will happen at some point given the progress in the state of AI technology.
To address these concerns the collaborative model can be modified to incorporate multiple independently trained machine channels, all with an accuracy exceeding that of an average human specialist verified in the operational practice ( Figure 5).

Figure 5. Multi-channel AI decision system with expert arbiter
As before, the decisions of the channels are accumulated by cumulative and conflict functions, and an expert's role is involved in the case of a conflict between the channels. A simple calculation shows that with a three-channel system with an individual channel accuracy of 95% and a human expert channel with an accuracy of 98%, the resulting error rate of the human-AI collaborative model described above would be well below 1%, offering an outstanding improvement in both the quality and performance of modern diagnostics systems.

Conclusions
The realities of an aging population are driving the cost of public health care systems in the developed world ever upwards, calling for innovative approaches to increase the efficiency of the system while retaining and enhancing its reliability, quality of care, and safety. Such opportunities can be found in harnessing the benefits of machine intelligence in applications in primary patient care that can substantially improve the accuracy of the diagnostic systems while retaining full control over their operation. The proposed model combines human and machine expertise into a single synergetic operational system and offers a number of significant advantages over the traditional "singlechain" models:  Demonstrated significant improvement in overall accuracy of diagnostics resulting in reduction in unnecessary spending in the system, improved patient care and overall quality.
 At a minimal incremental cost of development and deployment.
 Flexibility: the model can be easily adaptable and transferrable to different areas of primary care, diagnostics, as well as decision making in other areas of application.
 It does not introduce any additional delay due to the high performance of the parallel machine channels.
 This allows optimal use of the expert resources only in those cases that require their attention and involvement.
 Fully compatible with distributed, high performance and outstanding quality operational models of public service delivery.
 It combines the strengths and advantages of human and machine expertise for a significant improvement over current practice.
 While retaining complete and uncompromised human control over the process of diagnostics and treatment.
We fully expect that the development and introduction into operational practice of primary care of hybrid and synergetic human-machine service delivery models of the proposed type and ones similar to it in the near future will have the potential to make a significant improvement in the quality, reliability, safety, and efficiency of the primary care systems and may facilitate new ideas and approaches in further research, development, and improvements in operational practice in this field, which is essential and critical for the continuous well-being of society.

Data Availability Statement
Data sharing is not applicable to this article.

Funding
The author received no financial support for the research, authorship, and/or publication of this article.

Institutional Review Board Statement
Not applicable.