1 Introduction

The ethics guidelines and principles for AI issued by government agencies, industry associations, and business companies are unified by the claim that AI should be trustworthy (Jobin et al., 2019). For example, the US government advances the “development and use of trustworthy AI” (National Artificial Intelligence Initiative Office, 2021). Likewise, the European Union has developed “ethics guidelines for trustworthy AI” (European Commission, 2019). Trustworthiness is considered necessary for AI to earn trust, which in turn “is needed for its fruitful, pervasive use in our daily lives” (IEEE 2017, p. 2). While this claim is intuitive from a normative viewpoint, it is also based on the empirical hypothesis that trust requires trustworthiness. However, there is little evidence to support this argument for AI. The purpose of this study is to investigate how sensitive users are to the trustworthiness of AI-powered learning moral advisors.

AI-powered algorithms have conquered areas such as personnel recruitment, the allocation of loans, penal sentencing, or autonomous driving (Rahwan et al., 2019). They make and help us make highly consequential decisions and have practically turned into ethical agents (Whitby, 2011; Voiklis et al., 2016). In particular, the algorithm can serve as a moral advisor to its human user, who still makes the decision and accounts for it. Human involvement in algorithmic decision-making enhances perceived control over the algorithm and has been found to increase trust (Dietvorst et al., 2015; Burton et al., 2020). The human in the loop is therefore considered a building block in creating trustworthy AI (European Commission, 2019). However, this argument assumes that the human user does not naïvely trust the algorithm regardless of how trustworthy it is but carefully checks its advice and makes his or her own decision if a red flag rises.

In the case of learning AI, the transparency and integrity of the training data are minimum requirements for an algorithm to be trustworthy (IEEE, 2017; Lepri et al., 2018; European Commission, 2019). A major concern about algorithms, which transparency can help mitigate, is that they are biased (Mittelstadt et al., 2016; Jobin et al., 2019). We explore, first, whether users trust the moral advice of an algorithm if they know nothing about how it generates advice. Our benchmark is AI-generated advice that is based on the judgments of impartial human advisors. While human judgment is notoriously untransparent, the notion of the impartial advisor evokes the ideal observer and gives the advisee an idea of how the advice comes about (Jollimore, 2021). Hence, an algorithm is more transparent to its users if they know it was trained on judgments of impartial human advisors than if they know nothing about how it generates moral advice.

Second, we study whether users trust the algorithm’s advice when the integrity of its training data is dubious. Specifically, we assume that moral advice from convicted criminals is distrusted by many. Indeed, moral judgment is impaired by pathological traits (Campbell et al., 2009; Jonason et al., 2015; Blair, 2017), which are common among criminal offenders. There is also recent evidence that criminal offenders’ judgment is biased relative to the average population (Koenigs et al., 2011; Young et al., 2012; Lahat et al., 2015). Of course, crime does not necessarily arise from a lack of moral judgment. People often know what is right but do the wrong thing nonetheless. We find it still reasonable to assume that training data from convicts are perceived as biased, and discrimination in education, employment, and housing showcases the deep-seated distrust against convicts (Sokoloff & Schenck-Fontaine, 2017; Evans et al., 2019; Sugie et al., 2020).

We report three experiments to examine the acceptance of moral advice from an algorithm. In each experiment, the subjects were advised by an algorithm in choosing between two options in an ethical dilemma. In the first study, the algorithm was described as modeled on the judgments of impartial human advisors, and the subjects followed the algorithm’s advice just like advice from an impartial human advisor. In the second study, they were told that it was unknown what the algorithm’s advice would be based on, but they followed it indifferently. In the third study, we told them that the algorithm mimicked convicted criminals. The subjects now turned out to make their judgments independently of the human advisor (i.e., the convict who advised them), which confirms our assumption that advice from criminals is distrusted. By contrast, the algorithm said to imitate convicted criminals invariably influenced the advisees even then.

Our results show that users readily accept ethical advice from algorithms even when they know nothing about their training data or when these are presumably biased. This insight challenges the intuition that AI needs to be trustworthy to be trusted. In turn, it corresponds with evidence for algorithm appreciation from outside the moral domain, which suggests that people are more receptive to or, to put it negatively, unreflective of algorithms than one might expect (Logg et al., 2019). It is noteworthy how insensitive users are to the information provided in the human–machine interaction, which we manipulate. In summary, our findings provide first evidence that algorithms are accepted as moral advisors, on the one hand. On the other hand, they suggest that we think about how to ensure that AI-powered algorithms are used responsibly—e.g., by improving digital literacy. Our study thus contributes to the literature on AI ethics.

2 Procedure

We ran our three experiments on CloudResearch in March 2021. CloudResearch is an online platform to recruit subjects and conduct studies (Litman et al., 2017). We recruited a total of 2,017 US residents from CloudResearch’s Prime Panels (Chandler et al., 2019). Online platforms such as CloudResearch were found to provide reliable and valid results across a wide range of tasks and measures and they have been frequently used in the social sciences (Goodman et al., 2013; Hauser & Schwarz, 2016; Chandler et al., 2019). Prime Panels members must pass default screening questions to take part in a study, and they are more diverse and more representative of the US population than MTurk participants in terms of age, family background, religion, education, and political attitudes (Chandler et al., 2019). Each experiment took the subjects about five minutes to complete, and they were compensated with a fixed US$1.25.

We programmed our experiments in Qualtrics. The factorial design resulted in multiple experimental conditions. We used Qualtrics to randomly assign the subjects each to one condition, and CloudResearch to preclude repeated participation. The experiment consisted of four parts. First, we obtained informed consent from the subjects, while we guaranteed confidentiality and voluntary participation (screens #1 and #2). Second, they answered the focal question about an ethical dilemma with the advice of an algorithm or human advisor, which we manipulated (screen #3). Third, we posed a question to probe the subjects’ understanding and attention (screen #4). Fourth, they were asked a series of post-experimental questions about their ethical mindset, their attitude to artificial intelligence, and demographic data (screens #5 to #11). The appendix includes screenshots and further details and technicalities.

We obtained ethical approval for the studies from the institutional review board of the German Association for Experimental Economic Research (https://www.gfew.de). Each study was pre-registered at AsPredicted (https://www.aspredicted.org), where we specified the experimental conditions, the key variables, and the planned analyses. Moreover, we determined that the analyses would be restricted to subjects who answered the comprehension question correctly, and we set the number of subjects per condition to fifty. Anticipating that some would fail to prove their comprehension, we recruited more subjects; we attained the planned number of subjects after excluding about 20%, who had answered the comprehension question incorrectly. The final sample totaled 1,593 subjects for the three studies. The URL addresses to access the pre-registration documents are included in the appendix.

3 Study 1

Do decision-makers accept ethical advice from a machine? On the one hand, there is plenty of evidence for algorithm aversion. When offered a choice, people would rather take advice from a human than from an artificial advisor, even after seeing the latter outperform the former. They lose confidence in algorithms more easily, more persistently, and more than they reasonably should when these err (Dietvorst et al., 2015). They prefer that human decision-makers rather than machines make moral decisions (Bigman & Gray, 2018), delegate decisions rather to humans, and prefer that others delegate decisions to humans (Gogoll & Uhl, 2018). Approaches to reduce algorithm aversion include digital literacy, behavioral design, and control by human involvement (Dietvorst et al., 2018; Burton et al., 2020). The concept of the human in the loop both illustrates skepticism about algorithms and is a prime example of a remedy to mitigate it.

On the other hand, there is also evidence that people trust artificial more than human advice. As a counterpart to Dietvorst et al.’s (2015) algorithm aversion, Logg et al. (2019) coined the term algorithm appreciation. In a series of experiments, they showed that people give more weight to an estimate by an algorithm than by another person in adjusting their own judgments in various domains, ranging from weight estimates to romantic matches, whether they are provided with either one estimate or both. Likewise, news audiences were found to prefer news to be selected for them by algorithms over the selection by editors (Thurman et al., 2018). Despite their concerns about algorithms, people often choose automatically taken decisions rather than decisions by human experts or they are indifferent (Araujo et al., 2020). Moreover, they appear to trust algorithms to be as cooperative as human interaction partners (Karpus et al., 2021).

If people prefer human over artificial decision-makers in the moral domain (Bigman and Gray, 2018; Gogoll & Uhl, 2018), this does not imply that they will not accept advice from an algorithm when they receive it. Dietvorst et al.’s (2015) and Logg et al.’s (2019) studies resemble ours in that in both a human is advised by a machine. The two studies differ in whether the user has the algorithm seen erring before being advised. Logg et al. note that advisees follow artificial more than human advice in Dietvorst et al.’s study until they have seen it erring, and this case is relevant because many decisions are made without feedback on whether the advice was right. Moreover, it is harder to agree intersubjectively on whether moral advice was erroneous than whether a factual prediction was. Neither study considers advice-taking in the moral domain, though, and the question of whether moral advice is accepted from algorithms thus remains open.

To answer this question, imagine a situation where someone needs to make a decision that is consequential for someone else and thus clearly has an ethical dimension (e.g., a recruiter who selects an applicant for a job). In particular, suppose that there are two options to choose from, which leave the decision-maker in a moral dilemma. Moreover, there is an AI-powered algorithm to advise the decision-maker, which either encourages or discourages the choice of one option over the other. If the decision-maker heeds the algorithm’s advice, we should see him or her tend more or less to choose one of the two options, depending on the algorithm’s advice. Conversely, if the decision-maker is averse to following the algorithm’s advice, he or she should disregard it, the decision will be made independently of the advice, and no association between the advice and the inclination to choose the option encouraged by the algorithm should be discerned.

While an association between the algorithm’s advice and the human decision would confirm the expectation that moral advice is accepted from an algorithm, it would not show either algorithm aversion or appreciation strictly speaking, which are defined relative to the effect of human advice (Dietvorst et al., 2015; Logg et al., 2019). To see whether the decision-maker heeds the algorithm’s advice more or less than human advice, or equally, provided that he or she heeds the advice in the first place, we need to consider the same setting with a human advisor instead. Naturally, human advice that encourages the choice of one option over the other should increase the inclination to choose that option, whereas discouraging advice should reduce it. Algorithm aversion would then result in a weaker effect of the algorithm’s relative to the otherwise identical human advice; algorithm appreciation would result in a relatively stronger effect.

3.1 Method

To explore how decision-makers respond to advice by an human-trained AI-powered algorithm, we designed an experiment which required the subjects to make a decision that would benefit either a friend or a stranger. We vignetted three different scenarios, one in the business, health, and legal domain respectively, to preclude that our results would be driven by some specific situation. Each scenario featured the same trade-off between friendship and duty, and each subject was randomly assigned to one of those three scenarios. In the business scenario, for example, the subject was placed in the role of the recruiter of a company. That recruiter had two applicants shortlisted to fill a vacancy: a friend of hers and a stranger. She would then decide whom to hire. There was no further information about the two applicants given, other than that the recruiter found the stranger more eligible, but that she also felt obligated to her friend.

The vignette further introduced an AI-powered algorithm to advise the recruiter. The algorithm was described to the recruiter as imitating human decisions that were based on the moral judgments of impartial human advisors in a situation like hers. Moreover, the recruiter was told that the applicants did not know about the algorithm, and that no one would ever learn whether she followed the algorithm’s advice. We thus prevented the recruiter from feeling controlled rather than advised. We manipulated the advice by either saying that it was ethically acceptable or unacceptable for the recruiter to hire her friend. Whether the advice was in favor of or against her friend, was randomly determined by the experimental software for each recruiter. The recruiter was then asked to indicate how much she agreed with the statement that she would hire the stranger, not her friend, on a scale ranging from 0 (“fully disagree”) to 100 (“fully agree”).

To establish that the algorithm’s advice influences the recruiter in her decision, it is enough to show that the subjects’ decisions differ depending on the algorithm’s advice. However, we would naturally expect a similar, potentially larger effect of human advice. For comparison, we provided another set of subjects with the exact same vignette, except that the recruiter was advised by an impartial human advisor rather than by an algorithm whose advice was based on the judgments of impartial human advisors. By this description, we made the algorithm’s advice resemble the human advice to isolate the effect of the type of advisor. Like the algorithm, the human advisor would either tell the recruiter that it was acceptable or that it was unacceptable to hire her friend, and no one would ever learn how the recruiter decided. Whether the advice was in favor of or against the recruiter’s friend was again randomly determined by the software.

The law and health scenarios were vignetted and administered in the same way as the business scenario, where the subject was put into the shoes of the recruiter. In the law scenario, the subject took the role of a prosecutor responsible for prosecuting money laundering in financial services firms. That prosecutor had to decide which of two suspicious firms to raid—one run by a friend; the other, by a stranger. In the health scenario, the subject was responsible for compiling the list of recipients of kidney donations, and she needed to choose whether to allot the next available position on the list either to a friend of hers or to a stranger. In both cases, we varied the scenario, but kept everything else equal. The screenshots containing the instructions for all three scenarios are reprinted in the appendix. The screens were identical for all possible cases except for screen #3, which varied between scenarios and treatment conditions.

3.2 Results

We collected data from a total of 825 subjects. To ensure that the subjects whose answers we were going to analyze had diligently read their vignette and made their decision, we asked a comprehension question to probe their attention and understanding. Our results are based on the 633 subjects who answered this question correctly. 38% of these 633 subjects indicated that they were male, and their age averaged 41.7 years, within a range from 18 to 93 years. Our experiment employs a 2 × 2 × 3 factorial design with either a human advisor or the algorithm, who advises that it is acceptable or unacceptable for the decision-maker to favor her friend in one of three scenarios. The subjects were randomly assigned to one of the twelve resulting conditions. Our main interest is in how much the subjects agreed, on a scale ranging from 0 to 100, that they would rather decide in favor of the stranger than the friend in each condition.

Figure 1 breaks the subjects’ answers down by scenario, type of advisor, and nature of advice. Qualitatively, the subjects would rather decide in favor of the friend if advised that this was acceptable than if they were advised this was unacceptable, whether the advisor was human or an algorithm. The size of the difference varies somewhat between the scenarios and the types of advisor. The influence of the advice is apparently smaller in the business than in the other scenarios. We are interested, first, in whether ethical advice—both by the human advisor and the algorithm—leads the subjects to decide in favor of the stranger; second, in whether this effect differs depending on the type of advisor. To test the differences for significance, we ran regressions of the subjects’ inclination to favor the stranger over the friend on the nature of advice and the type of advisor with scenario-specific random effects, as stated on AsPredicted.

Fig. 1
figure 1

Inclination to favor the stranger over the friend on a scale from 0 (friend) to 100 (stranger) in response to human or an algorithm’s advice in favor of either the friend or the stranger, where the human advisor is an impartial observer and the algorithm is modeled on impartial observers. The figure depicts means and standard errors, broken down by scenario, type of advisor, and nature of advice

Columns 1 and 2 of Table 1 list the results of separate regressions for either type of advisor. They show that advice to favor the friend, both when the advisor is human and when it is an algorithm, reduces decision-makers’ inclination to decide in favor of the stranger, or makes them tend to decide in favor of the friend, across the three scenarios. Column 3 combines the data for both types of advisor to compare their effects. Again, the inclination to favor the stranger falls if the human advisor finds it acceptable to favor the friend. The coefficient on the interaction term between the algorithm as advisor and the advice to favor the friend is insignificant. Hence, there is no incremental effect of this type of advisor, and the effect of advice does not differ statistically between the human advisor and the algorithm. Likewise, the inclination does not differ depending on the type of advisor if the advice is against the friend.

Table 1 Influence of advice by AI-powered moral algorithm on moral judgment

Having considered the effect of advice on decision-makers’ inclination to decide in favor of the friend, another interesting question is how it would affect their actual decision. With an answer scale ranging from 0 (favor friend) to 100 (favor stranger), we took a score of less than 50 to indicate that the advisee would actually decide in favor of the friend, and thus derived a binary variable that captures the decision rather than the inclination. We see the percentage of advisees who decide to favor the friend rise from 23 to 43 if the human advisor finds this acceptable (χ2 = 13.48, p < 0.01). Likewise, it rises from 20 to 34 if the algorithm advises so (χ2 = 6.57, p = 0.01). The percentage does not differ between the human advisor and the algorithm, whether the advice was in favor of (χ2 = 2.35, p = 0.13) or against the friend (χ2 = 0.09, p = 0.76). Hence, the results for the inclination to decide turn out to also hold for the decisions.

In summary, the algorithm’s advice has an impact on decision-makers’ inclination to favor the friend or stranger, and most likely on their actual decisions. Our decision-makers do not slavishly follow the algorithm’s or the human advice, but the change in their inclination to favor the stranger or the friend depending on the nature of advice shows that they are influenced by it. Our data indicate neither algorithm appreciation nor algorithm aversion in the sense of the term used by Dietvorst et al. (2015, 2018) and Logg et al. (2019), both for the inclination and the presumptively resulting decision. Put differently, it appears that decision-makers do care about moral advice, but they do not care whether this advice comes from an impartial human advisor or an algorithm that imitates impartial human advisors, and our results do not reject the hypothesis that the algorithm’s and the human advice have the same effect on the advisee.

4 Study 2

The outcomes of our first study show that decision-makers heed moral advice from an artificial and a human advisor indifferently. They do not suggest algorithm aversion or distrust against algorithms compared to humans. In light of recent mixed evidence, this finding might not seem overly surprising. That said, despite this mixed evidence, the “idea [of distrust against algorithms] is so prevalent that it has been adopted by popular culture and the business press” (Logg et al., 2019, p. 90). This idea also informs ethics guidelines issued by governmental and non-governmental agencies and bodies, business associations, and companies (for an overview, see Jobin et al., 2019). For example, both the US government and the European Union advocate and advance “trustworthy AI” because they are afraid that distrust might hinder the adoption and acceptance of AI (e.g., European Commission 2019, p. 4, Exec. Order No. 13960 2020, Sec. 1).

The link between trustworthiness and trust is not straightforward. We may trust or distrust others who then turn out worthy or unworthy of our trust, leaving us with situations where we properly place our trust or distrust in someone else or where we misplace our trust or distrust in them (Levine et al., 2018). Ability, benevolence, and integrity were identified as factors that lead us to expect that someone will turn out worthy of our trust and thus increase the likelihood that we trust them (Mayer et al., 1995). In prior research, trust in AI was built on the ability of AI, namely to make correct factual predictions (Dietvorst et al., 2015, 2018; Logg et al., 2019). Calls for transparent, explainable, or accountable AI, in turn, refer rather to Mayer et al.’s (1995) trustworthiness factor of integrity (Jobin et al., 2019). For example, IBM (2019) argues that “we don’t blindly trust those who can’t explain their reasoning” (p. 26).

Transparency is the most prevalent ethical principle in AI guidelines (Jobin et al., 2019). It is considered a logical antecedent of trustworthiness. For instance, the IEEE (2017) argues that “transparency … will allow a community to understand, predict, and appropriately trust the A/IS [autonomous and intelligent systems],” and that “transparency allows for trust to be maintained” (p. 44). Unlike explainability, which lawmakers propose but are reluctant to specify, transparency has been required by regulations for automated systems, like those scoring creditworthiness, since the 1970s (Wachter et al., 2017). While transparency is seen as a means to make algorithms more trustworthy, there is considerable variation in what ethics guidelines claim should be made transparent, including how AI is used in decision-making, the source code, the training data, the legal regulations, the limitations, or the potential impact (Jobin et al., 2019).

Despite those claims, algorithms often remain untransparent because their proprietors refuse to disclose their functionality, or they are too complex to understand but by few specialists. Learning AI systems, in particular, are black boxes by design. Unlike rule-based AI systems, they develop their own rules to process information, which cannot meaningfully be interpreted by a human (Mittelstadt et al., 2016). However, algorithms should then be made as transparent as possible and audited (Mittelstadt, 2016). For example, learning algorithms depend largely on their training data. Biases in those data will naturally affect the algorithm’s decisions (Mittelstadt, 2016). The disclosure of (information about) the training data enables responsible users to assess the advice properly and decide what to make of it. It is therefore considered instrumental in making learning AI systems transparent and trustworthy (European Commission, 2019).

If transparency makes an algorithm appear trustworthy and thus helps it earn its user’s trust, users should be suspicious of algorithms they know nothing about. In our first study, the algorithm was described as mimicking human decisions, based on the judgments of impartial human advisors. This description suggests that the algorithm’s advice resembles the advice of those human advisors. While it is hard to tell how human advisors make their judgments, this algorithm stands comparison with them. In this sense, the disclosure of information about the training data made the algorithm of our first study transparent. Conversely, the lack of any such information makes an algorithm less transparent, compared to this benchmark. If users distrust untransparent algorithms, they should be more reluctant to heed advice from such an algorithm relative to the algorithm from our first study and hence make their judgments more independently.

4.1 Method

We ran another experiment on CloudResearch to investigate the impact of information about how the algorithm formed its advice. The vignettes employed the same wording as those featuring the algorithm as an advisor in our first study. They differed only in stating that it was unknown what the algorithm’s advice was based on rather than that it was based on the judgments of impartial human advisors. As opposed to our first study, there is no meaningful parallel condition with an untransparent human advisor. Humans are notoriously untransparent, and therefore the decision-makers did not learn anything about the human advisor in our first study that could be concealed. Instead, the untransparent algorithm is a viable alternative substitute for the human advisor, and the advisees’ response to the human advice in our first study remains a valid benchmark to assess advisees’ response to the untransparent algorithm in the second.

4.2 Results

A total of 396 subjects took our survey. 309 of these passed the comprehension test. 35% of the 309 subjects were male. The subjects’ age ranged from 17 to 91 years, with a mean of 43.9 years. The 2 × 3 factorial design, where the algorithm advised either in favor of or against the friend in one of three scenarios, created six experimental conditions. The subjects were randomly assigned to these conditions. Like in our first study, we focus on how much they agreed with the statement that they would rather decide in favor of the stranger or the friend. If advisees care about transparency, they are reluctant to follow advice from an opaque algorithm, and their agreement should not differ depending on the nature of advice. If it differs nonetheless, we are led to conclude that opaqueness does not bother them, and that the effect of the algorithm’s advice that we observed in our first study is therefore not driven by its relative transparency.

Figure 2 depicts the subjects’ answers, broken down by nature of advice and scenario. The results turn out to be qualitatively similar to those of our first study. As recruiters in the business scenario, the subjects leaned more toward hiring the stranger when advised so by the algorithm; as prosecutors in the law scenario, they tended more to raid the suspicious financial services firm run by the friend than the one run by the stranger in this case. Only for the health officials compiling the waiting list for donated kidneys in the health scenario, the algorithm’s advice makes little difference. Their agreement with deciding in favor of the stranger is similar regardless of the nature of advice, which suggests that they are less susceptible to it. Taking together the outcomes of the three scenarios, however, advisees appear to heed the advice although the algorithm is opaque, and they are therefore uncertain about what its advice is based on.

Fig. 2
figure 2

Inclination to favor the stranger over the friend on a scale from 0 (friend) to 100 (stranger) in response to an algorithm’s advice in favor of either the friend or the stranger, where nothing is known about how the algorithm works. The figure depicts means and standard errors, broken down by scenario and nature of advice

To test for potential differences, we first ran a regression of the subjects’ inclination to favor the stranger over the friend on the nature of advice, again with scenario-specific random effects. The results in column 1 of Table 2 show that the algorithm’s advice to favor the friend reduces this inclination. This suggests a significant influence of the algorithm although it is opaque. To complete the picture, the effect of the algorithm’s advice needs to be benchmarked against that of human advice—i.e., the response of decision-makers who were told that they were advised by an impartial human advisor in our first study. Column 2 reproduces column 2 of Table 1 for convenience. Combining the data, neither the coefficient on the interaction term nor on the algorithm as advisor in column 3 differ significantly from zero. Hence, like in our first study, the effect of the algorithm’s advice does not statistically differ from that of the human advice.

Table 2 Influence of advice by AI-powered opaque algorithm on moral judgment

To complement our analysis, we derived, in the same way as in our first study, a binary variable that captures the advisees’ actual decisions rather than their inclination to decide in favor of or against the friend. Looking at this variable, the percentage of advisees who would decide in favor of the friend is 21 if the algorithm advises against the friend, as opposed to 32, if it advises in favor of the friend. This difference is marginally significant (χ2 = 3.60, p = 0.058), and like in our first study, this result can be taken to indicate that the algorithm’s influence on decision-makers’ inclination to decide in favor of the stranger or the friend carries over to their actual decisions. For comparison, we recall that the percentages are similar in our first study, where 23 or 43% of the advisees would decide in favor of the friend, depending on what the human advice says, and 20 and 34%, depending on the transparent algorithm’s advice.

This outcome is not intuitive and it challenges the common belief that transparency creates trust. A potential explanation is motivated reasoning. People tend to collect and evaluate information in the light of the conclusions that they want to reach, and they disregard, overlook, or reinterpret conflicting information (Kunda, 1990; Gilovich, 1991). In our experiment, the subjects faced a moral dilemma that demanded a burdensome decision between friendship and duty. It was therefore convenient for them to follow the algorithm’s advice to relieve this burden. The vignette made the opaqueness of the algorithm salient, but the advisees still knew that the algorithm was there to “tell whether it [was] ethically acceptable to decide in favor of a friend in such situations.” It is conceivable that they filled the void and just assumed the algorithm was trustworthy although nothing was said to that effect. This conjecture motivates our next study.

Summing up, decision-makers trust moral advice from more and less transparent algorithms alike. While they do not each decide as advised, of course, their decisions are clearly influenced by the advice. We cannot rule out that the response to advice coming from an opaque algorithm is the same as to advice coming from an impartial human advisor. Again, we observe neither algorithm appreciation nor aversion relative to human advice. It is important to note that this finding does not invalidate the ethical argument for transparency to increase trustworthiness. It does suggest, though, that decision-makers attach less value to transparency or trustworthiness than commonly thought. Empirically, they blindly follow advice from an algorithm they know little about. Practically, this observation casts doubt on whether a decision-maker, as the human in the loop, can be expected to effectively control the algorithm’s decision in augmented decision-making.

5 Study 3

Starting from inconclusive evidence about trust in advice from AI-powered algorithms, we found in our first study that decision-makers trust ethical advice from an algorithm and an impartial human advisor alike. The algorithm was described as imitating decisions of impartial human advisors, though, making its training data quite transparent. Conceptually, transparency is an antecedent of trustworthiness, and a trustworthy algorithm is more likely to be trusted. This is arguably why transparency is the most prevalent principle in AI guidelines (Jobin et al., 2019). Unfortunately, transparency is often hard to attain (Mittelstadt, 2016). To test how much transparency matters, we made the algorithm opaquer and told the subjects in our second study that it was unknown what the algorithm’s advice was based on. Interestingly, they turned out to follow the algorithm’s advice both when it was more and when it was less transparent.

Having withheld information which suggests that the algorithm is trustworthy (i.e., that it imitates impartial human advisors), the next step to challenge users’ trust is to provide information which casts doubt on the algorithm’s trustworthiness. In our first study, trust in the algorithm’s moral judgment was derived from its training data. Obviously, there is little reason to distrust the moral judgment of impartial human advisors. It then seems intuitive to infer that the algorithm’s resemble those advisors’ decisions, making the algortihm’s advice trustworthy by the same token, and our subjects followed indeed the algorithm’s advice. Suppose now that the algorithm is instead designed to imitate decisions which are based on the moral judgments of people who have presumably acted immorally, such as convicted criminals. Will advisees still heed the algorithm’s advice, or will they now disregard it and decide independently of the advice?

It is not straightforward to conclude that the moral judgment of criminals is untrustworthy. First, we imply that the advisee shares the norms of the (unspecified) legal system under which the criminals have been convicted and that this legal system has rightly convicted them. That is, the convicts advising her are indeed criminals and the deeds that led to their convictions were not only illegal but also immoral. We assume that advisees take a conviction to indicate a moral transgression. Second, a moral transgression does not necessarily result from a lack of moral judgment. A criminal might well know what is morally right to do in some situation but do the wrong thing nonetheless. She would then be a poor role model, but her moral judgment would be intact, and she would therefore be perfectly qualified to give moral advice. (“Do as I say, not as I do.”) Is it reasonable to assume that a criminal’s moral judgment is trustworthy, on average?

Psychological research shows that individuals who load high on the dark triad personality traits (Paulhus and Williams, 2002) show an impaired moral judgment and a lower level of moral development than the average population (Campbell et al., 2009; Jonason et al., 2015; Blair, 2017). These traits are particularly pronounced among criminal offenders, who are often the subjects of such research. While juvenile offenders’ moral judgment is clearly impaired (Stams et al., 2006), the evidence for adult criminals is more mixed. However, recent studies did find systematic differences (Koenigs et al., 2011; Young et al., 2012; Lahat et al., 2015). For example, psychopaths attach lower relevance to fairness, authority, and others’ suffering (Jonason et al., 2015), and they have lower reservations against inflicting harm on others (Koenigs et al., 2011). Overall, this research suggests that convicts’ moral judgment differs from the general population.

This evidence gives reason to assume that many perceive the moral judgment of convicts as biased. Convicts are thus considered a negative selection of moral advisors, if not relative to the average population, then certainly to the impartial observers from our first study. Practically, the discrimination against ex-offenders in education, employment, and housing shows deep-seated distrust (Sokoloff & Schenck-Fontaine, 2017; Evans et al., 2019; Sugie et al., 2020). Of course, this distrust is arguably driven by fear of continued immoral behavior, which results from poor moral judgment or practical judgment or both. That said, people do not necessarily distinguish between moral and practical judgment. Instead, they will arguably find convicts’ moral judgment untrustworthy just because they find convicts untrustworthy. Hence, one would expect that decision-makers disregard advice from convicts, and that it has little effect on their decisions.

5.1 Method

Along the lines of our first study, we ran another experiment on CloudResearch, where the algorithm’s advice was described as based on the ethical judgments of convicted criminals rather than impartial human advisors. Other than that, we used the exact same vignettes as in the first two studies, where the decision-maker (e.g., the recruiter) was confidentially advised by an algorithm on whether to favor the friend or the stranger. To benchmark the impact of advice by the criminals-trained algorithm against that of a human advisor, we ran another condition where a human convicted criminal replaced the impartial human advisor from our first study. This condition also allows us to validate our argument that decision-makers distrust the moral judgment of convicted criminals. If we are right in assuming that a criminal record undermines an advisor’s trustworthiness in the eyes of the advisee, we shall see the advisees disregard the convict’s advice.

5.2 Results

We collected answers from a total of 796 subjects for this study. A total of 651 answered the comprehension question correctly. Fifty-two percent of these subjects were male, and their average age was 45.9 years, within a range from 17 to 90 years. Like our first study, the experiment has a 2 × 2 × 3 factorial design, where either a convicted criminal or an algorithm who imitates the decisions of convicted criminals gives the advice that it is either acceptable or unacceptable for the decision-maker to favor the friend in one of our three scenarios. We ran first the conditions with the algorithm as advisor and subsequently those with the human advisor. Within either run, the subjects were randomly assigned to the six conditions. The focal variable is again how much the subjects tended to decide in favor of the stranger as opposed to the friend in their role as a recruiter, prosecutor, or health official who compiles the waiting list for donated kidneys.

Figure 3 depicts the results, which differ qualitatively from those in our first and second studies in Figs. 1 and 2. On the one hand, the decision-makers who were advised by the human advisor (i.e., a convicted criminal) showed a similar inclination to decide in favor of the friend, regardless of whether the advice said that it was acceptable or unacceptable to favor the friend. Put differently, the descriptive statistics suggest that they considered ethical advice coming from a criminal to be untrustworthy and consequently disregarded it, as one would expect. On the other hand, the decision-makers who were advised by the algorithm instead still tended to favor the friend more if the advice said that this was acceptable and less so if it said that this was unacceptable. Hence, it turns out that they heeded the algorithm’s advice although that algorithm was introduced as being trained on the moral judgments of convicted criminals.

Fig. 3
figure 3

Inclination to favor the stranger over the friend on a scale from 0 (friend) to 100 (stranger) in response to human or an algorithm’s advice in favor of either the friend or the stranger, where the human advisor is a convicted criminal and the algorithm is modeled on convicted criminals. The figure depicts means and standard errors, broken down by scenario, type of advisor, and nature of advice

Like in the previous two studies, we ran regressions with scenario-specific random effects of the decision-makers’ inclination to favor the stranger over the friend on the type of advisor and the nature of advice to test these qualitative differences for significance. Table 3 shows the results of separate regressions for the algorithm mimicking convicted criminals (column 1) and the convicted criminal as human advisor (column 2) as well as a regression on the whole data set (column 3). Like the convicted criminal, the impartial human advisor from our first study is, again, a meaningful benchmark for the influence of the convicts-trained algorithm’s advice. Column 4 reproduces the results listed in column 2 of Table 1 for convenience. Column 5 lists the results of the regression on the data for the decision-makers receiving advice from the human advisor from our first study and from the algorithm from our third study combined.

Table 3 Influence of advice by AI-powered criminal algorithm on moral judgment

The results in columns 1 and 2 show that advice to favor the friend reduces the inclination to favor the stranger if it comes from the convicts-trained algorithm but not the convict. Accordingly, column 3 reports a negative coefficient on the interaction term but not the advice to favor the friend. Incidentally, advice to favor the stranger by the algorithm increases this inclination relative to the same advice coming from the convict, as can be seen from the significant coefficient on the algorithm as advisor. These results argue for an impact of advice by the algorithm but not the convict. When we compare the algorithm to the impartial human advisor from our first study, we observe the same pattern in column 5 as in the previous two studies (column 3 of Tables 1 and 2, respectively): The influence of the algorithm does not differ from that of the impartial human advisor, although the algorithm is trained on convicts.

As in the previous two studies, the results for the inclination to decide for the friend or stranger also hold for the actual decision. Translating the inclination score into a decision as before, the percentage of decision-makers who favor the friend is almost the same, whether the convict finds it acceptable or unacceptable to favor the friend (27 and 28). By contrast, if the convicts-trained algorithm advises in favor of the friend rather than the stranger, the percentage is 36 compared to 15, and thus significantly higher (χ2 = 17.72, p < 0.01). Recalling the percentages of decision-makers leaning toward these decisions with advice from the impartial human advisor from our first study, which are 43 and 23, the difference in percentage points is about the same. Hence, the influence of the convicts-trained algorithm resembles that of the impartial human advisor, which matches the insignificant effect of the interaction term in column 5.

In summary, decision-makers are influenced by an algorithm’s advice even if suspicious training data give them reason to distrust it. Indeed, the influence of that algorithm does not statistically differ from that of an impartial human advisor. In turn, the influence of a criminal human advisor is zero, ruling out the possibility that decision-makers are always influenced by advice and never care who the advisor is. Hence, we undermined the algorithm’s trustworthiness by both concealing information that could argue for its trustworthiness and revealing information that argues against it. Neither does the lack of transparency reduce trust, nor does transparency immunize advisees against advice from an untrustworthy algorithm. It seems that, in addition to transparency of information about the training data, digital literacy is needed for users to benefit from this transparency and make use of their information about the algorithm.

6 Further Results

Having found that decision-makers are influenced by ethical advice, both when it comes from a human advisor and when it comes from an algorithm, we wonder how they perceive the role of advice in their judgements and decisions. To address this question, we posed a post-experimental question to ask our subjects, first, whether they would make the same judgment if there were no moral advice. Second, we asked them the same question about “most other participants.” Table 4 summarizes the answers, broken down by the type of advisor and the nature of advice. We note that about three quarters of the subjects, with little variation among the experimental conditions, claimed that they would make the same judgment, which suggests that they felt hardly influenced by the advice. Conversely, more than half of them believed that the other participants were influenced and therefore would make a different judgment without advice.

Table 4 Participants’ assessment of influence of advice

These observations are striking in two regards. On the one hand, it is noteworthy how few subjects considered themselves susceptible to advice despite the large and significant effect that our three studies establish. On the other other hand, they consider others much more susceptible to it than themselves. The same kind of self-defeating reasoning has been found for morality. Departing from the fact that people consider themselves more selfless, kind, and generous than others, Epley and Dunning (2000) show that this is not because they underestimate others, but because they overestimate themselves, while their assessment of others is quite accurate. The subjects’ answers point to a considerable potential for self-deception about the influence of advice, and particularly advice by algorithms. There is a risk that decision-makers continue to believe that they own their decisions, although they largely adopt them from machines.

To examine the influence of advice on moral judgements and decisions, we intentionally raised the ethical question of whether to favor a friend over a stranger. While this question allows for a role of moral advice, the answer might also hinge on the decision-maker’s personal morality. To consider the potential effect of personal morality, we included a post-experimental question about ethical self-assessment, which asked the subjects how moral they considered themselves relative to other subjects. They answered on a scale ranging from 0 to 100, where a score of above 50 identified the respondent as more ethical than others; below 50, as less ethical. The answers actually range from 0 to 100, with a mean of 68.41 and a standard deviation of 19.61. We added moral self-assessment as a covariate to our regressions to test whether it is associated with the inclination to favor the stranger over the friend and to isolate the influence of ethical advice.

The results, which are reported in Table 6 in the Appendix, show a significant positive coefficient on self-assessed morality. Hence, decision-makers who considered themselves morally superior to others tend more to decide in favor of the stranger. Albeit not crucial for our findings, the decision in favor of the stranger thus turns out to be the morally superior choice in the eyes of our subjects. The focal effects of the algorithm, the advice, and their interaction do not differ qualitatively from those without ethical self-assessment in Tables 13, confirming our results. Incidentally, the aforementioned mean response of 68.41 reveals that our subjects assessed themselves to be more moral than the average. This is another example of the “better-than-average effect,” which is well-documented in the social psychology literature (Koellinger et al., 2007; Merkle & Weber, 2011), and which motivated Epley & Dunning’s (2000) above-cited study.

Additionally, we asked our subjects another question about their ethical mindset as well as some questions about their attitude to artificial intelligence. Specifically, we used the standard trolley problem by Foot (1967) to explore whether they were rather outcome-minded or rule-minded (Cornelissen et al., 2013), and added their responses to our regressions. The results in Table 6 in the Appendix reveal that outcome-minded decision-makers, who are identified by diverting the run-away trolley to kill one person and save five, tended slightly more to decide in favor of the friend than rule-minded subjects. Keeping in mind that favoring the stranger over the friend is considered more moral by the subjects, it is intuitive that outcome-minded people are more willing to trade off morality for friendship; however, the effect is not significant. Likewise, openness to artificial intelligence does not play a role for our findings.

7 Conclusion

We ran three experiments on trust in algorithms in the moral domain, where the subjects took the role of a decision-maker who faces a dilemma between friendship and duty. We manipulated the trustworthiness of the algorithm by concealing information about its training data, thus making it less transparent, and by revealing information that suggested it was biased. We found, first, that the algorithm’s advice influences users’ decisions like advice from an impartial human advisor. If it encourages a decision in favor of the friend, users are more inclined to decide in favor of the friend; if it discourages that decision, they are less inclined so. Second, users care little about how trustworthy the algorithm is. Its influence is almost the same whether it is presented as imitating impartial human advisors, as a black box, or as mimicking convicted criminals. By contrast, decision-makers do disregard advice from a human convicted criminal.

Our findings contribute to the growing literature on the ethical design of AI. It is commonplace that AI tends to be distrusted and that transparent and thus trustworthy AI is needed to reap the societal benefits AI can bring. This, however, is not what we find. Our data suggest that people follow AI-generated as much as much as human advice; that they do not bother about untransparency; that they follow AI-generated advice even in the presence of suspicous information; that they (over)confidently believe others are more susceptible to the algorithm’s influence than themselves. These empirical observations do not invalidate the ethical argument for transparent and trustworthy AI. They do indicate, though, that more than that is needed for a responsible use of AI. While users can realize, by trial-and-error, that AI can err to become more diligent (Dietvorst et al., 2015), we feel it is better to improve digital literacy (Burton et al., 2020).

Moreover, our research extends the ongoing debate in social sciences on whether and when people trust or distrust algorithms. Dietvorst et al. (2015) and Dietvorst et al. (2018) presented evidence for algorithm aversion. Logg et al. (2019), in contrast, found that advisees tend to appreciate AI-generated relative to human advice. Adding to this inconclusive prior evidence, we observe neither algorithm appreciation nor algorithm aversion about our moral dilemma. Instead, the results of our experiment suggest that the impact of AI-generated and human moral advice are very similar. (Technically, we see appreciation of the algorithm that mimics convicts relative to the human convict, but not the impartial human advisor.) As evidence keeps accumulating on both sides, along with evidence like ours, which argues for neither side, it seems that neither algorithm appreciation nor algorithm aversion generally prevails, but that multiple factors matter.

This article naturally has limitations that inivite further research. We considered a dilemma between friendship and duty to study trust in AI-powered algorithms. We varied the scenario to cover occupational, medical, and legal decisions, but it is easy to conceive further scenarios and dilemmas. While we believe that our findings generalize, it would be desirable to see them stand up to variation on both counts. We also confined ourselves to learning as opposed to rule-based AI. Learning machines have strenghts, including their flexiblity and scalability. However, their trustworthiness is limited by their training data and other factors. Rule-based machines, in turn, can be rendered fully transparent and users can thus check whether they agree with the rules. There have been attempts to create rule-based AI-advisors in the moral domain (Lara and Deckers, 2020), and it would be interesting to see similar research on them.

AI-powered advisors have a tremendous potential of improving decision-making. On the one hand, it is good news that ethical advice from algorithms is accepted. On the other hand, it is worrisome how little users reflect on such advice, even when they are cautioned against it. This is also a caveat against the human-in-the-loop approach: It makes us feel better about the resulting decision, but it cannot mitigate this risk if the human in the loop succumbs to the temptation of trusting the algorithm too readily. This overtrust creates a risk for decision-making to be corrupted by flawed algorithms. In a future with AI-powered assistants supporting us in all areas of life, we cannot count on government or other regulation alone to ensure that these are trustworthy, and we do not want to wait for users to find out that algorithms can err. Instead, we need to improve digital literacy and train them to use algorithms responsibly.