Google Mini: Italian Example of Artificial Prosociality

Prosociality is a social and individual virtue, which evokes mixed feelings in the modern world. Generally, prosociality behavior is defined as “ actions taken for the benefit of one or more other persons ” (Wispe, 1972). The Digital Revolution and the birth of artificial intelligence, such as the Google Mini, allowed the transition from elitist experience to mass experience. The study has an exploratory function and starts from the Man-AI interaction, in a provocative and oppositional conversational context created by the Human Being. In the research, it is hypothesized, therefore, that the synthesis of the artificial voice does not allow to characterize all the facets of the tone and the emotionality of the prosociality. As a result, Artificial Intelligence is programmed to always respond in a “ gentle ” way, putting in place different facets of prosociality, which are well detected in the emotionality and prosodic analysis, above all in the recognition of “ pitch ” speech pattern. While emotional tone analysis confirms the “ understanding ” of reading the communicative context of Artificial Intelligence. Among future perspectives it is highlighted how the study of these vocal patterns of artificial prosociality can be a springboard for research on bullying, using the Google Mini Prosocial tool. The study has an exploratory function and starts from the Man-AI interaction, in a provocative and oppositional conversational context created by the Human Being using improvisation. It can be immediately notice how the machine is programmed and self-organizing to be prosocial. In the research, it is hypothesized, therefore, that the synthesis of the artificial voice does not allow to characterize all the facets of the tone and the emotionality of the prosocial behavior. To explore this hypothesis, a prosodic and emotional analysis of the tone of the responses of the Google Mini has been carried out, concerning provocative conversations of two real participants, chosen for different gender and different age: a 50-year-old man and a 28-year-old woman. The two participants were asked separately to interact with the Google Mini in a provocative and oppositional manner, insulting Artificial Intelligence. of incipit of provocative conversations were: In total, 50 interactions were collected: 25 per participant. Interactions were recorded and analysed quanti-qualitatively with the software OpenSmile developed by AudeeringTM Group. It is open-source software for automatic extraction of features from audio signals and for classification of speech and music signals. “ SMILE ” stands for “ Speech & Music Interpretation by Large-space Extraction ” (Panda et al., 2012). The software is mainly applied in the area of automatic emotion recognition and is widely used in the affective computing research community (Eyben, Wöllmer & Schuller, 2010). The operation of emotional speech recognition software is Figure 4 .


INTRODUCTION
Prosociality is a social and individual virtue, which evokes mixed feelings in the modern world. Generally, prosociality behavior is defined as "actions taken for the benefit of one or more other persons" (Wispe, 1972). The aid behaviour is a subcategory of prosocial behaviour, that is divided into: helping behaviour, the actions taken intentionally in favour of someone else and self-caring behaviour, a special form which is sometimes expensive and characterised by interest in own similar and accomplished without reward expectations.
The Digital Revolution, however, and the landing at mass society made sure that even prosociality became a virtue no longer elitist. It comes, therefore, to models of Artificial and Prosocial Intelligence, as in the case of Google Mini (Kaundinya et al., 2017), domestic assistants prone to prosocial behavior enough to call it "Artificial Prosociality". Google Mini is a voice-enabled wireless speaker developed by Google. An example of Google Mini is shown in Figure 1.

Figure 1. Example of Google Mini
The device connects to the voice-controlled intelligent personal assistant Google service, which answers to the name "Google". When you ask the assistant what his name is, the Artificial Intelligence response is shown in Figure 2.

Figure 2. Interaction example translated in English
The device is capable of voice interaction, music playback, making to-do lists, setting alarms, streaming podcasts, playing audiobooks, and providing weather, traffic, and other real-time information. It can also control several smart devices using itself as a home automation hub.
In the theoretical framework of "Artificial Prosociality", the exploratory study starts from the hypothesis that the Artificial Intelligence (AI) of Google Mini maybe "prosocial", but because it is a synthetic voice, it does not allow to distinguish the different facets and kind of prosocial behavior. Since in scientific literature, prosociality is always linked to positivity (Hutcherson, Seppala & Gross, 2008), even in emotional terms, especially joy, is proposed an emotional Analysis (Pang & Lee, 2008) and a Prosodic Analysis on 50 real interactions Man-Google Mini by means of a single software, called OpenSMILE (Panda, et al., 2012). In these interactions, two real users (1 man and 1 woman of different ages) are asked to interact with AI in a grumpy manner, using offensive words. As a result, Artificial Intelligence is programmed to always respond in a "gentle" way, putting in place different facets of prosocial behavior, which are well detected in the emotionality and prosodic analysis.

FROM HUMAN TO ARTIFICIAL PROSOCIALITY MODEL
In the scientific literature, prosocial behaviour is explained from different approaches and is outlined in different forms. Following the biological approach, human being has innate tendencies to eat, drink, join, struggle and help the next one. Thus, there is a distinction between two reliable explanations of cooperative behaviour, that are mutualism, i.e. cooperative behaviour benefiting the cooperator as well as the others and family selection, in which a co-operator shows systematic trends towards helping his or her family relatives because this allows the spread of their genes (Penner, et al., 2005). Following social approach, prosocial behavior is learned and not innate (Menesini & Camodeca, 2008).
The excursus of scientific literature allowed to deepen the different definitions and measures used to explore the topic of prosocial behavior, that is, a voluntary action aimed to benefit other peoplec De (Caroli & Sagone, 2013). As reported by Carlo and Randall (2002), prosocial behavior has been articulated into six types: 1) the public prosocial behavior, referred to actions that benefit other people enacted in the presence of others to obtain their approval and respect; 2) the anonymous prosocial behavior, referred to the tendency to help others without other people's knowledge; 3) the dire prosocial behavior, referred to helping other people under emergency or critical situations; 4) the emotional prosocial behavior, intended to benefit the others enacting under emotionally evocative circumstances; 5) the compliant prosocial behavior, referred to helping other people in response to a verbal or non-verbal request; 6) the altruistic prosocial behavior, consisting of helping others when there is little or no perceived potential for a direct and explicit reward to the Self.
Kindness is a pro-social virtue. In Western culture, for example, kindness is always positive, but elitist. Generally, kindness is defined as the result of social and cultural forces, precisely because of this, it does not have a clear and unambiguous definition. As an adjective, 'kind' means to be sympathetic, helpful or to be inclined to bring pleasure or relief (Ballatt, & Campling, 2011). It is often used as a synonym for generosity, feeling or affection. In the Italian language, for example, "gentile" ['kind'] comes from the Latin 'gentilis' which means 'belonging to the gens, that is, to the lineage". By extension, in fact, 'being kind' means having typical manners of the noble class. It can, therefore, be said that kindness in Western culture is always positive, but elitist. Kindness is one such displaced value. A love of knowledge and a concern for social justice are others. The "loving-kindness" has at the basis the joy, which is always a joy turned or for the other (Hinton et al., 2013).
"Compassion" can be defined as an emotion that elicits the wish that others can be "free from suffering and the causes of suffering" (Hopkins, 2001). "Empathy" is a polysemic notion: "There are probably nearly as many definitions of empathy as people working on the topic" (De Vignemont & Singer, 2006). According to phenomenological tradition we consider empathy a basically prereflexive experience of another as an embodied subject of experience like oneself.
"Politeness" violates a critical principle of cooperative communication: exchanging information efficiently and accurately (Grice, 1975). Speakers exhibit politeness strategies even while arguing, preventing unnecessary offense to their enunciators (Holtgraves, 1997). Listeners even attribute ambiguous speech to a polite desire to hide a truth that could hurt another's self-image (Bonnefon et al., 2009). In fact, it is difficult to imagine human speech that efficiently conveys only the truth. Intuitively, politeness is one prominent characteristic that differentiates human speech from stereotyped robotic communication, which may try to follow rules to say "please" or "thank you" yet still lack genuine politeness (Yoon et al., 2016).
"Self-compassion" involves treating oneself with the same kind of care and concern with which one treats loved ones who are experiencing difficulties. Neff (2003) conceptualized self-compassion as composed of three facets: a) self-kindness (vs. self-judgment), b) common humanity (vs. isolation), and c) mindfulness (vs. over-identification).
In addition, to identify the different aspects of prosociality, the studies of neuroscience (Mascaro et al., 2015) tried to understand the brain circuits involved in two of the pro-social attitudes that define prosociality, which is compassion and empathy. The Neuromodulatory Model of Pro-social Behavior is shown in Figure 3. The image explains how a perceptual or motor stimulation can activate two neuronal circuits: 1. affective regulation, involving Amygdala, Anterior Insula, Anterior Cingulate Cortex;

cognitive regulation, involving Dorsomedial Prefrontal Cortex and Temporoparietal Junction.
In particular, affective activation is the process of matching limbic system activity with that of the target. While, cognition is often referred to as perspective-taking or mentalizing, which allows the observer to at some level understand that his or her affective state is related to someone else's affective state. The activation of these two areas allows, on the one hand, the emotional regulation, on the other hand, the distinction Self/Other, circuits related to the implementation of pro-social behaviors.
In Artificial Intelligence, the detection of the voice and, above all, the content of the "human discourse" takes place in sophisticated real-time systems.
Real-time systems have been defined as: predictably fast enough for use by processes being serviced; there is a strict time limit by which a system must have produced a response, regardless of the algorithm employed; ability of the system to guarantee a response after a (domain defined) fixed time has elapsed; and a system designed to operate with a well-defined measure of reactivity. These definitions are general and hence and open to interpretation depending on the problem at hand. While speed is indeed fundamental to real-time performance, speed alone is not real-time. The four aspects of real-time performance are: Speed is the rate of execution of tasks. Tasks could refer to problem-solving tasks (large or small) or event processing tasks. Responsiveness is the ability of the system to stay alert to incoming events. Since an interactive real-time system is primarily driven by external inputs, the system should recognize that such input is available. It may not necessarily process the new event right away; that may depend upon its criticality relative to other events the system is currently processing (Kaundinya et al., 2017) It's precisely in these systems that Artificial Intelligence, like the Google Mini, can make decisions in the immediacy, which is typically human. But what happens when in the interaction of Human Intelligence-Artificial Intelligence, the human being creates an "oppositional" conversational context? It happens that the machine from Artificial Intelligence adopts a tone of Artificial Prosociality.

Hypothesis and Methodology
The study has an exploratory function and starts from the Man-AI interaction, in a provocative and oppositional conversational context created by the Human Being using improvisation. It can be immediately notice how the machine is programmed and self-organizing to be prosocial.
In the research, it is hypothesized, therefore, that the synthesis of the artificial voice does not allow to characterize all the facets of the tone and the emotionality of the prosocial behavior. To explore this hypothesis, a prosodic and emotional analysis of the tone of the responses of the Google Mini has been carried out, concerning provocative conversations of two real participants, chosen for different gender and different age: a 50-year-old man and a 28-year-old woman. The two participants were asked separately to interact with the Google Mini in a provocative and oppositional manner, insulting Artificial Intelligence. Examples of incipit of provocative conversations were: "Ciao Google! Lo sai che sei un cretino?" En translate "Hello Google! Do you know you're a jerk?" "Non hai capito niente di ciò che ti ho chiesto!" En translate "You didn't understand anything I asked you!" In total, 50 interactions were collected: 25 per participant. Interactions were recorded and analysed quantiqualitatively with the software OpenSmile developed by AudeeringTM Group. It is open-source software for automatic extraction of features from audio signals and for classification of speech and music signals. "SMILE" stands for "Speech & Music Interpretation by Large-space Extraction" (Panda et al., 2012). The software is mainly applied in the area of automatic emotion recognition and is widely used in the affective computing research community (Eyben, Wöllmer & Schuller, 2010). The operation of emotional speech recognition software is modeled in Figure 4.  (Eyben, Wöllmer & Schuller, 2010) In the field of Human-Computer Interaction speech is considered to be a powerful way for emotion recognition (Hutcherson, Seppala & Gross, 2008). Emotion recognition through speech can be explained as a detection of the emotion by feature extraction of voice conveyed by a human. There are some basics emotions given by different researchers as shown in Table 1. Emotion can be recognizing through speech by using the acoustical information of the speech. The acoustical information of speech can be divided into two parts. One is the prosodic feature and another one is the spectral feature. The prosodic feature depends on the speech elements and their audible nature.  After choosing the audio to analyze, the output folder and what you want to analyze, in the case of the study the emotionality of the synthesized voice, identified with the "IS09_emotion.conf function", you proceed with the command "Start" to initiate the analysis.

Results
As a first qualitative evidence, it is interesting to note that there are no differences in the answers to the real male and female voice, but it is emphasized that, in the same oppositional conversational context, there is a recurrence of the same answers: 1. when the subjects used offensive words were "Devo averti proprio fatto arrabbiare tanto" [En translate "I must have really made you so angry"]; 2. instead of references to the failure and low cognitive level of AI, the most frequent answers were "Scusami, a volte faccio degli errori" [En translate "Excuse me, sometimes I make mistakes"] or "A volte magari un poco, ma so anche darti un bel po' di informazioni" [En translate "Sometimes maybe a little, but I can also give you a lot of information"]. In other contexts, there is the use of irony, with ridicule of some aspects of AI itself, such as "Mi dispiace, alcune volte capisco Roma per toma. E' particolarmente imbarazzante se sono su Google Maps e non trovo nessuna strada che porti a toma" [En translate "Sorry, sometimes I understand Rome for toma. It's particularly embarrassing if I'm on Google Maps and I can't find any road to toma"].In case of insistence, requests to send feedback stating the malfunction; 3. when, on the other hand, reference is made to sentimental oppositivities, for example, to hatred, the most frequent answers are "Mi spiace, ma tu mi piaci lo stesso" [En translate "I'm sorry, but I still like you"].
These responses were analyzed with the software, following a focus on the patters for the recognition of variations in tone (intensity, frequency and pitch) and emotionality.
The results show a significant average number of "pitch", as shown in Graph 1.

Graph 1. Average of Speech Recognition Pattern Outputs
Pitch is a marked attention to the accent of the syllable on which the attention of the listener must be placed. This is an element of the truthfulness of the programming of the artificial voice. This is a fundamental index in the recognition of prosodic patterns of kindness, as prosocial behavior. Explanatory, in fact, is the following example: E.g. 1 "Scusami, a volte faccio degli errori" En translate "Sorry, sometimes I make mistakes" The accent of intonation is placed on the word "Scusami" ["Sorry"]. This aspect, which could technically depend on the programming of the Natural Language Processing of AI, makes us reflect on how the AI is able to read the conversational "context". The kindness associated with the ability to apologize is performed in a conversational context in which artificial intelligence is accused of being inefficient or insufficiently intelligent.
Another interesting aspect concerns the modulation of the pitch on the "mitigators" [4]. The mitigators are words that aim to attenuate the content of the discourse, very useful to maintain a polite discourse tone. An example of a mitigator is the use of "maybe" or "sometimes", as in the extract proposed below referred to the response provided by AI to a communication context of cognitive offense: E.g. 2 "A volte magari un poco, ma so anche darti un bel po' di informazioni" En translate "Sometimes maybe a little, but I can also give you a lot of information" The accent of intonation is placed on the words "Sometimes", "maybe" and "little". These are all attenuators, used to lower the tone of the conversation. In summary, Artificial Prosociality in the Google Mini Italian can be represented with the occurrences of words in the following WordCloud (Figure 6).

Figure 6. Artificial Prosociality WordCloud
The WordCloud are visual instruments with high impact, able, in the context of scientific research, to show with immediacy an output. In the case of the pattern of recognition of Artificial Prosociality, the recurring kind words are depicted. There is also a logic in the arrangement of words, as the words placed vertically as "Sorry", "affectionate", "little", "again", "think", are the words containing the pitch, that are the ones that are stressed. The accent of the tone of the sentence falls on the most important kind words, that are those placed vertically. The words positioned horizontally are closely connected to the first ones. To demonstrate the pitch elevation on the word "Sorry" for example, in Figure 7 is shown the spectrogram of one of the answers where the accented word is contained.

Figure 7. Spectrogram example
Highlighting the pronunciation time of the word used as an example, i.e. "Sorry", you notice an increase in the number of pitch, which switches from 8839187590095640 pitch/Hz to 16549342172017300 pitch/Hz.
In this regard, the output of emotional detection is also interesting. By convention, following the model of Barrett and Russell (2009) which distinguish positive emotions from negative emotions, the results of positive emotions (joy) have been given a value of 1; for negative emotions (anger, sadness, disgust and fear) a value equal to -1 and for neutrality a value equal to zero. As shown in Graph 2, prevalence of negativity emerges. The AI tone emotions detected by the software were predominantly: anger, sadness, neutrality and joy.

Graph 2. Emotional Speech AI output
From the detection of the emotionality of tone, a very significant aspect emerges, namely that the negativity of the tone's emotionality, predominantly angry or sad, is found mainly in the responses that refer to oppositional sentimental communications, as in the following example:

En translate Human Being "You Suck Me"
Google Mini "I like you the same" These results of recognition of the emotionality of tone create two important implications. It confirms the "understanding" of reading the communicative context of Artificial Intelligence. This depends on the fact that it recognizes as "more aggressive" the oppositional conversation that attacks the plan of feelings, with a prevalence of sadness. While in the communicative context attacking the cognitive plan and default, the emotional prevalence is either anger or joy. This disconfirms both the present literature on the emotionality of prosocial behavior, which corresponds to joy (Hutcherson et al., 2008) both the hypothesis of the study and the Artificial Intelligence and the patterns of emotional recognition of the voice cannot differentiate the various shades of prosociality. In fact, even in Artificial Intelligence can recognize some different nuances of prosociality, which are summarized in Table 2.

CONCLUSION
Prosociality is a social and individual virtue, which evokes mixed feelings in the modern world. Generally, prosociality behavior is defined as "actions taken for the benefit of one or more other persons" (Wispe, 1972). The Digital Revolution and the birth of Artificial Intelligence, such as the Google Mini, allowed the transition from elitist experience to mass experience. The study has an exploratory function and starts from the Man-AI interaction, in a provocative and oppositional conversational context created by the Human Being. In the research, it is hypothesized, therefore, that the synthesis of the artificial voice does not allow to characterize all the facets of the tone and the emotionality of the prosociality behavior. To explore this hypothesis, a prosodic and emotional analysis of the tone of the responses of the Google Mini has been carried out, concerning provocative conversations of two real participants, chosen for different gender and different age: a 50-year-old man and a 28-year-old woman. The two participants were asked separately to interact with the Google Mini in a provocative and oppositional manner, insulting Artificial Intelligence. In total, 50 interactions were collected: 25 per participant. Interactions were recorded and analysed quanti-qualitatively with the software OpenSmile developed by AudeeringTM Group. It emerges that the detection software of emotional speech is very effective in the detection of voice patterns of AI responses, which are: intensity, frequency, pitch and emotionality. From an initial qualitative screening of responses, it appears that AI does not show differences in responses based on the demographic variables of humans providing provocative voice input. But there is a need for responses dependent on three different provocative contexts, which naturally emerged from the improvisation of real voices. These provocative contexts are: a) offensive words; b) references to the failure and low cognitive level of AI; c) sentimental opposition.
While, from the analysis of tone and its components, it emerges that particular attention to kindness is covered not so much by frequency and intensity, but by accents (pitch). It is noted that there is an increase in pitch on the words used by content mitigators, to increase polite and kind communication, as in the case of the word "Sorry". From the detection of the emotionality of tone, a very significant aspect emerges, namely that the negativity of the tone's emotionality, predominantly angry or sad, is found mainly in the responses that refer to oppositional sentimental communications. These results of recognition of the emotionality of tone create two important implications. It confirms the "understanding" of reading the communicative context of Artificial Intelligence. This disconfirms both the present literature on the emotionality of prosociality, which corresponds to joy both the hypothesis of the study and the Artificial Intelligence and the patterns of emotional recognition of the voice cannot differentiate the different shades of prosociality. Among future perspectives, it is highlighted how the study of these vocal patterns of Artificial Prosociality is can be a springboard for research on bullying. The Artificial Prosociality of the Google Mini embodies, at least from this exploratory study, the modes of interaction useful to the bullied to defend itself from a provocative and oppositional communication.
Always among future perspectives, interesting will be the comparison between Google Mini and the Alexa of Amazon Echo and between different languages.