Modelling the emergence of cognitive styles

Individuals consistently differ in behaviour, exhibiting so-called personalities. In many species, individuals differ also in their cognitive abilities. When personalities and cognitive abilities occur in distinct combinations, they can be described as ‘cognitive styles’. Both empirical and theoretical investigations produced contradicting or mixed results regarding the complex interplay between cognitive styles and environmental conditions. Here we use individual based simulations to show that, under just slightly different environmental conditions, different cognitive styles exist and under a variety of conditions, can also coexist. Co-existences are based on individual specialization on different resources, or, more generally speaking, on individuals adopting different niches or microhabitats. The results presented here suggest that in many species, individuals of the same population may adopt different cognitive styles. Thereby the present study may help to explain the variety of styles described in previous studies and why different, sometimes contradicting, results have been found under similar conditions.


Introduction
Joining studies of individual differences in cognition and of animal personalities leads to the exciting field of "cognitive styles". The concept of cognitive styles describes how individuals consistently differ in how they use their cognitive capacities in combination with consistent inter-individual differences in behaviours such as exploration, boldness or aggressiveness [reviewed in [1][2][3][4][5]. Empirical data support the existence of different cognitive styles in nature [e.g. [6][7][8]. Furthermore, the existence of animal personality in virtually all species tested [e.g. [9][10][11], combined with the fast-growing body of evidence of individual differences in cognitive abilities within species [reviewed in 5] let it seem likely that different cognitive styles can be found in a vast variety of species and that this 3 constitutes an important ecological and evolutionary aspect.
Interestingly, empirical studies often show opposing findings [reviewed in 8] and based on these and theoretical considerations different and contradicting predictions about cognitive styles have been formulated [see e.g., [1][2][3]. Probably the most influential of these, the proactive-reactive framework, states that "proactive" individuals tend to be bold and explorative, forming behavioural routines quickly, but having trouble to incorporate new information about the environment [2]. The latter may limit the performance of this behavioural type in many cognitive tasks. On the opposite end of this continuum are the so-called "reactive" individuals, which tend to be shy and less explorative but more sensitive towards environmental cues and chances of their surroundings. It has been hypothesized that these individuals should be better at dealing with some cognitive challenges, especially when the tasks require to reverse previously formed associations [2]. And indeed, experimental studies have found supporting evidence for these behaviour/cognitive types in some species [reviewed in e.g. 2,8]. However, other studies found different combinations of behavioural and cognitive characteristics, contradicting the proposed behavioural and cognitive types of the "proactive-reactive" framework. For example, in some fish [12], bird [13], and mammal [14] species, bolder or more explorative individuals were generally better at cognitive tasks than shyer individuals. Yet other studies could find only mixed, weak, or even no correlation between cognitive performance and exploration or activity level [e.g. [15][16][17].
While some differences in these results may be explained by methodological design, we believe that many of the demonstrated differences in previous studies are ecologically meaningful and reflect differences in the evolution and development of cognitive styles. It has been shown that the expression of traits underlying cognitive styles can crucially depend on environmental conditions [personality traits e.g. 18-20; cognition e.g. 21-23; 4 brain morphology e.g. [24][25][26].
In particular, predation pressure is regarded as a major environmental factor which may strongly influence the development of consistent inter-individual differences in behaviour [e.g. 27-29, but see 30]. Based on the above stated findings and considerations, we believe that many different cognitive styles can emerge depending on the precise ecological circumstances in which individuals live.
Using individual based simulations, we want to show here i) that different cognitive styles evolve under different environmental conditions, explaining apparently contradicting evidence from experimental and theoretical studies and ii) that even within the same environment, different cognitive styles can coexist, which may help to explain the existence of large differences in cognitive abilities within a species. While in nature a huge variety of factors will influence this relationship, we concentrate here on two traits of individuals (namely, exploration tendency and learning ability) and two features of the environment (namely, complexity in terms of different resource types, and predation pressure). Taking these four variables, we investigate the effect of environmental conditions on the development/evolution of learning skills and exploration tendency in individuals of the same population. Our results can help explain apparently contradicting findings of previous studies and outline complex interactions between individual traits and environmental conditions in regard to the emergence of cognitive styles.

Methods
The models presented here are an extension of a model used in previous work (Liedtke and Fromhage, submitted). We implemented populations of N Individuals individuals in which three traits can evolve independently: learning ability L, exploration tendency E, and selectiveness S. Both L and E are continuous traits and can take values between 0 and 1. S 5 is binary and can be either 0 or 1. Simulations are run for N Generations discrete generations ( = seasons). At the end of each season, we let individuals reproduce asexually in relation to their fitness. Fitness is determined by the amount and value of resources an individual collected throughout its lifetime. Each season has T days, which sets the maximum lifespan of individuals. Each day consists of N Steps steps through which each individual proceeds. In the beginning of each day, the order of individuals is randomized to ensure equal chances. Predation: Predation is implemented by introducing three different predator types (P1, P2, and P3), which are defined by their baseline probability of being encountered (P p) and their lethality λ P , i.e. how likely an individual will die when encountering this predator type. Whenever an individual moves in order to explore its environment, it is vulnerable to predation. We calculate the probability of a predator attack from a binomial distribution as: Here, E k isthe focal individual's exploration tendency. Thus, the more explorative an individual is, i.e. the more it moves around, the more likely a predator attack becomes.
When an attack happens, the type of predator is sampled according to the relative probabilities of the predator types (P P1 , P P2 , P P3 ). The individual under attack survives with probability: Actions: During each time step, an individual can perform one of the following actions: rest (and hide), explore (searching for resources), handle resources, or escape a predator.
In the beginning of each day, or in any time step after an action has been completed, it is determined whether an individual will move in the current time step. If the individuals' cumulative exploration tendency ( C) is above a randomly drawn threshold (between 0-1) it will move; otherwise it will rest. Cumulative exploration tendency means that each time an individual rests its C will increase by the value of E k. For example, if the focal individual has an E = 0.3 and has rested for the previous two time steps, its C in the current time step becomes = 0.9. Thus, it will move with 90% probability. Consequently, individuals with E > = 0.5 never rest more than once in a row. E is genetically encoded by a single locus, whose initial allelic values are randomly sampled from a uniform distribution between 0 and 1.
If an individual moves it will visit a randomly chosen site. Here it may encounter a predator with the probability given in Eq.1 and survive its attack with the probability given in Eq. 2. If it survives, it has the possibility to learn anti-predation behaviour (see below) after which the time step is over. If it dies, the individual will not participate in any further actions. If a moving individual is not attacked, it can explore the randomly chosen site and search for resources. If it enters a site containing a resource, it finds the resource with probability Individuals can then start handling the resource and, depending on the handling time of this resource type, obtain its value. If the handling time is larger than 1, the individual can continue to reduce the initial handling time by 1 unit in each following time step, until 7 the residual handling time reaches 0 and the resource value is gained. When only one time step is left in the current day, the individual needs to stop handling the resource and return to its hiding place without obtaining the reward. When a resource was successfully exploited, the site it was found in was emptied and not refilled. Thus, any exploitation of a resource reduces the likelihood of finding a resource in subsequent exploration attempts for all individuals until the end of the season.
Learning: We implement resource-learning as a reduction of handling time due to having experience with a given resource type. Each time an individual ends handling a resource type with larger handling times than a given minimum (HTmin = 3 in all presented cases), the handling time for this individual and this resource type is updated as: Here, L is the focal individual's learning speed; t is the number of time steps spent handing the resource item; h i,Initial is the initial handling time for resource type R i at the beginning of the current encounter; and t/h (i,Initial) is the proportion of the learning episode that was completed. The maximization function max [.] ensures that handling times cannot drop below 3 (i.e. HTmin).. L is genetically encoded by a single locus, whose initial allelic values are randomly sampled from a uniform distribution between 0 and 1.
Similarly to resource learning, lethality of a predator type can be reduced through learning each time an individual survives an attack. After an unsuccessful attack by a specific predator, the current lethality of this predator type is updated for the focal individual as: Here, Lis the focal individual's learning speed; λ P is the current lethality for this predator type, which is identical for all individuals at the beginning of the season (i.e. before any 8 learning took place) and ß is a parameter defining the general speed of predationlearning. The lethality of predators could not be reduced lower than 1/10 of their original value (at the beginning of the season before any learning took place). Thus, predators always have a minimum lethality no matter how often an individual has survived an attack of that predator type.
Selectiveness: We implemented individuals as being either selective or non-selective foragers. Selective individuals handle only resources whose handling time they can complete by the end of the day. Resources with longer handling times were rejected immediately and individuals can move to a new site in the next time step. Non-selective individuals handle any resources they find. This can lead to handling being interrupted prematurely at the end of the day, yielding no immediate reward. Yet, such incomplete handling of resources still provides an opportunity for learning. Therefore, non-selective individuals can eventually learn to collect resources whose initial handling times exceed a day's length. Selectiveness is genetically implemented by one locus with two alleles, determining individuals to be either selective ( S = 1) or non-selective ( S = 0). The initial allelic values are randomly sampled with equal probability.

Reproduction:
We assumed an 'income breeder' system where all individuals, independently of their survival until the end of the season, obtained offspring in relation to the total amount of the value of collected resources throughout their lifetime.
Reproductive success is calculated as: where V Total is the total value of collected resources, L is the individual's learning speed, and α is a cost coefficient that specifies the cost of learning. No costs of exploration (E) are explicitly included in this calculation, as they are implicit in the risk of overlooking resources and attracting predators. The next generation is recruited by randomly sampling 9 offspring from the present generation, using F as the independent sampling probabilities.
Mutation: All three traits, L, E, and S were independently subject to mutation. Mutation probability is set to q = 0.1 for each trait. For the continuous traits L and E, new trait values were chosen randomly from a normal distribution, with a mean of the parental trait value and a SD of 0.1. For the binary trait S, a mutation event would change the value from one state to the other (i.e. from 1 to 0 and vice versa).

Results
We heuristically explored the parameter space for conditions where we could find the existence of different cognitive styles with changing (for simplicity) as few parameters as possible. For the main findings we therefore changed the value of only three parameters unless stated otherwise. We found circumstances under which different combinations of the two individual traits L and E predominated in the population (see Fig. 1). We also found various cases of two different cognitive styles coexisting within the same population (see Fig. 2). The values of only two parameters needed to be changed in order to find these results. One is the detectability (D R), , which was either low (0) or high (0.9) for either resource type (R1 or R2). The other parameter was the length of season (T; i.e. maximal lifespan of individuals). Only in order to get a pure high E high L cognitive style ( Fig. 1 b) we needed to increase the abundance of the high value resource so that an alternative style, which exploited low value resources, was not adaptive even for a small portion of the population. Predation pressure (i.e. how likely an attack occurred and how lethal this attack was) was not needed to obtain these results. Nevertheless, this factor had a strong influence (see below).
As expected, we found no investment into learning (low L) whenever there was nothing to learn, i.e. handling time of resources were low and predators were absent. Additionally, this could occur whenever individuals could not learn fast enough because the season length (lifespan) was too short or predation pressure was so high that individuals were killed before they could learn sufficiently. Thus, in this way, predation could prevent the existence of 'fast learning' styles (see Fig. 3a). On the other hand, predation pressure could also lead to the evolution of high L in an otherwise "non-learning" environment (i.e. in an environment with only resources with low handling times or when exploiting resources with high handling times was not worth learning for). If predation pressure was not too severe, individuals could benefit from investing into learning abilities in order to reduce predation pressure and increase their expected lifespan, thereby increasing the overall income of resources (see Fig. 3b). Furthermore, predation could also hinder the existence of high exploration tendencies (high E) because the faster one explores, the more likely predators were attracted (see Fig. S1).
Exploration tendency also depends strongly on how easily resources were detected. When resources are conspicuous, individuals can find them even when exploring fast; hence high E becomes adaptive. However, whenever resources are hard to find (i.e. D R is high) low E can yield higher payoffs as it ensures that resources are not overlooked. Note that, since individuals need to explore in order to find something at all, minimum E (> 0) are to be expected. In our simulations without predation, the optimal exploration tendency was around ~0.4. Due to the cumulating exploration tendency (C) this value of E ensures that individuals will most likely explore at least every second time step, while keeping the risk of overlooking resources moderately low. However, high exploration may be needed when life is very short, so that to ensure finding any resources at all, individuals need to explore each time step-regardless of the risk of predation and of overlooking resources.
We found coexistence of cognitive styles when individuals specialize in exploiting one of the two resource types (Fig. 2). In the results presented here, R1 was always a low-valued resource (V R = 1) which did not necessitate any learning, while R2 always had a high handling time (H R = 15), which could be reduced through learning and was higher-valued (V R = 15). Coexistence under these conditions can occur for example, when the highvalued resource (R2) has a long handling time that necessitates learning, while being relatively rare. Some individuals may then invest into high L, whereas others will instead explore fast and exploit less-valued but more abundant resources (R1). Due to negative frequency-dependence this pattern can also occur the other way around when we tweak the parameters a little, so as to make the more valuable R2 resource relatively easier to find. Most individuals then invest in high L and exploit the more valuable and now easierto-find R2 resources. Some individuals, however, will avoid competition and specialize on less-valued R1 resources even if they are hard to find. But since most other conspecifics will not exploit them (as they mostly overlook them while quickly exploring for easy-tofind R2) the less common 'slow explorer' can find relatively many R1 and thereby gain a similar payoff as fast-exploring and fast-learning individuals searching for R2 (Fig. 2).
Predation influenced co-existence of two cognitive styles as well. Within a wide range of parameter space, predation can hinder the co-existence, by making fast exploration less beneficial ( Fig. S2a-b). Moreover, predation can also make slow learning less beneficial ( Fig. S2 c), as slow learners are not able to learn sufficiently to reduce lethality of predators. Or, under conditions where even fast learning will not reduce predation sufficiently, fast learning styles are prevented (Fig. S2 d). Yet, within a narrow parameter space, predation can also induce co-existence (Fig. S3) by reducing the payoffs of a fast learning style, making a slow learning strategy competitive. (Although in one out of ten simulation runs, the co-existence of two styles collapsed due to the extinction of the fast learning strategy. This was likely caused by a combination of stochastic events and high predation pressure.) Coexistence can also occur when individuals of both cognitive styles show the same exploration tendency (E) (Fig. 2c and 2d). This can occur when both types of resources (R1 and R2) are easy to find and thus select for fast exploring (Fig. 2c). Some individuals may then specialize on more abundant R1, with low handling times but lower value. Other individuals invest in higher L and exploit R2, which need to be higher-valued. Thus, a fastexploring and slow-learning cognitive style can occur alongside a fast-exploring and fastlearning style within the same environment. Similarly, when both resource types are hard to find, two cognitive styles with low E can coexist if some individuals specialize on lowvalued but easy-to-exploit resources (R1) and others on high-valued but hard-to-exploit resources (R2) (Fig. 2d). These coexistences, which arise due to negative-frequency dependence, can be found in a moderately wide range of parameters space.
Finally, we expected to find the coexistence of different cognitive styles with the same learning strategy (L).. However, we could not find any parameter space in which either low learning could exist in combination with both high and low exploration, or in which fast learning strategy could exist in combination with both high and low exploration. Even though in our present model we could not find evidence for this, that does not mean that these styles could not coexist in any model or environment.

Discussion
We found that combinations of the environmental factors "resource composition" and "predation" can select for a variety of cognitive styles. Depending on the value of these factors, our results are in line with the overall predictions of the proactive-reactive framework [2]: under certain circumstances, proactive (reactive) individuals invest less (more) in learning abilities. However, under just slightly different environmental conditions, the patterns are reversed, thereby being consistent with findings which oppose 13 the predictions of the proactive-reactive framework.
How can we explain the emergence of these pattern in our simulations? For example, in dangerous environments, in which resources are easy to exploit and thus do not necessitate any learning, individuals can gain the highest fitness by adopting a risky strategy. Individuals which accept a higher predation risk can explore more and thereby collect more resource items if they manage to survive long enough. This style, which represents a more proactive behaviour type, comes to predominate in the population because shy (reactive) types collect few resources despite suffering less predation.
However, if circumstances allow for effective anti-predation learning, increased learning skills combined with high exploration tendencies become the most adaptive cognitive style. Such a fast learning and highly active cognitive style is opposed to what is commonly expected by the proactive-reaction framework, but has been found in several species [e.g. 13,14,31].
When resources are present for which an investment in higher learning abilities is needed in order to exploit them, a different set of cognitive styles can be found. Under these circumstances, fast learning strategies become adaptive if lifespans are long enough to allow for handling the resources through learning. Whether individuals show high or low exploration tendencies depends both on how easily resources are found and on how severe predation pressure is. Furthermore, we found under a large range of environmental conditions that different cognitive styles can co-exist within the same population. Due to specializing on a resource type and its interplay with optimal search pattern (exploration tendency), fast and slow "styles" can co-exist. Frequency-dependence of these styles may stabilize their coexistence as suggested by Boogert and colleagues [5, compare also 32]. For example, in one population some individuals can specialize on easy-to-find and easy-to-handle 14 resources and thus exhibit a slow learning / fast exploration style, whereas other individuals can exploit resources which are hard to find and require learning abilities, thus exhibiting a fast learning / slow exploration style. All other possible combinations of these two individual traits can co-exist under specific environmental circumstances in our simulations. These results can therefore help to explain why different studies find such a large variety of behaviour and cognitive styles in nature, even within the same study system and under similar environmental conditions. Furthermore, it is conceivable that in two studies either some uncontrolled variables of the environment can cause slightly different circumstances (e.g. small differences in predation pressure or in resource composition between two populations). Or, depending on the sampling regime, one of two or more co-existing cognitive styles may be captured more frequently in one study than another. When behavioural and cognitive tasks are conducted with these non-random subsets of individuals it will likely lead to different population-averages in performance.
In line with what has been suggested for individual specialisation in general [33], the coexistence of different cognitive styles may stabilize populations as microhabitats can more efficiently be occupied and within-species competition can be reduced as individuals with different styles, at least partly, exploit different resources. Inter-individual differences can also facilitate speciation [e.g. 34,35], underlining its importance for ecology and evolution in general.
In our simulations, predation strongly influences the existence of cognitive styles, as has previously been shown for behavioural syndromes [reviewed in 28]. Predation can cause the development / evolution of alternative styles in an otherwise similar environment. In general, predation reduces exploration tendency. But under some circumstances, this effect is not found [see also 29,36]. For example, lifespan can be so short that individuals need to have a high exploration tendency and face the risk of predation, because otherwise they may not collect any resources at all. Or, if learning of predation avoidance is efficient enough to render the predation risk negligible, high exploration becomes more adaptive.
Furthermore, predation can also break down the co-existence by making only one strategy adaptive under given circumstances. However, predation can also cause the co-existence of cognitive styles e.g. by reducing lifespans to such an extent that investment in learning becomes less profitable, thus rendering slow learning strategies competitive. These effects were found in a limited parameter space only, which, however, is in line with findings of predators' effects on co-existence of interspecific competitors [reviewed in 37].
It would be interesting to investigate how social learning may influence this pattern. For example, in group-living species, shy individuals may learn anti-predator behaviour by observing bolder or more explorative individuals coping with predator encounters. Thereby, slow explorer or shyer individuals could possibly reduce predation pressure without increasing their own predation risk by doing so. This could create an interesting interplay of the evolution of bold individual learners and shy social learners.
Of course, our simulations are based on many simplifications, which limits their transferability to natural systems. However, these simplifications allow to identify some more general principles.
We assumed that the trait "L" allows for learning in two different situations: anti-predator behaviour and handling resources. One might argue that this is an unjustified simplification as these situations represent cognitive problems from two different domains. Indeed, this could be a valid point. However, we are confident that even with two independently evolving learning traits our main findings would remain qualitatively the same, i.e. that different environmental conditions can select for all combinations of exploration-and learning-styles and that these styles could in principle co-exist in the same population. Yet, certainly the parameter-space under which similar strategies would be found will shift to some degree. And of course, with more evolving traits, we would likely find more cognitive styles e.g. some fast explorers which are good at anti-predator learning but slow at reducing resource handling times and vice versa.
Anyways, the assumption that learning abilities such as associative learning can be domain-general may not be an unjustified simplification after all. In fact, studies have shown that, at least in some taxa, animals show "general intelligence", meaning that species, or individuals, which score high in one cognitive task also score high in tasks of other cognitive domains [reviewed e.g. in 38]. It is conceivable that mechanisms such as simple associative abilities may allow to learn in different situations and that our simulation may be realistic in this regard.
We also want to point out that, although the models presented here are based on genetic adaptation, we would expect similar outcomes if adaptive phenotypes, in our case Anyhow, both plastic and genetic responses lead to phenotypes which are adapted to local conditions. Under similar conditions we would expect the same phenotypes to occur whether caused by genetic or plastic responses of the study system. We therefore expect that the general conclusions of the present study can be transferred to systems in which differences in cognitive styles are generated by plasticity.
In this study, we regard the interplay of five aspects: exploration, learning, environmental complexity (implemented as "resource composition"), predation pressure, and maximum lifespan. We chose these aspects because they are often investigated and discussed in regard to animal personality, coping or cognitive style. However, of course, many other aspects of the environment and the species living in it are likely to influence the evolution and development of cognitive styles. For example, instead of handling resources, other environmental aspects may need to be learned, such as navigation through space [40], or nest-building [41]. Also, when interacting with conspecifics, cognitive styles may strongly be influenced by social learning skills. If learning is involved in interactions with other intelligent agents such as conspecifics or predators, interesting dynamics may occur in the evolution of cognitive styles. This may be a worthy field of further investigations which may help to understand the evolution of animal intelligence in general.
As a final remark we want to point out that there has been much work done, both theoretical and experimental, on the co-existence of competing species and some general conclusions may be transferable to a within-species context. Thereby, the scientific younger field of individual differences (i.e. behavioural types, coping styles, animal personality or cognitive styles) may benefit from decade-long research of interactions between species. On the other hand, no such generalisations may be possible when within-species processes such as sexual selection or kin competition are involved.   Co-existence of different cognitive styles within the same environment. Each panel shows the result of one simulation as an example from 10 replicate runs.
All replicates produced qualitatively similar results. Each simulation was run with N=1000, G=500 and without predation. The only differences in parameter setting between panels were in resource detectability (DR) and season length (T).