Shai Ophir 
1. Cooperation and Game Theory
Ever since Darwin, the existence of an apparent contradiction to his evolutionary theory has raised the interests of naturalists all over the world: some cooperating animals are clearly mutualistic or even altruistic. Before the 1960s only a few scientists attempted to understand the evolutionary processes underlying cooperation, since group selection seemed to explain cooperative societies. Yet research in later years could not support a pervasive groupbenefit view of selection; how then can cooperative genotypes spread in an environment of selfish genes? Currently, accounts of the evolution of cooperation can be divided into several general categories: e.g. 1) byproduct mutualism where cooperation is an incidental outcome from genuinely selfish behavior, 2) kinselected altruism with its climax in the social insects and 3) reciprocal altruism among unrelated individuals. The Prisoner's Dilemma is used as the standard metaphor to conceptualize the conflict between mutual support and selfish exploitation among interacting nonrelatives in biological communities.
The Prisoner’s Dilemma game is issued from the Game Theory of John von Neumann (see [30]) and Oskar Morgenstern and has been introduced by RAND game theorists Merrill Flood and Melvin Dresher in 1952. The idea was to introduce some irrationality in Game Theory, which is used as a way of modelling interactions between individuals.
The game got its name from the following hypothetical situation: imagine two criminals arrested under the suspicion of having committed a crime together. However, the police do not have sufficient proof in order to have them convicted. The two prisoners are isolated from each other, and the police visit each of them and offer a deal: the one who offers evidence against the other one will be freed. If neither of them accepts the offer, they are in fact cooperating against the police, and both of them will get only a small punishment because of lack of proof. They both gain. However, if one of them betrays the other one, by confessing to the police, the defector will gain more, since he is freed; the one who remained silent, on the other hand, will receive the full punishment, since he did not help the police, and there is sufficient proof. If both betray, both will be punished, but less severely than if they had refused to talk. The dilemma resides in the fact that each prisoner has a choice between only two options, but cannot make a good decision without knowing what the other one will do.
This game has been found to be a very good way of studying cooperation and evolution of cooperation and thus a theory of cooperation based upon reciprocity has been set in a wide literature, such as in [1, 3, 4]. The experimental studies of the game and its strategies need a lot of time computation and thus with the progress of computers, a lot of computerscientists and mathematicians have studied it as they have been able to use specific methods, like genetic algorithms, on it. [9, 2, 5, 8, 19, 20, 21, 26, 27, 28, 31].
As cooperation is a topic of continuing interest for the social, zoological and biological sciences, a lot of works in those different fields have been made on the Prisoner’s Dilemma: [6, 7, 16, 17, 22, 23, 25, 29, 24]. Although the researchers who have studied the game come from different research fields, it could be said that they are all working on the same topic, which belong to the Artificial Life field, the bottomup study of Life.
2. The Iterated Prisoner’s Dilemma (IPD)
Let two artificial agents have the choice between cooperation and defection. They play one against the other, in a synchronous manner, so that they do not know what the other will play. They get a score according to the situation of the move:
· They both cooperate and both get the cooperation reward, let it be equivalent to R points;
· They both defect and both get the selfish punishment, let it be equivalent to P points.
· One chooses to defect while the other chooses to cooperate, then the one who has defected gets the selfish temptation salary, let it be T points, and the one who has cooperated gets the sucker score, let it be S points.
To have a dilemma, temptation must be better than cooperation, which must be better than punishment, which must be better than to be the sucker. This can be formalized as:
T > R > P > S
Since this oneshot version of the Prisoner's Dilemma is not very interesting (the most rational choice is to defect), the game is iterated, the final score being the sum of all the moves scored. Each player does not know how many moves there will be, thus each agent's strategy can be studied, to see, for instance, how each player tries to put cooperation in the game.
To avoid the oneshot Prisoner's Dilemma solution to influence strategies, by giving too much importance to temptation regarding cooperation, it is useful to add the following restriction:
the reward for cooperation is greater than the average of a defector’s and a sucker’s payoff.
2R > T + S
With this restriction, strategies have no advantage in alternatively cooperating and defecting. The classical payoff matrix used is shown in table 1.
Action of B \ Action of A 
Player A cooperates 
Player A defects 
Player B cooperates 
R=3 (reward for mutual cooperation) 
S=0 (sucker’s payoff) 
Player B defects 
T=5 (temptation to defect) 
P=1 (punishment for mutual defection) 
Table 1: Iterated Prisoner’s Dilemma (IPD) scoring for player B.
The two participants have two options: to cooperate or to defect. The payoff to player B is shown. Obviously, he gains more by defection not only if player A cooperates (5 instead of 3 points), but also if player A defects (1 instead of 0 points). Since this is also valid for player A, both end up defecting and score 1 point instead of 3 points each if they had cooperated. Hence the dilemma.
Using evolutionary dynamics software implementation, the IPD can be simulated as an iterated game, where the players (virtual or real players) learn from the results of previous iterations and dynamically select the appropriate strategy among various possible strategies. Strategies can be applied in an evolutionary process, where the fittest strategy survives.
3. The Emergence of Cooperation
Axelrod, the most famous researcher in the area of Evolution and Cooperation concludes his experiments with IPD strategies giving a recipe for a winning strategy:
The advice takes the form of four simple suggestions for how to do well in a durable Iterated Prisoner's Dilemma:
1. Don't be envious,
2. Don't be the first to defect,
3. Reciprocate both cooperation and defection,
4. Don't be too clever. [1, page 110]
When Axelrod writes that, he states that a strategy has to be understandable, hence simple, mainly for its opponents to understand that all that is required is the establishment of a cooperation period. He thinks that if the strategy clearly announces how it will act, then the other player will be able to cooperate more quickly.
Axelrod and Hamilton [4] used a computer tournament to detect strategies that would favor cooperation among individuals engaged in the IPD. In a first round, 14 more or less sophisticated strategies and one totally random strategy competed against each other for the highest average scores in an IPD of 200 moves. Unexpectedly, a very simple strategy did outstandingly well: cooperate on the first move and then copy your opponent's last move for all subsequent moves.
This strategy was called 'Tit for tat' (TFT) and became the founder of an ever growing amount of successful strategies. In a similar competition with 62 contestants, TFT won again. It has three characteristics that account for its impressive performance: it is nice (cooperates on the first move), retaliatory (punishes defection in the prior move with defection) and forgiving (immediate return to cooperation after one cooperative move of the adversary).
In order to study the behavior of strategies, two types of computation can be made.
The first one is a simple roundrobin tournament, in which each strategy meets all other strategies. Its final score is then the sum of all scores done in each confrontation. At the end, the strategy's strength measurement is given by its range in the tournament. This is the way the titfortat strategy has been isolated by Axelrod.
The second one is an evolutionary approach, where at the beginning the same number of agents hold each one of the strategies. A roundrobin tournament is made and then the population of inefficient strategies is decreased whereas efficient strategies obtain new elements. The simulation is repeated until the population has been stabilized, i.e. the population does not change anymore. It has been found that "nasty" strategies (those who take the initiative of the first defection) are less effective than "kind" ones!
Delahaye and P. Mathieu [1014] did some experiments with the evolutionary approach computation method.
Twelve strategies have been tested:
1. Cooperate: always cooperates
2. Defect: always defects
3. Random: cooperates with a probability of 0.5
4. Titfortat: cooperates on the first move and then plays what its opponent played on the previous move
5. Spite: cooperates until the opponent defects, then defects all the time
6. Per_kind: plays periodically [cooperate, cooperate, defect]
7. Per_nasty: plays periodically [defect, defect, cooperate]
8. Soft_majo: plays the opponent's most used move and cooperates in case of equality (first move considered as equality)
9. Mistrust: has the same behavior as titfortat but defects on the first move
10. Prober: begins by playing [cooperate, defect, defect], then if the opponent cooperates on the second and the third move continues to defect, else plays titfortat.
11. Gradual: This strategy acts as titfortat, except of the following: after the first defection of the other player, defects one time and cooperates two times; After the second defection of the opponent, defect two times and cooperate two times, and so on.
12. Pavlov: Cooperates on the first move and then cooperates only if the two players made the same move; this strategy was studied in [27].
Each game (experiment) consisted of 1000 moves, with the classical payoff shown in table 1.
The results were as follows: (the full results are in [11]).
Strategy Final Score 
Gradual 33416 
Titfortat 31411 
Soft majo 31210 
Spite 30013 
Prober 29177 
Pavlov 28910 
Mistrust 25921 
Cooperate 25484 
Per kind 24796 
Defect 24363 
Per nasty 23835 
Random 22965 
The results of these experiments show that strategies with an inclination to cooperate gain more than the "nasty" ones! Gradual, TitforTat, and Soft_majo (the leading three strategies) all have a strong tendency to cooperate, while Defect and Per_nasty are defectoriented (and Random is simply a dumb strategy).
Several other experiments have been made with virtual systems in order to show the emergence of cooperation among software agents. The conclusions derived from such results are farreaching, and are beyond the scope and the goal of this paper. I’ll just mention Daniel Dennett [15], who explains that Darwin’s idea is considered dangerous because it challenges the belief that there is something special about life, and in particular about human life, consciousness, emotions, etc. Instead, it shows how these are the result of billions and billions of applications of a simple, mindless, mechanistic process.
4. Simulating Ethics
A major question I’d like to raise is whether norms of ethics can emerge in a virtual system, and how it may affect our Philosophy.
Let’s examine a simple game called the Ultimatum game. A cake is divided into ten pieces. Player A can demand either five or nine pieces, player B can either accept or reject the proposal. We will analyze this game within the context of evolutionary game theory.
What are the strategies applicable for this game? Each player can be either A or B. While being A there are only two options: Demand5 and Demand9. Being B enables four options: Accept all, Reject all, Accept if 9 is demanded but reject if 5 is demanded, and Accept if 5 is demanded but reject if 9 is demanded. In total, we have 2X4 strategies for a certain player, where each strategy tells her how to behave while being A or B.
We can summarize the strategies in the following table:
IF BEING A 
IF BEING B 

S1 
Demand 9 
Accept All 

S2 
Demand 9 
Reject All 

S3 
Demand 9 
Accept 5, Reject 9 

S4 
Demand 9 
Accept 9, Reject 5 

S5 
Demand 5 
Accept All 

S6 
Demand 5 
Reject All 

S7 
Demand 5 
Accept 5, Reject 9 

S8 
Demand 5 
Accept 9, Reject 5 
This game has been simulated by Brian Skyrms [32], using "genetic algorithms". In this method, replication is governed by success, judged by some standards relevant for the problem. Once in a while, the code of the application is cut into two pieces, and the pieces are swapped between other programs, creating new programs. Most of these programs will be useless, but some will be successful. The task for the programs is predefined, so there are clear criteria whether the program is successful or not. This "crossover" takes place again, until a few of the generated programs are approaching closer and closer towards the target. This method has been brought into the computer sciences by John Holland [18] and expanded by John Koza [19].
How can genetic algorithms be applied to the Ultimatum game?
We can view strategies as strings. Thus, the strategy If A: demand 9; If B: accept 5, reject 9 has the following substrategies: If A: demand 9 and if B: accept 5, reject 9. In addition, the second substrategy can be split into: If B and 5 is required, accept it and If B and 9 is required, reject it. The idea is to cut the strategies into the smallest pieces and recombine them arbitrarily, and to create as many new strategies as possible. After a cycle of recombination the strategies are played against each other and only the fittest will survive.
Skyrms performed several experiments with this game. In each experiment he initiated groups of agents with different strategies, let them play against each other, and operated the recombination again and again. In many of the experiments he observed a persistence of the strategy S7. This strategy (known later as Fairman) is the fairest strategy among the eight!
In most cases Fairman was not the winning strategy. S1 and S4 did better when starting with a population with equal proportions of the strategies. But it did not go extinct, unlike other strategies, S2 for example. In certain initial conditions, the range of population holding Fairman has been expanded. One experiment started with 30% of the population using Fairman, with the remaining strategies having equal proportions of the rest of the population. Fairman gained 64% of the population, while S1 and S4 did not survive. In another experiment, the initial proportions of the population for S1S8 strategies were <.32,.02,.10,.02,.10,.02,.40,.02> , respectively. The results state show 56.5% for Fairman and 43.5% for S5.
The above does not pretend to be a full evolutionary explanation of the fairness effect. Fairness is not a rational strategy, in a sense that if confronted with an unfair offer, it chooses the lower payoff. However the above shows that such a strategy can persist in evolution.
The simulations I described, as well as many others, show that Kant’s categorical imperative does pretty well as a strategy! The cooperative strategies actually follow the rule: "Act only so that if others act likewise fitness is maximized." Strategies that violate this imperative are driven to extinction. The emergence of cooperation can show how human morality has emerged from the complex system of life and society. Complexity theory is trying to understand complex systems "bottomup", from the base elements and their relations up to the general phenomena, in oppose to the traditional "topdown" analysis, which is looking at the logic and mechanism that controls the behavior of the system.
5. The Potential of Simulating Complex Systems
Agentbased modelling of complex systems is not limited to game theory. Various simulations usually contain a physical landscape and a definition of environmental conditions. The agents are grouped into socialrelated groups and are assigned basic rules of social behavior. The agents then are associated with a range of alternative states and then subjected to competitive fitness tests.
Such ‘artificial societies’ will hopefully open new windows into the kind of variables that are influencing the evolution of human behavior today in the contemporary world. Models are being developed today in order to explain the evolution of cultures, the creation of sociological structures and the behavior of economic systems.
The new discipline of complexity brings new insights into our philosophical discussions and problems. Today we are using neural networks in order to simulate brain behavior. We cannot define the rules that govern the brain, but we can try to simulate it based on the elementary relations between neurons. The same can be applied for ethics, as well as for other philosophical issues.
References
[1] R. Axelrod. The Evolution of Cooperation. Basic Books, New York, 1984.
[2] R. Axelrod. The Evolution of Strategies in the Iterated Prisoner's Dilemma. In L. Davis, editor, Genetic Algorithms and the Simulated Annealing, chapter 3, pages 3241. Pitman, London, 1987.
[3] R. Axelrod and D. Dion. The Further Evolution of Cooperation. Science, 242:13851390, 1988.
[4] R. Axelrod and W. D. Hamilton. The Evolution of Coperation. Science, 211:13901396, 1981.
[5] S. Bankes. Exploring the Foundations of Artificial Societies. In R. A. Brooks and P. Maes, editors, Artificial Life, Proc. 4th International Workshop on the Synthesis and Simulation of Living Systems, volume 4, pages 337342. MIT Press, 1994.
[6] J. Batali and P. Kitcher. Evolutionary Dynamics of Altruistic Behavior in Optional and Compulsory Versions of the Iterated Prisoner's Dilemma. In R. A. Brooks and P. Maes, editors, Artificial Life, Proc. 4th International Workshop on the Synthesis and Simulation of Living Systems, volume 4, pages 344348. MIT Press, 1994.
[7] J. Bendor. In Good Times and Bad: Reciprocity in an Uncertain World. American J. of Political Science, 31:531558, 1987.
[8] Robert Boyd and Jeffrey P. Lorberbaum. No Pure Strategy is Evolutionarily Stable in the Repeated Prisoner's Dilemma Game. Nature, 327:5859, 1987.
[9] E. Ann Stanley D. Ashlock, M. D. Smucker and L. Tesfatsion. Preferential Partner Selection in an Evolutionary Study of Prisoner's Dilemma. Economics R. No 35, Submitted for publication, 1994.
[10] J.P. Delahaye. L'altruisme R'Ecompens'e? Pour La Science, 181:150156, 1992.
[11] J.P. Delahaye and P. Mathieu. Exp'eriences Sur le Dilemme It'er'e des Prisonniers. Rapport de Recherche 233, LIFL Lille CNRS (URA 369), 1992.
[12] J.P. Delahaye and P. Mathieu. L'altruisme Perfectionn'e. Pour La Science, 187:102107, 1993.
[13] J.P. Delahaye and P. Mathieu. L'altruisme Perfectionn'e. Rapport de Recherche 249, LIFL Lille CNRS (URA 369), 1993.
[14] J.P. Delahaye and P. Mathieu. Complex Strategies in the Iterated Prisoner's Dilemma. In A. Albert, editor, Chaos and Society, Amsterdam, 1995. IOS Press.
[15] D. Dennett. Darwin's Dangerous Idea: Evolution and the Meaning of Life. Simon and Schuster, NY, 1995.
[16] M. R. Frean. The Prisoner's Dilemma Without Synchrony. Proc. Royal Society, London, 257(B):7579, 1994.
[17] H. C. J. Godfray. The Evolution of Forgiveness. Nature, 355:206207, 1992.
[18] J. Holland. Adaptations in Neural and Artificial Systems. Ann Harbor: University of Michigan Press, 1975.
[19] J. Koza. Genetic Programming: On the Programming of Computers by Natural Selection. Cambridge, Mass.: MIT Press, 1992.
[20] D. Ashlock M., D. Smucker, E. Ann Stanley. Analyzing Social Network Structures in the Iterated Prisoner's Dilemma with Choice and Refusal. RR. CS TR941259, University of WisconsinMadison, Department of ComputerSciences, 1994.
[21] C. Martino. Emergent Nastiness in Iterated Prisoner's Dilemma Games. 2.725: Design and Automation, 1995.
[22] R. M. May. More Evolution of Cooperation. Nature, 327:1517, 1987.
[23] P. Molander. The Optimal Level of Generosity in a Selfish, Uncertain Environment. J. of Conflict Resolution, 29(4):611618, 1985.
[24] M. Nowak. Stochastic Strategies in the Prisoner's Dilemma. Theoretical Population Biology, 38:93112, 1990.
[25] M. Nowak and K. Sigmund. The Evolution of Stochastic Strategies in the Prisoner's Dilemma. Acta ApplicandaeMathematicae, 20:247265, 1990.
[26] M. Nowak and K. Sigmund. Tit for Tat in Heterogeneous Populations. Nature, 355:250253, 1992.
[27] M. Nowak and K. Sigmund. A Strategy of WinStay, LoseShift that Outperforms TitforTat in the Prisoner's Dilemma Game. Nature, 364:5658, 1993.
[28] M. Oliphant. Evolving Cooperation in the NonIterated Prisoner's Dilemma. In R. A. Brooks and P. Maes, editors, Artificial Life, Proc. 4th International Workshop on the Synthesis and Simulation of Living Systems, volume 4, pages 350352. MIT Press, 1994.
[29] R. Pool. Putting Game Theory to the Test. Science, 267:15911593, 1995.
[30] W. Poundstone. Prisoner's Dilemma : John von Neumann, Game Theory, and the Puzzle of the Bomb. Number 019286162X. Oxford University Press, Oxford, 1993.
[31] Xin Yao and P. J. Darwen. An Experimental Study of NPerson Iterated Prisoner's Dilemma Game. Informatica, 18:435450, 1994.
[32] B. Skyrms. The Evolution of the Social Contract. Cambridge University Press, 1996.