Machines That Play (Building Chess Machines)

Written by samin | Published 2018/09/21
Tech Story Tags: artificial-intelligence | engineering | chess | chess-machine | machine-learning

TLDRvia the TL;DR App

Machines That Play series has been broken into 7 parts. This is Part 2 of the series.

This series covers the history of Artificial Intelligence and games (until Deep Blue) and focuses on machines that played chess, checkers, and backgammon. The following topics are covered: how to build chess machines, Shannon’s work on chess, Turing’s work on chess, The Turk, El Ajedrecista, MANIAC, Bernstein chess program, Samuel’s checkers, Mac Hack VI, Cray Blitz, BKG, HiTech, Chinook, Deep Thought, TD-Gammon, and Deep Blue.

Part 1: Machines That Play (Overview)

Part 2: Machines That Play (Building Chess Machines) — this one

Part 3: Machines That Play (Chess — Before Deep Blue)

Part 4: Machines That Play (Deep Blue)

Part 5: Machines That Play (Post Deep Blue)

If you want a summary of the first 5 parts, focusing on the human elements, go here.

Part 6: Machines That Play (Checkers)

Part 7: Machines That Play (Backgammon)

Part 2: Machines That Play (Building Chess Machines)

It will cover 1) why we wanted to build chess machines, 2) high level strategies to building chess machines and 3) progress in this field from 1940s-1990s.

Why chess?

Why would anyone want to teach a machine how to follow some arbitrary man-made rules to move a bunch of wooden pieces on a checkerboard, with the sole aim of cornering one special wooden piece? It’s so human to attack and capture pawns, knights, bishops and the queen to finally corner the king into an inescapable position. Those are our actions, that is our goal and we balance strategy and tactics to suit it. Then why teach machines to play chess?

Chess is an old game. It is believed to have originated in Eastern India (280–550). It reached Western Europe and Russia in the 9th century and by the year 1000, it had spread throughout Europe. It became popular and writings about chess theory (of how to play chess) began to appear in the 15th century.

Throughout history, many people have said many different things about chess and its connection to our intelligence and our mind.

  • During the Age of Enlightenment, chess was viewed as a means of self-improvement. Benjamin Franklin (1750) said, The Game of Chess is not merely an idle amusement; several very valuable qualities of the mind, useful in the course of human life, are to be acquired and strengthened by it, so as to become habits ready on all occasions; for life is a kind of Chess, in which we have often points to gain, and competitors or adversaries to contend with, and in which there is a vast variety of good and ill events, that are, in some degree, the effect of prudence, or the want of it. By playing at Chess then, we may learn: I. Foresight, which looks a little into futurity, and considers the consequences that may attend an action […], II. Circumspection, which surveys the whole Chess-board, or scene of action: — the relation of the several Pieces, and their situations […], III. Caution, not to make our moves too hastily […]
  • Chess is the touchstone of intellect.” — Johann Wolfgang von Goethe (1749–1832)
  • “The public must come to see that chess is a violent sport. Chess is mental torture.” — Garry Kasparov (1990s)
  • “As a chess player one has to be able to control one’s feelings, one has to be as cold as a machine.” — Levon Aronian (2008) (interview)

Chess has long been regarded as the game of intellect. And many people argued that a machine that could successfully play chess would prove that thinking can be modeled/understood or that machines can be built to think. And that is exactly why people wanted to build a machine that could follow some arbitrary rules of our game and become so good at it that it could one day beat us at it.

Norbert Wiener in 1948 book Cybernetics asked “whether it is possible to construct a chess-playing machine, and whether this sort of ability represents an essential difference between the potentialities of the machine and the mind.

It turned out that we would build a successful chess-playing machine in 1997 and it would beat the best of us, but it wouldn’t necessarily help us understand our mind — its way of playing would range from being extremely human to being almost alien and it would surely challenge our conception of what to call intelligence.

Playing the game

Chess is a two-player zero sum game (i.e. if one player wins, then the other player loses) with perfect information (i.e. in which players know, at each time, all moves that have taken place at any given point).

How do we usually play this game? We do the following:

  1. Consider all the legal moves that a player can make
  2. Compute the new position resulting from each move
  3. Evaluate to determine the next best move
  4. Make that (best) move
  5. Wait for opponent to make a move
  6. Respond by repeating above steps

From this perspective, almost all chess computers must deal with these fundamental steps. And in doing that, a chess computer would have to address the following key problems:

  1. Representing the “board”
  2. Generating all legal next states
  3. Evaluating a position

Let’s look at each in detail:

Representing the “board”: A chess computer would need to represent the chessboard and the pieces that occupy squares on the board, i.e., it would need to represent a single position in data structures. There are several ways of doing this — see here.

Before we talk about the next two steps, let’s understand why building a chess computer is hard. A chess computer would receive a given chess position as input and it would need to calculate the next best move. A lot of difficulty is here because of the complexity of chess.

  1. Each player has 16 pieces to play with and 20 possible moves.
  2. Suppose White starts the game — it has 20 possible moves (Black also has the same options — 20 possible moves).
  3. Total Number of Moves For White = 20.
  4. Moves for Pawns = 8 + 8 = 16 (one step) (2 steps).
  5. Moves for Knights = 2 + 2 = 4.
  6. Total = 20.
  7. Therefore, at level 1: 20 possible moves for White.
  8. Then, at level 2: 20 * 20 = 400 possible moves for Black, depending on what White does.
  9. At level 3: 400 * 20 = 8,000 for White.
  10. At level 4 : 8,000 * 20 = 160,000 for Black……and so on.
  11. For all possible positions, the computer would need to evaluate 10^120 possible moves! (Side note: Go has ~ 10^700 moves!!). Massive!

HowStuffWorks

In other words,

Complexity of a chess game

The problem space for chess can represented by a game tree, where nodes represent board positions and edges represent legal moves. The complete game tree for a game is the tree starting at the initial position (start of the game), containing all possible moves from each position, and ending at terminal position (end of the game: win/lose/draw). But chess has about 10^120 moves (and about10^40 nodes)!!! So the best way to think about a game tree is as a theoretical construct — it cannot be realized in the real world, it’s just too massive. But a player (human or computer) still needs to find a good move. This means a chess computer still needs to search some tree which is not too massive, but still has enough (relevant) nodes that can be examined to allow it to determine a next “good” move.

Games, including chess, are difficult because the search trees become astronomically large.

Game complexity

Here are some ways to measure game complexity.

The state-space complexity of a game is the number of legal game positions reachable from the initial position of the game.

The game tree size is the total number of possible games that can be played: the number of leaf nodes in the game tree rooted at the game’s initial position.

The branching factor is the number of children at each node. For example, chess, suppose that a “node” is considered to be a legal position, then the average branching factor is estimated to be about 35. This means that, on average, a player has about 35 legal moves available at each turn. By comparison, the average branching factor for the game Go is 250!

A quick look at game complexity of some commonly known games as well as their status:

Optimal status: it is not possible to perform better (some of these entries were solved by humans)

Super-human: performs better than all humans

Complexity of some games

What, then, does a chess computer have to do in order to beat the world champion?

Good game programs, therefore, must: 1) prune irrelevant branches of the game tree, 2) use good evaluation functions, and 3) look ahead as many moves as possible (and as fast a possible).

In particular, chess computers would need to do the following well:

Generating all legal next states: A chess computer would need to know how to identify possible moves and select the most promising moves for further analysis (i.e. prune out bad moves and keep good moves). Since it is impractical (actually impossible!) to consider every possible move, a computer must make choices by its use of search techniques — these are algorithms to select the best move. This is where a lot of the earlier artificial intelligence innovations took place. For example, a chess computer can evaluate 2 or 5 or 10 or 20 moves ahead in advance. And the depth of the tree it can generate depends on the speed of the computer — the faster the computer generates the moves, the better the performance. Hence the emphasis on computational power.

Evaluating a position: Once a chess computer generates the tree, it needs to evaluate the positions. When the search space is too large, the game tree can be created to a certain depth only, i.e. a computer cannot look ahead to all possible end (win/lose/draw) positions. It must, instead, look ahead a few plies (half moves) and compare the possible positions, i.e. it must correctly evaluate the value of a board position, given no further search will be done at that position. This is done via the evaluation function, which assigns real number scores to these board positions. This evaluation function (and the algorithms) are often vastly different between different chess programs. It is extremely complex to tell how good a position is. For example, Deep Blue examined 200 million positions per second and used very sophisticated evaluation — it had over 8000 features as part of its function.

Many chess computers also added endgame databases. These are databases of pre-calculated position evaluation of hundreds (or thousands or millions) of past games from top players. These endgame databases would mainly be used for openings and endgames.

There were two main philosophical approaches to developing chess computers: emulation vs. engineering — should computers emulate human knowledge and decision-making or should computers improve search via brute force? Those focusing on the first approach would build programs that had a lot of chess knowledge and a relatively smaller focus on search. Those focusing on the engineering approach would focus on computational power, by using special-purpose hardware, and search innovations. We’ll see that the best chess computers used the second approach, but even they ended up using a lot of chess knowledge and sophisticated evaluation heuristics.

Keep this framework in mind while reading through the remaining parts.

Optional: Below is a brief summary of progress in this direction. Key events. Key people. Key discoveries. Key machines. Details in other parts.

1950s: Playing at a very basic level

Computing power was limited in the 1950s, so machines could only play at a very basic level. In this period researchers developed the fundamental techniques for evaluating chess positions and searching best possible moves, considering opponent’s possible counter-moves. These ideas are still in use today.

  • In 1949–1950, Claude Shannon wrote the first article on computer chess titled “Programming a computer to play chess”. He described how to program a machine to play chess by scoring positions and selecting the next move. His work provided a framework for all future research in computer chess playing.
  • In 1950, Alan Turing wrote the first computer chess program. And he wrote it before computers even existed! He knew computers were coming and once they were powerful enough, they would be able to play chess. The same year he proposed the Turing Test. In 1951, Turing tried to implement his program “Turochamp” on the Ferranti Mark 1 computer. He never finished.
  • In 1952, Alick Glennie, who wrote the first computer compiler, defeated Alan Turing’s chess program. He was the first person to beat a computer program at chess.
  • In 1956, Mark Wells wrote a program to simulate “Los Alamos Chess,” a bishop-less, 6'x6' version of chess. In doing so, MANIAC became the first computer to run a chess program. It took 12 minutes to search a four moves depth (adding the two bishops would take three hours to search at the same depth).
  • In 1957, mathematician Alex Bernstein, an employee at IBM, created the first complete chess program for an IBM 704 machine. The program took 8 minutes to search a four moves depth and could play a full chess game.
  • Search innovation (1950): Minimax search (Shannon, Turing)
  • Search innovation (1956): Alpha-beta pruning (McCarthy)

1960s: Drosophila of AI

In the 1960s, AI pioneers Herbert Simon and John McCarthy referred to chess as “the Drosophila of AI”, which meant that chess, like the common fruit fly, represented a relatively simple system that could also be used to explore larger, more complex real-world phenomena.

By the end of the 1960s, computer chess programs were good enough to occasionally beat against club-level or amateur players.

  • In 1962, Alan Kotok, assisted by John McCarthy, developed the Massachusetts Institute of Technology’s (MIT) first chess program, now known as the Kotok-McCarthy program. The program ran on an IBM 7090 and it took about 5 to 20 minutes per move, depending on the complexity of the situation. The program could beat amateur chess players. During the Cold War, Kotok-McCarthy played (and lost to) the best Russian chess program.
  • In 1965, researchers from the Institute for Theoretical and Experimental Physics developed a chess program in Moscow. In 1966, this Russian program began a match with Kotok-McCarthy. The match lasted nine months and was won by the Russian program (3–1).
  • In 1967, Mac Hack Six, by Richard Greenblatt (introduced transposition tables) and became the the first program to defeat a person in tournament play. In 1965, Hubert Dreyfus said, “No chess program could play even amateur chess.” In 1967, Mac Hack VI beats Dreyfus.
  • In 1968, McCarthy and Mitchie bet David Levy (International Master) $1,000 that a computer would defeat him by 1978.
  • Search innovation (1966): Alpha-beta pruning (Kotok, McCarthy)
  • Search Innovation (1967): Transposition tables (MacHack)

1970s-1980s: Growing up (fast)

In the 1950s and 1960s, early pioneers focused on chess heuristics (rules of thumb) to choose the best next moves. Programs in 1970 also used heuristics, however, there was a much stronger focus was on software improvements and faster and more specialized hardware. Main chess programs used transposition tables (or hash tables) which stored information about positions the program had already searched. So, if the same position is reached on the board, the program wouldn’t need to search again; it would use the previously generated information. This customized hardware and software allowed programs to conduct much deeper searches of game trees (involving millions of chess positions), something humans did not do (because they could not) .

The 1980s brought an era of low cost chess computers. First microprocessor-based chess programs started becoming available. Because of the availability of home computers and these programs, anyone could now play chess (and improve their game) against a machine. By the mid-1980s, the sophistication of microprocessor-based chess software had improved so much they began winning tournaments — both against supercomputer based chess programs and some top-ranked human players.

Additionally, several search innovations were introduced: iterative deepening (searched down to a certain level of the game tree), opening books (included rules or sequences of moves, which experts knew to be good, to start a game), and endgame databases (contained sequences of moves that were good for ending a game).

  • In 1970, the Association for Computing Machinery (ACM) organized the first North American Computer Chess Championship. Six chess programs took part in this event. The ACM chess events were cancelled in 1995 because Deep Blue was preparing for the face-off against the world chess champion Garry Kasparov.
  • In 1974, the World Computer Chess Championships (WCCC) began. Thirteen chess programs participated in the first tournament, which was held in Stockholm, Sweden.
  • In 1977, Chess 4.6 became the first chess computer to be successful at a major chess tournament.
  • In 1977, the International Computer Chess Association (ICCA) was founded by computer chess programmers. That year, Grandmaster Michael Stean became the first grandmaster to lose against a chess program in a blitz game (chess games with five minutes for each player). It was much easier for a computer to beat chess Grandmasters in a blitz game. It would take another 11 years before a computer would finally beat a Grandmaster in a regulation game.
  • 1978: No machine has been able to beat Levy (Levy defeated Chess 4.7 by 4.5–1.5). Levy wins the bet, but a machine does win the first ever game against him.
  • In 1981, Cray Blitz won the Mississippi State Championship with a perfect 5–0 score and a performance rating of 2258. In round 4 it defeated Joe Sentef (2262) to become the first computer to beat a master in tournament play and the first computer to gain a master rating.
  • In 1982, Ken Thompson’s hardware chess player Belle earned a US master title and a performance rating of 2250. (Side note: Ken Thompson is the creator of UNIX operating system).
  • 1983, 1986: Cray Blitz (software by Robert Hyatt) won consecutive World Computer Chess Championships.
  • In 1985, Chiptest is built by Feng-hsiung Hsu ,Thomas Anantharaman, and Murray Campbell. It used a special VLSI move generator chip and could execute 50,000 moves per second
  • In 1987, Chiptest-M is built — the Chiptest team removed bugs in the chip. It could now execute 500,000 moves per second.
  • 1988: HiTech by Hans Berliner of CMU defeated grandmaster Arnold Denker in a match.
  • In 1988, Deep Thought by Hsu and Campbell of CMU shared first place with Tony Miles in the Software Toolworks Championship, ahead of several grandmasters including. It also defeated grandmaster Bent Larsen in a tournament game, making it the first computer to beat a grandmaster in a tournament. It could execute 700,000 moves per second. It obtained the performance rating of 2745, the highest rating obtained by a computer player. IBM took over the project and hired the people behind it.
  • In 1989 Deep Thought lost two exhibition games to Garry Kasparov.
  • Search innovation (1975): Iterative-deepening (Chess 3.0+)
  • Search innovation (1978): Special hardware (Belle)
  • Search innovation (1983): Parallel search (Cray Blitz)
  • Search innovation (1985): Parallel evaluation (HiTech)
  • Speed progress (1980s): In the 1980s a microcomputer could execute a little over 2 million instructions per second.

1990s: Childhood’s end?

In 1990s, chess programs began challenging International Chess Masters and later Grandmasters. These programs relied much more on memory and brute force than on strategic insight, and they consistently started to beat the best humans. Some dramatic moments in computer chess occurred in 1989 — two widely-respected Grandmasters were defeated by CMU’s Hitech and Deep Thought.

Researchers felt machines could finally beat a World Chess Champion. This got IBM interested so they began working on this challenge in 1989 and built a specialized chess machine, named Deep Blue. In 1997, it would end up beating Garry Kasparov, the best human chess player. So, did it think? And was it intelligent?

IBM researchers were awarded the Fredkin prize, a $100,000 award for the first program to beat a reigning world chess champion, which had gone unclaimed for 17 years. This (and later) victories in computer chess were disappointing to many AI researchers because they were interested in building machines that would succeed via “general intelligence” strategies rather than brute-force. In this sense, chess had started to decouple from AI research.

  • From 1991–1995, IBM worked on Deep Thought II, which was a stepping stone to Deep Blue. It was a 24 processor system and the Deep Thought software was rewritten to handle parallelism.
  • In 1992, the ChessMachine Gideon 3.1 (a microcomputer) by Ed Schröder won the 7th World Computer Chess Championship in front of mainframes, supercomputers and special hardware.
  • From 1992–1998: Chess Genius was a series of chess engines by Richard Lang, written for various processor architectures. The first version in 1992 was released as PC program running under 16-bit MS-DOS. These programs won the World Microcomputer Chess Championship in 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991 and 1993.
  • In 1993, Deep Blue, a highly specialized chess machine, lost a four-game match against Bent Larsen.
  • In 1994, during the Intel Grand Prix Cycle, Chess Genius 3, operated by Ossi Weiner, won a speed chess game (25-minutes per side) against Garry Kasparov and drew the second game, knocking Kasparov out of the tournament. This was the first match (speed-chess) that Kasparov ever lost to a computer. In the next round, Chess Genius 3 beat Predrag Nikolić, but then lost to Viswanathan Anand. Unlike Deep Blue, which was running on massively parallel custom-built hardware, ChessGenius was running on an early Pentium PC.
  • In 1996, Deep Blue lost a six-game match against Garry Kasparov (4–2): a) 1.6 to 2 million positions per second per chip, b) 50 to 100 million positions per second for system.
  • In 1997, Deep Blue (3.5–2.5) won a six-game match against Garry Kasparov . Kasparov’s loss to Deep Blue was his first ever chess match loss in his entire life: a) Enhanced chess chip with 2 to 2.5 million positions per chip, b) 200 million positions per second.
  • Speed progress (1990s): By the 1990s computers were executing over 50 million instructions per second.

Read about chess machines until Deep Blue in Part 3: Machines that Play (Chess — Before Deep Blue).

Read about Deep Blue here: Part 4: Machines That Play (Deep Blue) and Part 5: Machines That Play (Post Deep Blue)

Now:

  • No human can play the best computers on equal terms. We (humans) have had well over 1000 years to invent, play, and understand chess — machines have had less than 60 years and they are much much better than we are.
  • Speed progress (now): Processors in our tables and phone can execute over a billion instructions per second.

Luke Muehlhauser Historical chess engines’ estimated ELO rankings

Chess (Jersey City)


Written by samin | Math/physics to AI & data (15+ years in data science/products www.reasonets.com medium.com/@aminshreyaa
Published by HackerNoon on 2018/09/21