44

Investigating Kasparov Chess Moves with Scikit-Learn

 5 years ago
source link: https://www.tuicool.com/articles/hit/MNb2qmb
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

I am practicing my machine learning (ML) skills and became curious if chess moves could be predicted, somehow.  Crazy right? ;-) tldr: chess.py

For aprevious project, I collected 547 Gary Kasparov games, in PGN format.  For this project there are 21,088 Kasparov-only moves, and I want to look at the prediction accuracy for increasing numbers of moves.

My hypothesis is that this will drop sharply as the number of moves increases, since the complexity or number of unique moves increases over time.  So, I would be happily surprised, if not amazed, if a prediction accuracy of 20% were even possible.

So how do you represent a populated chessboard full game in a format that is digestible by a machine learning algorithm?  How do you represent the target of the move to predict?  Well what I chose to do is to flatten the Forsyth-Edwards Notation (or FEN) into a comma separated value (or CSV) row.  Each piece is encoded as an arbitrary number and unoccupied squares are zero.  I chose white p=10, r=11, n=12, b=13, q=14, k=15 and black P=20, R=21, N=22, B=23, Q=24, K=25.

For the target, I chose to use the standard algebraic notation (or SAN) of the moves.   (But as of this writing, I wonder if using alternate chess move notations would change prediction accuracy. Hmmm…)

Also, I add the number of the move to the beginning to the list.  So for the initial game setup and an initial move we have:

1,11,12,13,14,15,13,12,11,10,10,10,10,10,10,10,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,20,20,20,20,20,20,20,20,21,22,23,24,25,23,22,21,e4

For a full game, we have rows 1, 3, 5, etc. when Kasparov is the white player and 2, 4, 6, etc. when he plays black.  For every game in the collection, where every move is considered, there are 21,088 rows (as mentioned above).

With this CSV data in hand we can analyze it with various ML algorithms.

First thing is to break the data into training and testing sets.  I do this with the scikit-learn train_test_split() function .  Next is to train different models and check out their accuracy!  (Please see the code for details.)

The first model I tried is k-nearest neighbors (or KNN).  Here is a snip of the code for it:

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print metrics.accuracy_score(y_test, y_pred)

This yields 0.12101669195751139 for all moves in game.  A 12% prediction accuracy.  I expected this!  And when looking at the accuracy graph for k=1 to 25, the line drops directly as k increases.  Again, I expected this…

YjU7Bzi.png!web

The next model tried is Multinomial Naive Bayes .  This results in a 7.2% prediction accuracy.  Oof.

How about a Support vector machine ?  That’s a bit better with an 15.4% prediction accuracy.

Ok lastly let’s look at a Decision tree : For all moves, that weighs in at a whopping 17.1%! Woo.

What does the accuracy graph look like given the number of moves?  For example, what is the prediction accuracy when considering only the first moves?  What about as the number increases?

Here is a snip that performs that (given the function definitions defined in the code ):

k_range = range(1, 15)
scores = []
for k in k_range:
    create_csv(k)
    X_train, X_test, y_train, y_test = train()
    accuracy = decision_tree(X_train, X_test, y_train, y_test)
    print accuracy
    scores.append(accuracy)
 
plt.plot(k_range, scores)
plt.xlabel('Moves')
plt.ylabel('Accuracy')
plt.show()

And here is the graph:

2IRzym6.png!web

Dismal!  ;-)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK