-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training the engine to play custom variants #29
Comments
Hi @bkitej, As you stated correctly, the python MCTS version of CrazyAra uses python-chess for move generation and the C++ MCTS version is based on Multi-Variant-Stockfish. Python-chess is also used to prepare the training data in the case of supervised learning. I don't know your variant, but I assume that the games will mostly play out as regular chess games and only in a few, special cases it will be superior to capture their own piece. The neural network input representation and output representation would be the same as for regular chess. You could register your own variant in python chess or in Stockfish accordingly and modify the respective move-generation:
I assume there aren't any games available for your variant. Therefore supervised learning isn't possible.
Of course you could theoretically learn your variant from scratch or continue learning based on a given supervised network via selfplay. This however is rather computational intensive and would require an implementation in C++. There is also Fairy-Stockfish which supports a few more shogi-variants at the cost of slightly lower performance. In principle this could also be used as a move-generation back-end instead of Multi-Variant-Stockfish for the MCTS C++ version. I suggest that you first create your variant in python-chess and if this works correctly, you can either implement it in C++ and extend Mutli-Variant-Stockfish or add it to one of CrazyAra's MCTS versions. The current neural network weights in the official releases only support crazyhouse so far. |
Thank you for your very quick response! For a little background, I'm an undergraduate student in statistics, and this is for a project-based class. I have a few classes in ML under my belt, almost exclusively in Python, my language of choice.
I've seen it once referred to as "friendly fire chess" online, but otherwise it doesn't seem to be an established variant at all.
Correct!
The focus of my project is to discover these emergent behaviors. My own hypothesis is that games will become more materialistic, as checkmates can be escaped in more situations, at the cost of a weaker late game. How often it diverges from classical chess is another point of interest.
This is likely the best direction to take. I've noticed there are a few Stockfish projects on github, is there a pure Python implementation you would recommend?
I'm an undergraduate senior with virtually no knowledge of C++, but I do have access to my school's high-performance computing cluster... Does that brute-force approach sound like a viable alternative to writing it in C? I don't mind if the move generation itself takes up to a minute or so.
Can you tell me more about CrazyAra's MCTS versions? As in, if I'm able to successfully program my variant into python-chess, is that a relatively straightforward process? Naturally, I'd like to avoid C++ as much as possible... |
Well understandable. An implementation in C++ will very likely exceed the scope of your project. Starting a project with the idea to run reinforcement learning is often the first idea, but is usually neglected the more you learn and understand how much computational hardware is needed. Python only implementations might only be practicable for simple games with small board sizes and a single piece type: e.g. Othello, Gobang, TicTacToe, Connect4, Reversie The most simple python only MCTS implementation for classical chess is probably Just to put things into perspective, here a few statistics about how the number of MCTS simulations per second (NPS) developed in the case of CrazyAra. The NPS is a measurement of how many games you would be able to generate when setting a fixed number of nodes per move (e.g. 800): Chess-alpha-zero also uses python-chess for move-generation. Both of these are not ideal and thus we implemented the MCTS python version from scratch in vectorized form using numpy and a more direct access to the neural network with MXNet/Gluon. Over the course of the development additional techniques were added, like a transposition table, which allows to double the NPS in some cases and a time management system. For the most recent python MCTS version (0.5.1), the speed is about 250-300 NPS using the RISEv1 architecture, 7 residual blocks + 12 bottleneck residual blocks with SE. The same network achieved 1200-1500 NPS in the first MCTS C++ release (0.6.0) and 3500-4000 NPS when using the more efficient RISEv2 mobile architecture. In the current C++ development version, CrazyAra achieves 8000-10k NPS which generates about 20 crazyhouse games per minute on a GTX1080ti. For this CrazyAra uses larger a batch-size and a TensorRT back-end. 20 games per minute might seem much, but you should still have access to multiple GPUs in this case and preferably start learning from a given supervised network first.
Nice to hear that, but if your computing-cluster only includes CPUs it might be not of much use for training and running NN inference on large scale. Moreover, if you are planning to use python only you might be able to utilize the GPUs by 1-10% which is rather inefficient. Reinforcement learning methods are only supported in the C++ MCTS CrazyAra version. For an exemplary AlphaBeta search you can take a look at Sunfish: |
Thanks again for the quick response!
Oh okay, good to know. Generating the ruleset and gamestate is my main (and... first) bottleneck. It looks like Sunfish has its own board model. Maybe stealing parts of this this will be easier than adapting python-chess's opaque bitboard implementation. Re: my school's computing cluster, it's my understanding that it's a GPU cluster engineered specifically for the age of neural nets. Even inefficient GPU usage will outperform my humble MacBook Pro... plus, the demands of this first implementation aren't too strenuous. I'll try out alpha-zero at your recommendation. I don't mind having this issue closed out in the meantime. Thank you! |
This comment has been minimized.
This comment has been minimized.
I'm actually enjoying reading this conversation, and it may be of help to
me, as I'm interested in analogous projects. I vote for keeping the
generically useful questions here.
…On Tue, Nov 19, 2019 at 3:14 PM bkitej ***@***.***> wrote:
Again, sincere thanks for your help.
Would you mind if I continued to ask further questions over email, as I
discover them in the course of this project? I worry that having these
conversations on the public GitHub repo might be irrelevant to the project
as a whole, and thus rather impolite. I'm a beginner when it comes to the
GitHub community and its etiquette, and I'm wary about being unnecessarily
imposing...
On Tue, Nov 19, 2019 at 8:06 AM Johannes Czech ***@***.***>
wrote:
> Naturally, I'd like to avoid C++ as much as possible...
>
> Well understandable. An implementation in C++ will very likely exceed the
> scope of your project.
>
> Starting a project with the idea to run reinforcement learning is often
> the first idea, but is usually neglected the more you learn and
understand
> how much computational hardware is needed.
>
> Python only implementations might only be practicable for simple games
> with small board sizes and a single piece type: e.g. Othello, Gobang,
> TicTacToe, Connect4, Reversie
>
> - https://github.com/suragnair/alpha-zero-general
>
> The most simple python only MCTS implementation for classical chess is
> probably
>
> - https://github.com/Zeta36/chess-alpha-zero
>
> Just to put things into perspective, here a few statistics about how the
> number of MCTS simulations per second (NPS) developed in the case of
> CrazyAra. The NPS is a measurement of how many games you would be able to
> generate when setting a fixed number of nodes per move (e.g. 800):
> Our first MCTS python version *CrazyAra 0.1.0* (which isn't available)
> was mainly based on https://github.com/Zeta36/chess-alpha-zero. Here,
> CrazyAra achieved *25-35 NPS* on my GTX1080ti using an AlphaZero-like
> network with 7 blocks.
>
> Chess-alpha-zero also uses python-chess for move-generation.
> However, the main purpose of python-chess is not to be used in
> chess-engines.
> The default 3-fold-repetition check is an example for this:
> niklasf/python-chess#338
> <niklasf/python-chess#338>
> The mcts of chess-alpha-zero also uses the simpler Keras framework at the
> cost of performance as well as plain python list iterations for
> implementing the MCTS formulas.
>
> Both of these are not ideal and thus we implemented the MCTS python
> version from scratch in vectorized form using numpy and a more direct
> access to the neural network with MXNet/Gluon. Over the course of the
> development additional techniques were added, like a transposition table,
> which allows to double the NPS in some cases and a time management
system.
>
> For the most recent python MCTS version (0.5.1), the speed is about
*250-300
> NPS* using the RISEv1 architecture, 7 residual blocks + 12 bottleneck
> residual blocks with SE
> <https://github.com/QueensGambit/CrazyAra/wiki/Model-architecture>.
>
> The same network achieved *1200-1500 NPS* in the first MCTS C++ release
> (0.6.0) and *3500-4000 NPS* when using the more efficient RISEv2 mobile
> architecture.
>
> In the current C++ version, CrazyAra achieves *8000-10k NPS* which
> generates about 20 crazyhouse games per minute on my GTX1080ti. For this
> CrazyAra uses larger batch-size and a TensorRT back-end.
>
> 20 games per minute might seem much, but you should still have access to
> multiple GPUs in this case and preferably start learning from a given
> supervised network first.
> Finding the right RL hyper-parameters is non-trivial and often requires a
> lot of fine-tuning.
>
> [...] but I do have access to my school's high-performance computing
> cluster...
>
> Nice to hear that, but if your computing-cluster only includes CPUs it
> might be not of much use for training and running NN inference on large
> scale. Moreover, if you are planning to use python only you might be able
> to utilize the GPUs by 1-10% which is rather inefficient.
>
> Reinforcement learning methods are only supported in the C++ MCTS
CrazyAra
> version.
> Adapting CrazyAra to play "friendly fire chess" will be much easier when
> supervised networks for classical chess are available.
> Therefore, I suggest that you try out chess-alpha-zero instead because it
> supports classical chess and you can train your network first using
> supervised learning:
>
> - https://github.com/Zeta36/chess-alpha-zero
>
> For an exemplary AlphaBeta search you can take a look at Sunfish:
>
> - https://github.com/thomasahle/sunfish
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <
#29?email_source=notifications&email_token=AHPSKYMOEIPTNW63IC4SQNDQUQFJBA5CNFSM4JM5Q4I2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEOXBFA#issuecomment-555577492
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AHPSKYJWT7Y4TDGFB4YX4I3QUQFJBANCNFSM4JM5Q4IQ
>
> .
>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#29?email_source=notifications&email_token=AACLNQ5LC3E4AZHAZQW5IC3QURCJVA5CNFSM4JM5Q4I2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEPS63I#issuecomment-555691885>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACLNQ3PFQGVDPCU7UNK6ZLQURCJVANCNFSM4JM5Q4IQ>
.
|
Answering these questions via GitHub is fine by me. PyChess is a great library when writing an engine for educational purposes because it comes along with many valuable tools and visualisation techniques. A fellow student also added the Bughouse variant into PyChess which might help you registering your own variant: |
I'd like to revisit a couple points that have been mentioned so far.
A further note on the scope of this variant: FFChess has a strictly larger range of legal moves at almost all stages of the game. For example, in classical chess, there are 20 possible first moves; in FFChess there are 40 by my count. Self-play is still the ideal though. Under the constraints of my current technical ability and time, I'm considering a shift to studying the effects of FFChess on famous endgame positions only. Any thoughts on how that might affect the feasibility of this Python-only project? |
Of course you're right that training on classical games will result in a network which only plays classical chess. However, since FFChess is very similar to classical chess you could use this network as a starting for either training on FFChess data or starting selfplay. This concept is called transfer learning and will very likely reduce computational cost.
In supervised learning you can only learn from sparse labels: The value target is also binary (or threefold, if a draw is possible). In supervised learning you usually have a broad diverse number of players in your data set which reduces the chance of over-fitting and missing key concepts of your variant. In reinforcement you have to deal with a new dimension of complexity. You're always at risk of zeroing out important lines. This happened to some extent to AlphaZero by almost never playing the Kings Indian Defence or Sicilian Defence (https://arxiv.org/pdf/1712.01815.pdf, p. 6). In reinforcement learning your policy target is the final MCTS visit statistics distribution on which a temperature scaling is applied and the next move to play is sampled from. The cross entropy loss is therefore calculated using the full distribution. This is an important aspect because some DL-frameworks only support sparse cross-entropy by default.
My assumption is that value evaluation of FFChess is quite similar to classical chess and the default policy distribution is also a good starting point. For instance, CrazyAra has a boolean UCI option "Enhance_Checks" which manually increases the probability for checks which are below a given threshold to ensure that short forced mating sequences are detected quickly even when the raw policy has blind spot for it (https://arxiv.org/pdf/1908.06660.pdf, 9.2.7, p. 23).
First this project was entirely written in python. The full preprocessing pipeline of converting pgn-files into plane representation makes use of the python-chess and numpy library. For compression we use the zarr library. Later the neural network can be trained supervised which is also written in python.
Endgame positions are typically neither a strong point for both classical A/B as well as neural network engines. |
Hi @bkitej , So you could define your variant friendly fire chess in one of them or implement your own environment from scratch using a blank State class. There is also a model for standard chess available now which was trained on human standard chess games. |
hey i need ur help in modifying an engine to play a specific variant where each player is granted 6 moves and the one with more material points at the end of 12th move wins the game nomatter how worse their position is or if theyre getting mated at next move however jst like normal chess if someone gets mated before 12th move they lose the game also to keep balance if someone captures a directly defended piece at 12th move thats counted as foul and opponent is granted a 13th penalty move, the game starts with any preset random equalized position offered to both players |
Hello @KuruSa-arch , for this you would need to extend the terminal condition to include the material evaluation. |
alright thanks for the reply man, appreciate it |
Is there a process for training this engine to play custom variants not included in the Python-chess module, or in Stockfish's supported variants? I'm looking to write a variant where the only rule modification is one can capture one's own pieces, which doesn't exist in either yet. Assuming that Board is programmed correctly, could I train an engine around it?
The text was updated successfully, but these errors were encountered: