Captain's Mistress

botman · Post by **botman** » Mon Mar 25, 2024 6:09 pm

I did modify lines 28 and 29 to represent my 480x320 pixel TFT display.
I tried the AImoveNN code for player 1, and the feedforward code seems to work.
I know that the weights won't change in this code without adding backpropagation.
Its not clear to me when and how you would do that.
Would you only backpropagate after a win, lose, or draw has been achieved,
or would you somehow estimate the quality of each move and backpropagate on each move?

BeanieBots · Post by **BeanieBots** » Tue Mar 26, 2024 2:43 pm

[Local Link Removed for Guests] wrote: [Local Link Removed for Guests]Mon Mar 25, 2024 6:09 pm I did modify lines 28 and 29 to represent my 480x320 pixel TFT display.
I tried the AImoveNN code for player 1, and the feedforward code seems to work.
I know that the weights won't change in this code without adding backpropagation.
Its not clear to me when and how you would do that.
Would you only backpropagate after a win, lose, or draw has been achieved,
or would you somehow estimate the quality of each move and backpropagate on each move?

I recall from when I converted from TFT to VGA that I had a few issues.
I meant to mention them at the time but forgot.
Annex uses global variables exclusively. That includes things which are configured with the gui objects.
If you set something like a colour or X,Y coordinate for printing, then update a gui text, the print colour and location get corrupted.
I'll try and do a post later with a specific example so that cicciocb can look into it.

The objective of the code so far as posted is to have routines that can play against the neural network and hence have it learn.
The initial problem was that I could not get the weights to converge to give the required output.
I think I have that sorted now and will start to write code over the next week or so.

When to run backpropogation:-
There lies the biggest issue!
The net needs to be updated every time it sees something new. Feedforward takes about 1.4 seconds to execute. Backpropogation is a similar amount of maths so it would be fair to assume a similar amount of time. The problem is that it needs to be done over many itterations for each pattern that it knows. By many, we're talking 10's or even 100's of thousands. That's getting on for two days each time it sees something new.
I was hoping to be able to run an itteration while waiting for the human player to make a move or do several hundred at the end of each game but it would cause an unacceptable delay. The plan now to play a few games and then leave it running over-night to update the weights and play a better game the following day.
Another option would be to send the data to another ESP (or maybe a PC) which could do the updating and send back the results.
On the bright side, I have included a few pause1 lines in the existing code to be pessimistic about times. These can be removed to speed things up.
Also, I've tried the ReLU activation instead of the sigmoid and it executes much quicker though it may require more itterations.
[ max(x,y) vs 1/(1+e^-x)] and don't forget, for backprop we need to take the derivative of these functions.
I've also done a lot of reading up on the subject and found a few other optimisations that may well help.
The existing method of seeding can be improved by using a random number between +/-SQR(6/Fan-in) and not just +/- 1.
Testing has shown that the learn rate is very sensitive to the initial seeding.
Also, any NN learns patterns. Once it starts to recognise them, learning is much more efficient and the number of new patterns will reduce.
Learning does not need to be presented with ALL the new patterns. It can be done in batches.
I also plan to drop a few patterns to prevent it becoming perfect which would be impossible to beat.
ie. I plan to design it so that it will also forget as well as learn.
Still lots to do and still hopeful that it can be achieved.
I welcome any input on what exists or suggestions to move forward.

EDIT:-
The routines are as follows:-
AI_Random: Plays a purely random move.
AddBias: adds a slight bias to play towards the center of the board. This alone has quite an effect and can be combined with all the other options.
AImove_TL: looks for a way to prevent losing on the next move
AImove_TW: looks for a way to win on the next move
AImove_TWL: does both of the above.
AImoveNN: uses the output from the neural network.

botman · Post by **botman** » Tue Mar 26, 2024 3:07 pm

I found this article about Neural Connect 4:
https://citeseerx.ist.psu.edu/document? ... 69604b3107

BeanieBots · Post by **BeanieBots** » Tue Mar 26, 2024 4:59 pm

Thanks for the link. A very interesting read.
However, his aproach was to introduce a scoring system based on how the game is played and what the desired outcome is.
I have specifically avoided doing that. My intention is to have the network learn from experience without ever knowing anything about the game other than what moves can/cannot be played within the game rules. ie. no prior knowledge of what to do with the possible exception of tending towards the middle if no other move is obvious.

I forgot to mention in my earlier post about how the backprop knows how to update.
It needs to have a 'target' for each possible move.
When an input pattern is presented, it updates the weights in such a way that the next move will either block a winning line or create a win.
That is determined from the result (if any) of playing that move, which is random to start with.
That knowledge comes from wether or not the following move results in either player winning.
If there is no result (or a draw) all options are given a target value of 50% (and thus randomly selected if seen again)
If the move results in a win, the position played is given 80%
If the move stops a win, it is given 60%. (because a winning move is better than a move that prevents the opponent from winning)
It is this data that is backprop'd into the net after each result.
What I'm hoping, is that it will learn to recognise the pattern and play a good move even with no prior knowledge of how the state of play looks at the time of deciding its next move. As stated in the article, it would be impossible to present all possible play states. I forget how many are possible but if I recall correctly it's trillions.
Each time there is a result, it will be saved to flash/sd and recalled during the backprop phase.
When I first started this, I was aiming to use a classic ESP32. That limited the size of the net and how many patterns could be held in memory at one time. The ESP-S3 has changed that limitiation, though I will probably stick with the current configuration for now.
42 input neurons.
11 hiden neuron
7 output neuron
10 patterns used for batch based backprop.

EDIT:-
Quote from the article.
"It became clear that small well-chosen game-archives
are better than big random archives. Thus an automatic
production by letting the symbolic algorithms play one
against the other, which was planned at first, has shown to
not be a good solution "
Oh dear! that is exactly my approach

I'll carry on though. After all, learning by experience (and failure) is the best way.
My biggest concern still remains how long it will take for each backprop to compute.

botman · Post by **botman** » Tue Mar 26, 2024 8:01 pm

When using AImoveNN the program uses FeedForward to calculate OutputResult values.
The column with the maximum OutputResult is selected as the next move.
Sigmoid is used to calculate the OutputResult values from the OutputSum values.
The column with the maximum OutputResult is always the same as the column with the maximum OutputSum.
If you use OutputSum to choose the move column, there is no need use Sigmoid to calculate OutputResult values.
I tried this change, and it saved about 10% of the time for FeedForward to execute with no change in move selection.

BeanieBots · Post by **BeanieBots** » Wed Mar 27, 2024 9:48 am

Thanks for taking the time to go through the code in such detail.
What you say is absolutely spot on if we only consider feedforward.
Unfortunately, it's there for a reason.
You may have noticed that at the same it calculates the sigmoid, it also saves a variable I've called "Rate" into the second dimension of each neuron's output array.
This is because "Rate" is required for backpropogation to determine the significance and hence update value for each weight.
The clever bit is the actual sigmoid calculation Sigmoid = 1/(1 + Exp(-x)).
In order to calculate the required adjustment to the weights that contribute to the input sum, we need to know their significance (rate of change of weight vs rate of change of output).
That's where good old calculus comes in. We need the first order derivative of the output with respect to the weights that contribute to the input sum.
dy/dx of 1/(1 + Exp(-x)) = 1/(1 + Exp(-x)) * (1 - [1/(1 + Exp(-x))])
This can be simplied dramatically with respect to processor time.
If Sigmoid = 1/(1 + Exp(-x)) then it's derivative dy/dx = Sigmoid * (1 - Sigmoid)
Hence, we only need to calculate 1/(1 + Exp(-x)) once and then use it's own value to calculate it's derivate.
By using that particular variant of the sigmoid and doing the calculation in the feedforward, it saves having to do it FOUR times in the backprop. (exp(x) is processor hungry.)
So yes, removing it (along with calculating Rate) would speed up feedforward, but would massively slow down backprop. If I had included the backprob code (which I've not yet written) it might have been more evident why I've done it in the feedforward.

There is also a reason I want to keep the outputs in the range 0 - 1.
Captain's Mistress is a solvable problem, hence eventually, it will become impossible to beat.
To avoid that, my intention is to add a small random number to the outputs so that it will sometimes get it wrong. Without the sigmoid, the output range would be large and require a large random number to select anything other than the best. By condensing the range, a small random number can be used which would make it more probable that the next best option would be selected rather than a totally random selection. It would also equalise the oportunity of each option when there is no learned output rather than continually picking what the untrained weights have selected (which it does now). The net result being a more interesting gameplay.

Some options I'm considering now are using ReLU (rectified linear unit) instead of Sigmoid.
That would replace exp(x) with max(x,y).
I think a simple if/then in Annex Basic would be quicker than caculating an exponent. Needs testing.
Another thought is to expand out the nested for/next loops into descrete lines of code.
It would run quicker but how much quicker. That would require a lot of cut-n-paste + editing.
Using hard-coded constants for network arrays instead of variables might also speed things up but at the expense of being able vary the network size and dimensions. Besides, part of this excersise is to see how small a network is required to actually play the game.
Using ReLU will be much quicker than Sigmoid but it has a few issues that might cost as much as it saves.
Time will tell.....

Thanks for you input. Please keep it comming.

botman · Post by **botman** » Wed Mar 27, 2024 2:39 pm

Thank you for your detailed reply. I wouldn't argue with any of your observations.
I would suggest that ReLU, if it were used at the output layer, would give the same result as my experiment.
For positive OutputSum values, which the maximum is likely to be, the OutputSum equals the OutputResult,
and the derivative equals 1.
Something that seemed remarkable to me is, in the article at the link that I sent to you, they proposed a very large
number of neurons for the hidden layer. That seemed like overkill to me.
Your choice of a small number would be what I would call minimalist.
That choice is understandable considering the capability of the ESP32 module you are using.
My suggestion was just a small improvement, anyways.
I will be interested to see how you calculate an error function and how you backpropagate the corrections to the weights.

BeanieBots · Post by **BeanieBots** » Wed Mar 27, 2024 3:18 pm

Indeed, as you've noted, the derivative of ReLU is 1.
It can't get much simpler and quicker to calculate than multiplying by 1

The classic ESP32 restricted the hidden layer to about 11 but I actually think it can be done with as few as 7.
(one of the objectives of this project is to find out)
A lot depends on the arithmetic precision and having a small learnrate when the neurons are low in number.
That paper used a learnrate of 0.34 which is the highest I've ever seen. Normal is 0.001 to 0.0001.
I tried a ReLU activation with a learnrate of 0.03 on a simple four neuron XOR solution.
Sometimes it could solve with just a few hundred epochs but more often than not it would oscillate and never get there.
It all depended on the initial weights. A learnrate of 0.001 always worked but could take +50k epochs again depending on initial weights.
I found a paper which gave a formula for initial weights which is a range between +/- root(6/fan-in) i'll give it a try later.

When trying things out in VB6, I transposed the ESP version and found a bug in the feedforward code.
Right at the beginning it is missing a for/next loop to reset the input neuron sums to zero prior to accumulation.
I'll not bother correcting at this stage, just so that you know. It will keep accumulating and eventually saturate the sigmoid.
I'm hoping a ReLU version is not far off now. I also need to work out an efficient way of loading and saving both the weights and training patterns to SD.
I'll probably go for a method that is human readable at first to help with debugging, but eventually it could be condensed considerably.
Thanks for making me look closer at the code and finding that bug.

I too will be interested to see how I calculate the error function and backpropergate

For the error function I plan to use a simple root-mean-square. Again, easy to calculate the derivative.
I'd like to use softmax but that would be too processor intensive.
I'm working with VB6 at the moment because it's much easier to debug by simply shoving everything out to labels and manipluting values with textboxes. I've even thrown in an array of checkboxes so that I can switch on/off each weight individually. Not to mention that a pentium i9 runs a bit quicker than an ESP

Hopefully, not far off now.

BeanieBots · Post by **BeanieBots** » Wed Mar 27, 2024 4:20 pm

Just for fun, here's my vb6 debug screen for a four neuron net solving XOR.

NN.png

To be fair, XOR is actually quite a hard problem to solve. It's OR AND NAND which causes a conflict when updating the weights and can easily go into oscillation.

Post by **cicciocb** » Wed Mar 27, 2024 6:38 pm

[Local Link Removed for Guests] wrote: [Local Link Removed for Guests]Wed Mar 27, 2024 4:20 pm Just for fun, here's my vb6 debug screen for a four neuron net solving XOR.
NN.png
To be fair, XOR is actually quite a hard problem to solve. It's OR AND NAND which causes a conflict when updating the weights and can easily go into oscillation.

Annex RDS forum

Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress