Captain's Mistress

botman · Post by **botman** » Wed Mar 27, 2024 7:04 pm

I checked the FeedForward code that you posted here, and it looks to me that you did properly zero the sums for the input, hidden, and output layers before accumulating the weight times the previous output values.
However I did see something that doesn't look correct.
During the accumulations for each layer, you calculate a source index for each weight named Src.
For the input layer, Src=(I*numInputs) .
That value doesn't increment with the value N, so the same weight value is being used over and over until I increments.
I believe that it should be Src=(I*numInputs)+N .
I think that the hidden layer and the output layer have the same situation.

BeanieBots · Post by **BeanieBots** » Wed Mar 27, 2024 7:19 pm

We'll get there!
It's just a matter of expanding that 2 input diagram with no hidden layers to one with 42 inputs 11 (or more) hidden layer neurons and 7 output neurons.
Even the humble ESP32 has enough to do it.
It's my own set of neurons that are struggling. Good variable names helps a lot but then I forget what I called things

Jokes aside, I have now got my head around partial differentiation and the chain rule.
It's actually not all that complex once understood. (guess that's true for most things).
As you'll note from the discusion with Botman, the ReLU activation function has wittled it down to 'times one' which I can even do without a calculator.
Now I just need to get off my backside and do it.

Hi Botman, my response above was aimed at cicciocb before I spotted your post.
I might not be looking at the same code I posted because I could not see the sum reset for the input layer.
I think you have caught a genuine bug with the calculation of Src and it should include +N as you point out.
Well spotted. The idea is to convert the two dimensional board to a single dimension for the inputs as you already appear to have worked out.

botman · Post by **botman** » Wed Mar 27, 2024 10:43 pm

I think that I found another similar bug in the GetGamePlay subroutine.
The array index is calculated to be N = i*j .
I think that it should be something like N = ((i-1) * Rs) + (j-1) .
With this change, the line InputData(0) = H(0) could not be used.

BeanieBots · Post by **BeanieBots** » Thu Mar 28, 2024 9:35 am

[Local Link Removed for Guests] wrote: [Local Link Removed for Guests]Wed Mar 27, 2024 10:43 pm I think that I found another similar bug in the GetGamePlay subroutine.
The array index is calculated to be N = i*j .
I think that it should be something like N = ((i-1) * Rs) + (j-1) .
With this change, the line InputData(0) = H(0) could not be used.

I think you've found a much worse error than that

It looks like I've made a major blunder with version control.
I'm using a printout to transpose it into VB6 so that I can try out backprop with the feedforward as written.
That's when I noticed it was missing the sum reset. Which is actually correct in the posted code.
You then picked up on the Src calculation which is wrong in the posted code but correct in my printout.
The history is that this started on a 2.8" device and then moved to a 3.5" device.
I then used the 3.5" for something else and moved it back to a 2.8" device.
When I moved over to the 7" screen, I think I picked up the code from the wrong 2.8" device.
I only concentrated on converting to VGA and did not consider the other routine (assuming them to be correct).
So, please take the feedforward code in the posted version with caution.
The intent and a little history is as follows:-
Originally the input indexes were going to be 1 to (neuron count) with index 0 being used AS the bias.
I then changed it to be 0 to (neuron count) with input0 being set to 1 for the bias. (thus avoiding a special case in the for/next loop).
H(0) is the move count which is intended to be input0 to give information on the move sequence in case it is of significance during game play.
Hence the line InputData(0) = H(0).
I will need to go over all versions to find out which is (the more) correct but I think the versions are all messed up now.
So, in short, irrespective of what my code is doing, it should be using indexes from 0, with the data into 0 being a bias of 1.
I could not see anything in the posted code that sets index 0 to 1 so it is clearly the wrong version.
Sorry about that, but thanks for looking into it and finding the issue. I'd assumed that version of the feedforward was tried and tested.
With Easter comming up, not sure I'll have time to look until next week.

EDIT:-
This article introduces a new (to me) concept of using a quadratic for the cost function. Something to think about over the weekend.
http://neuralnetworksanddeeplearning.com/chap2.html
Something else I'm pondering over is the presentation of the state of play.
At first I thought a simple tri-state. 0=blank, 1=player1, 2=player2
I then changed that to -1=player1, 0=blank, +1=player2 . It felt more intuitive to have 0 as the multiplier for a blank space.
I'm now thinking the oposite. Have a blank as the most significant as that's where you'll be making your next move. Not somewhere already occupied.
Any thoughts?

botman · Post by **botman** » Thu Mar 28, 2024 11:02 am

Enjoy the holiday.

botman · Post by **botman** » Thu Mar 28, 2024 1:57 pm

I found the article by Martin Gardner in Scientific American magazine in 1962 that first introduced me to the concept of reinforcement learning by a machine.
That was back in the dark ages when computers were rare and room-sized, so the machine was constructed with matchboxes and beads.
I actually built this machine when I was in high school (the Hexapawn one, not the TicTacToe one).
You may find it to be an enjoyable read over the holiday.

http://cs.williams.edu/~freund/cs136-07 ... xapawn.pdf

Note that it is not only the box that has the final board configuration that gets rewarded or punished, but also the boxes for each move that was made to get to that configuration.

botman · Post by **botman** » Fri Mar 29, 2024 10:20 am

You have described the sigmoid activation as requiring an undesirable large amount of time to compute.
Using the ReLU activation should use significantly less time, but I am concerned about the zero slope for negative weighted sums.
I believe that it was Einstein that said something like "Everything should be made as simple as possible, but not simpler".
I have been reading about a slight modification to ReLU called Parametric ReLU.
It is also very simple to calculate, but it doesn't have that zero slope.
It is the same as ReLU with slope=1 for positive weighted sums, but it has a different, nonzero slope for negative weighted sums.
Intuitively, Parametic ReLU just seems to me to be simple and less likely to cause learning problems.
Ordinary ReLU seems only slightly simpler but more likely to have learning snags.

BeanieBots · Post by **BeanieBots** » Fri Mar 29, 2024 2:37 pm

Thanks for the file, another interesting read.
I actually tried something similar to that in the early 90's using a database for the matchboxes/beads.
It kept resulting in empty boxes and thus no suggested move.
The database also got so huge it could no longer be stored on my computer. Gave up in the end.

What you describe as parametric ReLU is also known (in its simpler form) as Leaky ReLU which is what I plan to implement.
ReLU = max(0,sum)
Parametric ReLU = max(P,sum) where P is small value and can be configured along with learning rate.
Standard ReLU can lead to neurons dying for large negative input weights. Leaky ReLU prevents death but once the negative slope comes into play they are very ineffective until/if the slope becomes positive again. (at least they don't die completely)
I propose to hard code a value of around 0.01, again to avoid using a variable in the multiplication chain.

It should not be underestimated how important it is to make the maths as processor (speed wise) friendly as possible.
Consider a conservative estimate.
Let's say we have 1000 test samples, 10 input neurons, 10 hidden neurons and 7 output neurons.
It often takes 100,000 epochs to train a data set and each epoch needs to process each data item for each weight (every dendrite in the net).
So that's 100,000 * 1000 * 42 * 10 * 10 * 7 = 2,940,000,000,000 times that activation needs to evaluated!
That's just for the activation function. The weights and biases also need to be updated that many times.
Even with a decent PC it would take quite a while. I'm not sure how feasable it will be with Annex.
For now, I'll just be concentrating on getting some code that actually works.
Then look at how best to optimise structure and functions.
I'll probably cheat and train a set on a PC and load it as a text file. Then let Annex update as/when required. (probably overnight).

botman · Post by **botman** » Fri Mar 29, 2024 11:26 pm

I think that your "Parametric ReLU = max(P,sum)" should be ReLU = max(P*sum,sum) .
Also, I wonder if P needs to be as small as 0.01 .
What if it was 0.1 ?
How would you get a number for the final output column choice?
What number would you use as the ideal output number that gets subtracted from it to get the error?
Minimizing the squared error seems to be the most common optimization goal.

botman · Post by **botman** » Sat Mar 30, 2024 11:45 am

Here is an idea to save time in the FeedForward calculation. I tried it, and it works.
It comes from the observation that, even with nearly all the locations on the board filled,
over half of the InputData values are zeros.
Each InputData value is multiplied by a weight and accumulated to each sum in the hidden layer.
Those multiply and add calculations can be skipped when the InputData value is zero.

Annex RDS forum

Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress

Re: Captain's Mistress