What could cause this?

If doesn't fit into any other category ....
Post Reply
Zim
Posts: 280
Joined: Mon Feb 08, 2021 9:15 pm
Has thanked: 253 times
Been thanked: 128 times

What could cause this?

Post by Zim »

Hi All
I have a project that runs for several weeks with no problems, but then stops running. The 8266 can be logged into and rerun. Where would I start to troubleshoot something like this?
Thanks for your suggestions.
Zim
User avatar
Electroguard
Posts: 836
Joined: Mon Feb 08, 2021 6:22 pm
Has thanked: 268 times
Been thanked: 317 times

Re: What could cause this?

Post by Electroguard »

Halt on mem error could account for those symptoms.
CiccioCB has found and corrected many memory leaks, so causes of memory leaks will obviously depend on the Annex version in use.

You could check for creeping memory leak eventually causing a halt from lack of memory error by appending the ramfree nightly to file, which would give a record of any leakage trend over time.
Zim
Posts: 280
Joined: Mon Feb 08, 2021 9:15 pm
Has thanked: 253 times
Been thanked: 128 times

Re: What could cause this?

Post by Zim »

Thanks Electroguard.
I'll implement that and see if points to memory.

Zim
rmsta
Posts: 32
Joined: Sat Feb 20, 2021 9:36 am
Location: Sindelfingen, Germany
Has thanked: 134 times
Been thanked: 9 times

Re: What could cause this?

Post by rmsta »

Just a hint:
To monitor serveral data over time, I am using Thingspeak. Is simple to handle and could be checked from anywhere, you don´t have to log into the ESP.
Good luck !

Rainer
BeanieBots
Posts: 325
Joined: Tue Jun 21, 2022 2:17 pm
Location: South coast UK
Has thanked: 173 times
Been thanked: 104 times

Re: What could cause this?

Post by BeanieBots »

Does your project have any sensors like DS11 or BME280?
If yes, they can sometimes return "nan" (not a number) instead of a numeric value.
If you don't trap for such an event, it can cause a crash due to type mismatch.
Zim
Posts: 280
Joined: Mon Feb 08, 2021 9:15 pm
Has thanked: 253 times
Been thanked: 128 times

Re: What could cause this?

Post by Zim »

Good tip BeanieBots. No sensors, but several ESPnow transmissions

Thanks
Zim
BeanieBots
Posts: 325
Joined: Tue Jun 21, 2022 2:17 pm
Location: South coast UK
Has thanked: 173 times
Been thanked: 104 times

Re: What could cause this?

Post by BeanieBots »

As if by magic, this happened to me yesterday. 3 ESPs stopped sending data. Still up and possible to connect but not running. Almost as if somebody had connected and pressed the stop button.
I also had a Rpi lockup and observed slow wireless connection on my PC.
My suspicion is that I was the victim of an intensive port scan which I know could have caused the Rpi to crash but I am not sure what the ESPs would have done. More likely than not, an out of memory if a large amount of data was sent over their listening port.
rmsta
Posts: 32
Joined: Sat Feb 20, 2021 9:36 am
Location: Sindelfingen, Germany
Has thanked: 134 times
Been thanked: 9 times

Re: What could cause this?

Post by rmsta »

Hello,
are there some new "ideas" about that topic ?
I got also the impression that it is somehow related to the internet connection.
When doing some testing, I found that Esp32 (Annex32 WiFi BLE 1.43.7) is not reconnecting to the router,
when the router was switched off for about 5 minutes. It seems to remain in AP mode, which works well and
program keeps running, but with "connection failed" in the log field when accessing "outside".

Rainer
BeanieBots
Posts: 325
Joined: Tue Jun 21, 2022 2:17 pm
Location: South coast UK
Has thanked: 173 times
Been thanked: 104 times

Re: What could cause this?

Post by BeanieBots »

Yes, there is something about the connection in the 1.44.2 release notes.
"Editor Page
- Improved the connection with the module using a ping / pong mechanism to hold the connection"
However, I'm not convinced that it relates to this particular issue. (I may well be wrong though).
What both Zim and I experienced, was still connected but not running. Like the stop button had been pressed.
Unfortunately, it is such a rare occurence, so it is extremely difficult to diagnose conclusively.
If your issue is just a disconnect but otherwise OK, then try upgrading to 1.44.2 and see if the problem goes away as that is addressed in that release.
Post Reply