-
-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NRZ-2018-107: Very unstable config network #242
Comments
There seems to be a memory issue in access point mode. We need to shorten the config page. But I can't 'read' or change the polish translation. Is there a chance to shorten some of the texts to save some RAM and flash? |
It definitely looks like polish translation would benefit from a review. I'll look into it later. Now, as far as I remember, the same issues happened on english / german version as well. I'll try to reproduce it again today. Maybe it was a power consumption issue all along? We'll see. |
I'm again able to reproduce the same issue on When opening the web interface I get "ERR_CONTENT_LENGTH_MISMATCH" on Chrome as well, but I think this is just caused by TCP connection closing before end of content. (caused by packet drops) I haven't played with AP mode on ESP8266 lately, but AP shows up as late as after 10-20 seconds of refreshes on Android after "Starting WiFiManager" message is sent over Serial. I recall it being much faster (like 5 seconds tops) on Espressif SDK a year or so ago, but maybe that's just my failing memory... :) |
...and I've just disconnected USB-A side and connected it back again, opened serial monitor (picocom, not the Arduino one) and it worked perfectly fine now on the same board... 0% packet loss. |
...and back to being broken after another power cycle... :)
|
Do you try to connect to the sensor if it is connected to an USB port? Some USB ports limit the possible power. Then this may not be enough for a stable wifi connection in AP mode. It's better to use a power supply with minimum 1A for configuration. |
During our workshops we have experienced the same difficulty with the initial connection in AP mode. I am working on a branch that uses WiFiManager for the initial connection just to let the user configure the SSID and password in a captive portal instead of showing the full configuration page. The results of this approach so far are encouraging. |
Using WiFiManager would create a larger firmware image. At some point we can't update OTA anymore. But we could limit the config page to the wifi part only in AP mode. Then users need to change anything else when the sensor is connected to the wifi network. |
After doing some more tests it seems the size of the config page is OK. I think the real culprit is enabling STA and AP mode at the same time. Go to airrohr-firmware.ino and try adding
and/or
at the top of the |
I have inserted the mentioned lines. This seems to work. I have pushed this to the beta branch. If someone else could test this and it's okay I would push this change to the master branch and publish this as the new release firmware. |
Wow, this seems to work much better now! Latest english beta build, in case anyone wanted to try this, without building it locally: |
I have pushed the beta to our update server (needed some time for all language versions ;-) ). So you can also download them at: https://www.madavi.de/sensor/update/data/ |
We have too few people looking at the code. So many thanks for your work @LuchtwachtersDelft . After 2 years of work on the firmware it's sometimes hard to find something like this. |
Much appreciated. This fix means I can finally help the people who came to our workshops but never managed to connect their sensors. And I can go ahead with the next workshops. While I was working with WiFiManager I noticed the captive portal feature. A captive portal improves the user experience, because users don't have to go to their browser and enter 192.168.4.1. Airrohr has the "Sign in to network" popup on Android, as shown above, but not on iOS. Do you think it's worth expanding the captive portal feature for Airrohr on iOS and should I open a new feature issue for this? |
If you know what iOS is looking for I can add it to the available paths. But I don't have an iOS device to test this. For iOS users this would be nice. |
I'll open a separate issue for captive portal. |
Still seems not that stable after the fix when the AP has a password (FS_PWD). It shows the config page, but subsequent reloads of the config page turn up white. |
Does it also show a blank page if you comment out some parts (i.e. the sensor configs)? |
Could you check the actual beta? I have checked this version with my Android phone and had no blank pages (sensor as WPA access point). |
I have been testing the upstream beta when I reported the white page issue. Removing the optional parts from the config page seems to help. It improves user experience too, because the full config page tends to overwhelm and confuse new users. The OLED/LED section could remain, but I usually enable the correct screen in ext_def.h anyway. But making the config page smaller doesn't explain why setting a password triggered the white page. I think it's a combination of causes, but in the past hours I haven't been able to come up with a solid explanation. I suspected the WiFi.disconnect() calls. For some reason having 2 of them in wifiConfig() combined with setting fs_pwd and showing the full config page can cause the white page to appear on iOS. |
Maybe some other WiFi devices are interfering. I just noticed that my Android tablet keeps trying to connect to the AP, and fails because it doesn't have the correct password, while I'm busy with the iPhone. But then I'd expect more trouble when fs_pwd is empty. The white page doesn't happen as often anymore now, even with the full config page, both WiFi.disconnect()'s, the fs_pwd not empty, and the Android tablet interfering. What triggers the white pages remains a mystery. In any case, this fix is much more stable than no fix at all. Add the minimal config page then I think we've got a good candidate for a public release. |
I have found two possible reasons for blank pages.
|
Okay, solution for reason 2 is implemented also. Firmware should select channel with lowest signal from channels 1,6 and 11. |
Could reason 2 explain also why WPA2 triggers blank pages more often than an unprotected connection and why a small page is able to pass through before too many packets are lost? I'm in an area that is absolutely saturated with WiFi. About 20 APs here all the time. |
I don't know how many overhead WPA2 adds to the transmitted data. But the crypto funtions are time and RAM consuming. All data needs twice the RAM while encrypting. WPA2 and also the HTTPS/TLS connections (i.e. data transmission to or servers) can cause memory problems. The config page isn't changed. Maybe we don't need to shorten this page. |
@Informatic , @LuchtwachtersDelft just to say that: Many thanks for working on this issue! Solving this problem will help many users. |
The WiFi channel selection feature seems to work pretty well. I can see with Android app WiFi Analyzer the ESP moves to the best channel. No problem using the full config page. Once I set a WPA2 password the signal strength decreases, also showing in WiFi Analyzer as a lower curve. And it becomes hard to connect again. So it might indeed be the case that WPA2 adds too much overhead. |
So we have two possibilities:
|
I prefer Option 1 for consistency.
But isn't it strange the connection goes bad with WPA even before the page is rendered?
…On Thu, Sep 13, 2018 at 13:47, Rajko Zschiegner ***@***.***> wrote:
So we have two possibilities:
- Short configpage (Wifi + displays) in AP mode encrypted (WPA) and unencrypted
- short page only for WPA, normal config page for unencrypted AP
—
You are receiving this because you were mentioned.
Reply to this email directly, [view it on GitHub](#242 (comment)), or [mute the thread](https:/notifications/unsubscribe-auth/AkDOtiIRXMDnKD41Qk1ywUoffDjYBW75ks5uakXvgaJpZM4WWIKV).
|
I have made some more changes (setting PHY layer to 802.11/g, setting max. signal strength) to force the right settings. With my system I can only see a small difference in signal strength between WPA2 and unencrypted AP. |
I just compiled the upstream beta from source and got my first hardware exception
And the second time I think it hangs
|
Downloaded the precompiled beta. It works, but on iOS the captive portal isn't activated. Will do some more testing. |
Very important: power down or hard reset the NodeMCU after flashing. The soft boot doesn't work after flashing. And this may be not the only side effect. |
Woah, the captive portal is finally working on iOS! after using the Forget This Network feature, and toggling WiFi. Seriously, these kinds of weird behavior don't make developer life any easier... |
On iOS still seems less stable than on Android. Sometimes the Captive Portal disappears after a few seconds and the iOS Wi-Fi page shows a spinner next to Feinstaubsensor-1234567 to indicate it's scanning instead of being connected. This also happens without WPA and with a mini config page. I've added
to webserver_not_found() to check which URIs are requested when connecting.
|
Work in progress on the mini config page... |
Latest beta version is online. It includes a minimized config page. |
Ouch, I didn't see that commit. Oh well. I will try it tomorrow. |
Changed the captive portal again. I hope that this will work ... |
I tested the latest beta commit ba93ce3 and the previous c4c9a60 back and forth. ba93ce3 does not trigger the captive portal even with Forget This Network. Tested several times, even erasing the whole flash before. c4c9a60 does, no need for Forget This Network. WiFiManager also had this problem. tzapu/WiFiManager#296 |
Changed back last modification on Captive portal. |
In the list of WiFi networks on the config page there's suddenly one that's just a space and a star. It doesn't show on Android or iOS. Any idea?
And now without *.
|
I haven't heard until now from such a behavior. Is there a hidden wifi you know about? The scan class will find such wifis but I don't know what name is shown in this case. |
Could it be related to changes in a recent commit? 7c7f0c2 Using |
Showing hidden SSIDs wasn't changed. But I will include the mentioned test in the next version to avoid those lines. Excluding hidden SSIDs at scan is not an option as we need the wifi channels and RSSIs occupied by them. |
I just tried https:/esp8266/Arduino/tree/master/libraries/DNSServer/examples/CaptivePortalAdvanced a couple of times on iOS and eventually it also disconnected, closing the captive portal and showing the spinner on the iOS WiFi page. Looks like there is a deeper problem, maybe in one of the other libraries or even in iOS itself. |
I think it's iOS. I have read how the try to get the Captive Portal. They try to access different pages, partially with random paths. In some cases the call should give an error, in others not. So the solution "catch all 'not found' and redirect" doesn't really work, but special paths doesn't exist. |
Still, I'm not fully convinced there's nothing we can do to fix it. Yesterday I tried adding some check to prevent the captive portal to be triggered multiple times when it's already open. But it didn't seem to prevent the iPhone from disconnecting.
|
So far so good. The captive portal works on B11 on iOS, using a Javascript redirect instead of 302. Need to test a couple of times more to see if it's really stable. The hidden SSIDs are still displayed because |
As I am away this weekend I would like to push the new release tonight. |
Let's do it! After fixing the typo ;) |
Typo fixed, new release version is online. |
Pop the champagne!
OTA update failed, is the server getting hammered right now by thousands of NodeMCUs trying to download the new firmware?
…On Thu, Sep 20, 2018 at 01:00, Rajko Zschiegner ***@***.***> wrote:
Typo fixed, new release version is online.
—
You are receiving this because you were mentioned.
Reply to this email directly, [view it on GitHub](#242 (comment)), or [mute the thread](https:/notifications/unsubscribe-auth/AkDOthBSiRGgvfdHSUthkmIK4wNgqLgtks5ucsxxgaJpZM4WWIKV).
|
For a short time the new files are inaccessable for the server (while copying). At the moment the first sensors are getting the new version (105 sensor right now): |
Maybe OTA failed because my ISP's DNS is malfunctioning or some scheduled maintenance. Can't access Madavi, Github and other sites through cable, only LAN. Cellular is ok. These things always happen at unfortunate moments.
…On Thu, Sep 20, 2018 at 01:00, Rajko Zschiegner ***@***.***> wrote:
Typo fixed, new release version is online.
—
You are receiving this because you were mentioned.
Reply to this email directly, [view it on GitHub](#242 (comment)), or [mute the thread](https:/notifications/unsubscribe-auth/AkDOthBSiRGgvfdHSUthkmIK4wNgqLgtks5ucsxxgaJpZM4WWIKV).
|
Well then. Everything seems to work much better now. Time to close this issue. |
Hey!
We've been working on getting luftdaten project going in Warsaw/Poland. So far we've encountered a pretty bad problems with initial configuration - wifi clients connecting to Feinstaubsensor-... network either get randomly disconnected, and when they finally get connected, we get average of ~40-60% packet loss to ESP8266 IP, and web interface is (obviously) very unstable. (ie. keeps on constantly loading in browsers, needs multiple refreshes, etc...) I've looked at debug serial output and the sensor is not rebooting/crashing.
Finally, after initial configuration sensors work perfectly fine, and we don't see any packet losses when pinging/accessing web interface in local network.
We've tested it on multiple different ESP8266 boards (4x NodeMCU Lolin v3 from the same batch, one random Wemos D1 Mini) and they all behave very similar. On the other hand side, software based on Sming/plain Espressif SDK seems to be working fine on these, so this doesn't sound like problem with the boards themselves, does it? (I've done some ESP8266 development myself before)
We've tested it on:
This sounds pretty similar to issues stated in #88
The text was updated successfully, but these errors were encountered: