Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UHK 60 v2 backlight colors change randomly eventually #339

Closed
mondalaci opened this issue Mar 17, 2021 · 28 comments
Closed

UHK 60 v2 backlight colors change randomly eventually #339

mondalaci opened this issue Mar 17, 2021 · 28 comments

Comments

@mondalaci
Copy link
Member

Some UHK 60 v2 testers observed that over time, the backlight colors of some keys change. Sometimes, I also notice some keys displaying different colors than they should display.

This issue can be temporarily fixed by switching to the Mouse layer, for example, but it's disturbing nevertheless.

I have a hypothesis regarding the cause of this issue. The UHK has an I2C bus that interconnects the keyboard halves and modules. The longer the bridge cable, the more modules are attached, and the noisier the environment is, the higher the bus's parasitic capacitance. High parasitic capacitance results in communication errors, and unlike the microcontrollers of the keyboard halves and modules, the communication of the LED drivers is not CRC-validated, so they accept invalid LED values.

This issue should be fixable at the firmware level. The baud rate of the I2C bus should be dynamically adjusted based on I2C error counters. Alternatively, the firmware should periodically check if the colors are correct and override them if needed.

At this point, I'd like to gather feedback on how often this issue happens and which keys are usually affected. Please attach photos.

@ttscoff
Copy link

ttscoff commented Mar 17, 2021

I see the issue happen once or twice a day. There are two different color patterns that show up, all only affecting the right half.

  1. The 7, 8, and -_ keys turn yellow, everything else appears as it should.
  2. A more random combination, but it's the same every time:

Image upload on 2021-03-17 at 10-31-16

As you said, switching to the mouse layer and back corrects it, but nothing else seems to.

I haven't been able to pin down any triggers for it yet, but I did notice today that the colors had switched after I'd been away from my machine for a bit. I can't be absolutely certain they hadn't already switched before I let the machine sleep, though.

I currently only have the key cluster module on the left half, no module on the right. I've seen the issue pop up with a mouse module attached as well. I'll try testing for a while with no modules and see if it still happens.

Update as I finished typing this and went to unplug the key module I saw the colors described in num 1 above. Unplugging the key cluster module reset the backlighting to correct colors.

@ttscoff
Copy link

ttscoff commented Mar 18, 2021

I don't think it's coincidence that the 7/8/-_ bug shows up after my Mac sleeps. It seems like every time I come back to it, that coloring is there. Could just be a matter of time, I guess.

@kareltucek
Copy link
Collaborator

kareltucek commented Mar 22, 2021

Sorry for so late reply... here are some pics by date. All are taken with the same firmware compile of the fork, with backlight set to uniform warm white.

EDIT: github doesn't likes neitner big pictures nor external references, so please continue at http://ktweb.cz/uhk_backlight/

Note randomly changing LED display - the only correct abbrevs are "INI", "QT3" and "QTM". Also, "ERR" might show up at times, but in that case it most likely means that keymap abbreviation array is corrupted.

17th March

was not taking pictures for this research at that moment yet, took the picture just because it was cool...

18th March

Moved the keyboard to office and started trying to take pictures of every change. I noticed that while working, jsut very few keys change, but they change forth and back often. Also, suspiciously I've noticed only left half going wrong in the office, while I am sure that right half used to go wrong at home too.

22nd March

first ten minutes of work so far


will keep bumping with some generic comments...

@mondalaci
Copy link
Member Author

@kareltucek Thanks for the detailed feedback! Can you reproduce with the halves merged and without the bridge cable?

@kareltucek
Copy link
Collaborator

kareltucek commented Mar 23, 2021

Yes, I can.

Yep...

@mondalaci
Copy link
Member Author

@kareltucek Please reduce the I2C baud rate until you can reproduce this issue.

@kareltucek
Copy link
Collaborator

kareltucek commented Mar 25, 2021

Hmmm, reduced the baud rate, then increased it again, and have not seen backlight issue since then. I guess that the cable was badly plugged in so the requested disconnect and consequent reconnect fixed the issue... Edit: this hypothesis is not consistent with reproducing the issue without the bridge cable.

Just ran twice into the keyboard going wild (stuck mouse movement etc)...

@kareltucek
Copy link
Collaborator

kareltucek commented Mar 26, 2021

Still, there is some stability in the behaviour of the keyboard. It does not randomly do all the weird things I keep complaining about at the same time. It always does one again and again. Sometimes frequently, sometimes not so frequently. But then it suddenly changes to some other behaviour. There's one variable though - I am still doing some changes to my user config at a pace of approximately one change every one or two days. I am wondering if there might be some connection.

Anyway will keep an eye on it...

@steamraven
Copy link
Contributor

steamraven commented Mar 28, 2021

Could part of the problem be a race condition with the array of LedValues? The array is updated from both the main loop and several interrupt callbacks (wakeUpUhk, UsbCommand_ApplyConfig). In addition the data is read and sent to the chips from another interrupt callback (slaveSchedulerCallback). This it is possible for the ledValues array to be modified while it is being read.

Here is some code to try to remove the biggest race conditions. There still may be more. Be warned, this code not tested on a real keyboard
https:/steamraven/firmware/tree/synchronous_led

@kareltucek
Copy link
Collaborator

kareltucek commented Mar 29, 2021

@mondalaci just noticed this:

requestedBaudRate:100000 | *actualBaudRate:1102* | I2C0_F:0b00101100

Which implies that unless I am missing something, my messing with i2c was not very meaningful. If more experiments are desired, please provide more detailed instructions.

(Atm, backlight is still totally stable.)

@steamraven
Do you have any specific hypothesis?

I mean, yeah, race conditions and interrupt parallelism are both very dangerous things, and doing things in interrupts that dont' need to be done in interrupts is irresponsible, and yeah, I do support your ideas about "well written code"... But as long as the LedMap content is stable (e.g., does not change rapidly during update cycle), write race conditions should not matter. As for the reading part (the LedMap and writing the i2c), multiple interrupt handlers interrupting each other and writing the i2c "at the same time" sure can be a problem. However, to cite Laszlo:

The slave scheduler is written in a way that buffer override shouldn't be an issue.

(Note that I am just using my common sense here - no actual expertise.) Also note that I cannot actually confirm whether the i2c interrupt race conditions are actually prevented or not, and if I was to invest time into in, this would be where I would start.

Laci's hypothesis is errors happening on the i2c. I am wondering if we can test it. @mondalaci any way to actually stress the system to start making errors?

@steamraven
Copy link
Contributor

@kareltucek My thoughts for race condition came from the different color patterns that repeatedly came up. They mentioned that 4 keys turn yellow at the same time, consistently. From your pictures, there are several keys with the same color. So every third byte for several places would have to be similar. Corruption due to noise on the i2c I would think be a lot more random. But race conditions on single cores can be consistent. And their solutions are usually a lot easier than trying to correct a noise bus so I wanted to look here first.

However, you are right, I don't think there is a big of problem as I thought. The slave scheduler and the USB handler are at the same priority and so will not pre-empt one another. The slave schedule should be fine as long as nothing interrupts it. The USB Handler runs on wakeup/sleep and Applyconfig, so wouldn't generally be causing continuous changes. Plus, it looks like even if they executed at the same time as the main thread, stuff should come out the same value in the end.

It would be nice to pull the LedDriverValues array after corruption is seen just to be sure. (Issue #322 )

But back to baudrate. Your "actual baud rate" looks really weird. From the source the sdk tries to come up with a value to match the requested. The I2C0_F has those numbers: bit 7-6: Multiplier b00 translates to 1. Bit 5-0 divider index = b101100 == 44 which translates to a divisor of 576. so 120 Mhz / 2 / (1 * 576) should get us 104167, which is about right. Don't know where 1102 comes from.

@kareltucek
Copy link
Collaborator

It would be nice to pull the LedDriverValues array after corruption is seen just to be sure. (Issue #322 )

Good point.

But back to baudrate. Your "actual baud rate" looks really weird.

Well, what is your baud rate? For me, the value 1102 is the same across two computers and both my keyboards (production v1 and prototype v2). Or do you have even no v1 for testing?

@steamraven
Copy link
Contributor

I'm thinking the weird "actualBaudRate" is a red herring. I can't figure out why it is not correct, but the fact that the actual multiplier is correct is indicative that the bus baud rate really is 104 kHz.

@mondalaci any way to actually stress the system to start making errors?

If you want to stress the system you can set the Baud Rate at 400Khz. Technically, the chips can handle up to 1 Mhz, but the capacitance on the bus will probably make it impossible to communicate at that level. If you change the baud rate (through code or usb command), the I2C0_F should update as well.
A couple values:

50000:   actual 52083,   I2C0_F: 0b00110100
100000:  actual 104166,  I2C0_F: 0b00101100
200000:  actual 208333,  I2C0_F: 0b00100100
400000:  actual 416666,  I2C0_F: 0b00011100
800000:  actual 833333,  I2C0_F: 0b00010011
1000000: actual 1000000, I2C0_F: 0b01000101

If you are interested, I could put some testing code into the firmware that would dump lots of random values onto the bus.

(Atm, backlight is still totally stable.)

Are you still experiencing any led corruption?

@kareltucek
Copy link
Collaborator

kareltucek commented Apr 5, 2021

If you want to stress the system you can set the Baud Rate at 400Khz. Technically, the chips can handle up to 1 Mhz, but the capacitance on the bus will probably make it impossible to communicate at that level. If you change the baud rate (through code or usb command), the I2C0_F should update as well.

Nice! Instant reproduction of that memory corruption (set to 400Khz via agent script)(keystate matrix and keymap/layer indices corrupted).

(Just for the record - I was already trying that week ago via source code with 1MHz without any results, so I am now quite surprised.)

(Atm, backlight is still totally stable.)

Are you still experiencing any led corruption?

Still totally stable.

@kareltucek
Copy link
Collaborator

I can confirm that after applying that memory corruption fix, the worst problems go away. I.e.:

  • Haven't seen any of the previously observed behaviours since then, not even when set baud rate to 400Khz
  • If I dig into the baud rate in any way, the left half becomes terribly slow, but apart from that behaves correctly.
  • Weirdly, the keyboard slows down even if I try to set it to 100000 from clean state. That should be no-op.
#reset keyboard
$ ./set-i2c-baud-rate.ts 100000
$ ./set-i2c-baud-rate.ts 100000
Segmentation fault (core dumped)
me@L490:/opt/firmware/lib/agent/packages/usb$ ./get-i2c-health.ts
uptime: 0d 0:1:15
requestedBaudRate:160 | actualBaudRate:3906 | I2C0_F:0b10111111
leftHalf      : nak:122
leftModule    : nak:13503
rightModule   : nak:13503
rightLedDriver: nak:13503
leftLedDriver :
kboot         : arbitrationLost:1

#reset keyboard 
$ ./get-i2c-health.ts
uptime: 0d 0:0:5
requestedBaudRate:100000 | actualBaudRate:104166 | I2C0_F:0b00101100
leftHalf      : nak:122
leftModule    : nak:3733
rightModule   : nak:3736
rightLedDriver: nak:3739
leftLedDriver :
kboot         :

@kareltucek
Copy link
Collaborator

kareltucek commented Apr 5, 2021

(Still, atm I have no evidence in hand saying that the slave count memory corruption is related to the LED matrix corruption.)

@steamraven
Copy link
Contributor

@kareltucek you get the best errors!

Put in a pull request to fix the agent's setBaudRate

@kareltucek
Copy link
Collaborator

kareltucek commented Apr 5, 2021

Allright, one more problem: UHK suddenly started refusing to be flashed from both my stations (tried two different cables and joining halves). Worked fine for several hours. Maybe related, maybe not - it started happening right after I flashed an older branch - before fix of the led brightness.

So I figure this might actually be a power issue (caused by LEDs draining too much power - I have been running on intensity 32, but now they shine on full intensity due to the bug in the currently flashed firmware), or issue related to corruption of the IsBusPalOn region.

I think I have encountered this before (related to UltimateHackingKeyboard/agent#1474 - at that time it wouldn't work on my notebook in the office, yet would work with my home desktop and maybe also with the notebook after I brought them home).

Anyway, this is just for your info and amusement - I will simply try to disconnect the keyboard for a few hours and hope that it is ok when I return... EDIT: no result. V1 works fine, so I will use that for development for now...

@kareltucek
Copy link
Collaborator

Ok, just ran into the flashing problem with the second unit too.

But turns out that shortening the reset pins fixes the problem.

Despite the case being easy to disassemble, I still wish there was a dedicated button accessible directly without having to take of palm rests etc...

@kareltucek
Copy link
Collaborator

Turns out that (at least part of) flashing problems were probably caused by dereferencing a null pointer during firmware runtime. :slightly_embarassed:

@steamraven
Copy link
Contributor

Despite the case being easy to disassemble, I still wish there was a dedicated button accessible directly without having to take of palm rests etc

Isn't that what the magnetic reed switch is for? Just pass a magnet under the keyboard

@kareltucek
Copy link
Collaborator

I am not aware of any reed switch and magnet does not seem to have any effect (at least not on v1).

The reset micro switch which is on the bottom only reloads settings and throws the keyboard into factory keymap.

@mondalaci
Copy link
Member Author

There is a pair of pads for the reed switch, indeed, but it's not populated by default. I plan to ship your UHK 60 v2 this week, @steamraven. Do you want me to solder the reed switch? It'd be helpful in such cases.

@kareltucek I can send you a reed switch in your next package if you're interested in soldering it.

I'll write a dedicated guide on advanced UHK hardware development topics, including the reed switch, the test LEDs, and the I2C debug header, eventually.

Excellent work on your fixes, @steamraven!

@kareltucek
Copy link
Collaborator

@kareltucek I can send you a reed switch in your next package if you're interested in soldering it.

Yes please :-)!

@steamraven
Copy link
Contributor

@mondalaci Yes please if it will not delay anything. I can solder it on from home if its quicker.

@AkechiShiro
Copy link

Any news on this issue, has a PR landed or still WIP ?

@kareltucek
Copy link
Collaborator

kareltucek commented Mar 11, 2022

I believe it is not an issue anymore. Not sure when or how it got fixed though. (Most likely, it was some of the steamraven's PRs - https:/UltimateHackingKeyboard/firmware/pulls?q=is%3Apr+steamraven )

@warsaw
Copy link

warsaw commented Mar 12, 2022

In Agent, go to LED brightnesst and turn down Key backlight brightness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants