Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autocomplete/autocorrect [feature-request] #710

Open
katherinespiess opened this issue Oct 11, 2023 · 6 comments
Open

Autocomplete/autocorrect [feature-request] #710

katherinespiess opened this issue Oct 11, 2023 · 6 comments

Comments

@katherinespiess
Copy link

This might not be feasable, but it'd be nice if we coulde implement via macro a lookup based on the last few scancodes sent.

The usecase would be:
Given a trigger, the scan codes would be stored.
Given the trigger again, those scan codes would be used to look up a table of registered typos, the lookup would produce the desired scancode sequence to be sent instead of the ones that were produced by the user typing.

Having that trigger be the space bar, that could be used to autocorrect a set of typos in somewhat fluid way.

Alternatively, having it as a list of valid scancodes sequences and sending the one that is most similar to the one produced by the user would enable the dump of a large dictionary of words without the need to actually register the typos that could happen in typing them.

The same way, the trigger could be made in such a way that the matching of the produced scancode sequence would consider only the length of the produced one, that way the user could autocomplete the words.

@tunney
Copy link

tunney commented Oct 11, 2023

Cannot see the onboard memory, or processor, coping with that.
Nor would I expect it do really.

@kareltucek
Copy link
Collaborator

Frankly, I am not optimistic about this either. Most likely this will end up on a "feel free to go on and mess with it yourself" basis.

Also, I consider this a problem that is more suitable to be dealt with by software solutions. (Partially because keyboard does not have access to the text buffer, so it would be based entirelly on scancode-based guesswork, partially because indeed memory is a bit of a problem.)

How big lookup tables are we talking about?

Could you suggest a specific macro syntax that would control this? Do you envision user-defined typo dictionary, or a pre-defined english dictionary?

@katherinespiess
Copy link
Author

The idea came to me while I was reabing about steno, and imagined what from it could be made possible with the hardware that I already own, of course in practice to use steno it'd be better to junt run plover and configure the uhk for that, but the idea of the hardware itselve being able to enhance the user's accuracy in nice.

Considering that the 200 most common words cover a bit over 50% of the average text asd the most common 1000 words
cover 80~90%, the dictionary would not need be that large, that dict would be user defined. The idea is that, given a scancode sequence bounded by a trigger key (I suppose space or tab), the firmware could iterate over that dict checking if the provided sequence happen in the dict within a error of one key. I.e. the providede seq would need to have one element added, removed, or swapped to match a given token.

A possible approach woud be to store that dict as a sorted list.
That would allow the search to happen only in the spans where the first scancode match (1, the first key was right), second scan code match (2, the second key was right, but the first was wrong), or they are crossed (3, the second was typed first)...

In the 2nd and 3rd case, there'd be no more errors to spare the user, either every other key match, in that case we can fix the typo in the beginning of the word, or not, in that case it is passed as the user typed.

The 1st case restrict significantly the search space (~1/26) and we'd do a recursive search...
Than, if there provided sequence ends, we can treat it as a snippet to be completed.

That, I believe, would be an approach feasable with the hardware, a thousand words with a saparator bythe would consume roughtly 6kB, but we could do better restricting the scans we allow to go into de dict, if we were to use only 26 letters, we could fit three letters into two bytes. That aside, of course, we'd need to store the indexes for the look up, it is, where the section of words starting with any two scans starts and ends. those indexes cannot go much deeper than that as with two scans it already would use a large amount of memory, but the search deeper than that can be made linearly, considering that it'd be around 40 words.

The key part is that the dictionary don't need to be that big to be effective, common words that are rarely miss typed could be ommited, as well as words that are not that common, the search, if we assume the user didn't commited more than one error, can be made relatively fast.

It is, for sure, something that could be made more easily in software, but is something that would be nice to have in the keyboard, specially because it makes somewhat easy to diferentiate what is a word that the user wants to be spell checked/completed or not, by using keys that do or don't trigger that process (e.g. the right and the left space).

It is possible to emulate today uning gestures, but that is way to clunky to be of any use, requiering the user to register every possible error.

Another possible implementation would be to store te list of sequences as a tree, where teh node for "the" would lead to a not terminating the word, a node for "y" followed by the terminator, or a node for a "re" followed by a terminator... That approach offers us a way to avoid having to store subsequences, but introduce some overhead in the form of pointers, each node being composed of a chunk of the sequence, the pointer to its first child and the pointer to its next sibling.

That way, the search would traverse the nodes where the first code match, where the first didn't but the second matched, and where the first and the second are swapped.

Lastly, scrappind the idea of some auto-correct feature, implementing only the snippet part would be somewhat more feasable. It is, have the user provide the matching text and the text to be produced by it. Somthing akin to:

att -> attatched  #simple snippet
(th|ht) -> the       #any of those sequences snipps
{the} -> they       #all of those keys in any order

The idea to allow the user to use it so they can have {af}{tr} represent the word after, such that by pressing 'a' and 'f' together, than 't' and 'r' together, it'd produce after. It is possible to be done today with chording and gesture, but it is again too clunky to be actually used.

Lastly, I'd like to take a moment to praise the keyboard. It has earned its name as ultimate. I cannot think of any feature that other keyboards have that the v2 don't or can't emulate. And I don't think I'll ever wanna change it for anything short of a UHKv3.

@kareltucek
Copy link
Collaborator

kareltucek commented Oct 11, 2023

  • I am still a bit unclear on how the mechanism would be triggered?

    • I assume words could be delimited by non-alpha scancodes, which means that the first trigger would be redundant?
    • Then this could be abstracted into following new commands. Is this thinking correct?:
    dictionary correct
    dictionary undoLastCorrect
    dictionary completeCycleForward <shortcut to produce if context is not relevant>
    dictionary completeCycleBackward
    

    With following mappings:

    tab to

    ifShift dictionary completeCycleBackward
    ifNotShift  dictionary completeCycleForward tab
    

    space to

    dictionary correct
    holdKey space
    

    mouse/esc to

    dictionary undoLastCorrect
    holdLayer mouse
    

    (These mappings could be made implicit and not required to be actually mapped by the user.)

  • I still don't see how the user-defined dictionary would be communicated to the UHK. (Would it be via regular macro commands, or would you need to store it separately using a new Agent feature?)

  • Tree representation stored in RAM seems infeasible to me, as the pointers are likely to take far too much memory. Maybe if the pointers are compressed into memory offsets?

  • 6kB of RAM is roughly just as much as we have free at the moment. We have some backup plans to free as much as 40 kB of memory if need be. The bottom line is that if you decide to implement this in your fork for yourself, then you are fine. If this is to be implemented / merged into master, we would probably be quite reluctant to allocate 6kB.

  • Note that static memory (i.e., with constant data compiled in) is much cheaper than dynamic. We have 512kB of it and do not know what to use it for. We would be much less reluctant to allocate lets say 100kB of read-only memory than 5kB of RAM space.

  • Now the most important question: is this worth it for you to implement it yourself and maintain on your own firmware fork? (I assume that if you decided to do that, I would support you as much as I can when it comes to navigating existing code. I also assume it may get merged eventually into mainstream or it may not, depending on feedback and experience with the feature.)

@katherinespiess
Copy link
Author

I am still a bit unclear on how the mechanism would be triggered?

Your thinking is correct, I'd just have a starting/restarting trigger so that the user can input that what was typed do not need to be corrected and that the capturing to complete can start over. Because the idea is to have just words that are common enough to justify it be completed. Like using 'th' for the and 'the' for they...

Meaning, most of the time, preferrably always, the have a single match.

I still don't see how the user-defined dictionary would be communicated to the UHK. (Would it be via regular macro commands, or would you need to store it separately using a new Agent feature?)

The idea was to have the dictionarie defined and have it static betwen configuration updates, like a "big" file uploaded to the keyboard by the agent, that'd be consumed, holding as little as possibel in the ram, just whats needed to actually traverse the file.

That could be something appended to the firmware, that'd make it harder to update, but if it'd allow us to use much more memory it'd be preferrable. If we consider using those 100kB of rom, that could have a large amout of indexing baked in. Have it as an array of words, and a large pre baked search tree whose leafs point out to the array. Having it baked by the agent allow us to free much more of the processing aswell...

Now the most important question: is this worth it for you to implement it yourself and maintain on your own firmware fork? (I assume that if you decided to do that, I would support you as much as I can when it comes to navigating existing code. I also assume it may get merged eventually into mainstream or it may not, depending on feedback and experience with the feature.)

Now, that't the question, I have little to no experience with C, so surely the code that I'd produc would be unpropper to merge to the mainstream. But I think I will take some time and do a proof of concept... And even for that I believe that I'll be needing some help...

@kareltucek
Copy link
Collaborator

And even for that I believe that I'll be needing some help...

I think I can provide that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants