Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing CSP mode and analyzing movement issues #254

Closed
rsantos88 opened this issue Jan 21, 2021 · 30 comments
Closed

Testing CSP mode and analyzing movement issues #254

rsantos88 opened this issue Jan 21, 2021 · 30 comments

Comments

@rsantos88
Copy link
Contributor

I've recently been working with pt-mode to execute offline trajectories in TEO. This type of operation has led to some "accidents" that took place before Christmas (I think the video has already gone around all the robotics classes to show what not to do). This resulted in a broken forearm that @smcdiaz had to reprint and fix, after much sweat and effort. The movement resulted in a loss of control of the arm, causing strong tremors and leading to a broken arm. This resulted in me being more careful with the trajectories that I wanted to launch, forcing me to scale them in time, interpolating and making them much slower.
Currently, I am doing tests to analyze what this behavior is due to and I want to share it, in case you all can give me a hand and get TEO to perform fluid and fast movements without giving him parkinson's.

First, I am executing a simple trajectory in which the left elbow moves -90 degrees (forward -> 100 poses) and 90 degrees (backward-> 100 poses), returning to its home position, repeating this sequence 4 times. This trajectory has a total of 800 poses, each column corresponding to an articular position. I upload the CSV file test_elbow.zip with the test path. The fourth column would be the one that corresponds to the movement of the elbow, which is the one that interests us.

As you can see, after graphing the resulting trajectory, it is smooth and clean.

image

Here comes the tests. In the video that you can see below, the trajectory is executed using different frequency values ​​in the sending of the positions. In each of these tests, the period (delay) in which my application sends each of the positions to the driver, such as the syncPeriod variable when initializing launch, has been modified to the same value. The values ​​of these periods are 0.02s, 0.03s, 0.05s, 0.08s 0.1s.

IMAGE ALT TEXT
https://youtu.be/mZzC227wNbg

  • 0.02s: As @PeterBowman told me, this would be the lowest value that the driver would support. Lower than 2ms the channel would saturate. As you can see, the driver ends up jumping red and stops.

  • 0.03s: Strong blows or sudden movements occur along the trajectory. At the end of the execution of the trajectory, he loses control and begins to enter a dangerous parkinson's movement which is what gave rise to the "accident" mentioned at the beginning.

  • 0.05s, 0.08, 0.1s: Here a smoother progressive movement is appreciated but there are still those strong blows that give rise to a not very fluid trajectory and dangerous for the robot's joints. The problem still exists even when the period goes down a lot. This is the most worrying. Which may be due?

Any subjection you can try is welcome. Thanks in advance!

@PeterBowman
Copy link
Member

PeterBowman commented Jan 21, 2021

Thank you so much for sharing this, @rsantos88.

0.02s: As @PeterBowman told me, this would be the lowest value that the driver would support. Lower than 2ms the channel would saturate. As you can see, the driver ends up jumping red and stops.

Note this value is equivalent to 20 ms, not 2 ms. Perhaps (I hope it) the failure was not due to channel overflow, rather joint slippage or other kind of motion issue, maybe even a PT buffer overflow. Did you notice any error or warning line on the console? (90 deg)/(0.02 s/100) = 45 deg/s is pretty close to the maximum velocity limit of 50 deg/s (ref). If surpassed, new reference positions are clipped to match the travel distance allowed within syncPeriod at said max speed, logging a "Maximum velocity exceeded, clipping target position" line (ref).

The problem still exists even when the period goes down a lot. This is the most worrying. Which may be due?

PT mode is still WIP at #245, too bad I didn't document my earlier tests... (edit: see #245 (comment)) I believe my next step was to analyze data throughput aboard TEO, thus avoiding network latencies. Perhaps new setpoints are sent intermittently and the internal buffer reaches low/empty state (should never happen, though).

By the way, did you notice any of these blows when using default position direct mode (CSP)? (I mean, the only one available in master branch; I presume you have been working with the offline-ip-mode branch, please correct me if I'm wrong)

PS I have recently discovered that PT setpoints might be affected by sampling time: #253 (comment). If that's correct, it could explain the total havoc you observed when TEO dismembered itself.

@PeterBowman
Copy link
Member

Perhaps (I hope it) the failure was not due to channel overflow, rather joint slippage or other kind of motion issue, maybe even a PT buffer overflow.

Actually, in PT mode setpoints are sent in bursts in order to populate the internal buffer:

// drive's buffer is empty, motion has not started yet, we have enough points in the queue
if (!vars.ipBufferFilled && !vars.ipMotionStarted && linInterpBuffer->isQueueReady())
{
bool ok = true;
for (auto setpoint : linInterpBuffer->popBatch(true))
{
ok &= can->rpdo3()->write(setpoint); // load point into the buffer
}
vars.ipBufferFilled = ok;
return ok;
}

It could make sense that a network saturation occurred even if staying quite far from the empyrical 2 ms limit - precisely because of those swift moments of sending N*buffer_size points over the CAN network. @rsantos88 did you command just the elbow joint, or all six arm joints (but only changing the elbow angle)?

@rsantos88
Copy link
Contributor Author

Thank you so much for sharing this, @rsantos88.

thanks to you for the help!

Note this value is equivalent to 20 ms, not 2 ms

Yes, it was a typo. I ate a zero. I wanted to say 20 ms :)

Perhaps (I hope it) the failure was not due to channel overflow, rather joint slippage or other kind of motion issue, maybe even a PT buffer overflow. Did you notice any error or warning line on the console?

I can assure you that in the 0.05s, 0.08s and 0.1s tests there were no errors or warnings on the screen. Execution starts and ends successfully. In the case of the first 2 tests, yes it shows error but I didn't realize to save the registry, sorry. I can do it for next time. Anyway, I am still concerned that with such a low frequency of sending setpoints, there will continue to be sudden stops in the movement, which don't help to perform a fluid movement.

By the way, did you notice any of these blows when using default position direct mode (CSP)? (I mean, the only one available in master branch; I presume you have been working with the offline-ip-mode branch, please correct me if I'm wrong)

At all times I've worked with the master branch of yarp-devices. I'm not working with the offline-ip-mode branch. Do you recommend trying it?. I didn't realize the existence of several pt modes of operation in different branches.

It could make sense that a network saturation occurred even if staying quite far from the empyrical 2 ms limit - precisely because of those swift moments of sending N*buffer_size points over the CAN network. @rsantos88 did you command just the elbow joint, or all six arm joints (but only changing the elbow angle)?

Specifically, to carry out these tests, all the joints of both arms were commanded, with the exception of the trunk joints which would correspond to the last 2 columns of the csv (these were disabled). Therefore, all the joints of the arms were controlled but only by changing the angle of rotation of the elbow.

@PeterBowman
Copy link
Member

Ok, so you were actually working with CSP mode, I'll update the title accordingly. Sorry I wasn't clear enough in our previous conversations, in terms of YARP/iPOS control modes this boils down to:

  • In master branch, yarp::dev::IPositionDirect maps to Cyclic Synchronous Position (CSP) mode on the iPOS drives. New setpoints are sent at a fixed rate of syncPeriod. If no new point arrives via setPosition, the previous point is reused. It is the user's responsibility to adjust the rate of setPosition commands to syncPeriod (or faster, in which case intermediate points may be discarded). PT/PVT is disabled in this branch (ref).
  • In offline-ip-mode branch, I'm currently working to restore and improve our old implementation of yarp::dev::IPositionDirect using PT/PVT linear interpolation submodes. This should be especially useful for offline trajectories. It has been tested with a single joint with PT, but I do not recommend doing anything more complex at this point (and PVT is known to crash).

For the record, @jgvictores and I did some tests with the right arm at 0.01 s. It was registered in a YT video linked here. It looks smooth, but crank your PC volume up - you'll notice the sound the joints make on a harsh transition between some points. Yet the fastest I could work with CSP on the same limb was 0.002 s (not 0.02 s).

At this point, I wonder if the same tests you have described here would succeed or at least perform better on the other arm. I'm also curious, what if we issue position direct commands just to the elbow joint, i.e. leave the other five joints in default position control mode?

What would I do next: save the value variable from this snippet to a file along with the timestamp, and plot the positions over time to check whether there are some "gaps", i.e. the device sends the same point twice.

@PeterBowman PeterBowman changed the title Testing mode-pt and analyzing movement issues Testing CSP mode and analyzing movement issues Jan 22, 2021
@PeterBowman
Copy link
Member

By the way, could you share the code used in this demo (specifically, the command loop)?

@rsantos88
Copy link
Contributor Author

rsantos88 commented Jan 24, 2021

Thank you very much for explaining the different modes of operation that exist in each of the branches.

For the record, @jgvictores and I did some tests with the right arm at 0.01 s. It was registered in a YT video linked here. It looks smooth, but crank your PC volume up - you'll notice the sound the joints make on a harsh transition between some points. Yet the fastest I could work with CSP on the same limb was 0.002 s (not 0.02 s).

It's true that it looks quite soft in the video.. Uploading the CSV file and plotting it is one of the first things I wanted to make sure if the problem could come from the origin of the trajectory. It's interesting to consider all the possibilities.

At this point, I wonder if the same tests you have described here would succeed or at least perform better on the other arm. I'm also curious, what if we issue position direct commands just to the elbow joint, i.e. leave the other five joints in default position control mode?

Ok, it's a good idea. I'll try to perform the same trajectory with the other arm and also execute only the joint involved in position direct and the others in default position control mode.

What would I do next: save the value variable from this snippet to a file along with the timestamp, and plot the positions over time to check whether there are some "gaps", i.e. the device sends the same point twice.

Any proposal that occurs to you is welcome

By the way, could you share the code used in this demo (specifically, the command loop)?

Yes of course. This is the python code that I run for the CSV file, with the commented trunk code (just like I ran it last time):

script
import csv
import math
import sys
import yarp

DELAY = 0.02 

robotPrefix = sys.argv[1]
playerPrefix = '/player' + robotPrefix
csvPath = sys.argv[2]

if robotPrefix == '/teo':
    isReal = True
elif robotPrefix == '/teoSim':
    isReal = False
else:
    print('error: illegal prefix parameter, choose "/teo" or "/teoSim"')
    quit()

yarp.Network.init()

if not yarp.Network.checkNetwork():
    print('error: please try running yarp server')
    quit()

options = yarp.Property()
options.put('device', 'remote_controlboard')

options.put('remote', robotPrefix + '/leftArm')
options.put('local', playerPrefix + '/leftArm')
leftArmDevice = yarp.PolyDriver(options)
leftArmEnc = leftArmDevice.viewIEncoders()
leftArmMode = leftArmDevice.viewIControlMode()
leftArmPosd = leftArmDevice.viewIPositionDirect()

options.put('remote', robotPrefix + '/rightArm')
options.put('local', playerPrefix + '/rightArm')
rightArmDevice = yarp.PolyDriver(options)
rightArmEnc = rightArmDevice.viewIEncoders()
rightArmMode = rightArmDevice.viewIControlMode()
rightArmPosd = rightArmDevice.viewIPositionDirect()

'''
options.put('remote', robotPrefix + '/trunk')
options.put('local', playerPrefix + '/trunk')
trunkDevice = yarp.PolyDriver(options)
trunkEnc = trunkDevice.viewIEncoders()
trunkMode = trunkDevice.viewIControlMode()
trunkPosd = trunkDevice.viewIPositionDirect()
'''
leftArmAxes = leftArmEnc.getAxes()
rightArmAxes = rightArmEnc.getAxes()
# trunkAxes = trunkEnc.getAxes()

leftArmMode.setControlModes(yarp.IVector(leftArmAxes, yarp.VOCAB_CM_POSITION_DIRECT))
rightArmMode.setControlModes(yarp.IVector(rightArmAxes, yarp.VOCAB_CM_POSITION_DIRECT))
# trunkMode.setControlModes(yarp.IVector(trunkAxes, yarp.VOCAB_CM_POSITION_DIRECT))

with open(csvPath, 'r') as csvFile:
    for row in csv.reader(csvFile):
        if True:
            print(list(map(float, row[:6])))
            print(list(map(float, row[6:12])))
            print(list(map(float, row[12:])))
            leftArmPosd.setPositions(yarp.DVector(list(map(float, row[:6])))) 
            rightArmPosd.setPositions(yarp.DVector(list(map(float, row[6:12]))))
            #trunkPosd.setPositions(yarp.DVector(list(map(float, row[12:]))))
            yarp.delay(DELAY)

leftArmDevice.close()
rightArmDevice.close()
# trunkDevice.close()

yarp.Network.fini()

@PeterBowman
Copy link
Member

Thank you for sharing. Let's take this snippet:

with open(csvPath, 'r') as csvFile:
    for row in csv.reader(csvFile):
        if True:
            print(list(map(float, row[:6])))
            print(list(map(float, row[6:13])))
            print(list(map(float, row[13:])))
            leftArmPosd.setPositions(yarp.DVector(list(map(float, row[:6])))) 
            rightArmPosd.setPositions(yarp.DVector(list(map(float, row[6:12]))))
            #trunkPosd.setPositions(yarp.DVector(list(map(float, row[12:]))))
            yarp.delay(DELAY)

You are applying a fixed delay on each iteration. This might not be the same as executing each loop at a fixed rate because it takes some (tiny) time to process the lines that precede yarp.delay(DELAY). This time accumulated over 800 iterations might not be that small after all. I'd suggest something along the lines of the following code, not tested:

with open(csvPath, 'r') as csvFile:
    start = yarp.now()
    i = 1
    for row in csv.reader(csvFile):
        print(list(map(float, row[:6])))
        print(list(map(float, row[6:13])))
        print(list(map(float, row[13:])))
        leftArmPosd.setPositions(yarp.DVector(list(map(float, row[:6])))) 
        rightArmPosd.setPositions(yarp.DVector(list(map(float, row[6:12]))))
        delay = DELAY * i - (yarp.now() - start)
        yarp.delay(delay)
        i = i + 1

Probably saving the CSV to a local list variable first, and then iterating, is also worth considering.

@rsantos88
Copy link
Contributor Author

rsantos88 commented Jan 29, 2021

I'm thinking of testing the following code on the robot and I wanted to share it with you to see what you think.
I plan to get graphs resulting from encoders with different front joints (eg elbow and wrist). I also want to try it with the other arm.
The idea is execute only the joint involved in position direct and the others in default position control mode.

script
import csv
import math
import sys
import yarp

# usage: python3 testSingleJoint.py /teoSim 3 test/test.csv  test/test_encs.csv

DELAY = 0.05

robotPrefix = sys.argv[1]   # prefix
joint = int( sys.argv[2])   # joint
csvInPath = sys.argv[3]     # csv IN csvFile
csvOutPath = sys.argv[4]    # csv OUT csvFile


playerPrefix = '/player' + robotPrefix


if robotPrefix == '/teo':
    isReal = True
elif robotPrefix == '/teoSim':
    isReal = False
else:
    print('error: illegal prefix parameter, choose "/teo" or "/teoSim"')
    quit()

yarp.Network.init()

if not yarp.Network.checkNetwork():
    print('error: please try running yarp server')
    quit()

options = yarp.Property()
options.put('device', 'remote_controlboard')

options.put('remote', robotPrefix + '/leftArm')
options.put('local', playerPrefix + '/leftArm')
leftArmDevice = yarp.PolyDriver(options)
leftArmEnc = leftArmDevice.viewIEncoders()
leftArmMode = leftArmDevice.viewIControlMode()
leftArmPosd = leftArmDevice.viewIPositionDirect()

options.put('remote', robotPrefix + '/rightArm')
options.put('local', playerPrefix + '/rightArm')
rightArmDevice = yarp.PolyDriver(options)
rightArmEnc = rightArmDevice.viewIEncoders()
rightArmMode = rightArmDevice.viewIControlMode()
rightArmPosd = rightArmDevice.viewIPositionDirect()

leftArmAxes = leftArmEnc.getAxes()
rightArmAxes = rightArmEnc.getAxes()

# single joint
leftArmMode.setControlMode(joint, yarp.VOCAB_CM_POSITION_DIRECT)
#rightArmMode.setControlMode(joint, yarp.VOCAB_CM_POSITION_DIRECT)

with open(csvInPath, 'r') as csvInFile:
    start = yarp.now()
    i = 1
    csvreader = csv.reader(csvInFile)
    with open(csvOutPath, 'w', newline='') as csvOutfile:
        csvwriter = csv.writer(csvOutfile, delimiter=',')
        for row in csvreader:
            if True:
                leftArmPosd.setPosition(joint, float(row[3])) # set position
                # rightArmPosd.setPosition(joint, float(row[3])) # set position
                csvwriter.writerow([leftArmEnc.getEncoder(joint)])
                print(leftArmEnc.getEncoder(joint))
                delay = DELAY * i - (yarp.now() - start)
                yarp.delay(delay)
                i = i + 1

leftArmDevice.close()
rightArmDevice.close()

yarp.Network.fini()

@PeterBowman
Copy link
Member

PeterBowman commented Jan 30, 2021

Looks nice, but I find it more interesting to register the command-timestamp pairs sent at each precise SYNC instant from the TechnosoftIpos device itself, as explained in the quote below. You could still use the getEncoder output to compare both graphs, but the timestamp (yarp.now()) is missing in your code.

What would I do next: save the value variable from this snippet to a file along with the timestamp, and plot the positions over time to check whether there are some "gaps", i.e. the device sends the same point twice.

@rsantos88
Copy link
Contributor Author

Ok, I'll follow your advise. I've prepared the test code here.
Would it be correct to initialize the timestamp here ? (in the end of initiaze function)

@PeterBowman
Copy link
Member

It should be fine, but the substraction yarp::os::Time::now()-initTime wouldn't really matter as initTime is an arbitrary initial instant anyway. If you leave that simply as yarp::os::Time::now(), then yarp.now() in the Python script will give comparable values thanks to using a synchronized network clock.

@PeterBowman
Copy link
Member

As suggested by @jgvictores on private chat, you can query the timestamp associated to the exact instant each encoder value was read (beware that you need YARP 3.4.0 or higher):

encs = dd.viewIEncodersTimed()
v1 = yarp.DVector(1)
t1 = yarp.DVector(1)
encs.getEncoderTimed(0, v1, t1)
print(v1[0], t1[0])

@rsantos88
Copy link
Contributor Author

Nice! I'll test it when I upgrade to the new yarp version on my computer. At the moment I plan to test it like this because I am working with yarp 3.3.2 ^^u. Removed initiTime. Thanks for the info

with open(csvInPath, 'r') as csvInFile:
    start = yarp.now()
    i = 1
    csvreader = csv.reader(csvInFile)
    with open(csvOutPath, 'w', newline='') as csvOutfile:
        csvwriter = csv.writer(csvOutfile, delimiter=',')
        csvwriter.writerow(['timestamp', 'value'])
        for row in csvreader:
            if True:
                print('reading >> ', row[3])
                leftArmPosd.setPosition(joint, float(row[3])) # set position
                print('encoder >> ', leftArmEnc.getEncoder(joint))
                csvwriter.writerow([yarp.now(), leftArmEnc.getEncoder(joint)])
                delay = DELAY * i - (yarp.now() - start)
                yarp.delay(delay)
                i = i + 1

@rsantos88
Copy link
Contributor Author

I am going to gradually add some additional information to this issue so that some may become useful to discover the problem .
Testing the script without these extra lines, I've observed that at the moment in which the sudden movement occurs, the following launch message appears on the screen:

image

Maybe I can give some clue...

@rsantos88
Copy link
Contributor Author

Doing more experiments, I don't think this message from [info] about our issue makes sense, as it is reporting messages from other joints that we are not acting on ...
In this other experiment, I've applied the lines suggested here for the delay and I observe the phenomenon of small abrupt accelerations that give rise to the following messages:

image

@rsantos88
Copy link
Contributor Author

rsantos88 commented Feb 4, 2021

Yesterday I was carrying out a series of tests that I've tried to collect in detail to share as best as possible what is happening.
First of all, the tests have been performed on the right arm using the manipulation-rightArm.ini configuration file to avoid displaying unnecessary messages on launchCanBus. Tests have been made with the periods previously shown (0.03s, 0.05s, 0.08s and 0.1s). A video of each of the tests carried out has been recorded (captured at 60fps to be able to perceive the problem in more detail), a txt file with the data obtained from the launch named terminal.txt, the csv file named log.csv obtained from the log resulting from the positions commanded to the driver and the csv file named test_encs.csv created by a python script of the positions read and commanded to the robot, as well as the position of the encoder (note that this script has been modified to move exclusively the right arm). Both CSV files include the timestamp. I want to note that the python script was launched from the teo-manipulation computer.
Here the experiments:

Some observations: by adding the code to more precisely adjust the delay, you no longer notice sudden stops in the movement of the trajectory, but you do notice small sudden accelerations that can lead to loss of control of the articulation at high frequencies of position sending.

Lastly, I share the trajectories thrown in both arms where all the joints of each arm move here. The execution period of each setpoint is 0.05s. The scripts launched have been:

Right Arm:

Left Arm:

Results Right Arm:
Watch the video
Files: 0.05.zip

Results Left Arm:
Watch the video
Files: 0.05.zip

@rsantos88
Copy link
Contributor Author

Sorry for the mistake, I correct the attached files and upload them again.

@rsantos88
Copy link
Contributor Author

Sorry for the mistake, I correct the attached files and upload them again.

The file terminal.txt from test 0.03 is not correct. Sorry. The other files are correct. I remove it from the zip

@PeterBowman
Copy link
Member

PeterBowman commented Feb 4, 2021

Thank you for gathering the data. I have noticed that the synchronization callback is called twice at each command (most lines in log.csv are dupes regarding position values). This should be fixed at d8f244c. I believe that CAN slaves process the last of these two commands, silently discarding the other, but let's see if anything changes.

I've been using https://www.graphreader.com/plotter to plot the graphs. Notes:

  • Right arm, 0.05 s: a point is skipped at lines 58-59 (~17 degrees), which correlates with the behavior observed in the video.
  • Similarly, see 0.03 s at line 128 (~63 degrees) and 0.08 s at line 243 (~49 degrees).
  • There are no signs of the catastrophe of the 0.03 s clip in the CSV, the commanded trajectory looks pretty smooth.

Also, there is a delay between setPosition coordinates and getEncoder values, both read within the same iteration in the Python script. I need to investigate the underlying implementation of CSP on the iPOS-side.

@PeterBowman
Copy link
Member

PeterBowman commented Feb 5, 2021

Also, there is a delay between setPosition coordinates and getEncoder values, both read within the same iteration in the Python script. I need to investigate the underlying implementation of CSP on the iPOS-side.

Can't confirm, the closest thing I have found is a mention of a position contouring mode in MotionChip™ II TML Programming User Manual. Perhaps iPOS drives map old PT/PVT and "modern" CSP to the same thing under the hood. Default buffer size of PT submode is 9 points, which actually matches the observed delay in all test_encs.csv files regardless of the sync/command period.

Focusing on that brief accelerations, the correlation is clear: every time the joint did a funny thing, there was an abnormal transition between successive points in the CSV. I actually expected a twofold explanation to this: either two points arrive within the same interval, thus skipping the earliest and consequently traversing a longer distance, or no point arrived within said interval, resulting in a flat line in the pose-vs-time plot because of reusing the last setpoint. I'm surprised there is no trace of the latter case in the logs.

Idea: send new references faster, feed them to the drives slower. For instance if sync period is 50 ms (.ini file), send positions every 10 ms (Python script). Of course the trajectory should be discretized with care in order to avoid excessive joint speed.

@PeterBowman
Copy link
Member

PeterBowman commented Feb 5, 2021

You are applying a fixed delay on each iteration. This might not be the same as executing each loop at a fixed rate because it takes some (tiny) time to process the lines that precede yarp.delay(DELAY). This time accumulated over 800 iterations might not be that small after all.

It seems yarp::os::Timer, which is responsible for looping over CSP setpoints and forwarding them to the drives, either doesn't follow this rule or its implementation is buggy. There is a drift between the Python loop period and this CSP sync period, and clearly the CSP one adds a tiny delay on each iteration (as if not accounting for the time spent processing the step). It's easy to verify this by inspecting the CSV (iterations * period + start timestamp != end timestamp), or just check https://chart-studio.plotly.com/~PeterBowman/2/#/ (double click to zoom out). As soon as one loop "catches" and surpasses the other, a step is missed and hence the motor is commanded to travel twice the usual distance. This also explains the second case I was talking about earlier:

I actually expected a twofold explanation to this: either two points arrive within the same interval, thus skipping the earliest and consequently traversing a longer distance, or no point arrived within said interval, resulting in a flat line in the pose-vs-time plot because of reusing the last setpoint. I'm surprised there is no trace of the latter case in the logs.

Screenshot_2021-02-05 Online Graph Maker · Plotly Chart Studio

It is therefore utterly important to iterate with a constant period if a smooth motion is desired, both in user-script command loop and onboard sync loop. Besides, it would be best to interlace these loops so that timestamps don't concur at the same instant (in order to avoid race conditions which will cause these annoying "jumps"). If that's impossible or hard to achieve (mind the distributed architecture), let me stress on this idea:

Idea: send new references faster, feed them to the drives slower. For instance if sync period is 50 ms (.ini file), send positions every 10 ms (Python script). Of course the trajectory should be discretized with care in order to avoid excessive joint speed.

Alternatively, I was thinking about giving a second chance to legacy PT/PVT linear interpolation submodes, since supposedly CSP is implemented in a similar manner. CSP is much easier to work with, but it gives less or no freedom in terms of controlling the internal buffer it seems to have. I'd consider adding a second internal buffer to PT/PVT, subordinate to the 9-point drive's buffer, as a means to collect incoming points and even smooth that sequence a bit if necessary.

@PeterBowman
Copy link
Member

PeterBowman commented Feb 5, 2021

It seems yarp::os::Timer, which is responsible for looping over CSP setpoints and forwarding them to the drives, either doesn't follow this rule or its implementation is buggy.

Habemus bug. It is actually worse as the underlying PeriodicThread is misbehaving. Reported upstream at robotology/yarp#2488.

@rsantos88
Copy link
Contributor Author

rsantos88 commented Feb 7, 2021

Thank you for your analysis and your work,

Idea: send new references faster, feed them to the drives slower. For instance if sync period is 50 ms (.ini file), send positions every 10 ms (Python script). Of course the trajectory should be discretized with care in order to avoid excessive joint speed.

Perfect. I've created a new trajectory with 500 setpoints for the joint displacement by 90 degrees. Sending it at 10ms we would have a speed of 18º/s. Simulating it, I see that it is a fairly slow speed. There should be no problem. I'll keep using the right elbow as a test joint with the script:

script
import csv
import math
import sys
import yarp

# usage: python3 testRightElbow.py /teoSim test/test_elbow.csv  test/test_encs.csv

DELAY = 0.01
joint = 3 # elbow

robotPrefix = sys.argv[1]   # prefix
csvInPath = sys.argv[2]     # csv IN csvFile
csvOutPath = sys.argv[3]    # csv OUT csvFile


playerPrefix = '/player' + robotPrefix


if robotPrefix == '/teo':
    isReal = True
elif robotPrefix == '/teoSim':
    isReal = False
else:
    print('error: illegal prefix parameter, choose "/teo" or "/teoSim"')
    quit()

yarp.Network.init()

if not yarp.Network.checkNetwork():
    print('error: please try running yarp server')
    quit()

options = yarp.Property()
options.put('device', 'remote_controlboard')

options.put('remote', robotPrefix + '/rightArm')
options.put('local', playerPrefix + '/rightArm')
rightArmDevice = yarp.PolyDriver(options)
rightArmEnc = rightArmDevice.viewIEncoders()
rightArmMode = rightArmDevice.viewIControlMode()
rightArmPosd = rightArmDevice.viewIPositionDirect()

rightArmAxes = rightArmEnc.getAxes()

# single joint
rightArmMode.setControlMode(joint, yarp.VOCAB_CM_POSITION_DIRECT)

with open(csvInPath, 'r') as csvInFile:
    start = yarp.now()
    i = 1
    csvreader = csv.reader(csvInFile)
    with open(csvOutPath, 'w', newline='') as csvOutfile:
        csvwriter = csv.writer(csvOutfile, delimiter=',')
        csvwriter.writerow(['timestamp', 'value'])
        for row in csvreader:
            if True:
                print('reading >> ', row[3])
                rightArmPosd.setPosition(joint, float(row[3])) # set position
                print('encoder >> ', rightArmEnc.getEncoder(joint))
                csvwriter.writerow([yarp.now(), rightArmEnc.getEncoder(joint)])
                delay = DELAY * i - (yarp.now() - start)
                yarp.delay(delay)
                i = i + 1

rightArmDevice.close()

yarp.Network.fini()

test_elbow.zip

@PeterBowman
Copy link
Member

PeterBowman commented Feb 9, 2021

I'm thinking of a possible solution for thread synchronization: let's open yet another YARP port (YAYP) devoted for sending timestamps on each SYNC action, then make clients (such as this Python script) listen and send their command on response. This could be locally (client-side) implemented in two ways:

  • A port callback. Best suited for online trajectories and fairly simple as there are no time operations involved (neither implicit nor explicit delays). Quite verbose since a new callback class must be created, but lambdas could have a bright future here: Allow lambdas as callback robotology/yarp#2404.
  • A network clock. Assuming the port name is /teoclock, this could work (alternatively, use PeriodicThread or Timer):
    yarp::os::Network yarp;
    yarp::os::Time::useNetworkClock("/teoclock");
    
    while (true)
    {
        yInfo("Network time: %f", yarp::os::Time::now());
        yarp::os::Time::delay(0.005);
    }
    A port callback is set under the hood of yarp::os::Time. On each new timestamp received (let's say every 5 ms), now() is updated and returns the new value. Note delay(x) adds x seconds to the last timestamp (not to the current system time as given by yarp::os::SystemClock::nowSystem()), therefore it's best to substract a tiny amount of time here in order to avoid near-zero timestamp comparisons, resulting in delaying twice as long as requested. I'd rather always use PeriodicThread or Timer instead of a loop, but the same issue persists and possibly could be fixed as well: PeriodicThread accumulates error over time robotology/yarp#2488 (comment).
    Note: the useNetworkClock call can be removed if the YARP_CLOCK=/teoclock env variable is set.

A port prepared to work in the second case would also be valid in the first scenario. However, only the latter is applicable for offline trajectories. I'd also consider reimplementing PT/PVT submodes for that matter and in an asynchronous fashion if possible: make TechnosoftIpos buffer incoming points, then compute time on reception between successive points, and use that in the corresponding CAN message field (on the other hand, going synchronous implies ignoring this field and setting a fixed period beforehand).

PS I find this article useful https://en.wikipedia.org/wiki/Input_lag#Typical_overall_response_times.
PS2 regarding channel priorities and dropping messages sent over the network: http://yarp.it/git-master/channelprioritization.html.
PS3 thanks, @jgvictores! #160 (comment)

@rsantos88
Copy link
Contributor Author

rsantos88 commented Feb 11, 2021

I show the results of the experiments carried out :

Observations:

  • in Test 1 very subtle accelerations can be seen along the trajectory, more perceptible in the video if attention is paid to the sound. The warning is displayed in these cases:
[warning] StateVariables.cpp:228 clipSyncPositionTarget(): Maximum velocity exceeded, clipping target position (canId: 18).
  • In Test 2 there is also a clear acceleration between seconds 28 and 29.

@PeterBowman
Copy link
Member

PeterBowman commented Feb 21, 2021

Also, there is a delay between setPosition coordinates and getEncoder values, both read within the same iteration in the Python script. I need to investigate the underlying implementation of CSP on the iPOS-side.

Can't confirm, the closest thing I have found is a mention of a position contouring mode in MotionChip™ II TML Programming User Manual. Perhaps iPOS drives map old PT/PVT and "modern" CSP to the same thing under the hood. Default buffer size of PT submode is 9 points, which actually matches the observed delay in all test_encs.csv files regardless of the sync/command period.

This is not right. I have just tested it myself via examplePositionDirect on joints 24 and 26, up to 50º total distance at 10º/s, both C++ and Python bindings (YARP 3.3.3) at 8dbd67f. There is an observed delay of two points, but this is perfectly normal with CSP.

@PeterBowman
Copy link
Member

I've reached out to Technosoft support:

Q: (...) Section 10 explains that my master device should send SYNC signals at precisely the interpolation time period configured in object 0x60C2. Due to limitations of my framework, I might introduce a small delay each time I send SYNC. How bad is that (...)?

A: If there’s a synchronization issues, it can be simply detected by checking the Target_Speed or TSPD variable values. When the synchronization doesn’t work correctly, the Target_Speed / TSPD has spikes that drop to 0 or double the average speed. A wrong synchronization leads to unwanted vibration / shock in the motor control, the reference command isn’t correctly interpreted and the current through the motor will have strange behavior, all this can lead to motor overheating and very bad performance of your application.

@PeterBowman
Copy link
Member

Habemus bug. It is actually worse as the underlying PeriodicThread is misbehaving. Reported upstream at robotology/yarp#2488.

Implemented at robotology/yarp#2515 and merged into current master branch, scheduled for the YARP 3.5 release. Our code is prepared to comply with absolute accuracy clocks thanks to 4a0b2b0.

@rsantos88
Copy link
Contributor Author

Tested the new changes that will take place with the Yarp 3.5 release. The movement in CSP mode can be appreciated much smoother and without serious problems of jumps in the trayectory. T=0.005s

VID-20210415-WA0021.mp4

@PeterBowman
Copy link
Member

PeterBowman commented May 7, 2021

All done at merged at 61dc9e3. I prepared a battery of C++/Python examples which are also documented in Doxygen (here, look for exampleXxxTrajectoryXxx.(cpp|py)). The methodology is described in a brand new tutorial at teo-developer-manual: GitHub (perma), GitBook.

Note YARP 3.5 (unreleased) is required to enable our tweaks. I tested them in two ways:

  • Loading the 20200207-planArmV10.csv and 20200207-planBackArmV10.csv trajectories from teo-demos-misc. Our playback app needed some extra care at roboticslab-uc3m/tools@850f28a. Command:
    examplePlaybackThread --remote /teo/rightArm --timeIdx 0 --timeScale 1.0 --mask 0 1 1 1 1 1 1 0 0 0 0 0 0 --file /path/to/file.csv
    
  • Teleoperation via spnav 3D mouse, using a remote callback port: roboticslab-uc3m/kinematics-dynamics@9b476c9. Commands and video below:
    yarpdev --device SpaceNavigator --period 5 --name /spnav --channels 6
    yarpdev --device BasicCartesianControl --name /bcc --local /bcc/rightArm --remote /teo/rightArm --kinematics teo-fixedTrunk-rightArm-fetch.ini --ik st
    streamingDeviceController --streamingDevice SpaceNavigator --remoteCartesian /bcc --SpaceNavigator::remote /spnav --SpaceNavigator::fixedAxes "(rotx roty rotz)" --syncPort /teo/sync:o --gain 0.01 --movi
    
csp-spnav-20210503.mp4

As they use to say on Spanish TV commercials, now it goes smooth like a toddler's butt.

PS The observed micro-bumps are targeted and presumably fixed. I'm not sure whether the chaotic motion that made the arm break might still happen. Please open a new ticket (and make sure it's not a hardware-related issue) if it does.

cc @rsantos88 @jgvictores

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants