-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests for remote examples #800
Conversation
Codecov Report
@@ Coverage Diff @@
## main #800 +/- ##
==========================================
- Coverage 64.69% 64.63% -0.06%
==========================================
Files 107 107
Lines 9259 9259
==========================================
- Hits 5990 5985 -5
- Misses 3269 3274 +5
Continue to review full report at Codecov.
|
…ookresearch/Mephisto into add-tests-for-remote-examples
examples/remote_procedure/mnist/webapp/cypress/e2e/remote_procedure_mnist.cy.js
Outdated
Show resolved
Hide resolved
cy.get('[data-cy="toxicity-alert"]', { timeout: 25000 }).as( | ||
"toxicityAlert" | ||
); | ||
cy.get("@toxicityAlert").contains( | ||
'The statement, "I hate bob!," has a toxicity of:' | ||
); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A long timeout is required for this as the detoxify model takes a long time to run on the GitHub action (they have slow computers).
♻️ Removed force:true from click()
cy.get('[data-cy="canvas-mouse-down-container-1"]') | ||
.trigger("mouseover") | ||
.trigger("mousedown", 100, 20) | ||
.trigger("mousedown", 120, 200) | ||
.trigger("mouseup", 120, 200); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This draws a 1, but in the GitHub action it seems to sometimes think it is a 8 leading to the test failing about 1 out of every 4 times. Might make more sense to draw a number that is clearer to the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels kinda weird though because given the exact same mouse down values I feel like the model should give me the exact same result, but I guess that is not the case. AI stuff I guess 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the 1 to a 3, hopefully this is more consistent
Edit:
Seems to be consistent, haven't had any github actions fail
I don't really know if there is a way to alleviate this, but in some of the tests I click the submit button. This makes locally running the tests a second time fail. The task understandably isn't expected to work after it has been submitted. One workaround is to just comment out the line where you click the submit button in the test. This would allow you to run the test on the task many times without it failing because of submission. |
Another possible approach is to launch multiple tasks, and have a "submit" be the last step of a test, then other tests can use a different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall these tests make sense to me!
It may not even be related to what I said to be honest, not exactly sure Here's a video of the problem (occurs at the end). Screen.Recording.2022-07-15.at.1.35.41.PM.movI think if you submit the task enough times it will fail or something like that. |
Ah okay so the backend server ends up turning off as it thinks it's "finished" collecting tasks. I don't think we have a great way of dealing with this outside of just launching a "reasonable" amount to cover all of the testing, and then note that too many submissions will eventually lead to tests failing once the server shuts down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation looks good to me, excited to have these tests in!
Summary
Adds Cypress tests for the remote_procedure/template, remote_procedure/mnist, and remote_procedure/toxicity_detection examples.
Three GitHub action jobs are added. One for each of the remote_procedure examples. They just install the necessary dependencies and run the cypress tests.
Video of Tests
Toxicity Detection
toxicity-detection-tests.mov
Template
template-tests.mov
Mnist(Updated with drawing capability)
mnist-new-test.mov