Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bert #105

Merged
merged 2 commits into from
Jan 15, 2021
Merged

Bert #105

merged 2 commits into from
Jan 15, 2021

Conversation

chenkelmann
Copy link
Contributor

Channel for questions

Description

An implementation for Bert pretraining, an example for it, a simple base class for listeners to save boilerplate, a new initializer and learning rate tracker to get the same results as the TF reference implementation. A sketch of a utility class that should fix #84 by executing training in parallel.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage: no
  • Code is well-documented:
    • comments could do with updating, some had to be deleted again due to the neverending checkstyle errors. Time budget for commenting was spent on quieting the stylecheckrs.
  • To the my best knowledge, examples and jupyter notebooks are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Bert Pretraining Blocks
  • Utility class for easier memory management
  • new learning rate tracker
  • new Initializer
  • Base class for training listeners to avoid boilerplate
  • Example for Bert pretraining

Comments

  • The example does not log anything - are the default listeners broken?
  • With this code, the parallel gpu problem and the batch norm performance problems can be further explored.

chenkelmann and others added 2 commits January 14, 2021 13:01
Added all classes necessary for a vanilla bert pretraining.

Added a training listener base class to save on boilerplate when implementing custom listeners.

Added Bert classes & simple example

Removed comments to shut up checkstyle.

Fixed PMD errors.

Fixed more PMD errors.

Made code layout less readable and less pretty with ./gradlew formatJava.
Change-Id: I5e46341fc48853f7ae0dfa4deaa1923fa5bb5c6a
@zachgk zachgk merged commit dac7c07 into deepjavalibrary:master Jan 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple GPUs are used sequentially, not parallel
3 participants