Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logDirectory option to copy logs to WACZ #119

Merged
merged 7 commits into from
Aug 28, 2024

Conversation

tw4l
Copy link
Collaborator

@tw4l tw4l commented Aug 13, 2024

Fixes #118

Thanks for taking a look, this is one of the last things needed for Browsertrix Crawler!

Copy link
Collaborator

@matteocargnelutti matteocargnelutti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tw4l This looks great as usual 🤝 !

One question and one request:

  • Is there a common pattern for these log files (extension, content, etc ...)? I would be interested in making that feature less permissive, so that we don't have a "add whatever to WACZ" flag 😅 .
  • Would you mind adding docs for this new option to types.js so the IDE / JSDoc knows about this new option?

Thank you!

@tw4l
Copy link
Collaborator Author

tw4l commented Aug 15, 2024

* Is there a common pattern for these log files (extension, content, etc ...)? I would be interested in making that feature less permissive, so that we don't have a "add whatever to WACZ" flag 😅 .

The log files from Browsertrix Crawler are JSON-L files with .log as the extension - happy to add extra validation there. Perhaps we can limit to log or txt extensions for now? I think if there was an issue with the correctness of the log files (not that I've seen that happen, but you never know) it'd still be better to include a file with a formatting issue rather than no logs.

* Would you mind adding docs for this new option to `types.js` so the IDE / JSDoc knows about this new option?

Happy to!

@matteocargnelutti
Copy link
Collaborator

@tw4l

The log files from Browsertrix Crawler are JSON-L files with .log as the extension - happy to add extra validation there. Perhaps we can limit to log or txt extensions for now? I think if there was an issue with the correctness of the log files (not that I've seen that happen, but you never know) it'd still be better to include a file with a formatting issue rather than no logs.

That makes sense. Let's start by only accepting .log and .txt files. Thank you :)

index.js Outdated
* Path to directory of log files to copy into WACZ.
* @type {?string}
*/
logDirectory = null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tw4l Nitpick: Let's use logDir for consistency with pagesDir and the like :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I kept the option for class initialization as logDirectory and CLI flag as log-directory for compatibility with py-wacz but internally it now uses this.logDir

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to change the former though, I don't have a strong opinion, as long as the CLI flag is the same I think we're good :)

@tw4l
Copy link
Collaborator Author

tw4l commented Aug 16, 2024

Rebased and updated, thanks!

This is a breaking change in the case of `pagesDir`
@matteocargnelutti
Copy link
Collaborator

matteocargnelutti commented Aug 19, 2024

Hey there @tw4l! This is great, thanks!
I pushed a commit to suggest an updated naming convention for all options that expect / accept a path to dir - what do you think?

Cheers,

@tw4l
Copy link
Collaborator Author

tw4l commented Aug 26, 2024

Hey there @tw4l! This is great, thanks! I pushed a commit to suggest an updated naming convention for all options that expect / accept a path to dir - what do you think?

Hi @matteocargnelutti , makes sense to me! The consistency is nice :) Thanks for this!

@matteocargnelutti matteocargnelutti merged commit e7db309 into main Aug 28, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to copy log files from specified directory into WACZ
2 participants