Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental monitor mode to BasicCrawler #2692

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

ImBIOS
Copy link

@ImBIOS ImBIOS commented Oct 2, 2024

Fixes #2680

Add a new Monitor class to track and display time estimation and concurrency status in the CLI output at regular intervals.

  • Monitor Class:
    • Add Monitor class in packages/core/src/monitor.ts.
    • Include logic to write into the output and gather and calculate the monitor data.
  • BasicCrawler Integration:
    • Import Monitor class in packages/basic-crawler/src/internals/basic-crawler.ts.
    • Initialize and start the Monitor class in the run function.
    • Ensure monitor output and log output are written on separate lines.
    • Add monitor option to BasicCrawlerOptions interface.

Fixes apify#2680

Add a new Monitor class to track and display time estimation and concurrency status in the CLI output at regular intervals.

* **Monitor Class**:
  - Add `Monitor` class in `packages/core/src/monitor.ts`.
  - Include logic to write into the output and gather and calculate the monitor data.
* **BasicCrawler Integration**:
  - Import `Monitor` class in `packages/basic-crawler/src/internals/basic-crawler.ts`.
  - Initialize and start the `Monitor` class in the `run` function.
  - Ensure monitor output and `log` output are written on separate lines.
  - Add `monitor` option to `BasicCrawlerOptions` interface.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/apify/crawlee/issues/2680?shareId=XXXX-XXXX-XXXX-XXXX).
@fnesveda fnesveda added the t-tooling Issues with this label are in the ownership of the tooling team. label Oct 2, 2024
Copy link
Member

@B4nan B4nan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR!

few nits to begin with, i will try it out later today

packages/basic-crawler/src/internals/basic-crawler.ts Outdated Show resolved Hide resolved
Comment on lines 373 to 375
* If you encounter issues due to this change, please:
* - report it to us: https:/apify/crawlee
* - set `requestLocking` to `false` in the `experiments` option of the crawler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this part, its wrong. also this should not be included in experiments

@@ -904,11 +912,15 @@ export class BasicCrawler<Context extends CrawlingContext = BasicCrawlingContext
this.events.on(EventType.MIGRATING, boundPauseOnMigration);
this.events.on(EventType.ABORTING, boundPauseOnMigration);

const monitor = this.experiments.monitor ? new Monitor(this.stats, this.log) : null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep this conditional, but not inside experiments, we use those for unstable features, I don't forsee any actual problems with it here.

@ImBIOS
Copy link
Author

ImBIOS commented Oct 2, 2024

@B4nan If you trying to run this code, You should expect something to go wrong because I'm not done yet (that's why I made this PR a draft), I should make the other log above this monitor like in puppeteer-cluster, so it's readable and tidy. I'm not implementing it yet. I'm not even running this code yet, I'm just writing it with imagination in GitHub (while testing their Beta Copilot Workspace to help me understand crawlee's codebase faster).

Not even writing a test yet, should I write a unit or e2e test for it? which one or both?

@B4nan
Copy link
Member

B4nan commented Oct 2, 2024

Yes, we need some tests, both works fine, but I am not sure now how to implement checks for the E2E tests really.

I'm not even running this code yet

That's a bit scary, opening a PR with code you haven't even tried...

You should expect something to go wrong because I'm not done yet (that's why I made this PR a draft)

We won't be merging this before it's ready, the fact that this is in progress PR is not relevant.

@ImBIOS
Copy link
Author

ImBIOS commented Oct 2, 2024

That's a bit scary, opening a PR with code you haven't even tried...

Plz don't worry, look at this:

image

What can go wrong?

@ImBIOS ImBIOS marked this pull request as ready for review October 2, 2024 14:51
@B4nan
Copy link
Member

B4nan commented Oct 4, 2024

you will need to add some tests. maybe the e2e test could do the job here, we at least need to know it wont break anything, we can check how it works manually, but its important to verify no runtime failures can happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Monitor mode
3 participants