Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignoring paths is not supported in source_files #213

Closed
carusooo opened this issue Sep 20, 2024 · 17 comments · Fixed by #219
Closed

Ignoring paths is not supported in source_files #213

carusooo opened this issue Sep 20, 2024 · 17 comments · Fixed by #219
Assignees
Labels
bug Something isn't working

Comments

@carusooo
Copy link

I'm having issues with configuring spellcheck to ignore file paths when passed in via the source_files argument

From the documentation here and in PySpelling, it looks like this should be all that's required.

  sources:
  - '**/*.md*|!docs/api/**'`

This is how the action is invoked

jobs:
  spellcheck:
    name: Spellcheck
    runs-on: ubuntu-latest

    steps:
    - uses: actions/[email protected]
      name: Checkout
    - id: changed_files
      name: Changed Files
      uses: tj-actions/changed-files@v45
      with:
        files: |
           **.mdx
    - name: List all changed files
      env:
        ALL_CHANGED_FILES: ${{ steps.changed_files.outputs.all_changed_files }}
      run: |
        for file in ${ALL_CHANGED_FILES}; do
        echo "$file was changed"
        done
    - uses: rojopolis/[email protected]
      name: Spellcheck
      
      if: ${{ steps.changed_files.outputs.all_changed_files }}
      with:
        config_path: spellcheck.yaml
        source_files: ${{ steps.changed_files.outputs.all_changed_files }}
        task_name: Markdown

I suspect this is working as intended, wanted to confirm. I think I'll need to filter out the changed files in the github action

@jonasbn
Copy link
Collaborator

jonasbn commented Sep 21, 2024 via email

@jonasbn jonasbn self-assigned this Sep 22, 2024
@carusooo
Copy link
Author

It is a public repo, here's the spellcheck.yaml file. I ended up adding the filter to the changed file detection workflow step, since I think this action is basically connecting to the PySpelling CLI and passing in the files there.

Overall I think I need to revise my approach to spellcheck the built HTML files since mdx files aren't well handled by PySpelling.

@jonasbn
Copy link
Collaborator

jonasbn commented Oct 3, 2024

Hi @carusooo

Thanks for sharing I will see what I can learn from it, perhaps something can be used in the action.

@facelessuser
Copy link

facelessuser commented Oct 4, 2024

I'm not sure what is meant by not supporting mdx files. Pyspelling doesn't have a filter that has control to filter directly in Markdown, but has one that converts Markdown to HTML which is easier to selectively filter within.

I'm certain an interested party could possibly use write their own filter using a Markdown parser that allows more control than what we currently use that might allow such control, but I would generally argue HTML is much easier to filter with available libraries. Often I build my documentation from Markdown and then use the HTML filter.

As for file exclusions, PySpelling probably could make this more clear. I think we reference wcmatch (the library that controls this) but expect the user to take a look. But you can exclude directories. It uses the NEGATE flag in the wcmatch library by default that allows you to specify patterns that start with ! to exclude them from search.

So you could do something like what is shown below. You can provide multiple rules that will be evaluated simultaneously if you separate them with |. So you can add an exclusion rule and an inclusion rule (order doesn't matter).

  sources:
  - '!docs/api/**|**/*.mdx'

Assuming this action passes the source normally to Pyspelling, I imagine you could do something similar.

@facelessuser
Copy link

Since Pyspelling allows a list of source patterns, I wonder if I should just update it so all patterns our evaluated simultaneously so that this would work.

sources
- '**/*.mdx'
- '!docs/api/**'

There's no limitation in the underlying library to prevent this, I think we just loop through the sources...just because.

@facelessuser
Copy link

facelessuser commented Oct 4, 2024

I realized looking at the OP again, I did not read things very well as the opening post did specify how exclusions can be used 😅 . I also see that PySpelling documentation does indeed mention how exclusions are done (I didn't realize I had documented that).

Regardless, I'm probably going to allow the following moving forward as I think it will be more intuitive for the user. I'm not exactly sure how this action interfaces with Pyspelling in this regard, but this should work via the --source command line option as well. I also will not invalidate the old approach (separating patterns with |) so those can continue to be used as well.

sources
- '**/*.mdx'
- '!docs/api/**'

@jonasbn
Copy link
Collaborator

jonasbn commented Oct 4, 2024

Hi @facelessuser

I believe the core problem is that when you specify the use of source_files in the workflow specification. It does not apply the patterns from the configuration file.

Meaning that all ignored files get processed even if specified to be ignored by the configuration file.

So there is a conflict between the use from the Action perspective and the configuration file, since the action uses: --source.

From the documentation:

--source SOURCE, -S SOURCE
Specify override file pattern. Only applicable when specifying exactly one --name.

REF: https://facelessuser.github.io/pyspelling/

So I need to find a way to get the pattern from the configuration applied after the list of source_files is accumulated.

@facelessuser
Copy link

So you want to be able to append new sources to sources already in the pyspelling config? Is there a reason this approach is desired? The way PySpelling was designed was just to setup rules and then call them. If you want to override something, then you can from the command line. It's possible I don't understand the use case, but from my perspective, if you were going to constantly append certain files to the rules, my question would be why not permanently include those patterns in the config?

@jonasbn
Copy link
Collaborator

jonasbn commented Oct 4, 2024

@facelessuser I completely understand PySpelling and it's approach, but in the context of the GitHub Action, I am interested in only capturing the changed files, meaning a more dynamic approach. Here --source seemed to be a good approach.

But as pointed out by @carusooo this does not work as expected since the --source overrides the configuration, as you indicate is a more static approach, so the two usage patterns collide.

@jonasbn
Copy link
Collaborator

jonasbn commented Oct 4, 2024

@facelessuser I have a large repository consisting of Markdown files, it takes significant time to process, to only checking the changes files makes sense.

Example output:

Using pyspelling on configuration outlined in >.spellcheck.yaml<
Running task >Markdown<
Checking file >awk/learn_awk.md<
Checking file >node/use_nvm.md<
Checking file >git/use_tig.md<
Checking file >node/use_nodeenv.md<
Checking file >rm/are_you_sure.md<
Checking file >shell/nifty_aliases.md<
----------------------------------------------------------------
Using aspell to spellcheck Markdown
Running Task: Markdown...
Compiling Dictionary...
> Processing: awk/learn_awk.md
> Processing: node/use_nvm.md
> Processing: git/use_tig.md
> Processing: node/use_nodeenv.md
> Processing: rm/are_you_sure.md
> Processing: shell/nifty_aliases.md

Spelling check passed :)

The repository contains 559 Markdown files if I count correctly.

@jonasbn jonasbn pinned this issue Oct 4, 2024
@jonasbn jonasbn added the bug Something isn't working label Oct 4, 2024
@facelessuser
Copy link

facelessuser commented Oct 4, 2024

Okay, in that case, you would need to pass in exclude patterns with !exclude. Currently, that requires you to specify them via --source "!exlude|include" as each --source is evaluated independently. If it makes it easier, the change I mentioned earlier would allow you to pass them in separately as all the patterns would be evaluated simultaneously. So --source include1 --source include2 --source !exclude would work.

Let me know if that helps, if not you can join exclude and include with |.

@jonasbn
Copy link
Collaborator

jonasbn commented Oct 4, 2024

@facelessuser I will see if I can come up with a solution

@facelessuser
Copy link

It's possible I can just add git support for getting a file list between the current head and some branch or commit. I have all the glob and glob matching support already...

@facelessuser
Copy link

I have a proof of concept locally where a new option is added to specify the git merge-base, and when it is defined, only the modified files are checked:

pyspelling git:(feature/git) ✗ python3 -m pyspelling -n python -v -m master
Using aspell to spellcheck python
Running Task: python...
Compiling Dictionary...
Searching: Only checking files that changed in git...
> Processing: pyspelling/__init__.py
> Processing: pyspelling/__main__.py
> Processing: pyspelling/util/git.py

Spelling check passed :)

This could allow an easy way of checking only tracked, modified files in git and get access to inclusion and exclusion handling directly in PySpelling. Of course I need to test this directly in Github to see how well it works and probably should test it with an external PR from another user as well to see how it works.

This action is not obligated to use such functionality and is free to implement its own way as well, but I felt like exploring this as a possibility.

@facelessuser
Copy link

Branch that explores git handling in PySpelling https:/facelessuser/pyspelling/pull/193/files. Things are subject to change and the addition of such functionality is not guaranteed but only being explored at the present.

@jonasbn
Copy link
Collaborator

jonasbn commented Oct 17, 2024

Hi @facelessuser and @carusooo

I have made an attempt at a fix. which is local to the action by applying some file and filter gymnastics in the entrypoint.sh. The fix has been merged and tagged, but I have not made a public release to the Marketplace, because I want to conduct some more tests on some GitHub repositories.

@facelessuser I might end up reevaluating this approach, but I very much think this was an issue for the action and not PySpelling, so addressing it in this context seems the best approach. If you decide on following up on the exploration you made, I might follow that path.

@facelessuser
Copy link

That's fine. If there is no desire for it in PySpelling, I'm fine leaving it as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants