sudachipy is able to work as python module. #2

Kensuke-Mitsuzawa · 2018-05-06T07:45:35Z

Current sudachipy is not able to work as python module because of following issues.

it fails to import the package due to relative importing statements
setup.py is not sufficient. It misses some module directories.

And it's dull work to put system dictionary manually. It's possible to make that in automation with makefile.

By the way, it seems that sudachipy does not support mode "C". I guess this mode is welcomed by developers. I hope it will come some day :)

…sing module directory in setup.py

…en package is installed with setup.py

sorami · 2018-05-08T12:02:40Z

Thank you for your comments and code!

Using SudachiPy as a Python module

It still does not work with the $pip install -e . installation step? I confirm that with that step it works fine (my colleagues and me).

Yes, $pip install -e . is not the final form of usage, and we aim to make develop it to a stable version then we will register it to PyPI and you can install like any other public Python libraries, $pip install sudachipy (Still under development ...).

Downloading and locating dictionary file

Yes, I totally agree with you that it's a dull work to put the system dictionary manually.

Your Makefile method works, but what we were planning is do this step from the code itself;
Similar NLTK (e.g., import nltk; nltk.download()) or spaCy (e.g., $python -m spacy download en). That's our goal, you currently need to do that dull work because we haven't implemented that part ... Sorry about that.

C mode splitting

I suspect that this is not about the code but the dictionary.

So we have core and full dictionaries, and different versions as we update the vocabs. The problem is that sometimes you cannot replicate the example in documents;

Say, with this setup,

import json

from sudachipy import tokenizer
from sudachipy import dictionary
from sudachipy import config

with open(config.SETTINGFILE, "r", encoding="utf-8") as f:
    settings = json.load(f)
tokenizer_obj = dictionary.Dictionary(settings).create()

With the current system_full.dic,

mode = tokenizer.Tokenizer.SplitMode.C
[m.surface() for m in tokenizer_obj.tokenize(mode, "医薬品安全管理責任者")]
# => ['医薬品安全管理責任者']

But with current system_core.dic, the result is the following as the vocab is not in the dictionary.

mode = tokenizer.Tokenizer.SplitMode.C
[m.surface() for m in tokenizer_obj.tokenize(mode, "医薬品安全管理責任者")]
# => ['医薬品', '安全', '管理', '責任者']

We hear the same issue from various people (e.g., Clarify the definition of core and non_core lexicon · Issue #34 · WorksApplications/Sudachi); we are sorry for the confusion, and we will tidy up the documents so it correpsonds to the real situation.

Misc

Yes, I think we need to add package_data part in setup.py as you suggested.

Adding an example.py (or write in README.md) would be a nice one.

So I would like to close this PR, but we are very thankful for raising these issues in public, and we are more than welcome to get questions, or finer coarse PRs.

sorami · 2018-05-08T12:16:30Z

With your comments and code, I have updated the readme;

Add example usage as a package · WorksApplications/SudachiPy@4153a52
Add a future plan explanation about resource management · WorksApplications/SudachiPy@56b05d2

Feature/require sudachidict core

Kensuke-Mitsuzawa added 8 commits May 6, 2018 15:25

replaced relative import statement into absolutive import & added mis…

eea240b

…sing module directory in setup.py

fixed broken line in dictionary.py

f8264d5

Added resource files as part of package file

2d123d6

fixed a issue of importing error because of missing directory path wh…

563bd8a

…en package is installed with setup.py

added missing modules under plugin directory

0325890

added makefile to deploy system with the dictionary

afca84a

updated README.md

29ce10f

added example script file

14f3842

sorami self-requested a review May 8, 2018 12:00

sorami closed this May 8, 2018

izziiyt pushed a commit that referenced this pull request Jul 7, 2019

Merge pull request #2 from megagonlabs/feature/require_sudachidict_core

defdac8

Feature/require sudachidict core

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sudachipy is able to work as python module. #2

sudachipy is able to work as python module. #2

Kensuke-Mitsuzawa commented May 6, 2018

sorami commented May 8, 2018 •

edited

Loading

sorami commented May 8, 2018

sudachipy is able to work as python module. #2

sudachipy is able to work as python module. #2

Conversation

Kensuke-Mitsuzawa commented May 6, 2018

sorami commented May 8, 2018 • edited Loading

Using SudachiPy as a Python module

Downloading and locating dictionary file

C mode splitting

Misc

sorami commented May 8, 2018

sorami commented May 8, 2018 •

edited

Loading