Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move texar to extra require #720

Merged
merged 102 commits into from
Apr 12, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
102 commits
Select commit Hold shift + click to select a range
8a1bdfe
raise import error for texar-pytorch
hepengfe Mar 31, 2022
5319775
raise import error for texar-pytorch
hepengfe Mar 31, 2022
8d06d73
remove texar-pytorch requirement
hepengfe Mar 31, 2022
0c3edda
Merge branch 'master' into move_texar_to_extra_require
hepengfe Mar 31, 2022
6db52e1
Reverse test file changes
hepengfe Mar 31, 2022
5de9914
fix HParams importing
hepengfe Mar 31, 2022
f976dcb
Merge branch 'move_texar_to_extra_require' of https:/feip…
hepengfe Mar 31, 2022
757ba7d
change extra_require as suggested
hepengfe Mar 31, 2022
fb6dca1
rebase from master branch
hepengfe Mar 31, 2022
19e8ed2
update pre-commit black version
hepengfe Mar 31, 2022
fea18a0
add description about updating pre-commit configuration
hepengfe Mar 31, 2022
7e582d9
black
hepengfe Mar 31, 2022
aecc633
rm unrelevant files
hepengfe Apr 1, 2022
63ac0d1
rm unrelevant files
hepengfe Apr 1, 2022
aa3a83a
Merge branch 'master' into move_texar_to_extra_require
hepengfe Apr 1, 2022
374183b
edit based on code review
hepengfe Apr 1, 2022
a2282f9
add requirement of transformers for audio test
hepengfe Apr 2, 2022
865fc6d
add pseudo hyperlink to point to the installation command
hepengfe Apr 2, 2022
82d6722
pylint
hepengfe Apr 4, 2022
24003f5
add extra requisite to main.yml
hepengfe Apr 4, 2022
8c3daa6
spacing
hepengfe Apr 4, 2022
46430dd
spacing
hepengfe Apr 4, 2022
22880eb
try transformers 4.15.0
hepengfe Apr 4, 2022
adf3ff5
loosen requirement
hepengfe Apr 4, 2022
66c0504
remove past requirement
hepengfe Apr 4, 2022
6f1228f
Merge branch 'master' into move_texar_to_extra_require
hunterhector Apr 5, 2022
893e84e
merge extra requirements
hepengfe Apr 5, 2022
147dbf4
fixed hyperlink in README.md
hepengfe Apr 5, 2022
0c3f509
fixed hyperlink title
hepengfe Apr 5, 2022
b748ed7
imports in the local scope and re-raise import error with error messages
hepengfe Apr 5, 2022
9c24693
remove modules from __init__
hepengfe Apr 5, 2022
b5d44a0
Merge branch 'master' into move_texar_to_extra_require
hepengfe Apr 5, 2022
e863d53
pylint
hepengfe Apr 5, 2022
9cc6d18
pylint
hepengfe Apr 6, 2022
839ea76
Merge branch 'master' into move_texar_to_extra_require
hepengfe Apr 6, 2022
58a4f4c
adjust imports in test files
hepengfe Apr 6, 2022
84aac86
HParams from asyml_utilities
hepengfe Apr 6, 2022
db0b32d
workflow edit
hepengfe Apr 6, 2022
e2bafe0
readthedocs asyml_utilities
hepengfe Apr 6, 2022
414be53
import special tokens from asyml_utilities
hepengfe Apr 6, 2022
9f4b265
move ImportError out of the loop
hepengfe Apr 7, 2022
24df884
fix import orders
hepengfe Apr 7, 2022
925b585
move out many more texar dependency
hepengfe Apr 7, 2022
5cb4eb1
add train preprocessor
hepengfe Apr 7, 2022
612b647
add train preprocessor
hepengfe Apr 7, 2022
ffec8dc
update main.yml
hepengfe Apr 7, 2022
4a46987
fix rst files
hepengfe Apr 7, 2022
0db66b3
pylint
hepengfe Apr 7, 2022
7392216
git add tests
hepengfe Apr 7, 2022
1498d64
special tokens
hepengfe Apr 7, 2022
5c98ff0
Fix test importing
hepengfe Apr 7, 2022
1cc9c3b
special tokens
hepengfe Apr 8, 2022
fa8b468
pylint
hepengfe Apr 8, 2022
203825b
Update data_pack_dataset.py
hunterhector Apr 8, 2022
1821670
Update README.md
hunterhector Apr 8, 2022
33da833
Update setup.py
hunterhector Apr 8, 2022
befffb3
Update main.yml
hunterhector Apr 8, 2022
8ae4cea
Update tagging_trainer.py
hunterhector Apr 8, 2022
7f464a2
Update __init__.py
hunterhector Apr 8, 2022
5aa2ce1
Update bert_based_query_creator.py
hunterhector Apr 8, 2022
dc30347
Update bert_ranker.py
hunterhector Apr 8, 2022
73ae361
Update bert_reranking_processor.py
hunterhector Apr 8, 2022
baed126
Update srl_predictor.py
hunterhector Apr 8, 2022
66598b1
Update data.py
hunterhector Apr 8, 2022
2d8bdf3
Update model.py
hunterhector Apr 8, 2022
e296cea
Update model_utils.py
hunterhector Apr 8, 2022
a9a3422
Update data.py
hunterhector Apr 8, 2022
166f473
Update model.py
hunterhector Apr 8, 2022
85ec72e
Update model_utils.py
hunterhector Apr 8, 2022
e2ba106
Update model_factory.py
hunterhector Apr 8, 2022
c7d2b4d
Update __init__.py
hunterhector Apr 8, 2022
eeb96cf
Update train_preprocessor.py
hunterhector Apr 8, 2022
1cefd81
Update main.yml
hunterhector Apr 8, 2022
ac074de
Update setup.py
hunterhector Apr 8, 2022
06df5e2
Update subword_tokenizer.py
hunterhector Apr 8, 2022
df6e745
Update README.md
hunterhector Apr 8, 2022
c7c5473
Update texar_nondependency_test.py
hunterhector Apr 8, 2022
b974eeb
Update main.yml
hunterhector Apr 8, 2022
7f7b37f
Update bert_ranker.py
hunterhector Apr 8, 2022
621046f
Update bert_ranker.py
hunterhector Apr 8, 2022
9b52a08
move texar dependent SRLSpanData out
hepengfe Apr 8, 2022
eee4c9c
merge requirement of texar encoder into nlp
hepengfe Apr 8, 2022
d2312c4
srl span data
hepengfe Apr 8, 2022
f3ddb81
udpate readme
hepengfe Apr 8, 2022
9d96d69
add temporary torch installation
hepengfe Apr 8, 2022
87619a6
uninstall texar pytorch after installing forte
hepengfe Apr 8, 2022
cfb5856
uninstall without prompt
hepengfe Apr 8, 2022
b8a0ae1
pylint
hepengfe Apr 8, 2022
10e3008
pylint
hepengfe Apr 8, 2022
08692c9
update texar pytorch version requirement
hepengfe Apr 8, 2022
99e7592
ignore texar nondependency test in coverage run
hepengfe Apr 8, 2022
01753e3
install the lastest texar pytorch
hepengfe Apr 9, 2022
9bc13d2
add texar related test cases
hepengfe Apr 9, 2022
fa03088
Merge branch 'master' into move_texar_to_extra_require
hepengfe Apr 9, 2022
8ae2547
move bert classes depedent on texar inside the folder
hepengfe Apr 9, 2022
f3e5d15
move bert classes depedent on texar inside the folder
hepengfe Apr 9, 2022
7f15ae7
ir bert paths
hepengfe Apr 9, 2022
0e1359e
fixed hyperlinks
hepengfe Apr 11, 2022
0a24224
pylint
hepengfe Apr 11, 2022
baacbe4
remove unrelevant tests
hepengfe Apr 11, 2022
4e4155e
Merge branch 'master' into move_texar_to_extra_require
hepengfe Apr 11, 2022
1c35c60
Merge branch 'master' into move_texar_to_extra_require
hepengfe Apr 12, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 16 additions & 4 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,7 @@ jobs:
pip install --progress-bar off coverage codecov
python -m pip install ipykernel
python -m ipykernel install --user
pip install testbook
pip install termcolor
pip install --progress-bar off asyml-utilities
- name: Format check with Black
run: |
black --line-length 80 --check forte/
Expand Down Expand Up @@ -84,7 +83,20 @@ jobs:
rm -rf texar-pytorch
- name: Install Forte
run: |
pip install --use-feature=in-tree-build --progress-bar off .[ner,test,example,wikipedia,augment,stave,audio_ext,remote]
pip install --use-feature=in-tree-build --progress-bar off .[models,test,wikipedia,data_aug,nlp,ir,texar-encoder,stave,audio_ext,remote,extractor]
- name: Test backbone Forte import test
run: |
# Try to install Forte backbone only and test basic imports.
pip install --use-feature=in-tree-build --progress-bar off .
# needs to remove it in after torch dependency is removed
pip uninstall -y texar-pytorch
pytest tests/forte/texar_nondependency_test.py
# install lastest texar pytorch
git clone https:/asyml/texar-pytorch.git
cd texar-pytorch
pip install --progress-bar off .
cd ..
rm -rf texar-pytorch
Copy link
Member

@hunterhector hunterhector Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we run these steps before the Install Texar or even the Install deep learning frameworks step?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we still have dependencies on other libraries in the dependency matrix. In this PR, we only need to remove texar. In the later PR that we set up a dependency env for different modules, these lines of code will be removed.

- name: Build ontology
run: |
./scripts/build_ontology_specs.sh
Expand All @@ -102,7 +114,7 @@ jobs:
if [[ ${{ matrix.torch-version }} != "1.5.0" && ${{ matrix.python-version }} == "3.9" ]]; then mypy forte; fi
- name: Test with pytest and run coverage
run: |
coverage run -m pytest tests --ignore=tests/forte/notebooks
coverage run -m pytest tests --ignore=tests/forte/notebooks --ignore=tests/forte/texar_nondependency_test.py
coverage run --append -m pytest --doctest-modules forte
- name: Upload coverage
run: |
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ the [Google Python Style guide](http://google.github.io/styleguide/pyguide.html)
project code is examined using `pylint`, `flake8`, `mypy`, `black` and `sphinx-build` which will be run
automatically in CI. It's recommended that you should run these tests locally before submitting your pull request to save time. Refer to the github workflow [here](https:/asyml/forte/blob/master/.github/workflows/main.yml) for detailed steps to carry out the tests. Basically what you need to do is to install the requirements (check out the `Install dependencies` sections) and run the commands (refer to the steps in `Format check with Black`, `Lint with flake8`, `Lint with pylint`, `Lint main code with mypy when torch version is not 1.5.0`, `Build Docs`, etc.).

We also recommend using tools `pre-commit` that automates the checking process before each commit since checking format is a repetitive process. We have the configuration file `.pre-commit-config.yaml` that lists several plugins including `black` to check format in the project root folder. Developers only need to install the package by `pip install pre-commit`.
We also recommend using tools `pre-commit` that automates the checking process before each commit since checking format is a repetitive process. We have the configuration file `.pre-commit-config.yaml` that lists several plugins including `black` to check format in the project root folder. Developers only need to install the package by `pip install pre-commit`. All the package versions in the `.pre-commit-config.yaml` must be consistent with package versions in [workflow configuration](https:/asyml/forte/blob/master/.github/workflows/main.yml). For example, `black` package version should be set to the same.
hunterhector marked this conversation as resolved.
Show resolved Hide resolved

### Docstring

Expand Down
31 changes: 17 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@
[![Chat](http://img.shields.io/badge/gitter.im-asyml/forte-blue.svg)](https://gitter.im/asyml/community)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https:/psf/black)

**Forte** is a toolkit for building Natural Language Processing pipelines, featuring
composable components, convenient data interfaces, and cross-task interaction. Forte designs
a universal data representation format for text, making it a
one-stop platform to assemble state-of-the-art NLP/ML technologies, ranging
from Information Retrieval, Natural Language Understanding to Natural Language Generation.
**Forte** is a toolkit for building Natural Language Processing pipelines, featuring
composable components, convenient data interfaces, and cross-task interaction. Forte designs
a universal data representation format for text, making it a
one-stop platform to assemble state-of-the-art NLP/ML technologies, ranging
from Information Retrieval, Natural Language Understanding to Natural Language Generation.

Forte was originally developed in CMU and is actively contributed
by [Petuum](https://petuum.com/)
Expand Down Expand Up @@ -50,13 +50,18 @@ pip install src/spacy

Some components or modules in forte may require some [extra requirements](https:/asyml/forte/blob/master/setup.py#L45):

* `pip install forte[ner]`: Install packages required for [ner_trainer](https:/asyml/forte/blob/master/forte/trainer/ner_trainer.py)
* `pip install forte[data_aug]`: Install packages required for [data augmentation modules](https:/asyml/forte/tree/master/forte/processors/data_augment).
* `pip install forte[ir]`: Install packages required for [Information Retrieval Supports](https:/asyml/forte/tree/master/forte/processors/ir/)
* `pip install forte[remote]`: Install packages required for pipeline serving functionalities, such as [Remote Processor](https:/asyml/forte/processors/misc/remote_processor.py).
* `pip install forte[audio_ext]`: Install packages required for Forte Audio support, such as [Audio Reader](https:/asyml/forte/blob/master/forte/data/readers/audio_reader.py).
* `pip install forte[stave]`: Install packages required for [Stave](https:/asyml/forte/blob/master/forte/processors/stave/stave_processor.py) integration.
* `pip install forte[models]`: Install packages required for [ner training](https:/asyml/forte/blob/master/forte/trainer/ner_trainer.py), [srl](https:/asyml/forte/tree/master/forte/models/srl), [srl with new training system](https:/asyml/forte/tree/master/forte/models/srl_new), and [srl_predictor](https:/asyml/forte/tree/master/forte/processors/nlp/srl_predictor.py)
* `pip install forte[test]`: Install packages required for running [unit tests](https:/asyml/forte/tree/master/tests).
* `pip install forte[example]`: Install packages required for running [forte examples](https:/asyml/forte/tree/master/examples).
* `pip install forte[wikipedia]`: Install packages required for reading [wikipedia datasets](https:/asyml/forte/tree/master/forte/datasets/wikipedia).
* `pip install forte[augment]`: Install packages required for [data augmentation module](https:/asyml/forte/tree/master/forte/processors/data_augment).
* `pip install forte[stave]`: Install packages required for [StaveProcessor](https:/asyml/forte/blob/master/forte/processors/stave/stave_processor.py).
* `pip install forte[audio_ext]`: Install packages required for [AudioReader](https:/asyml/forte/blob/master/forte/data/readers/audio_reader.py).
* `pip install forte[nlp]`: Install packages required for additional NLP supports, such as [subword_tokenizer](https:/asyml/forte/tree/master/forte/processors/nlp/subword_tokenizer.py) and [texar encoder](https:/asyml/forte/tree/master/forte/processors/third_party/pretrained_encoder_processors.py)
* `pip install forte[extractor]`: Install packages required for extrator-based training system, [extractor](https:/asyml/forte/blob/master/forte/data/extractors), [train_preprocessor](https:/asyml/forte/tree/master/forte/train_preprocessor.py) and [tagging trainer](https:/asyml/forte/tree/master/examples/tagging/tagging_trainer.py)



## Getting Started

Expand Down Expand Up @@ -135,11 +140,11 @@ principle, we make Forte:

-----------------
| ![forte_arch.jpg](https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_arch.png) |
|:--:|
|:--:|
| *A high level Architecture of Forte showing how ontology and entries work with the pipeline.* |
-----------------
| ![forte_results.jpg](https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_results.png) |
|:--:|
|:--:|
|*Forte stores results in data packs and use the ontology to represent task logic.* |
-----------------

Expand All @@ -162,5 +167,3 @@ and [Contribution Guideline](https:/asyml/forte/blob/master/CONTRIBU
<img src="https://asyml.io/assets/institutions/cmu.png", width="200" align="top">
<img src="https://www.ucsd.edu/_resources/img/logo_UCSD.png" width="200" align="top">
</p>


6 changes: 3 additions & 3 deletions docs/code/data_aug.rst
Original file line number Diff line number Diff line change
Expand Up @@ -164,11 +164,11 @@ Data Augmentation Models

:hidden:`Reinforcement Learning`
----------------------------------
.. autoclass:: forte.models.da_rl.MetaAugmentationWrapper
.. autoclass:: forte.models.da_rl.aug_wrapper.MetaAugmentationWrapper
:members:

.. autoclass:: forte.models.da_rl.MetaModule
.. autoclass:: forte.models.da_rl.magic_model.MetaModule
:members:

.. autoclass:: forte.models.da_rl.TexarBertMetaModule
.. autoclass:: forte.models.da_rl.magic_model.TexarBertMetaModule
:members:
2 changes: 1 addition & 1 deletion docs/notebook_tutorial/pipeline.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.11"
"version": "3.10.0"
},
"orig_nbformat": 4
},
Expand Down
5 changes: 3 additions & 2 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ testbook
pyyaml>=5.4
jsonpickle>=1.4
sortedcontainers>=2.1.0
texar-pytorch>=0.1.1
texar-pytorch>=0.1.2
typing>=3.7.4; python_version < '3.5'
typing-inspect>=0.6.0

Expand All @@ -44,5 +44,6 @@ nltk==3.6.6

nbsphinx==0.8.8
jinja2<=3.0.3

asyml_utilities
sphinx_autodoc_typehints

10 changes: 5 additions & 5 deletions examples/content_rewriter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ will rewrite the sentence based on the table.
The code has been tested on:
- Python 3.6.0 and Python 3.7.6
- tensorflow-gpu==1.14.0
- texar-pytorch==0.1.1
- texar-pytorch==0.1.2
- texar==0.2.1
- cuda 10.0

** NOTE **:
** NOTE **:
Due to some historical texar compatibility issue, the model is only compatible
by installing texar 0.2.1 from source, which can be installed via the following
command.
Expand All @@ -27,12 +27,12 @@ Run the following commands:
```bash
cd model
pip install -r requirements.txt
```
```

### Downloading the models and data

Before we run the rewriting demo, we need to download models and data from the
[link](https://drive.google.com/drive/folders/1jNaJ_R_f89G8xbAC8iwe49Yx_Z-LXr0i?usp=sharing)
Before we run the rewriting demo, we need to download models and data from the
[link](https://drive.google.com/drive/folders/1jNaJ_R_f89G8xbAC8iwe49Yx_Z-LXr0i?usp=sharing)
and put the two directories(e2e_data, e2e_model) under the same directory [model_dir]

### Running the example
Expand Down
3 changes: 2 additions & 1 deletion examples/data_augmentation/reinforcement/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@

from config import config_data, config_classifier
from utils import model_utils
from forte.models.da_rl import MetaAugmentationWrapper, TexarBertMetaModule
from forte.models.da_rl.aug_wrapper import MetaAugmentationWrapper
from forte.models.da_rl import TexarBertMetaModule

parser = argparse.ArgumentParser()
parser.add_argument(
Expand Down
2 changes: 1 addition & 1 deletion examples/ner/main_predict.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -168,4 +168,4 @@
},
"nbformat": 4,
"nbformat_minor": 1
}
}
2 changes: 1 addition & 1 deletion examples/passage_ranker/indexer_reranker_eval_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
from forte.common.configuration import Config
from forte.data.multi_pack import MultiPack
from forte.pipeline import Pipeline
from forte.processors.ir import BertRerankingProcessor
from forte.processors.ir.bert import BertRerankingProcessor


if __name__ == "__main__":
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
from forte.data.multi_pack import MultiPack
from forte.data.readers import MultiPackTerminalReader
from forte.pipeline import Pipeline
from forte.processors.ir import BertRerankingProcessor
from forte.processors.ir.bert import BertRerankingProcessor
from ft.onto.base_ontology import Sentence


Expand Down
20 changes: 16 additions & 4 deletions examples/tagging/tagging_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from typing import Iterator, Dict

import torch
from texar.torch.data import Batch
hunterhector marked this conversation as resolved.
Show resolved Hide resolved

from torch.optim import SGD
from torch.optim.optimizer import Optimizer
from tqdm import tqdm
Expand Down Expand Up @@ -112,11 +112,12 @@ def train(self):
val_pl: Pipeline = Pipeline()
val_pl.set_reader(val_reader)
val_pl.add(
predictor, config={
predictor,
config={
"batcher": {
"batch_size": 10,
}
}
},
)
val_pl.add(evaluator, config=evaluator_config)
val_pl.initialize()
Expand All @@ -127,9 +128,20 @@ def train(self):
train_sentence_len_sum: float = 0.0

logger.info("Start training.")

try:
from texar.torch.data import (
Batch,
) # pylint: disable=import-outside-toplevel
except ImportError as e:
raise ImportError(
" `texar-pytorch` is not installed correctly."
" Consider install texar via `pip install texar-pytorch`."
" Or refer to [extra requirement for extractor system](pip install forte[extractor])"
" for more information. "
) from e
while epoch < self.config_data.num_epochs:
epoch += 1

# Get iterator of preprocessed batch of train data
batch_iter: Iterator[Batch] = tp.get_train_batch_iterator()

Expand Down
1 change: 0 additions & 1 deletion forte/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,3 @@

from forte.version import VERSION as __version__
from forte.pipeline import *
from forte.train_preprocessor import *
3 changes: 1 addition & 2 deletions forte/common/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,7 @@
Config here.
"""
from typing import Dict

from texar.torch import HParams
from asyml_utilities.hyperparams import HParams

__all__ = ["Config"]

Expand Down
3 changes: 3 additions & 0 deletions forte/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,6 @@
from forte.data.multi_pack import *
from forte.data.span import *
from forte.data.base_extractor import *
from forte.data.data_store import *
from forte.data.selector import *
from forte.data.index import *
14 changes: 12 additions & 2 deletions forte/data/data_pack_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,18 @@
from typing import Dict, Iterator, Type, Optional, List, Tuple, Union, Any

import torch
from texar.torch import HParams
from texar.torch.data import IterDataSource, DatasetBase, Batch

try:
from texar.torch.data import IterDataSource, DatasetBase, Batch
except ImportError as e:
raise ImportError(
" `texar-pytorch` is not installed correctly."
" Consider install texar via `pip install texar-pytorch`"
" Or refer to [extra requirement for extrator](pip install forte[extractor])"
" for more information. "
) from e
from asyml_utilities.hyperparams import HParams


from forte.data.converter import Converter
from forte.data.converter import Feature
Expand Down
1 change: 1 addition & 0 deletions forte/data/extractors/seqtagging_extractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@

from torch import Tensor


from forte.common.configuration import Config
from forte.data.base_extractor import BaseExtractor
from forte.data.converter.feature import Feature
Expand Down
13 changes: 12 additions & 1 deletion forte/data/extractors/subword_extractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
import logging
from typing import Union, Dict, Optional

from texar.torch.data.tokenizers.bert_tokenizer import BERTTokenizer

from forte.common.configuration import Config
from forte.data.data_pack import DataPack
from forte.data.converter.feature import Feature
Expand All @@ -42,6 +42,17 @@ class SubwordExtractor(BaseExtractor):
def initialize(self, config: Union[Dict, Config]):
# pylint: disable=attribute-defined-outside-init
super().initialize(config=config)

try:
from texar.torch.data.tokenizers.bert_tokenizer import ( # pylint:disable=import-outside-toplevel
BERTTokenizer,
)
except ImportError as e:
raise ImportError(
" `texar-pytorch` is not installed correctly."
" Please refer to [extra requirement for aug wrapper](pip install forte[extractor])"
" for more information. "
) from e
self.tokenizer = BERTTokenizer(
pretrained_model_name=self.config.pretrained_model_name,
cache_dir=None,
Expand Down
5 changes: 3 additions & 2 deletions forte/data/readers/audio_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,9 @@ def __init__(self):
except ModuleNotFoundError as e:
raise ModuleNotFoundError(
"AudioReader requires 'soundfile' package to be installed."
" You can run 'pip install soundfile' or 'pip install forte"
"[audio_ext]'. Note that additional steps might apply to Linux"
" You can refer to [extra modules to install]('pip install"
" forte['audio_ext']) or 'pip install forte"
". Note that additional steps might apply to Linux"
" users (refer to "
"https://pysoundfile.readthedocs.io/en/latest/#installation)."
) from e
Expand Down
7 changes: 3 additions & 4 deletions forte/data/vocabulary.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,7 @@
from collections import Counter
from typing import List, Tuple, Dict, Union, Hashable, Iterable, Optional
from typing import TypeVar, Generic, Any, Set

import texar.torch as tx
from asyml_utilities.special_tokens import SpecialTokens

from forte.common import InvalidOperationException

Expand Down Expand Up @@ -174,15 +173,15 @@ def __init__(
# a vector of zeros.
pad_id = -1 if method == "one-hot" else None
self.add_special_element(
tx.data.SpecialTokens.PAD,
SpecialTokens.PAD,
element_id=pad_id,
special_token_name="PAD",
representation=pad_value,
)

if use_unk:
self.add_special_element(
tx.data.SpecialTokens.UNK,
SpecialTokens.UNK,
special_token_name="UNK",
representation=unk_value,
)
Expand Down
Loading