asyml · hepengfe · Apr 12, 2022 · Mar 31, 2022 · Mar 31, 2022 · Mar 31, 2022
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -55,8 +55,7 @@ jobs:
  pip install --progress-bar off coverage codecov
  python -m pip install ipykernel
  python -m ipykernel install --user
- pip install testbook
- pip install termcolor
+ pip install --progress-bar off asyml-utilities
  - name: Format check with Black
  run: |
  black --line-length 80 --check forte/
@@ -84,7 +83,20 @@ jobs:
  rm -rf texar-pytorch
  - name: Install Forte
  run: |
- pip install --use-feature=in-tree-build --progress-bar off .[ner,test,example,wikipedia,augment,stave,audio_ext,remote]
+ pip install --use-feature=in-tree-build --progress-bar off .[models,test,wikipedia,data_aug,nlp,ir,texar-encoder,stave,audio_ext,remote,extractor]
+ - name: Test backbone Forte import test
+ run: |
+ # Try to install Forte backbone only and test basic imports.
+ pip install --use-feature=in-tree-build --progress-bar off .
+ # needs to remove it in after torch dependency is removed
+ pip uninstall -y texar-pytorch
+ pytest tests/forte/texar_nondependency_test.py
+ # install lastest texar pytorch
+ git clone https:/asyml/texar-pytorch.git
+ cd texar-pytorch
+ pip install --progress-bar off .
+ cd ..
+ rm -rf texar-pytorch
  - name: Build ontology
  run: |
  ./scripts/build_ontology_specs.sh
@@ -102,7 +114,7 @@ jobs:
  if [[ ${{ matrix.torch-version }} != "1.5.0" && ${{ matrix.python-version }} == "3.9" ]]; then mypy forte; fi
  - name: Test with pytest and run coverage
  run: |
- coverage run -m pytest tests --ignore=tests/forte/notebooks
+ coverage run -m pytest tests --ignore=tests/forte/notebooks --ignore=tests/forte/texar_nondependency_test.py
  coverage run --append -m pytest --doctest-modules forte
  - name: Upload coverage
  run: |

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -135,7 +135,7 @@ the [Google Python Style guide](http://google.github.io/styleguide/pyguide.html)
 project code is examined using `pylint`, `flake8`, `mypy`, `black` and `sphinx-build` which will be run
 automatically in CI. It's recommended that you should run these tests locally before submitting your pull request to save time. Refer to the github workflow [here](https:/asyml/forte/blob/master/.github/workflows/main.yml) for detailed steps to carry out the tests. Basically what you need to do is to install the requirements (check out the `Install dependencies` sections) and run the commands (refer to the steps in `Format check with Black`, `Lint with flake8`, `Lint with pylint`, `Lint main code with mypy when torch version is not 1.5.0`, `Build Docs`, etc.).
 
-We also recommend using tools `pre-commit` that automates the checking process before each commit since checking format is a repetitive process. We have the configuration file `.pre-commit-config.yaml` that lists several plugins including `black` to check format in the project root folder. Developers only need to install the package by `pip install pre-commit`.
+We also recommend using tools `pre-commit` that automates the checking process before each commit since checking format is a repetitive process. We have the configuration file `.pre-commit-config.yaml` that lists several plugins including `black` to check format in the project root folder. Developers only need to install the package by `pip install pre-commit`. All the package versions in the `.pre-commit-config.yaml` must be consistent with package versions in [workflow configuration](https:/asyml/forte/blob/master/.github/workflows/main.yml). For example, `black` package version should be set to the same.
 
 ### Docstring
 

diff --git a/README.md b/README.md
@@ -11,11 +11,11 @@
 [![Chat](http://img.shields.io/badge/gitter.im-asyml/forte-blue.svg)](https://gitter.im/asyml/community)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https:/psf/black)
 
-**Forte** is a toolkit for building Natural Language Processing pipelines, featuring 
-composable components, convenient data interfaces, and cross-task interaction. Forte designs 
-a universal data representation format for text, making it a 
-one-stop platform to assemble state-of-the-art NLP/ML technologies, ranging 
-from Information Retrieval, Natural Language Understanding to Natural Language Generation. 
+**Forte** is a toolkit for building Natural Language Processing pipelines, featuring
+composable components, convenient data interfaces, and cross-task interaction. Forte designs
+a universal data representation format for text, making it a
+one-stop platform to assemble state-of-the-art NLP/ML technologies, ranging
+from Information Retrieval, Natural Language Understanding to Natural Language Generation.
 
 Forte was originally developed in CMU and is actively contributed
 by [Petuum](https://petuum.com/)
@@ -50,13 +50,18 @@ pip install src/spacy
 
 Some components or modules in forte may require some [extra requirements](https:/asyml/forte/blob/master/setup.py#L45):
 
-* `pip install forte[ner]`: Install packages required for [ner_trainer](https:/asyml/forte/blob/master/forte/trainer/ner_trainer.py)
+* `pip install forte[data_aug]`: Install packages required for [data augmentation modules](https:/asyml/forte/tree/master/forte/processors/data_augment).
+* `pip install forte[ir]`: Install packages required for [Information Retrieval Supports](https:/asyml/forte/tree/master/forte/processors/ir/)
+* `pip install forte[remote]`: Install packages required for pipeline serving functionalities, such as [Remote Processor](https:/asyml/forte/processors/misc/remote_processor.py).
+* `pip install forte[audio_ext]`: Install packages required for Forte Audio support, such as [Audio Reader](https:/asyml/forte/blob/master/forte/data/readers/audio_reader.py).
+* `pip install forte[stave]`: Install packages required for [Stave](https:/asyml/forte/blob/master/forte/processors/stave/stave_processor.py) integration.
+* `pip install forte[models]`: Install packages required for [ner training](https:/asyml/forte/blob/master/forte/trainer/ner_trainer.py), [srl](https:/asyml/forte/tree/master/forte/models/srl), [srl with new training system](https:/asyml/forte/tree/master/forte/models/srl_new), and [srl_predictor](https:/asyml/forte/tree/master/forte/processors/nlp/srl_predictor.py)
 * `pip install forte[test]`: Install packages required for running [unit tests](https:/asyml/forte/tree/master/tests).
-* `pip install forte[example]`: Install packages required for running [forte examples](https:/asyml/forte/tree/master/examples).
 * `pip install forte[wikipedia]`: Install packages required for reading [wikipedia datasets](https:/asyml/forte/tree/master/forte/datasets/wikipedia).
-* `pip install forte[augment]`: Install packages required for [data augmentation module](https:/asyml/forte/tree/master/forte/processors/data_augment).
-* `pip install forte[stave]`: Install packages required for [StaveProcessor](https:/asyml/forte/blob/master/forte/processors/stave/stave_processor.py).
-* `pip install forte[audio_ext]`: Install packages required for [AudioReader](https:/asyml/forte/blob/master/forte/data/readers/audio_reader.py).
+* `pip install forte[nlp]`: Install packages required for additional NLP supports, such as [subword_tokenizer](https:/asyml/forte/tree/master/forte/processors/nlp/subword_tokenizer.py) and [texar encoder](https:/asyml/forte/tree/master/forte/processors/third_party/pretrained_encoder_processors.py)
+* `pip install forte[extractor]`: Install packages required for extrator-based training system, [extractor](https:/asyml/forte/blob/master/forte/data/extractors), [train_preprocessor](https:/asyml/forte/tree/master/forte/train_preprocessor.py) and [tagging trainer](https:/asyml/forte/tree/master/examples/tagging/tagging_trainer.py)
+
+
 
 ## Getting Started
 
@@ -135,11 +140,11 @@ principle, we make Forte:
 
 -----------------
 | ![forte_arch.jpg](https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_arch.png) |
-|:--:| 
+|:--:|
 | *A high level Architecture of Forte showing how ontology and entries work with the pipeline.* |
 -----------------
 | ![forte_results.jpg](https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_results.png) |
-|:--:| 
+|:--:|
 |*Forte stores results in data packs and use the ontology to represent task logic.* |
 -----------------
 
@@ -162,5 +167,3 @@ and [Contribution Guideline](https:/asyml/forte/blob/master/CONTRIBU
  <img src="https://asyml.io/assets/institutions/cmu.png", width="200" align="top">
  <img src="https://www.ucsd.edu/_resources/img/logo_UCSD.png" width="200" align="top">
 </p>
-
-
diff --git a/docs/code/data_aug.rst b/docs/code/data_aug.rst
@@ -164,11 +164,11 @@ Data Augmentation Models
 
 :hidden:`Reinforcement Learning`
 ----------------------------------
-.. autoclass:: forte.models.da_rl.MetaAugmentationWrapper
+.. autoclass:: forte.models.da_rl.aug_wrapper.MetaAugmentationWrapper
  :members:
 
-.. autoclass:: forte.models.da_rl.MetaModule
+.. autoclass:: forte.models.da_rl.magic_model.MetaModule
  :members:
 
-.. autoclass:: forte.models.da_rl.TexarBertMetaModule
+.. autoclass:: forte.models.da_rl.magic_model.TexarBertMetaModule
  :members:
diff --git a/docs/notebook_tutorial/pipeline.ipynb b/docs/notebook_tutorial/pipeline.ipynb
@@ -118,7 +118,7 @@
  "name": "python",
  "nbconvert_exporter": "python",
  "pygments_lexer": "ipython3",
- "version": "3.7.11"
+ "version": "3.10.0"
  },
  "orig_nbformat": 4
  },

diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -18,7 +18,7 @@ testbook
 pyyaml>=5.4
 jsonpickle>=1.4
 sortedcontainers>=2.1.0
-texar-pytorch>=0.1.1
+texar-pytorch>=0.1.2
 typing>=3.7.4; python_version < '3.5'
 typing-inspect>=0.6.0
 
@@ -44,5 +44,6 @@ nltk==3.6.6
 
 nbsphinx==0.8.8
 jinja2<=3.0.3
-
+asyml_utilities
 sphinx_autodoc_typehints
+
diff --git a/examples/content_rewriter/README.md b/examples/content_rewriter/README.md
@@ -8,11 +8,11 @@ will rewrite the sentence based on the table.
 The code has been tested on:
  - Python 3.6.0 and Python 3.7.6
  - tensorflow-gpu==1.14.0
- - texar-pytorch==0.1.1
+ - texar-pytorch==0.1.2
  - texar==0.2.1
  - cuda 10.0
 
-** NOTE **: 
+** NOTE **:
 Due to some historical texar compatibility issue, the model is only compatible
 by installing texar 0.2.1 from source, which can be installed via the following
 command.
@@ -27,12 +27,12 @@ Run the following commands:
 ```bash
 cd model
 pip install -r requirements.txt
-``` 
+```
 
 ### Downloading the models and data
 
-Before we run the rewriting demo, we need to download models and data from the 
-[link](https://drive.google.com/drive/folders/1jNaJ_R_f89G8xbAC8iwe49Yx_Z-LXr0i?usp=sharing) 
+Before we run the rewriting demo, we need to download models and data from the
+[link](https://drive.google.com/drive/folders/1jNaJ_R_f89G8xbAC8iwe49Yx_Z-LXr0i?usp=sharing)
 and put the two directories(e2e_data, e2e_model) under the same directory [model_dir]
 
 ### Running the example

diff --git a/examples/data_augmentation/reinforcement/main.py b/examples/data_augmentation/reinforcement/main.py
@@ -28,7 +28,8 @@
 
 from config import config_data, config_classifier
 from utils import model_utils
-from forte.models.da_rl import MetaAugmentationWrapper, TexarBertMetaModule
+from forte.models.da_rl.aug_wrapper import MetaAugmentationWrapper
+from forte.models.da_rl import TexarBertMetaModule
 
 parser = argparse.ArgumentParser()
 parser.add_argument(

diff --git a/examples/ner/main_predict.ipynb b/examples/ner/main_predict.ipynb
@@ -168,4 +168,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 1
-}
+}
diff --git a/examples/passage_ranker/indexer_reranker_eval_pipeline.py b/examples/passage_ranker/indexer_reranker_eval_pipeline.py
@@ -23,7 +23,7 @@
 from forte.common.configuration import Config
 from forte.data.multi_pack import MultiPack
 from forte.pipeline import Pipeline
-from forte.processors.ir import BertRerankingProcessor
+from forte.processors.ir.bert import BertRerankingProcessor
 
 
 if __name__ == "__main__":

diff --git a/examples/passage_ranker/indexer_reranker_inference_pipeline.py b/examples/passage_ranker/indexer_reranker_inference_pipeline.py
@@ -23,7 +23,7 @@
 from forte.data.multi_pack import MultiPack
 from forte.data.readers import MultiPackTerminalReader
 from forte.pipeline import Pipeline
-from forte.processors.ir import BertRerankingProcessor
+from forte.processors.ir.bert import BertRerankingProcessor
 from ft.onto.base_ontology import Sentence
 
 

diff --git a/examples/tagging/tagging_trainer.py b/examples/tagging/tagging_trainer.py
@@ -15,7 +15,7 @@
 from typing import Iterator, Dict
 
 import torch
-from texar.torch.data import Batch
+
 from torch.optim import SGD
 from torch.optim.optimizer import Optimizer
 from tqdm import tqdm
@@ -112,11 +112,12 @@ def train(self):
  val_pl: Pipeline = Pipeline()
  val_pl.set_reader(val_reader)
  val_pl.add(
- predictor, config={
+ predictor,
+ config={
  "batcher": {
  "batch_size": 10,
  }
- }
+ },
  )
  val_pl.add(evaluator, config=evaluator_config)
  val_pl.initialize()
@@ -127,9 +128,20 @@ def train(self):
  train_sentence_len_sum: float = 0.0
 
  logger.info("Start training.")
-
+ try:
+ from texar.torch.data import (
+ Batch,
+ ) # pylint: disable=import-outside-toplevel
+ except ImportError as e:
+ raise ImportError(
+ " `texar-pytorch` is not installed correctly."
+ " Consider install texar via `pip install texar-pytorch`."
+ " Or refer to [extra requirement for extractor system](pip install forte[extractor])"
+ " for more information. "
+ ) from e
  while epoch < self.config_data.num_epochs:
  epoch += 1
+
  # Get iterator of preprocessed batch of train data
  batch_iter: Iterator[Batch] = tp.get_train_batch_iterator()
 

diff --git a/forte/__init__.py b/forte/__init__.py
@@ -17,4 +17,3 @@
 
 from forte.version import VERSION as __version__
 from forte.pipeline import *
-from forte.train_preprocessor import *
diff --git a/forte/common/configuration.py b/forte/common/configuration.py
@@ -17,8 +17,7 @@
 Config here.
 """
 from typing import Dict
-
-from texar.torch import HParams
+from asyml_utilities.hyperparams import HParams
 
 __all__ = ["Config"]
 

diff --git a/forte/data/__init__.py b/forte/data/__init__.py
@@ -17,3 +17,6 @@
 from forte.data.multi_pack import *
 from forte.data.span import *
 from forte.data.base_extractor import *
+from forte.data.data_store import *
+from forte.data.selector import *
+from forte.data.index import *
diff --git a/forte/data/data_pack_dataset.py b/forte/data/data_pack_dataset.py
@@ -21,8 +21,18 @@
 from typing import Dict, Iterator, Type, Optional, List, Tuple, Union, Any
 
 import torch
-from texar.torch import HParams
-from texar.torch.data import IterDataSource, DatasetBase, Batch
+
+try:
+ from texar.torch.data import IterDataSource, DatasetBase, Batch
+except ImportError as e:
+ raise ImportError(
+ " `texar-pytorch` is not installed correctly."
+ " Consider install texar via `pip install texar-pytorch`"
+ " Or refer to [extra requirement for extrator](pip install forte[extractor])"
+ " for more information. "
+ ) from e
+from asyml_utilities.hyperparams import HParams
+
 
 from forte.data.converter import Converter
 from forte.data.converter import Feature

diff --git a/forte/data/extractors/seqtagging_extractor.py b/forte/data/extractors/seqtagging_extractor.py
@@ -20,6 +20,7 @@
 
 from torch import Tensor
 
+
 from forte.common.configuration import Config
 from forte.data.base_extractor import BaseExtractor
 from forte.data.converter.feature import Feature

diff --git a/forte/data/extractors/subword_extractor.py b/forte/data/extractors/subword_extractor.py
@@ -18,7 +18,7 @@
 import logging
 from typing import Union, Dict, Optional
 
-from texar.torch.data.tokenizers.bert_tokenizer import BERTTokenizer
+
 from forte.common.configuration import Config
 from forte.data.data_pack import DataPack
 from forte.data.converter.feature import Feature
@@ -42,6 +42,17 @@ class SubwordExtractor(BaseExtractor):
  def initialize(self, config: Union[Dict, Config]):
  # pylint: disable=attribute-defined-outside-init
  super().initialize(config=config)
+
+ try:
+ from texar.torch.data.tokenizers.bert_tokenizer import ( # pylint:disable=import-outside-toplevel
+ BERTTokenizer,
+ )
+ except ImportError as e:
+ raise ImportError(
+ " `texar-pytorch` is not installed correctly."
+ " Please refer to [extra requirement for aug wrapper](pip install forte[extractor])"
+ " for more information. "
+ ) from e
  self.tokenizer = BERTTokenizer(
  pretrained_model_name=self.config.pretrained_model_name,
  cache_dir=None,

diff --git a/forte/data/readers/audio_reader.py b/forte/data/readers/audio_reader.py
@@ -36,8 +36,9 @@ def __init__(self):
  except ModuleNotFoundError as e:
  raise ModuleNotFoundError(
  "AudioReader requires 'soundfile' package to be installed."
- " You can run 'pip install soundfile' or 'pip install forte"
- "[audio_ext]'. Note that additional steps might apply to Linux"
+ " You can refer to [extra modules to install]('pip install"
+ " forte['audio_ext']) or 'pip install forte"
+ ". Note that additional steps might apply to Linux"
  " users (refer to "
  "https://pysoundfile.readthedocs.io/en/latest/#installation)."
  ) from e

diff --git a/forte/data/vocabulary.py b/forte/data/vocabulary.py
@@ -16,8 +16,7 @@
 from collections import Counter
 from typing import List, Tuple, Dict, Union, Hashable, Iterable, Optional
 from typing import TypeVar, Generic, Any, Set
-
-import texar.torch as tx
+from asyml_utilities.special_tokens import SpecialTokens
 
 from forte.common import InvalidOperationException
 
@@ -174,15 +173,15 @@ def __init__(
  # a vector of zeros.
  pad_id = -1 if method == "one-hot" else None
  self.add_special_element(
- tx.data.SpecialTokens.PAD,
+ SpecialTokens.PAD,
  element_id=pad_id,
  special_token_name="PAD",
  representation=pad_value,
  )
 
  if use_unk:
  self.add_special_element(
- tx.data.SpecialTokens.UNK,
+ SpecialTokens.UNK,
  special_token_name="UNK",
  representation=unk_value,
  )