Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc test readme #830

Merged
merged 9 commits into from
Jun 22, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -239,3 +239,24 @@ jobs:
token: ${{ secrets.REPO_DISPATCH_PAT_HECTOR }}
repository: asyml/forte-wrappers
event-type: trigger-forte-wrappers

readme:
needs: build
runs-on: ubuntu-latest
env:
python-version: 3.9
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ env.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ env.python-version }}

- name: Test README.md when python version is 3.9
run: |
pip install mkcodes
pip install --progress-bar off .
pip install --progress-bar off forte.spacy nltk
mkcodes --github --output tests/temp_readme_test.py README.md
python tests/temp_readme_test.py
rm tests/temp_readme_test.py
22 changes: 13 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,6 @@ pip install forte.spacy
Let's start by writing a simple processor that analyze POS tags to tokens using the good old NLTK library.
```python
import nltk

from forte.processors.base import PackProcessor
from forte.data.data_pack import DataPack
from ft.onto.base_ontology import Token
Expand All @@ -105,14 +104,13 @@ class NLTKPOSTagger(PackProcessor):
def _process(self, input_pack: DataPack):
# get a list of token data entries from `input_pack`
# using `DataPack.get()`` method
token_entries = input_pack.get(Token)
token_texts = [token.text for token in token_entries]
token_texts = [token.text for token in input_pack.get(Token)]

# use nltk pos tagging module to tag token texts
taggings = nltk.pos_tag(token_texts)

# assign nltk taggings to token attributes
for token, tag in zip(token_entries, taggings):
for token, tag in zip(input_pack.get(Token), taggings):
token.pos = tag[1]
```
If we break it down, we will notice there are two main functions.
Expand All @@ -127,31 +125,37 @@ a full pipeline.
```python
from forte import Pipeline

from forte.data.readers import TerminalReader
from forte.data.readers import StringReader
from fortex.spacy import SpacyProcessor

pipeline: Pipeline = Pipeline[DataPack]()
pipeline.set_reader(TerminalReader())
pipeline.set_reader(StringReader())
pipeline.add(SpacyProcessor(), {"processors": ["sentence", "tokenize"]})
pipeline.add(NLTKPOSTagger())
```

Here we have successfully created a pipeline with a few components:
* a `TerminalReader` that reads data from terminal
* a `StringReader` that reads data from a string.
* a `SpacyProcessor` that calls SpaCy to split the sentences and create tokenization
* and finally the brand new `NLTKPOSTagger` we just implemented,

Let's see it run in action!

```python
for pack in pipeline.initialize().process_dataset():
input_string = "Forte is a data-centric ML framework"
for pack in pipeline.initialize().process_dataset(input_string):
for sentence in pack.get("ft.onto.base_ontology.Sentence"):
print("The sentence is: ", sentence.text)
print("The POS tags of the tokens are:")
for token in pack.get(Token, sentence):
print(f" {token.text}({token.pos})", end = " ")
print(f" {token.text}[{token.pos}]", end = " ")
print()
```
It gives us output as follows:

```
Forte[NNP] is[VBZ] a[DT] data[NN] -[:] centric[JJ] ML[NNP] framework[NN] .[.]
```

We have successfully created a simple pipeline. In the nutshell, the `DataPack`s are
the standard packages "flowing" on the pipeline. They are created by the reader, and
Expand Down