Skip to content
forked from 2mh/PyBioC

Python library for working with BioC files

License

Notifications You must be signed in to change notification settings

OntoGene/PyBioC

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyBioC

PyBioC is a native Python library for reading and writing BioC XML data.

More information about BioC is available at sourceforge.

Installation

Use pip:

pip install git+https:/OntoGene/PyBioC.git

For Python 3, you might have to type pip3.

Usage

Two example programs, test_read+write.py and stemming.py are shipped in the src/ folder.

  • test_read+write.py shows the very basic reading and writing capability of the library.
  • stemming.py uses the Python Natural Language Toolkit (NLTK) library to manipulate a BioC XML file read in before; it then tokenizes the corresponding text, does stemming on the tokens and transforms the manipulated PyBioC objects back to valid BioC XML format.

Example

Generate BioC object for export

from bioc import BioCXMLWriter, BioCCollection, BioCDocument, BioCPassage

writer = BioCXMLWriter()
writer.collection = BioCCollection()
collection = writer.collection
collection.date = '20150301'
collection.source = 'ngy1 corpus'

document = BioCDocument()
document.id = '123456'  # pubmed id

passage = BioCPassage()
passage.put_infon('type', 'paragraph')
passage.offset = '0'
passage.text = 'This is a biomedical sentence about various rare diseases.'
document.add_passage(passage)

collection.add_document(document)

print writer

About

Python library for working with BioC files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%