atap-corpus-slicer

atap_corpus_slicer Documentation


Docs

atap_corpus_slicer.CorpusSlicer

A GUI tool for applying filters to a corpus based on its data and metadata

Can be imported using:

from atap_corpus_slicer import CorpusSlicer

CorpusSlicer.__init__

CorpusSlicer constructor

Params

Example

nlp = spacy.load('en_core_web_sm')
slicer = CorpusSlicer(run_logger=True, model=nlp)

CorpusSlicer.servable

Inherited from panel.viewable.Viewer. Call CorpusSlicer.servable() in a Jupyter notebook context to display the CorpusLoader widget with the CorpusSlicer embedded as a tab.

Example

loader = CorpusSlicer()
loader.servable()

CorpusSlicer.get_corpus_loader

Returns the CorpusLoader object used by the CorpusSlicer to build and load the corpus. The CorpusLoader panel is displayed with the CorpusSlicer embedded as a tab.

Returns: CorpusLoader - the CorpusLoader object in which the CorpusSlicer is embedded.

Example

slicer = CorpusSlicer()
loader = slicer.get_corpus_loader()

CorpusSlicer.get_mutable_corpora

Returns the corpora object that contains the loaded corpus objects. This allows adding to the corpora from outside the CorpusSlicer as the object returned is mutable, not a copy. The Corpora object has a unique name constraint, meaning a corpus object cannot be added to the corpora if another corpus with the same name is already present. The same constraint applies to the rename method of corpus objects added to the corpora.

Returns: TCorpora - the mutable corpora object that contains the loaded corpus objects

Example

slicer = CorpusSlicer()
corpora_object = slicer.get_mutable_corpora()
corpus = corpora_object.get("example")

Example usage

The following snippet could be used as a cell in a Jupyter notebook. Each time the user builds a corpus, the corpus will be piped through the provided spaCy Language.

import spacy
from atap_corpus_slicer import CorpusSlicer
from atap_corpus_loader import CorpusLoader

nlp = spacy.load('en_core_web_sm')
loader = CorpusLoader(root_directory='./corpus_data')
corpus_slicer: CorpusSlicer = CorpusSlicer(corpus_loader=loader, run_logger=True, model=nlp)
corpus_slicer.servable()