A GUI tool for applying filters to a corpus based on its data and metadata
Can be imported using:
from atap_corpus_slicer import CorpusSlicer
CorpusSlicer constructor
Params
Example
nlp = spacy.load('en_core_web_sm')
slicer = CorpusSlicer(run_logger=True, model=nlp)
Inherited from panel.viewable.Viewer. Call CorpusSlicer.servable() in a Jupyter notebook context to display the CorpusLoader widget with the CorpusSlicer embedded as a tab.
Example
loader = CorpusSlicer()
loader.servable()
Returns the CorpusLoader object used by the CorpusSlicer to build and load the corpus. The CorpusLoader panel is displayed with the CorpusSlicer embedded as a tab.
Returns: CorpusLoader - the CorpusLoader object in which the CorpusSlicer is embedded.
Example
slicer = CorpusSlicer()
loader = slicer.get_corpus_loader()
Returns the corpora object that contains the loaded corpus objects. This allows adding to the corpora from outside the CorpusSlicer as the object returned is mutable, not a copy. The Corpora object has a unique name constraint, meaning a corpus object cannot be added to the corpora if another corpus with the same name is already present. The same constraint applies to the rename method of corpus objects added to the corpora.
Returns: TCorpora - the mutable corpora object that contains the loaded corpus objects
Example
slicer = CorpusSlicer()
corpora_object = slicer.get_mutable_corpora()
corpus = corpora_object.get("example")
The following snippet could be used as a cell in a Jupyter notebook. Each time the user builds a corpus, the corpus will be piped through the provided spaCy Language.
import spacy
from atap_corpus_slicer import CorpusSlicer
from atap_corpus_loader import CorpusLoader
nlp = spacy.load('en_core_web_sm')
loader = CorpusLoader(root_directory='./corpus_data')
corpus_slicer: CorpusSlicer = CorpusSlicer(corpus_loader=loader, run_logger=True, model=nlp)
corpus_slicer.servable()