agent.process.classification_and_content module

Extract text, and get suggestions, features, and flags from Classifier.

class agent.process.classification_and_content.CheckStopwordCount(submission_id, process_id=None)

Bases: agent.process.base.Process

Check the submission content for too low stopword count.

check_stop_count(previous, trigger, emit)

Flag the submission if the number of stopwords is too low.

Return type

None

steps = [<function CheckStopwordCount.check_stop_count>]
class agent.process.classification_and_content.CheckStopwordPercent(submission_id, process_id=None)

Bases: agent.process.base.Process

Check the submission content for too low percentage of stopwords.

check_stop_percent(previous, trigger, emit)

Flag the submission if the percentage of stopwords is too low.

Return type

None

steps = [<function CheckStopwordPercent.check_stop_percent>]
class agent.process.classification_and_content.PlainTextExtraction(submission_id, process_id=None)

Bases: agent.process.base.Process

Extract plain text from a compiled PDF.

handle_plaintext_exception(exc)

Handle exceptions raised when calling the plain text service.

Return type

None

poll_extraction(previous, trigger, emit)

Poll the plain text service until extraction is complete.

Return type

None

retrieve_content(previous, trigger, emit)

Retrieve the extracted plain text.

Return type

bytes

source_id(trigger)

Get the source ID for the submission content.

Return type

int

start_extraction(previous, trigger, emit)

Request extraction by the plain text service.

Return type

None

steps = [<function PlainTextExtraction.start_extraction>, <function PlainTextExtraction.poll_extraction>, <function PlainTextExtraction.retrieve_content>]
class agent.process.classification_and_content.RunAutoclassifier(submission_id, process_id=None)

Bases: agent.process.classification_and_content.PlainTextExtraction

Extract plain text and poll the autoclassifier.

In addition to generating classification suggestions, the current implementation of the autoclassifier also generates features (like word counts) and content flags (e.g. possible language issues, line numbers).

CLASSIFIER_FLAGS = {'%stop': None, 'charset': <Type.CHARACTER_SET: 'character set'>, 'language': <Type.LANGUAGE: 'language'>, 'linenos': <Type.LINE_NUMBERS: 'line numbers'>, 'stops': None}
call_classifier(content, trigger, emit)

Send plain text content to the autoclassifier.

Return type

None

handle_classifier_exception(exc)

Handle exceptions raised when calling the classifier service.

Return type

None

process_result(result, trigger, emit)

Process the results returned by the autoclassifier.

Return type

None

steps = [<function PlainTextExtraction.start_extraction>, <function PlainTextExtraction.poll_extraction>, <function PlainTextExtraction.retrieve_content>, <function RunAutoclassifier.call_classifier>, <function RunAutoclassifier.process_result>]