arxiv.submission.services.plaintext.plaintext module

Provides integration with the plaintext extraction service.

This integration is focused on usage patterns required by the submission system. Specifically:

  1. Must be able to request an extraction for a compiled submission.

  2. Must be able to poll whether the extraction has completed.

  3. Must be able to retrieve the raw binary content from when the extraction has finished successfully.

  4. Encounter an informative exception if something goes wrong.

This represents only a subset of the functionality provided by the plaintext service itself.

exception arxiv.submission.services.plaintext.plaintext.ExtractionFailed(msg, response)[source]

Bases: arxiv.integration.api.exceptions.RequestFailed

The plain text extraction service failed to extract text.

exception arxiv.submission.services.plaintext.plaintext.ExtractionInProgress(msg, response)[source]

Bases: arxiv.integration.api.exceptions.RequestFailed

An extraction is already in progress.

class arxiv.submission.services.plaintext.plaintext.PlainTextService(verify=True, headers={})[source]

Bases: arxiv.integration.api.service.HTTPIntegration

Represents an interface to the plain text extraction service.

class Meta[source]

Bases: object

Configuration for Classifier.

service_name = 'plaintext'
class Status[source]

Bases: enum.Enum

Task statuses.

FAILED = 'failed'
IN_PROGRESS = 'in_progress'
SUCCEEDED = 'succeeded'
VERSION = 0.3

Version of the service for which this module is implemented.

endpoint()[source]

Get the URL of the extraction endpoint.

Return type

str

extraction_is_complete()[source]

Check the status of an extraction task by submission upload ID.

Parameters

source_id (str) – ID of the submission upload workspace.

Return type

bool

Returns

bool

Raises

ExtractionFailed – Raised if the task is in a failed state, or an unexpected condition is encountered.

is_available()[source]

Check our connection to the plain text service.

Return type

bool

request_extraction()[source]

Make a request for plaintext extraction using the submission upload ID.

Parameters

source_id (str) – ID of the submission upload workspace.

Return type

None

retrieve_content()[source]

Retrieve plain text content by submission upload ID.

Parameters

source_id (str) – ID of the submission upload workspace.

Return type

bytes

Returns

bytes – Raw text content.

Raises
  • RequestFailed – Raised if an unexpected status was encountered.

  • ExtractionInProgress – Raised if an extraction is currently in progress

status_endpoint()[source]

Get the URL of the extraction status endpoint.

Return type

str