agent package

Orchestrates backend processes based on rules triggered by submission events.

The primary concerns of the agent are:

  • Orchestrating automated processes in support of submission and moderation.

  • Keeping track of what processes have been carried out on a submission, and the outcomes of those processes.

  • Providing a framework for defining conditions under which processes should be carried out.

In addition, we anticipate future development of:

  • Interfaces for administrators to monitor submission-related processes, and to start processes manually.

  • A metrics endpoint for [Prometheus](https://prometheus.io/), to expose process performance/rates.

  • Interfaces for administrators to define processing rules.

Conceptual overview

A process is a set of one or more related steps that should be carried out in order, usually focusing on a single submission. Steps are small units of work with a specific objective, such as getting a resource from a service or applying a policy. If a step in a process fails, the subsequent steps are not carried out. Examples of processes include running the autoclassifier and annotating a submission with the results, and placing submissions on hold when they exceed size limits.

Processes are implemented by defining a class that inherits from Process.

A rule defines the circumstances under which a process should be carried out. Specifically, a rule is associated with a particular type of event, and a function that determines whether the process should be carried out based on the event properties and/or the state of the submission.

Rules are implemented by instantiating Rule in rules.

An event is a specific mutation of a submission by an actor at a particular point in time. See arxiv.submission for an overview of the event model used in the submission system.

Events are implemented by defining an Event class in arxiv.submission, and emitted via arxiv.submission.core.save().

Architectural overview

Context

The agent operates within the scope of the submission system.

../../_images/submission-agent-context.png

Fig. 6 System context for the arXiv submission agent.

The submission agent consumes submission events generated by other applications running in the submission system, such as the submission UI, via the SubmissionEvents Kinesis stream. The agent uses the arxiv.submission package to generate new events, which involves writing to the submission database and putting records on the SubmissionEvents Kinesis stream.

In carrying out processes, the agent makes requests to backend services in the submission system, such as the plain text extraction service, file management service, etc.

Containers

The submission agent is comprised of four containers that are deployed and scaled more or less independently.

../../_images/submission-agent-containers.png

Fig. 7 Containers within the arXiv submission agent.

The agent.consumer`consumes notifications about events on the ``SubmissionEvents` Kinesis stream. It is implemented on top of arxiv.integration.kinesis. The agent consumes events on the stream one at a time, in order, and keeps track of its progress by marking checkpoints in a database. The agent also uses the database to commemorate process-relevant submission events. In the event that an agent process goes away, this allows us to resume processing the stream while minimizing the amount of duplicated work. The agent dispatches steps in triggered processes to be carried out by the agent.worker. Only one agent process should run per shard to avoid processing the same events more than once.

The agent database is a MariaDB SQL database used by the consumer. It stores checkpoints, process-relevant submission events, and (future) configurations for user-defined rules.

The agent.worker is an horizontally-scalable Celery worker that carries out the steps of processes. These tasks are dispatched by the agent.consumer via a Redis in-memory key-value store. The worker is responsible for calling backend services as it carries out its work. Worker processes can be scaled horizontally independently of agent processes.