arxiv.submission package

Core event-centric data abstraction for the submission & moderation subsystem.

This package provides an event-based API for mutating submissions. Instead of representing submissions as objects and mutating them directly in web controllers and other places, we represent a submission as a stream of commands or events. This ensures that we have a precise and complete record of activities concerning submissions, and provides an explicit and consistent definition of operations that can be performed within the arXiv submission system.

Overview

Event types are defined in domain.event. The base class for all events is domain.event.base.Event. Each event type defines additional required data, and have validate and project methods that implement its logic. Events operate on domain.submission.Submission instances.

from arxiv.submission import CreateSubmission, User, Submission
user = User(1345, 'foo@user.com')
creation = CreateSubmission(creator=user)

core defines the persistence API for submission data. core.save() is used to commit new events. core.load() retrieves events for a submission and plays them forward to get the current state, whereas core.load_fast() retrieves the latest projected state of the submission (faster, theoretically less reliable).

from arxiv.submission import save, SetTitle
submission, events = save(creation, SetTitle(creator=user, title='Title!'))

Watch out for exceptions.InvalidEvent to catch validation-related problems (e.g. bad data, submission in wrong state). Watch for SaveError to catch problems with persisting events.

Callbacks can be attached to event types in order to execute routines automatically when specific events are committed, using domain.Event.bind().

from typing import Iterable

@SetTitle.bind()
def flip_title(event: SetTitle, before: Submissionm, after: Submission,
               creator: Agent) -> Iterable[SetTitle]:
    yield SetTitle(creator=creator, title=f"(╯°□°)╯︵ ┻━┻ {event.title}")

Finally, services.classic provides integration with the classic submission database. We use the classic database to store events (new table), and also keep its legacy tables up to date so that other legacy components continue to work as expected.

Using commands/events

Command/event classes are defined in arxiv.submission.domain.event, and are accessible from the root namespace of this package. Each event type defines a transformation/operation on a single submission, and defines the data required to perform that operation. Events are played forward, in order, to derive the state of a submission. For more information about how event types are defined, see arxiv.submission.domain.event.Event.

Note

One major difference between the event stream and the classic submission database table is that in the former model, there is only one submission id for all versions/mutations. In the legacy system, new rows are created in the submission table for things like creating a replacement, adding a DOI, or requesting a withdrawal. The Integration with the legacy system handles the interchange between these two models.

Commands/events types are PEP 557 data classes. Each command/event inherits from Event, and may add additional fields. See Event for more information about common fields.

To create a new command/event, initialize the class with the relevant data, and commit it using save(). For example:

>>> from arxiv.submission import User, SetTitle, save
>>> user = User(123, "joe@bloggs.com")
>>> update = SetTitle(creator=user, title='A new theory of foo')
>>> submission = save(creation, submission_id=12345)

If the commands/events are for a submission that already exists, the latest state of that submission will be obtained by playing forward past events. New events will be validated and applied to the submission in the order that they were passed to save().

  • If an event is invalid (e.g. the submission is not in an appropriate state for the operation), an InvalidEvent exception will be raised. Note that at this point nothing has been changed in the database; the attempt is simply abandoned.

  • The command/event is stored, as is the latest state of the submission. Events and the resulting state of the submission are stored atomically.

  • If the notification service is configured, a message about the event is propagated as a Kinesis event on the configured stream. See arxiv.submission.services.notification for details.

Special case: creation

Note that if the first event is a CreateSubmission the submission ID need not be provided, as we won’t know what it is yet. For example:

from arxiv.submission import User, CreateSubmission, SetTitle, save

>>> user = User(123, "joe@bloggs.com")
>>> creation = CreateSubmission(creator=user)
>>> update = SetTitle(creator=user, title='A new theory of foo')
>>> submission, events = save(creation, update)
>>> submission.submission_id
40032

Versioning events

Handling changes to this software in a way that does not break past data is a non-trivial problem. In a traditional relational database arrangement we would leverage a database migration tool to do things like apply ALTER statements to tables when upgrading software versions. The premise of the event data model, however, is that events are immutable – we won’t be going back to modify past events whenever we make a change to the software.

The strategy for version management around event data is implemented in arxiv.submission.domain.events.versioning. When event data is stored, it is tagged with the current version of this software. When event data are loaded from the store in this software, prior to instantiating the appropriate Event subclass, the data are mapped to the current software version using any defined version mappings for that event type. This happens on the fly, in domain.event.event_factory().

Integration with the legacy system

The classic service module provides integration with the classic database. See the documentation for that module for details. As we migrate off of the classic database, we will swap in a new service module with the same API.

Until all legacy components that read from or write to the classic database are replaced, we will not be able to move entirely away from the legacy submission database. Particularly in the submission and moderation UIs, design has assumed immediate consistency, which means a conventional read/write interaction with the database. Hence the classic integration module assumes that we are reading and writing events and submission state from/to the same database.

As development proceeds, we will look for opportunities to decouple from the classic database, and focus on more localized projections of submission events that are specific to a service/application. For example, the moderation UI/API need not maintain or have access to the complete representation of the submission; instead, it may track the subset of events relevant to its operation (e.g. pertaining to metadata, classification, proposals, holds, etc).

class arxiv.submission.IAwaitable(*args, **kwargs)[source]

Bases: typing_extensions.Protocol

An object that provides an is_available predicate.

is_available(**kwargs)[source]

Check whether an object (e.g. a service) is available.

Return type

bool

arxiv.submission.init_app(app)[source]

Configure a Flask app to use this package.

Initializes and waits for StreamPublisher and classic to be available.

Return type

None

arxiv.submission.wait_for(service, delay=2, **extra)[source]

Wait for a service to become available.

Return type

None

Subpackages