arxiv.canonical.integrity package

Integrity structs and collections for the canonical record.

This module provides a class hierarchy for integrity and consistency-related concerns pertaining to the canonical record. The classes herein generate and validate checksums, and generate manifests.

In order to efficiently verify the completeness and integrity of the record (or a replica of the record), and to identify the source of inconsistencies, consistency checks are performed at several levels of granularity (e.g. entry, day, month, year, global). The completeness and integrity of all or a part of the arXiv collection can be verified by comparing the checksum values at the corresponding level of granularity.

The way in which checksum values are calculated for each level is described below. This is inspired by the strategy for checksum validation of large chunked uploads to Amazon S3. All checksum values are md5 hashes, stored and transmitted as URL-safe base64-encoded strings.

Level

Contents

Completeness

Integrity

File

Binary data.

Presence/absence of descriptor.

Hash of binary file content.

Version

Collection of metadata, source, and render files.

Presence of files.

Hash of concatenated (sorted by name) file hashes.

E-Print

One or more sequential versions

Presence of version records.

Hash of concatenated (sorted) version hashes.

Day

All e-prints the first version of which was announced on this day.

Presence of e-print records.

Hash of concatenated (sorted) e-print hashes.

Month

All e-prints the first version of which was announced in this month.

Presence of day records.

Hash of concatenated (sorted) day hashes.

Year

All e-prints the first version of which was announced in this year.

Presence of month records.

Hash of concatenated (sorted) month hashes.

All

All e-prints.

Presence of year records.

Hash of concatenated (sorted) year hashes.

The same hierarchy is used for listing files, where the terminal bitstream is the binary serialized manifest.

A global integrity collection, Integrity draws together the e-print and listing hierarchies into a final, composite level.

class arxiv.canonical.integrity.Integrity(name, record=None, members=None, manifest=None, checksum=None)[source]

Bases: arxiv.canonical.integrity.core.IntegrityBase

Apex of the integrity collection.

class arxiv.canonical.integrity.IntegrityBase(name, record=None, members=None, manifest=None, checksum=None)[source]

Bases: typing.Generic

Generic base class for all integrity collections.

Provides a uniform protocol for integrity collections, while allowing the name, record, member name, and member types to vary from subclass to subclass.

calculate_checksum()[source]
Return type

str

property checksum

The checksum of this integrity collection.

Return type

str

extend_manifest(member)[source]
Return type

None

classmethod from_record(record, checksum=None, calculate_new_checksum=True)[source]
Return type

~_Self

property is_valid

Indicates whether or not this collection has a valid checksum.

Return type

bool

iter_members()[source]
Return type

Iterable[~_Member]

classmethod make_manifest(members)[source]

Make a Manifest for this integrity collection.

Return type

Manifest

classmethod make_manifest_entry(member)[source]
Return type

ManifestEntry

property manifest

The Manifest of this integrity collection.

Return type

Manifest

property manifest_name

Get the name of this object for a parent manifest.

Return type

str

member_type = None

The type of members contained by an instance of a register class.

property members

The members of this collection.

Return type

Mapping[~_MemberName, ~_Member]

property number_of_events

int

Type

rtype

property number_of_versions

int

Type

rtype

property record

The record associated with this collection.

Return type

~_Record

set_record(record)[source]
Return type

None

update_checksum()[source]

Set the checksum for this record.

Return type

None

update_or_extend_manifest(member, checksum)[source]

Update the checksum on a manifest entry, or add a new entry.

Return type

None

class arxiv.canonical.integrity.IntegrityDay(name, record=None, members=None, manifest=None, checksum=None)[source]

Bases: arxiv.canonical.integrity.core.IntegrityBase

Integrity collection for e-prints associated with a single day.

Specifically, this includes all versions of e-prints the first version of which was announced on this day.

property day

The numeric day represented by this collection.

Return type

date

class arxiv.canonical.integrity.IntegrityEntryBase(name, record=None, members=None, manifest=None, checksum=None)[source]

Bases: arxiv.canonical.integrity.core.IntegrityBase

class arxiv.canonical.integrity.IntegrityEntryMembers[source]

Bases: arxiv.canonical.util.GenericMonoDict

A dict that returns only :class: .IntegrityEntry instances.

Consistent with Mapping[str, IntegrityEntry].

class arxiv.canonical.integrity.IntegrityEPrint(name, record=None, members=None, manifest=None, checksum=None)[source]

Bases: arxiv.canonical.integrity.core.IntegrityBase

Integrity collection for an EPrint.

classmethod make_manifest_entry(member)[source]
Return type

ManifestEntry

member_type

alias of IntegrityVersion

class arxiv.canonical.integrity.IntegrityEPrints(name, record=None, members=None, manifest=None, checksum=None)[source]

Bases: arxiv.canonical.integrity.core.IntegrityBase

Integrity collection for all e-prints in the canonical record.

class arxiv.canonical.integrity.IntegrityListing(name, record=None, members=None, manifest=None, checksum=None)[source]

Bases: arxiv.canonical.integrity.core.IntegrityEntryBase

calculate_checksum()[source]
Return type

str

classmethod from_record(record, checksum=None, calculate_new_checksum=True)[source]

Make an IntegrityListing from a :class:`.RecordListing.

Return type

~_Self

record_type

alias of arxiv.canonical.record.listing.RecordListing

class arxiv.canonical.integrity.IntegrityListingDay(name, record=None, members=None, manifest=None, checksum=None)[source]

Bases: arxiv.canonical.integrity.core.IntegrityBase

Integrity collection of listings for a single day.

classmethod from_record(record, checksum=None, calculate_new_checksum=True)[source]

Generate an IntegrityListing from a RecordListing.

Return type

~_Self

classmethod make_manifest_entry(member)[source]
Return type

ManifestEntry

property manifest_name

The name to use for this record in a parent manifest.

Return type

str

class arxiv.canonical.integrity.IntegrityListingMonth(name, record=None, members=None, manifest=None, checksum=None)[source]

Bases: arxiv.canonical.integrity.core.IntegrityBase

Integrity collection of listings for a single month.

property manifest_name

The name to use for this record in a parent manifest.

Return type

str

property month

The numeric month represented by this collection.

Return type

int

property year

The numeric year represented by this collection.

Return type

int

class arxiv.canonical.integrity.IntegrityListingYear(name, record=None, members=None, manifest=None, checksum=None)[source]

Bases: arxiv.canonical.integrity.core.IntegrityBase

Integrity collection of listings for a single year.

property year

The numeric year represented by this collection.

Return type

int

class arxiv.canonical.integrity.IntegrityListings(name, record=None, members=None, manifest=None, checksum=None)[source]

Bases: arxiv.canonical.integrity.core.IntegrityBase

Integrity collection of all listings.

class arxiv.canonical.integrity.IntegrityMetadata(name, record=None, members=None, manifest=None, checksum=None)[source]

Bases: arxiv.canonical.integrity.core.IntegrityEntryBase

Integrity entry for a metadata bitstream in the record.

calculate_checksum()[source]
Return type

str

classmethod from_record(record, checksum=None, calculate_new_checksum=True)[source]
Return type

~_Self

record_type

alias of arxiv.canonical.record.metadata.RecordMetadata

class arxiv.canonical.integrity.IntegrityMonth(name, record=None, members=None, manifest=None, checksum=None)[source]

Bases: arxiv.canonical.integrity.core.IntegrityBase

Integrity collection for e-prints associated with a single month.

Specifically, this includes all versions of e-prints the first version of which was announced in this month.

property manifest_name

The name to use for this record in a parent manifest.

Return type

str

property month

The numeric month represented by this collection.

Return type

int

property year

The numeric year represented by this collection.

Return type

int

class arxiv.canonical.integrity.IntegrityVersion(name, record=None, members=None, manifest=None, checksum=None)[source]

Bases: arxiv.canonical.integrity.core.IntegrityBase

Integrity collection for an e-print version.

property formats

Dict[ContentType, IntegrityEntry]

Type

rtype

classmethod from_record(version, checksum=None, calculate_new_checksum=True, manifest=None)[source]

Get an IntegrityVersion from a RecordVersion.

Parameters
  • version (RecordVersion) – The record for which this integrity object is to be generated.

  • checksum (str or None) –

  • manifest (dict) – If provided, checksum values for member files will be retrieved from this manifest. Otherwise they will be calculated from the file content.

  • calculate_new_checksum (bool) – If True, a new checksum will be calculated from the manifest.

Returns

Return type

IntegrityVersion

Return type

~_Self

classmethod make_manifest(members)[source]

Make a Manifest for this integrity collection.

Return type

Manifest

classmethod make_manifest_entry(member)[source]
Return type

ManifestEntry

property metadata

IntegrityMetadata

Type

rtype

property render

Optional[IntegrityEntry]

Type

rtype

property source

IntegrityEntry

Type

rtype

class arxiv.canonical.integrity.IntegrityYear(name, record=None, members=None, manifest=None, checksum=None)[source]

Bases: arxiv.canonical.integrity.core.IntegrityBase

Integrity collection for e-prints associated with a single year.

Specifically, this includes all versions of e-prints the first version of which was announced in this year.

property year

The numeric year represented by this collection.

Return type

int