arxiv.canonical.integrity package¶
Integrity structs and collections for the canonical record.
This module provides a class hierarchy for integrity and consistency-related concerns pertaining to the canonical record. The classes herein generate and validate checksums, and generate manifests.
In order to efficiently verify the completeness and integrity of the record (or a replica of the record), and to identify the source of inconsistencies, consistency checks are performed at several levels of granularity (e.g. entry, day, month, year, global). The completeness and integrity of all or a part of the arXiv collection can be verified by comparing the checksum values at the corresponding level of granularity.
The way in which checksum values are calculated for each level is described below. This is inspired by the strategy for checksum validation of large chunked uploads to Amazon S3. All checksum values are md5 hashes, stored and transmitted as URL-safe base64-encoded strings.
Level |
Contents |
Completeness |
Integrity |
---|---|---|---|
File |
Binary data. |
Presence/absence of descriptor. |
Hash of binary file content. |
Version |
Collection of metadata, source, and render files. |
Presence of files. |
Hash of concatenated (sorted by name) file hashes. |
E-Print |
One or more sequential versions |
Presence of version records. |
Hash of concatenated (sorted) version hashes. |
Day |
All e-prints the first version of which was announced on this day. |
Presence of e-print records. |
Hash of concatenated (sorted) e-print hashes. |
Month |
All e-prints the first version of which was announced in this month. |
Presence of day records. |
Hash of concatenated (sorted) day hashes. |
Year |
All e-prints the first version of which was announced in this year. |
Presence of month records. |
Hash of concatenated (sorted) month hashes. |
All |
All e-prints. |
Presence of year records. |
Hash of concatenated (sorted) year hashes. |
The same hierarchy is used for listing files, where the terminal bitstream is the binary serialized manifest.
A global integrity collection, Integrity
draws together the
e-print and listing hierarchies into a final, composite level.
-
class
arxiv.canonical.integrity.
Integrity
(name, record=None, members=None, manifest=None, checksum=None)[source]¶ Bases:
arxiv.canonical.integrity.core.IntegrityBase
Apex of the integrity collection.
-
class
arxiv.canonical.integrity.
IntegrityBase
(name, record=None, members=None, manifest=None, checksum=None)[source]¶ Bases:
typing.Generic
Generic base class for all integrity collections.
Provides a uniform protocol for integrity collections, while allowing the name, record, member name, and member types to vary from subclass to subclass.
-
classmethod
from_record
(record, checksum=None, calculate_new_checksum=True)[source]¶ - Return type
~_Self
-
classmethod
make_manifest
(members)[source]¶ Make a
Manifest
for this integrity collection.- Return type
-
member_type
= None¶ The type of members contained by an instance of a register class.
-
property
record
¶ The record associated with this collection.
- Return type
~_Record
-
classmethod
-
class
arxiv.canonical.integrity.
IntegrityDay
(name, record=None, members=None, manifest=None, checksum=None)[source]¶ Bases:
arxiv.canonical.integrity.core.IntegrityBase
Integrity collection for e-prints associated with a single day.
Specifically, this includes all versions of e-prints the first version of which was announced on this day.
-
class
arxiv.canonical.integrity.
IntegrityEntryBase
(name, record=None, members=None, manifest=None, checksum=None)[source]¶
-
class
arxiv.canonical.integrity.
IntegrityEntryMembers
[source]¶ Bases:
arxiv.canonical.util.GenericMonoDict
A dict that returns only :class: .IntegrityEntry instances.
Consistent with
Mapping[str, IntegrityEntry]
.
-
class
arxiv.canonical.integrity.
IntegrityEPrint
(name, record=None, members=None, manifest=None, checksum=None)[source]¶ Bases:
arxiv.canonical.integrity.core.IntegrityBase
Integrity collection for an
EPrint
.-
member_type
¶ alias of
IntegrityVersion
-
-
class
arxiv.canonical.integrity.
IntegrityEPrints
(name, record=None, members=None, manifest=None, checksum=None)[source]¶ Bases:
arxiv.canonical.integrity.core.IntegrityBase
Integrity collection for all e-prints in the canonical record.
-
class
arxiv.canonical.integrity.
IntegrityListing
(name, record=None, members=None, manifest=None, checksum=None)[source]¶ Bases:
arxiv.canonical.integrity.core.IntegrityEntryBase
-
classmethod
from_record
(record, checksum=None, calculate_new_checksum=True)[source]¶ Make an
IntegrityListing
from a :class:`.RecordListing.- Return type
~_Self
-
record_type
¶
-
classmethod
-
class
arxiv.canonical.integrity.
IntegrityListingDay
(name, record=None, members=None, manifest=None, checksum=None)[source]¶ Bases:
arxiv.canonical.integrity.core.IntegrityBase
Integrity collection of listings for a single day.
-
classmethod
from_record
(record, checksum=None, calculate_new_checksum=True)[source]¶ Generate an
IntegrityListing
from aRecordListing
.- Return type
~_Self
-
classmethod
-
class
arxiv.canonical.integrity.
IntegrityListingMonth
(name, record=None, members=None, manifest=None, checksum=None)[source]¶ Bases:
arxiv.canonical.integrity.core.IntegrityBase
Integrity collection of listings for a single month.
-
class
arxiv.canonical.integrity.
IntegrityListingYear
(name, record=None, members=None, manifest=None, checksum=None)[source]¶ Bases:
arxiv.canonical.integrity.core.IntegrityBase
Integrity collection of listings for a single year.
-
class
arxiv.canonical.integrity.
IntegrityListings
(name, record=None, members=None, manifest=None, checksum=None)[source]¶ Bases:
arxiv.canonical.integrity.core.IntegrityBase
Integrity collection of all listings.
-
class
arxiv.canonical.integrity.
IntegrityMetadata
(name, record=None, members=None, manifest=None, checksum=None)[source]¶ Bases:
arxiv.canonical.integrity.core.IntegrityEntryBase
Integrity entry for a metadata bitstream in the record.
-
classmethod
from_record
(record, checksum=None, calculate_new_checksum=True)[source]¶ - Return type
~_Self
-
record_type
¶
-
classmethod
-
class
arxiv.canonical.integrity.
IntegrityMonth
(name, record=None, members=None, manifest=None, checksum=None)[source]¶ Bases:
arxiv.canonical.integrity.core.IntegrityBase
Integrity collection for e-prints associated with a single month.
Specifically, this includes all versions of e-prints the first version of which was announced in this month.
-
class
arxiv.canonical.integrity.
IntegrityVersion
(name, record=None, members=None, manifest=None, checksum=None)[source]¶ Bases:
arxiv.canonical.integrity.core.IntegrityBase
Integrity collection for an e-print version.
-
property
formats
¶ Dict
[ContentType
,IntegrityEntry
]- Type
rtype
-
classmethod
from_record
(version, checksum=None, calculate_new_checksum=True, manifest=None)[source]¶ Get an
IntegrityVersion
from aRecordVersion
.- Parameters
version (
RecordVersion
) – The record for which this integrity object is to be generated.manifest (dict) – If provided, checksum values for member files will be retrieved from this manifest. Otherwise they will be calculated from the file content.
calculate_new_checksum (bool) – If
True
, a new checksum will be calculated from the manifest.
- Returns
- Return type
- Return type
~_Self
-
classmethod
make_manifest
(members)[source]¶ Make a
Manifest
for this integrity collection.- Return type
-
property
metadata
¶ -
- Type
rtype
-
property
render
¶ -
- Type
rtype
-
property
source
¶ -
- Type
rtype
-
property
-
class
arxiv.canonical.integrity.
IntegrityYear
(name, record=None, members=None, manifest=None, checksum=None)[source]¶ Bases:
arxiv.canonical.integrity.core.IntegrityBase
Integrity collection for e-prints associated with a single year.
Specifically, this includes all versions of e-prints the first version of which was announced in this year.