arxiv.canonical.classic.abs module

Parse fields from a single arXiv abstract (.abs) file.

class arxiv.canonical.classic.abs.AbsData(identifier, submitter, submitted_date, announced_month, updated_date, license, primary_classification, title, abstract, authors, size_kilobytes, submission_type, secondary_classification, source_type, journal_ref, report_num, doi, msc_class, acm_class, proxy, comments, previous_versions)[source]

Bases: tuple

property abstract

Alias for field number 8

property acm_class

Alias for field number 18

property announced_month

Alias for field number 3

property authors

Alias for field number 9

property comments

Alias for field number 20

property doi

Alias for field number 16

property identifier

Alias for field number 0

property journal_ref

Alias for field number 14

property license

Alias for field number 5

property msc_class

Alias for field number 17

property previous_versions

Alias for field number 21

property primary_classification

Alias for field number 6

property proxy

Alias for field number 19

property report_num

Alias for field number 15

property secondary_classification

Alias for field number 12

property size_kilobytes

Alias for field number 10

property source_type

Alias for field number 13

property submission_type

Alias for field number 11

property submitted_date

Alias for field number 2

property submitter

Alias for field number 1

property title

Alias for field number 7

property updated_date

Alias for field number 4

class arxiv.canonical.classic.abs.AbsRef(identifier, submitted_date, announced_month, source_type, size_kilobytes)[source]

Bases: tuple

property announced_month

Alias for field number 2

property identifier

Alias for field number 0

property size_kilobytes

Alias for field number 4

property source_type

Alias for field number 3

property submitted_date

Alias for field number 1

arxiv.canonical.classic.abs.NAMED_FIELDS = ['Title', 'Authors', 'Categories', 'Comments', 'Proxy', 'Report-no', 'ACM-class', 'MSC-class', 'Journal-ref', 'DOI', 'License']

Fields that may be parsed from the key-value pairs in second major component of .abs string. Field names are not normalized.

exception arxiv.canonical.classic.abs.NoSuchAbs[source]

Bases: RuntimeError

arxiv.canonical.classic.abs.REQUIRED_FIELDS = ['title', 'authors', 'abstract']

Required parsed fields with normalized field names.

Note the absense of ‘categories’ as a required field. A subset of version- affixed .abs files with the old identifiers predate the introduction of categories and therefore do not have a “Categories:” line; only the (higher- level) archive and group can be be inferred, and this must be done via the identifier itself.

The latest versions of these papers should always have the “Categories:” line.

arxiv.canonical.classic.abs.get_path(data_path, identifier)[source]
Return type

str

arxiv.canonical.classic.abs.iter_all(data_path, from_id=None, to_id=None)[source]

List all of the identifiers for which we have abs files.

The “latest” section will have an abs file for every e-print, so that’s the only place we need look.

Return type

Iterable[Identifier]

arxiv.canonical.classic.abs.latest_base_path(data_path)[source]
Return type

str

arxiv.canonical.classic.abs.latest_path(data_path, identifier)[source]
Return type

str

arxiv.canonical.classic.abs.latest_path_month(data_path, identifier)[source]

Get the base path for the month block containing the “latest” e-prints.

This is where the most recent version of each e-print always lives.

Return type

str

arxiv.canonical.classic.abs.list_versions(data_path, identifier)[source]

List all of the versions for an identifier from abs files.

This works by looking at the presence of abs files in both the “latest” and “original” locations.

Return type

List[VersionedIdentifier]

arxiv.canonical.classic.abs.original_base_path(data_path)[source]
Return type

str

arxiv.canonical.classic.abs.original_path(data_path, identifier)[source]
Return type

str

arxiv.canonical.classic.abs.original_path_month(data_path, identifier)[source]

Get the main base path for an abs file.

This is where all of the versions except for the most recent one live.

Return type

str

arxiv.canonical.classic.abs.parse(data_path, identifier)[source]
Return type

AbsData

arxiv.canonical.classic.abs.parse_first(data_path, identifier)[source]

Parse the abs for the first version of an e-print.

Return type

AbsData

arxiv.canonical.classic.abs.parse_latest(data_path, identifier)[source]

Parse the abs for the latest version of an e-print.

Return type

AbsData

arxiv.canonical.classic.abs.parse_versions(data_path, identifier)[source]
Return type

Iterable[AbsData]