arxiv.canonical.domain.content module

Core concepts for characterizing bitstream/version content.

class arxiv.canonical.domain.content.ContentType[source]

Bases: enum.Enum

Characterization of the content type of an individual bitstream.

abs = 'abs'
dvi = 'dvi'
property ext

The preferred filename extension for this ContentType.

Return type

str

from_filename(filename) = <bound method ContentType.from_filename of <enum 'ContentType'>>[source]
from_mimetype(mime) = <bound method ContentType.from_mimetype of <enum 'ContentType'>>[source]
html = 'html'
json = 'json'
make_filename(identifier, is_gzipped=False)[source]

Make a filename for a bitstream with this ContentType.

Return type

str

property mime_type

The MIME content type for this ContentType.

Return type

str

pdf = 'pdf'
ps = 'ps'
tar = 'tar'
tex = 'tex'
arxiv.canonical.domain.content.DISSEMINATION_FORMATS_BY_SOURCE_EXT = [('.tar.gz', None), ('.tar', None), ('.dvi.gz', None), ('.dvi', None), ('.pdf', [<ContentType.pdf: 'pdf'>]), ('.ps.gz', [<ContentType.pdf: 'pdf'>, <ContentType.ps: 'ps'>]), ('.ps', [<ContentType.pdf: 'pdf'>, <ContentType.ps: 'ps'>]), ('.html.gz', [<ContentType.html: 'html'>]), ('.html', [<ContentType.html: 'html'>]), ('.gz', None)]

Dissemination formats that can be inferred from source file extension.

Note

This is largely to support format discovery in classic. In the NG canonical record, this should all be explicit.

class arxiv.canonical.domain.content.SourceFileType[source]

Bases: enum.Enum

Source file types are represented by single-character codes.

Ancillary = 'A'

Submission includes ancillary files in the /anc directory.

DCPilot = 'B'

Submission has associated data in the DC pilot system.

DOCX = 'X'

Submission in Microsoft DOCX (Office Open XML) format.

HTML = 'H'

Multi-file HTML submission.

Ignore = 'I'

All files auto ignore. No paper available.

ODF = 'O'

Submission in Open Document Format.

PDFLaTeX = 'D'

A TeX submission that must be processed with PDFlatex.

PDFOnly = 'F'

PDF-only with .tar.gz package (likely because of anc files).

PostscriptOnly = 'P'

Multi-file PS submission.

It is not necessary to indicate P with single file PS since in this case the source file has .ps.gz extension.

SourceEncrypted = 'S'

Source is encrypted and should not be made available.

class arxiv.canonical.domain.content.SourceType(value)[source]

Bases: str

Characterizes a version source package.

property available_formats

List the available dissemination formats for this source type.

Depending on the original source type, we may not be able to provide all supported formats.

This does not include the source format. Note also that this does not enforce rules about what should be displayed as an option or provided to end users.

Return type

List[ContentType]

property has_docx

Indicate whether the source has DOCX content.

Return type

bool

property has_encrypted_source

Indicate whether the source is encryped.

Return type

bool

property has_html

Indicate whether the source has HTML content.

Return type

bool

property has_ignore

Indicate whether the source content should be ignored.

Return type

bool

property has_odf

Indicate whether the source has ODF content.

Return type

bool

property has_pdf_only

Indicate whether the source contains only a PDF.

Return type

bool

property has_pdflatex

Indicate whether the source has PDFLaTeX content.

Return type

bool

property has_ps_only

Indicate whether the source has postcript content only.

Return type

bool

arxiv.canonical.domain.content.available_formats_by_ext(filename)[source]

Attempt to determine the available dissemination formats by file extension.

It sometimes (but not always) possible to infer the available dissemination formats based on the filename extension of the source package.

Note

This is largely to support format discovery in classic. In the NG canonical record, this should all be explicit.

Return type

Optional[List[ContentType]]

arxiv.canonical.domain.content.list_source_extensions()[source]

List all of the known filename extensions for source files.

Return type

List[str]