Context & Scope

arXiv.org is a central part of the scholarly publishing model for academic researchers in many fields of physics, mathematics, data science, computer science, economics, and other quantitative disciplines.

This section describes the system context for arXiv. The system context encompasses direct interactions between the arXiv platform and both users and external systems. With minor exceptions, it does not describe how those users and external systems interact.

Business Context

The following user groups and external systems constitute the business context for the arXiv system.

_images/classic-context.png

Fig. 1 Business context for arXiv.org.

Submitters & Readers

Researchers submit e-print versions of scholarly works to arXiv.org, usually through the form-based submission interface. They may upload documents in one of a handful of formats (including LaTeX, PDF, and PostScript) provide additional metadata (e.g. title, abstract, co-authors), and select primary and secondary categories under which the submission should be indexed. Submitters may also submit replacements for existing papers, resulting in new versions. During the submission process, submitters may respond to moderator or administrator inquiries and make corrections. Researchers may also appeal for ownership of existing arXiv papers (e.g. as a co-author).

The arXiv readership is diverse, and access the platform in a variety of ways. Some readers receive email notifications about new announcements; others visit arxiv.org directly and browse categorized lists of announcements; others use third-party services (e.g. search indexes) to find arXiv content.

Moderators

Moderators primarily interact with the system through the moderation web interface. That interface allows moderators to make changes to categorizations, hold papers that require review or do not merit announcement, and perform a variety of other actions. Some moderators also interact with the system via email.

Administrators

Administrators interact with the system through the administration web interface. Admins respond to moderator actions, adjudicate disputes and appeals, and monitor the system. In the classic system, moderators also interact with the RequestTracker service (external) to track submitter requests. In the NG system, the functionality provided by RT should be provided by the arXiv application.

Administrators also interact with Ithenticate <http://www.ithenticate.com/>, a plagiarism detection service under contract with CUL, to investigate possible problems with submissions.

Power users

In the classic system, some power users have direct read access to the core database. In the NG system, all access to data are mediated by RESTful APIs.

CULAR

We have collaborated with the Cornell University Library Archival Repository to explore long-term preservation of arXiv.org announcements. We provide a protected endpoint from which CULAR staff are able to download monthly compressed archives from arXiv.org web servers.

Ginsparg Tools & Services

Paul Ginsparg and his research staff operate a collection of tools and services to support quality assurance processes in the submission system. Some of those tools perform automated tasks via special-purposes APIs and web interfaces. Other tools (e.g. classifier) provide APIs upon which the arXiv submission system depends.

External platforms, applications, & databases

External metadata feeds

External organizations (e.g. ADS, INSPIRE, APS, EDP, MSP, etc.) provide access to supplementary metadata that are used to enhance arXiv.org announcements. For example, externally-derived metadata are used to generate links from arXiv e-print announcements to “final” papers published by peer-reviewed journals.

Peers & collaborators

arXiv is embedded in a rich landscape of organizations and services that support scholarly knowledge-sharing. This includes scholarly societies, authoring platforms, editorial platforms, journals, data repositories, and other e-print publishers.

We have long-standing relationships with several scholarly data systems that consume arXiv metadata, and provide alternative interfaces to arXiv papers. These include:

External developers/API consumers

There exists a large and diverse ecosystem of apps, libraries, services, and other software created by external developers and researchers that consume arXiv metadata and content programmatically. In NG, these requests are serviced by an API Gateway that handles authentication and routing, and provides comprehensive documentation for arXiv services.

Examples of external interfaces to arXiv content include:

Existing integrations include metadata retrieval from trusted partners, programmatic submission via the SWORD API, opt-in author identity linkage using ORCID, and opt-in annotation using Hypothes.is.

Many external systems retrieve metadata about arXiv.org announcements via an Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) endpoint.

Overlay applications

In addition to external data systems, third-party developers have generated a range of tools designed to be used as an overlay to the arXiv platform. In general, these are browser plugins and add-ons that allow users to interact with arXiv content in various ways.

For example:

  • Hypothes.is is an open-source project that enables users to comment on web content. Many Hypothes.is users have annotated arXiv papers, e.g. as part of journal clubs.

  • The Fermats Library project provides a browser plugin called Librarian <https://fermatslibrary.com/librarian> that provides cited references, comments, and BibTeX citations as an overlay on arXiv.org PDFs.

Overlay journals

Overlay journals aggregate arXiv papers into peer-reviewed online announcements. While there are a variety of operational models, a typical use-case is for authors to deposit papers in arXiv and submit their arXiv ID to the overlay journal for peer review. Accepted papers continue to be hosted on the arXiv platform; the overlay journal is effectively a special-purpose index of selected arXiv papers.

arXiv Mirror Sites

In addition to the main site at Cornell University Library, there are several mirror sites for arXiv content. We are in the process of discontinuing the arXiv mirror network. Geographic locality of servers is much less useful now than it was when the mirror network was established, and maintaining support for mirrors impedes development of new features on arXiv.

Current mirrors include:

  • lanl.arXiv.org (née xxx.lanl.gov, U.S. mirror at Los Alamos)

  • cn.arXiv.org (China)

  • de.arXiv.org (Germany)

  • in.arXiv.org (India)

  • es.arXiv.org (Spain)

  • front.math.ucdavis.edu

Stakeholders

In no particular order, the main arXiv stakeholder groups (and contacts) and arXiv-NG advisory groups are listed below.

arXiv Users.

The arXiv user base is large, and interact with the platform in a variety of ways with a variety of opinions and expectations. User input is collected through surveys, polls, focus groups, usability testing, and other venues. Gail Steinhart, Program Associate, is the team’s lead on user engagement.

Scientific Advisory Board (SAB).

“arXiv’s Scientific Advisory Board is composed of scientists and researchers in disciplines covered by arXiv. The Board provides advice and guidance pertaining to the repository’s intellectual oversight, with a particular focus on the policies and operation of arXiv’s moderation system. The Board is governed by bylaws that detail its duties, composition, and operation as well as the election of its members.” 1 David Morrison is the current SAB chair. Steinn Sigurdsson (Scientific Director) facilitates the activities of the SAB and works closely with the arXiv team.

Member Advisory Board (MAB).

MAB members represent the views of large user blocks as well as external information systems that integrate with arXiv “Based on the arXiv’s operating principles, MAB represents participating institutions’ interests and advises CUL on issues related to repository management and development, standards implementation, interoperability, development priorities, business planning, and outreach and advocacy.

Representation on MAB is reserved for libraries, research institutions, laboratories, and foundations that are members of arXiv and that contribute to the financial support of the service.”2 The MAB currently has no chair.

Scientific Director.

The scientific director is chief stakeholder representing the Scientific Advisory Board. For details, see the arXiv Scientific Director job posting. Steinn Sigurdsson is the current Scientific Director.

arXiv Administrators.

Manages moderation and user support, monitors the technical infrastructure. The arXiv NG system should significantly improve the ability of administrators to monitor and control all aspects of the arXiv platform. Jim Entwood, Operations Manager, is the development team’s primary contact with administrators.

arXiv Moderators.

Moderators are volunteers who screen submissions to arXiv. “The arXiv moderators are experts in their fields and in the types of submissions that are appropriate for their subject classifications. They evaluate based on the content of the submission and the policies of arXiv.” 3 Providing improved functionality for the moderation team is a significant priority for this project. Depending on the need, Jim Entwood or Gail Steinhart may serve as the development team’s primary contact with moderators.

Program Director.

Oversees membership, business planning, & governance. Participates in identifying technical requirements and setting development priorities. Oya Rieger is the arXiv Program Director.

External developers (API Consumers).

A significant ecosystem of applications, wrappers, and research projects have emerged around arXiv’s public APIs. Some of those external applications provide significant value to arXiv readers, e.g. by providing advanced search functionality, article discovery, alt-metrics, etc. The arxiv-api Google group is the primary venue for communication with external API consumers. Erick Peirson, Lead Architect, and Martin Lessmeister, IT Lead, are the primary contacts with External Developers.

Development Team.

The arXiv development team is responsible for engineering and implementing arXiv software. In addition to generating value for end users, the arXiv NG process should significantly improve the ability of the development team to respond to stakeholder requests and produce new functionality. Martin Lessmeister, IT Lead, is the primary contact for the development team.

arXiv-NG advisory groups.

As specified in the arXiv NG Project Charter, two advisory groups also inform the decision making processes of the arXiv-NG team:

  • arXiv-NG IT Advisory Group: comprised of IT experts with particular strength in supporting the sciences, the IT Advisory Group provides input on technology and partnership choices.

  • arXiv-NG Steering Group: comprised of members of arXiv’s leadership team and delegates from the MAB and SAB, this group advises the Program Director on high-level decisions.

These groups were created specifically for the NG project.

See also:

1

https://arxiv.org/help/scientific_ad_board

2

https://confluence.cornell.edu/display/arxivpub/Member+Advisory+Board

3

https://arxiv.org/help/moderation