Search Interface

The current version of the arXiv search application is designed to meet the goals outlined in arXiv-NG milestone H1: Replace Legacy Search.

  • H1.1. Replace the current advanced search interface, search results, and search by author name.
  • H1.2. The search result view should support pagination, and ordering by publication date or relevance.
  • H1.3. An indexing agent updates the search index at publication time in response to a Kinesis notification, using metadata from the docmeta endpoint in the classic system.

Key Requirements

  • Simple search:
    • Users should be able to search for arXiv papers by title, author, and abstract.
    • Searches can originate from any part of the arXiv.org site, via the search bar in the site header.
  • Advanced search:
    • Users can search for papers using boolean combinations of search terms on title, author names, and/or abstract.
    • Users can filter results by primary classification, and submission date.
    • Submission date supports prior year, specific year, and date range.
  • Author name search:
    • Users should be able to search for papers by author name.
    • This should support queries originating on the abs page, and in search results.
  • UI: The overall flavor of the search views should be substantially similar to the classic views, but with styling that improves readability, usability, and accessibility.

Quality Goals

  • Code quality:
    • 90% test coverage on Python components that we develop/control.
    • Linting: pylint passes with >= 9/10.
    • Documentation: pydocstyle passes.
    • Static checking: mypy passes.
  • Performance & reliability:
    • Response time: 99% of requests have a latency of 1 second or less.
    • Error rate: parity with classic search.
    • Request rate: support request volume of existing search * safety factor 3.
  • Accessibility: meet or exceed WCAG 2.0 level A for accessibility.

Constraints

  • Must be implemented in Python/Flask, and be deployable behind Apache as a Python/WSGI application.
  • The search application itself must be stateless. It must be able to connect to an arbitrary ElasticSearch cluster, which can be specified via configuration.
  • Notifications about new content are delivered via the Kinesis notification broker.