Skip to content

Legacy Issues

As might be expected for a 30+ year old system, arXiv suffers from a number of legacy issues.

growing requirements

  • arXiv usage is growing
  • heading from about 150k per year to 00k+
  • lots of delayed feature requests
  • Example: research agencies want funding meta data attached to articles
  • accessibility requirements

Inflexible infrastructure

  • services run directly on VMs
  • a new VM takes several days to configure
  • configuration process is a manual based on a ~30 item checklist
  • we know there are differences between our VM nodes, and fixing this is hard
  • no comprehensive automated test suite to ensure VM is completely functional
  • CentOS 7, which is currently deployed on nearly all VMs will be EOL in a year
  • we're running an unsupported version of Apache, because the later versions would conflict with some of the legacy packages we use

inadequate secOps

  • deploys are largely manual, and occur one at a time for each web node
  • no comphrehensive test suites
  • very few tests at all
  • too many git repos, hard to deploy a set of cooredinated components atomically
  • monitoring and alerting facilities are limited

Aging code base

  • The submission process is almost all legacy perl code
  • Hard to find perl programmers
  • We don't want to continue writing new perl code
  • The user management system is legacy PHP code
  • based on a PHP version that's no longer supported
  • breaking changes in new version of PHP make upgrading problematic
  • highly problematic TeX/LaTeX pipeline
  • a number of old articles no longer build in the automated process
  • we fell two years behind in the version of TeX we use for new articles
  • a LaTeX article that shows correctly in Overleaf might now compile on arXiv