Infrastructure & deployment

Three principal concerns drive the infrastructure & deployment strategy for the classic renewal process:

  • Portability/interoperability between CUL and cloud infrastructure;

  • Maximizing security and stability, in part by minimizing complexity;

  • Reducing barriers to rapid development and deployment.

Interoperability

Given the need to maintain integration with the classic arXiv system throughout the classic renewal process, the NG architecture must support a staged deployment strategy. As outlined in Incremental decoupling, many subsystems will be deployed on existing infrastructure and later migrated to the cloud; other subsystems may be directly deployed to cloud infrastructure.

Security, stability, complexity

The arXiv development team is responsible for not only the actual development of arXiv software but also the bulk of infrastructure management. The team is small, which heightens the need for technology choices that (a) make it easier to do things right, and (b) minimizes the number of distinct tools and technologies required to operate the system.

Rapid development and deployment

The classic renewal process will require significant retooling within the dev team, e.g. transitioning from Perl to Python as the primary development language, adopting a new application framework, etc. Our deployment strategy should balance considerations like vendor lock-in against increased burden on the dev team to support a larger number of unfamiliar technologies.

The following technology choices will help the IT team to address those concerns throughout the project.

Amazon Web Services

The arXiv-NG system will be deployed in Amazon Web Services, primarily on top of the Elastic Compute Cloud (EC2) platform.

We will also utilize data store and cache services, including:

  • Relational Database Service (RDS); provides MySQL servers, and tools for redundancy and backup.

  • DynamoDB; a distributed NoSQL data store.

  • Simple Storage Service (S3); a blob-store, which we will use primarily for storing and serving announcement content (source packages, PDFs).

  • AWS ElasticSearch; managed deployment of Elasticsearch, to back the search service.

A June, 2017 report by Gartner ranks AWS as the among the top Infrastructure-as-a-Service provider in terms of maturity and risk, followed by Microsoft Azure and Google Cloud._[#] Cornell University Library is already engaged with AWS, and the dev team already has some AWS expertise, which provides a clearer migration path than alternative platforms. At the same time, avoiding lock-in with AWS is important; implementation choices will reflect this caution, but also balance the risk of tight coupling against dev team productivity goals and overall system complexity.

Docker

Docker is an open source containerization system with broad support across cloud platforms and operating systems.

In the NG architecture, each independently-deployable service/subsystem is released as a Docker “image”, which includes all of (and only) the dependencies required for that application to run. This enormously simplifies the configuration of development and deployment environments (they need only run Docker), increases isolation (which imparts security properties), and minimizes dependence on any particular server environment.

Using Docker lowers the cost and complexity of withdrawing from AWS in the future, or of modifying our deployment strategy within AWS, since some of the configuration details that otherwise would be AWS- and product-specific are internalized by the Docker image.

Application builds will be disseminated via a private Docker repository (e.g. AWS EC2 Container Service repository).

Kubernetes

We use Kubernetes to deploy, scale, and manage containerized services in AWS. The core functionality of Kubernetes is scheduling and monitoring Docker containers in a cluster environment. Kubernetes manages a cluster of virtual servers (AWS EC2 instances); Docker images describing arXiv-NG applications are deployed to “pods” (containers) distributed across that cluster. Kubernetes helps us to address the following problems in the arXiv-NG architecture:

  • Networking: manages AWS resources including VPCs, load balancers, etc.

  • Deployment. - Deploying new applications. - Deploying updates to existing applications with no downtime.

  • Load-balancing & circuit-breaking. - Spreading load effectively across multiple instances of an application. - Routing requests away from unhealthy application instances. - Horizontal scaling of applications based on customizable metrics. - Horizontal scaling of the underlying cluster.

  • Service discovery.

  • API gateway.

  • Monitoring.

Kubernetes uses a declarative resource description model, which provides a straight-forward way to document deployments and configurations.

Kubernetes provides another layer of protection against vendor lock-in: it can be deployed on multiple cloud platforms including Google Compute Engine and Microsoft Azure, as well as on-site infrastructure. This would facilitate transition in the case that arXiv were to move away from AWS in the future.