Infrastructure & deployment
***************************
Three principal concerns drive the infrastructure & deployment strategy for the
classic renewal process:
- Portability/interoperability between CUL and cloud infrastructure;
- Maximizing security and stability, in part by minimizing complexity;
- Reducing barriers to rapid development and deployment.
Interoperability
Given the need to maintain integration with the classic arXiv system
throughout the classic renewal process, the NG architecture must support a
staged deployment strategy. As outlined in :ref:`incremental-decoupling`,
many subsystems will be deployed on existing infrastructure and later
migrated to the cloud; other subsystems may be directly deployed to cloud
infrastructure.
Security, stability, complexity
The arXiv development team is responsible for not only the actual
development of arXiv software but also the bulk of infrastructure
management. The team is small, which heightens the need for technology
choices that (a) make it easier to do things right, and (b) minimizes the
number of distinct tools and technologies required to operate the system.
Rapid development and deployment
The classic renewal process will require significant retooling within the
dev team, e.g. transitioning from Perl to Python as the primary development
language, adopting a new application framework, etc. Our deployment
strategy should balance considerations like vendor lock-in against
increased burden on the dev team to support a larger number of unfamiliar
technologies.
The following technology choices will help the IT team to address those
concerns throughout the project.
Amazon Web Services
===================
The arXiv-NG system will be deployed in Amazon Web Services, primarily on top
of the Elastic Compute Cloud (EC2) platform.
We will also utilize data store and cache services, including:
- Relational Database Service (RDS); provides MySQL servers, and tools for
redundancy and backup.
- DynamoDB; a distributed NoSQL data store.
- Simple Storage Service (S3); a blob-store, which we will use primarily for
storing and serving announcement content (source packages, PDFs).
- AWS ElasticSearch; managed deployment of Elasticsearch, to back the search
service.
A June, 2017 report by Gartner ranks AWS as the among the top
Infrastructure-as-a-Service provider in terms of maturity and risk, followed
by Microsoft Azure and Google Cloud._[#] Cornell University Library is
already engaged with AWS, and the dev team already has some AWS expertise,
which provides a clearer migration path than alternative platforms. At the
same time, avoiding lock-in with AWS is important; implementation choices
will reflect this caution, but also balance the risk of tight coupling against
dev team productivity goals and overall system complexity.
Docker
======
`Docker `_ is an open source containerization system
with broad support across cloud platforms and operating systems.
In the NG architecture, each independently-deployable service/subsystem is
released as a Docker "image", which includes all of (and only) the dependencies
required for that application to run. This enormously simplifies the
configuration of development and deployment environments (they need only run
Docker), increases isolation (which imparts security properties), and minimizes
dependence on any particular server environment.
Using Docker lowers the cost and complexity of withdrawing from AWS in the
future, or of modifying our deployment strategy within AWS, since some of the
configuration details that otherwise would be AWS- and product-specific are
internalized by the Docker image.
Application builds will be disseminated via a private Docker repository
(e.g. AWS EC2 Container Service repository).
Kubernetes
==========
We use `Kubernetes `_ to deploy, scale, and manage
containerized services in AWS. The core functionality of Kubernetes is
scheduling and monitoring Docker containers in a cluster environment.
Kubernetes manages a cluster of virtual servers (AWS EC2 instances); Docker
images describing arXiv-NG applications are deployed to "pods" (containers)
distributed across that cluster. Kubernetes helps us to address the following
problems in the arXiv-NG architecture:
* Networking: manages AWS resources including VPCs, load balancers, etc.
* Deployment.
- Deploying new applications.
- Deploying updates to existing applications with no downtime.
* Load-balancing & circuit-breaking.
- Spreading load effectively across multiple instances of an application.
- Routing requests away from unhealthy application instances.
- Horizontal scaling of applications based on customizable metrics.
- Horizontal scaling of the underlying cluster.
* Service discovery.
* API gateway.
* Monitoring.
Kubernetes uses a declarative resource description model, which provides a
straight-forward way to document deployments and configurations.
Kubernetes provides another layer of protection against vendor lock-in: it can
be deployed on multiple cloud platforms including Google Compute Engine and
Microsoft Azure, as well as on-site infrastructure. This would facilitate
transition in the case that arXiv were to move away from AWS in the future.