Development Practices

This section describes development practices used across all arXiv-NG projects, including code management, application versioning, QA/QC, CI/CD, and documentation.

Code management

Source code and attendant documentation for each service will be kept under version control in its own Git repository, hosted on GitHub. Except where impracticable new GitHub repositories should be public, and should include a copy of the MIT license (e.g. see https://github.com/arxiv/arxiv-zero/blob/master/LICENSE). Repositories containing code for the classic arXiv system must remain private due to security concerns.

Branch Management

We use the Gitflow branching model to manage concurrent work within each repository. In brief, each repository has a master branch that contains the latest stable version of the application, a develop branch that contains additional work not yet released, and feature branches that contain work on a specific task or story.

Feature branches are named based on the corresponding ticket in the ARXIVNG or ARXIVDEV JIRA project: [type]/[project]-[number]. For example, story/ARXIVDEV-4092. JIRA ticket numbers (eg. ARXIVDEV-1234) should also be included in commit messages, especially when the ticket number is different from that named in the feature branch.

Delivering Work

Feature branches are not merged directly into the develop branch, nor is the develop branch merged directly into the master branch. Instead, the developer responsible for delivering the changes in question raises a pull request, which (except in rare cases) are subject to review by at least one other member of the dev team. Pull requests must also pass all automated tests and quality checks; see Testing & QA.

Tagging

Application versions are commemorated using annotated tags on the master branch. A tag is applied only after the prospective release has been staged and verified for deployment. Tags should contain only the version number. See Versioning for details.

Versioning

arXiv-NG services are versioned independently, using semantic versioning. In brief:

  • Major versions commemorate incompatible API changes;

  • Minor versions commemorate new functionality;

  • Patch versions commemorate bug fixes.

Version numbers used in tags are “bare”; i.e. they contain only the version number itself without any prefixes. For example: 1.4.3.

JIRA Releases

Work tickets are added to release in JIRA using the “Fix/Version” field. JIRA releases are labeled with both the service slug and the semantic version. For example, JIRA releases for the arxiv-fulltext service are labeled with fulltext-MAJOR.MINOR.PATCH.

Versioning arXiv-NG as a whole

We may decide to version the entire arXiv-NG system. This will implement some kind of romantic versioning scheme.

Todo

We should decide how we want to do this.

Release process

  1. At a sprint meeting, the arXiv team decides what constitutes a versionable release. Those tickets are added to a JIRA release using the Fix/Version metadata field.

  2. When all of the tickets in the release are Done, a PR is raised from the develop branch to the master branch.

  3. In addition to automated tests, the release candidate is staged in the staging namespace of the Kubernetes cluster:

    • Travis-CI automatically builds Docker image(s) for the service, tags them with develop, and pushes them to the Docker registry (ECR).

    • Travis-CI patches the Kubernetes/Helm deployment(s) in the staging namespace with the new Docker image(s).

    • Automated and manual tests are performed.

    • If tests fail, additional commits are added to the PR. This process is repeated until all tests pass.

  4. When all tests pass:

    • The PR is merged.

    • An annotated tag is added to the merge commit on the master branch (bare version number only).

    • The JIRA version is “released”, and the release notes are added to the tag/release comments on GitHub.

    • Upon pushing the tag on the master branch, Travis-CI builds the Docker image(s) for the service, adds version tags to the image(s) (M.m.p, M.m, M, and latest), pushes the images to the Docker repository, and patches the Kubernetes/Helm deployment in the production namespace via the Kubernetes API.

  5. The production deployment is verified with automated and/or manual tests. In the regrettable (and hopefully rare) case that a production deployment fails, it is rolled back via the Kubernetes API.

Testing & QA

This section describes testing practices and procedures to be implemented across all arXiv-NG projects.

Unit tests

Unit, integration, and end-to-end tests should be written using the built-in unittest module.

Nose2 is the preferred test runner. Coverage should also be installed to check test coverage. Nose2 should automatically discover your unit tests. From the root of the project repository, run:

nose2 --with-coverage

The minimum test coverage target is 90%. Test coverage should not decrease by more than 5% in a given PR.

Integration tests

Integration tests should be used to test service modules. Ideally, these tests will use the “real” service with which the module integrates. For integration tests involving AWS resources, the localstack project is invaluable. To test integrations with other arXiv-NG services, the latest Docker image for that service can be pulled from the Docker registry (ECR).

For an example of an integration test that uses localstack via Docker, see this test module.

Todo

Consider how we can leverage Swagger with Flex for testing integrations with other arXiv-NG services.

End-to-end tests

End-to-end tests should make use of Docker Compose.

If the subsystem/project involves multiple constituent services, the docker-compose configuration should build and start all of those constituents, and pull in images for any additional services needed for integrations. See also this example docker-compose.yml.

A separate image (e.g. defined by a separate dockerfile, Dockerfile-tests) can then be used to effect the e2e tests, for example by exercising service APIs or generating notifications. See this example Dockerfile.

Static analysis & type annotations

We use type hint annotations throughout the codebase. Although Python is not a statically typed language, type hints emerged in Python 3 as a mechanism for documenting code (specifically, function behavior) and to introduce some of the benefits of static typing – specifically, the ability to analyze code for programming errors without having to actually execute that code. While this does not obviate the need for comprehensive unit tests, it does provide another layer of quality assurance to supplement those tests.

Type hints may be especially useful when defining the core data domain <data-domain-modules> of a service, in cases where full-fledged classes would be overkill.

We use mypy for static analysis. Ideally mypy will pass without any errors. Judgment is exercised to exclude code from type checking by mypy. mypy chokes on dynamic base classes and proxy objects (which you’re likely to encounter using Flask); it’s perfectly fine to disable checking on those offending lines using “# type: ignore”. For example:

g.baz = get_session(app) # type: ignore

See this issue for more information.

A mypy.ini file may be included in the root of the repository. See for example: https://github.com/arxiv/arxiv-zero/blob/develop/mypy.ini

Code quality & linting

All new code should adhere as closely as possible to PEP008.

Use the Numpy style for docstrings.

Use Pylint to check your code prior to raising a pull request. The parameters below will be used when checking code cleanliness on commits, PRs, and tags, with a target score of >= 9/10.

If you’re using Atom as your text editor, consider using the linter-pylama package for real-time feedback.

We currently ignore the following flags (subject to change):

  • W0622: Redefining built-in %r

  • W0611: Unused import %s

  • F0401: Unable to import %s

  • R0914: Too many local variables (%s/%s)

  • W0221: Arguments number differs from %s method

  • W0222: Signature differs from %s method

  • W0142: Used * or ** magic

  • F0010: error while code parsing: %s

  • W0703: Catching too general exception %s

  • R0911: Too many return statements (%s/%s)

  • C0103: Invalid %s name “%s”

  • R0913: Too many arguments (%s/%s)

$ pylint --disable=W0622,W0611,F0401,R0914,W0221,W0222,W0142,F0010,W0703,R0911,C0103,R0913 -f parseable zero
No config file found, using default configuration
************* Module zero.context
zero/context.py:10: [W0212(protected-access), get_application_config] Access to a protected member _Environ of a client class
************* Module zero.encode
zero/encode.py:11: [E0202(method-hidden), ISO8601JSONEncoder.default] An attribute defined in json.encoder line 158 hides this method
************* Module zero.controllers.baz
zero/controllers/baz.py:1: [C0102(blacklisted-name), ] Black listed name "baz"
************* Module zero.services.baz
zero/services/baz.py:1: [C0102(blacklisted-name), ] Black listed name "baz"
************* Module zero.services.things
zero/services/things.py:11: [R0903(too-few-public-methods), Thing] Too few public methods (0/2)
zero/services/things.py:49: [E1101(no-member), get_a_thing] Instance of 'scoped_session' has no 'query' member

------------------------------------------------------------------
Your code has been rated at 9.49/10 (previous run: 9.41/10, +0.07)

Continuous Integration/Continuous Delivery

We use Travis-CI to perform automated tests and to automatically build and deploy services in staging and production.

Each project contains a .travis.yml configuration file that describes the build process. For example, see: https://github.com/arxiv/arxiv-zero/blob/master/.travis.yml

Todo

Add build/deploy to arxiv-zero travis config.

Travis may also trigger pylint and mypy checks, using a script like this one.

Todo

Better documentation for the linstats.sh example, including config params that need to be set in Travis-CI.

Travis reports the success or failure of the build process to GitHub, for use in pull requests.

Upon completion, Travis triggers test coverage analysis by Coveralls, which evaluates test coverage targets and reports the result to GitHub.

When a PR is raised from develop to master, Travis builds and pushes the Docker image(s) for the service, and deploys to staging using the Kubernetes API.

When a tag is pushed on the master branch, Travis builds and pushes the Docker image(s) for the service, and deploys to production using the Kubernetes API.

See also Release process.

Documentation

Most documentation (including this document) is written using reStructedText markdown, which we build to HTML and/or PDF with Sphinx.

Documentation for each project/service should be stored in a docs folder in the repository root.

Architectural documentation

Each service/project should include an architecture.rst file that describes what the service does, how it’s built, and any significant technical decisions that have been made in the course of its development.

This architecture documentation is based on the arc42 documentation model, and also draws heavily on the C4 software architecture model. The C4 model describes an architecture at four hierarchical levels, from the business context of the system to the internal architecture of small parts of the system.

For example, see: https://github.com/arxiv/arxiv-zero/blob/master/docs/source/architecture.rst

In document for arXiv NG services, we have departed slightly from the original language of C4 in order to avoid collision with names in adjacent domains. Specifically, we describe the system at three levels:

  • Context: This includes both the business and technical contexts in the arc42 model. It describes the interactions between a service and other services and systems.

  • Building block: This is similar to the “container” concept in the C4 model. A building block is a part of the system that is developed, tested, and deployed quasi-independently. This might be a single application, or a data store.

  • Component: A component is an internal part of a building block. In the case of a Flask application, this might be a module or submodule that has specific responsibilities, behaviors, and interactions.

Code API documentation

Documentation for the (code) API is generated automatically with sphinx-apidoc, and lives in docs/source/api.

sphinx-apidoc generates references to modules in the code, which are followed at build time to retrieve docstrings and other details. This means that you won’t need to run sphinx-apidoc unless the structure of the project changes (e.g. you add/rename a module).

To rebuild the API docs, run (from the project root):

sphinx-apidoc -M -f -o docs/source/api/ foo

Docstrings should be written in the Numpy style.

REST API documentation

Both internal and external APIs should be documented using the OpenAPI specification (aka Swagger). A separate API description should be provided for the internal and external APIs.

In addition, JSON Schema should be provided for each endpoint and referenced from the OpenAPI/Swagger description. These documents should describe both response and (as appropriate) request payloads.

API documentation will also be aggregated across subsystems for inclusion in the API consumer portal.

See Schema.