search.services.index package¶

Provides integration with an ElasticSearch cluster.

The primary entrypoint to this module is search(), which handles search.domain.Query instances passed by controllers, and returns a DocumentSet containing search results. get_document() is available for future use, e.g. as part of a search API.

In addition, add_document() and bulk_add_documents() are provided for indexing (e.g. by the search.agent.consumer.MetadataRecordProcessor).

SearchSession encapsulates configuration parameters and a connection to the Elasticsearch cluster for thread-safety. The functions mentioned above load the appropriate instance of SearchSession depending on the context of the request.

class search.services.index.SearchSession(host, index, port=9200, scheme='http', user=None, password=None, mapping=None, verify=True, **extra)[source]¶

Bases: object

Encapsulates session with Elasticsearch host.

add_document(document)[source]¶

Add a document to the search index.

Uses paper_id_v as the primary identifier for the document. If the document is already indexed, will quietly overwrite.

Parameters:	document (`Document`) – Must be a valid search document, per `schema/DocumentMetadata.json`.
Raises:	`IndexConnectionError` – Problem communicating with Elasticsearch host. `QueryError` – Problem serializing `document` for indexing.
Return type:	`None`

bulk_add_documents(documents, docs_per_chunk=500)[source]¶

Add documents to the search index using the bulk API.

Parameters:	document (`Document`) – Must be a valid search document, per `schema/DocumentMetadata.json`. docs_per_chunk (int) – Number of documents to send to ES in a single chunk
Raises:	`IndexConnectionError` – Problem communicating with Elasticsearch host. `BulkIndexingError` – Problem serializing `document` for indexing.
Return type:	`None`

cluster_available()[source]¶

Determine whether or not the ES cluster is available.

Returns:
Return type:	bool
Return type:	`bool`

create_index()[source]¶

Create the search index.

Parameters:	mappings (dict) – See elastic.co/guide/en/elasticsearch/reference/current/mapping.html
Return type:	`None`

exists(paper_id_v)[source]¶

Determine whether a paper exists in the index.

Return type:	`bool`

get_document(document_id)[source]¶

Retrieve a document from the index by ID.

Uses metadata_id as the primary identifier for the document.

Parameters:	doument_id (int) – Value of `metadata_id` in the original document.
Returns:
Return type:	`Document`
Raises:	`IndexConnectionError` – Problem communicating with the search index. `QueryError` – Invalid query parameters.
Return type:	`Document`

get_task_status(task)[source]¶

Get the status of a running task in ES (e.g. reindex).

Parameters:	task (str) – A task ID, e.g. returned in response to an asynchronous reindexing request.
Returns:	Response from ElasticSearch task API.
Return type:	dict
Return type:	`dict`

index_exists(index_name)[source]¶

Determine whether or not an index exists.

Parameters:	index_name (str) –
Returns:
Return type:	bool
Return type:	`bool`

reindex(old_index, new_index, wait_for_completion=False)[source]¶

Create a new index and reindex with the current mappings.

Creating the new index and performing the reindexing operation are two separate actions via the ES API. If creation of the next index succeeds but the request to reindex fails, no attempt is made to clean up. If the new index already exists, will still attempt to perform the reindex operation.

Parameters:	old_index (str) – Name of the index to copy from. new_index (str) – Name of the index to create and copy to.
Returns:	Response from ElasticSearch reindex API. If wait_for_completion is False (default), should include a task key with a task ID that can be used to check the status of the reindexing operation.
Return type:	dict
Return type:	`dict`

search(query, highlight=True)[source]¶

Perform a search.

Parameters:	query (`Query`) –
Returns:
Return type:	`DocumentSet`
Raises:	`IndexConnectionError` – Problem communicating with the search index. `QueryError` – Invalid query parameters.
Return type:	`DocumentSet`

search.services.index.add_document(self, document)[source]¶

Add a document to the search index.

Uses paper_id_v as the primary identifier for the document. If the document is already indexed, will quietly overwrite.

Parameters:	document (`Document`) – Must be a valid search document, per `schema/DocumentMetadata.json`.
Raises:	`IndexConnectionError` – Problem communicating with Elasticsearch host. `QueryError` – Problem serializing `document` for indexing.
Return type:	`None`

search.services.index.bulk_add_documents(self, documents, docs_per_chunk=500)[source]¶

Add documents to the search index using the bulk API.

Parameters:	document (`Document`) – Must be a valid search document, per `schema/DocumentMetadata.json`. docs_per_chunk (int) – Number of documents to send to ES in a single chunk
Raises:	`IndexConnectionError` – Problem communicating with Elasticsearch host. `BulkIndexingError` – Problem serializing `document` for indexing.
Return type:	`None`

search.services.index.cluster_available(self)[source]¶

Determine whether or not the ES cluster is available.

Returns:
Return type:	bool
Return type:	`bool`

search.services.index.create_index(self)[source]¶

Create the search index.

Parameters:	mappings (dict) – See elastic.co/guide/en/elasticsearch/reference/current/mapping.html
Return type:	`None`

search.services.index.current_session()[source]¶

Get/create SearchSession for this context.

Return type:	`SearchSession`

search.services.index.exists(self, paper_id_v)[source]¶

Determine whether a paper exists in the index.

Return type:	`bool`

search.services.index.get_document(self, document_id)[source]¶

Retrieve a document from the index by ID.

Uses metadata_id as the primary identifier for the document.

Parameters:	doument_id (int) – Value of `metadata_id` in the original document.
Returns:
Return type:	`Document`
Raises:	`IndexConnectionError` – Problem communicating with the search index. `QueryError` – Invalid query parameters.
Return type:	`Document`

search.services.index.get_session(app=None)[source]¶

Get a new session with the search index.

Return type:	`SearchSession`

search.services.index.get_task_status(self, task)[source]¶

Get the status of a running task in ES (e.g. reindex).

Parameters:	task (str) – A task ID, e.g. returned in response to an asynchronous reindexing request.
Returns:	Response from ElasticSearch task API.
Return type:	dict
Return type:	`dict`

search.services.index.handle_es_exceptions()[source]¶

Handle common ElasticSearch-related exceptions.

Return type:	`Generator`[+T_co, -T_contra, +V_co]

search.services.index.index_exists(self, index_name)[source]¶

Determine whether or not an index exists.

Parameters:	index_name (str) –
Returns:
Return type:	bool
Return type:	`bool`

search.services.index.init_app(app=None)[source]¶

Set default configuration parameters for an application instance.

Return type:	`None`

search.services.index.ok()[source]¶

Health check.

Return type:	`bool`

search.services.index.reindex(self, old_index, new_index, wait_for_completion=False)[source]¶

Create a new index and reindex with the current mappings.

Creating the new index and performing the reindexing operation are two separate actions via the ES API. If creation of the next index succeeds but the request to reindex fails, no attempt is made to clean up. If the new index already exists, will still attempt to perform the reindex operation.

Parameters:	old_index (str) – Name of the index to copy from. new_index (str) – Name of the index to create and copy to.
Returns:	Response from ElasticSearch reindex API. If wait_for_completion is False (default), should include a task key with a task ID that can be used to check the status of the reindexing operation.
Return type:	dict
Return type:	`dict`

search.services.index.search(self, query, highlight=True)[source]¶

Perform a search.

Parameters:	query (`Query`) –
Returns:
Return type:	`DocumentSet`
Raises:	`IndexConnectionError` – Problem communicating with the search index. `QueryError` – Invalid query parameters.
Return type:	`DocumentSet`

Subpackages¶

search.services.index.tests package
- Submodules

search.services.index package¶

Subpackages¶

Submodules¶

arXiv search

Navigation

Related Topics