search.services.index package¶
Provides integration with an ElasticSearch cluster.
The primary entrypoint to this module is search()
, which handles
search.domain.Query
instances passed by controllers, and returns a
DocumentSet
containing search results. get_document()
is
available for future use, e.g. as part of a search API.
In addition, add_document()
and bulk_add_documents()
are provided
for indexing (e.g. by the
search.agent.consumer.MetadataRecordProcessor
).
SearchSession
encapsulates configuration parameters and a connection
to the Elasticsearch cluster for thread-safety. The functions mentioned above
load the appropriate instance of SearchSession
depending on the
context of the request.
-
class
search.services.index.
SearchSession
(host, index, port=9200, scheme='http', user=None, password=None, mapping=None, verify=True, **extra)[source]¶ Bases:
object
Encapsulates session with Elasticsearch host.
-
add_document
(document)[source]¶ Add a document to the search index.
Uses
paper_id_v
as the primary identifier for the document. If the document is already indexed, will quietly overwrite.Parameters: document (
Document
) – Must be a valid search document, perschema/DocumentMetadata.json
.Raises: IndexConnectionError
– Problem communicating with Elasticsearch host.QueryError
– Problem serializingdocument
for indexing.
Return type: None
-
bulk_add_documents
(documents, docs_per_chunk=500)[source]¶ Add documents to the search index using the bulk API.
Parameters: Raises: IndexConnectionError
– Problem communicating with Elasticsearch host.BulkIndexingError
– Problem serializingdocument
for indexing.
Return type: None
-
cluster_available
()[source]¶ Determine whether or not the ES cluster is available.
Returns: Return type: bool Return type: bool
-
create_index
()[source]¶ Create the search index.
Parameters: mappings (dict) – See elastic.co/guide/en/elasticsearch/reference/current/mapping.html Return type: None
-
get_document
(document_id)[source]¶ Retrieve a document from the index by ID.
Uses
metadata_id
as the primary identifier for the document.Parameters: doument_id (int) – Value of
metadata_id
in the original document.Returns: Return type: Raises: IndexConnectionError
– Problem communicating with the search index.QueryError
– Invalid query parameters.
Return type:
-
get_task_status
(task)[source]¶ Get the status of a running task in ES (e.g. reindex).
Parameters: task (str) – A task ID, e.g. returned in response to an asynchronous reindexing request. Returns: Response from ElasticSearch task API. Return type: dict Return type: dict
-
index_exists
(index_name)[source]¶ Determine whether or not an index exists.
Parameters: index_name (str) – Returns: Return type: bool Return type: bool
-
reindex
(old_index, new_index, wait_for_completion=False)[source]¶ Create a new index and reindex with the current mappings.
Creating the new index and performing the reindexing operation are two separate actions via the ES API. If creation of the next index succeeds but the request to reindex fails, no attempt is made to clean up. If the new index already exists, will still attempt to perform the reindex operation.
Parameters: Returns: Response from ElasticSearch reindex API. If wait_for_completion is False (default), should include a task key with a task ID that can be used to check the status of the reindexing operation.
Return type: Return type:
-
-
search.services.index.
add_document
(self, document)[source]¶ Add a document to the search index.
Uses
paper_id_v
as the primary identifier for the document. If the document is already indexed, will quietly overwrite.Parameters: document (
Document
) – Must be a valid search document, perschema/DocumentMetadata.json
.Raises: IndexConnectionError
– Problem communicating with Elasticsearch host.QueryError
– Problem serializingdocument
for indexing.
Return type: None
-
search.services.index.
bulk_add_documents
(self, documents, docs_per_chunk=500)[source]¶ Add documents to the search index using the bulk API.
Parameters: Raises: IndexConnectionError
– Problem communicating with Elasticsearch host.BulkIndexingError
– Problem serializingdocument
for indexing.
Return type: None
-
search.services.index.
cluster_available
(self)[source]¶ Determine whether or not the ES cluster is available.
Returns: Return type: bool Return type: bool
-
search.services.index.
create_index
(self)[source]¶ Create the search index.
Parameters: mappings (dict) – See elastic.co/guide/en/elasticsearch/reference/current/mapping.html Return type: None
-
search.services.index.
current_session
()[source]¶ Get/create
SearchSession
for this context.Return type: SearchSession
-
search.services.index.
exists
(self, paper_id_v)[source]¶ Determine whether a paper exists in the index.
Return type: bool
-
search.services.index.
get_document
(self, document_id)[source]¶ Retrieve a document from the index by ID.
Uses
metadata_id
as the primary identifier for the document.Parameters: doument_id (int) – Value of
metadata_id
in the original document.Returns: Return type: Raises: IndexConnectionError
– Problem communicating with the search index.QueryError
– Invalid query parameters.
Return type:
-
search.services.index.
get_session
(app=None)[source]¶ Get a new session with the search index.
Return type: SearchSession
-
search.services.index.
get_task_status
(self, task)[source]¶ Get the status of a running task in ES (e.g. reindex).
Parameters: task (str) – A task ID, e.g. returned in response to an asynchronous reindexing request. Returns: Response from ElasticSearch task API. Return type: dict Return type: dict
-
search.services.index.
handle_es_exceptions
()[source]¶ Handle common ElasticSearch-related exceptions.
Return type: Generator
[+T_co, -T_contra, +V_co]
-
search.services.index.
index_exists
(self, index_name)[source]¶ Determine whether or not an index exists.
Parameters: index_name (str) – Returns: Return type: bool Return type: bool
-
search.services.index.
init_app
(app=None)[source]¶ Set default configuration parameters for an application instance.
Return type: None
-
search.services.index.
reindex
(self, old_index, new_index, wait_for_completion=False)[source]¶ Create a new index and reindex with the current mappings.
Creating the new index and performing the reindexing operation are two separate actions via the ES API. If creation of the next index succeeds but the request to reindex fails, no attempt is made to clean up. If the new index already exists, will still attempt to perform the reindex operation.
Parameters: Returns: Response from ElasticSearch reindex API. If wait_for_completion is False (default), should include a task key with a task ID that can be used to check the status of the reindexing operation.
Return type: Return type:
-
search.services.index.
search
(self, query, highlight=True)[source]¶ Perform a search.
Parameters: query (
Query
) –Returns: Return type: Raises: IndexConnectionError
– Problem communicating with the search index.QueryError
– Invalid query parameters.
Return type:
Subpackages¶
Submodules¶
- search.services.index.advanced module
- search.services.index.authors module
- search.services.index.exceptions module
- search.services.index.highlighting module
- search.services.index.prepare module
- search.services.index.results module
- search.services.index.simple module
- search.services.index.util module