search.services.index.util module

Helpers for building ES queries.

search.services.index.util.DATE_PARTIAL = '(?:^|[\\s])(\\d{2})((?:0[1-9]{1})|(?:1[0-2]{1}))(?:$|[\\s])'

Used to match parts of paper IDs that encode the announcement date.

search.services.index.util.MAX_RESULTS = 10000

This is the maximum result offset for pagination.

search.services.index.util.OLD_ID_NUMBER = '(910[7-9]|911[0-2]|9[2-9](0[1-9]|1[0-2])|0[0-6](0[1-9]|1[0-2])|070[1-3])(00[1-9]|0[1-9][0-9]|[1-9][0-9][0-9])'

The number part of the old arXiv identifier looks like YYMMNNN.

The old arXiv identifier scheme was used between 1991-07 and 2007-03 (inclusive).

search.services.index.util.Q_(qtype, field, value, operator='or')[source]

Construct a Q, but handle wildcards first.

Return type:<function Q at 0x7f5f70230510>
search.services.index.util.STRING_LITERAL = re.compile('([\\"][^\\"]*[\\"])')

Pattern for string literals (quoted) in search queries.

search.services.index.util.escape(term, quotes=False)[source]

Escape special characters.

Return type:str
search.services.index.util.has_wildcard(term)[source]

Determine whether or not term contains a wildcard.

Return type:bool
search.services.index.util.is_literal_query(term)[source]

Determine whether the term is intended to be treated as a literal.

Return type:bool
search.services.index.util.is_old_papernum(term)[source]

Check whether term matches 7-digit pattern for old arXiv ID numbers.

Return type:bool
search.services.index.util.is_tex_query(term)[source]

Determine whether the term is intended as a TeX query.

Return type:bool
search.services.index.util.parse_date(term)[source]

Attempt to find date-related information in the query.

Parameters:term (str) – Search term.
Returns:First element is the responding date-related fragment, second element is the remainder of term (without the date).
Return type:tuple
Raises:ValueError – Raised if no date-related information is found in term.
Return type:Tuple[str, str]
search.services.index.util.parse_date_partial(term)[source]

Convert a 4-digit ID date partial into a full year-month value.

This can be used to search for papers by announcement date.

Parameters:term (str) – Search term.
Returns:Date in yyyy-MM format, if found.
Return type:str
Return type:Optional[str]
search.services.index.util.remove_single_characters(term)[source]

Remove any single characters in the search string.

Return type:str
search.services.index.util.sort(query, search)[source]

Apply sorting to a Search.

Return type:Search
search.services.index.util.strip_punctuation(s)[source]

Remove all punctuation characters from a string.

Return type:str
search.services.index.util.strip_tex(term)[source]

Remove TeX-isms from a term.

Return type:str
search.services.index.util.wildcard_escape(querystring)[source]

Detect wildcard characters, and escape any that occur within a literal.

Parameters:querystring (str) –
Returns:
  • str – Query string with wildcard characters enclosed in literals escaped.
  • bool – If a non-literal wildcard character is present, returns True.
Return type:Tuple[str, bool]