agent.process.metadata_checks module¶

Automated metadata checks.

class agent.process.metadata_checks.CheckAbstractForUnicodeAbuse(submission_id, process_id=None)¶

Bases: agent.process.base.Process

Screen for possible abuse of unicode in abstracts.

We support unicode characters in abstracts, but this can get out of hand. This rule adds a flag if the ratio of non-ASCII to ASCII characters is too high.

check_abstract(previous, trigger, emit)¶

Check abstract for low ASCII content.

Return type: None

steps = [<function CheckAbstractForUnicodeAbuse.check_abstract>]¶

class agent.process.metadata_checks.CheckForSimilarTitles(submission_id, process_id=None)¶

Bases: agent.process.base.Process

Check for other submissions with very similar titles.

Ask classic for titles of papers submitted within the last several months. Add an annotation to the submission if a title is more similar to the current submission’s title than a configurable threshold.

check_for_duplicates(candidates, trigger, emit)¶

Look for very similar titles, and add flags if appropriate.

Return type: None

get_candidates(previous, trigger, emit)¶

Get candidate titles from the database.

Return type: List[Tuple[int, str, Agent]]

steps = [<function CheckForSimilarTitles.get_candidates>, <function CheckForSimilarTitles.check_for_duplicates>]¶

class agent.process.metadata_checks.CheckTitleForUnicodeAbuse(submission_id, process_id=None)¶

Bases: agent.process.base.Process

Screen for possible abuse of unicode in titles.

We support unicode characters in titles, but this can get out of hand. This rule adds a flag if the ratio of non-ASCII to ASCII characters is too high.

check_title(previous, trigger, emit)¶

Check title for low ASCII content.

Return type: None

steps = [<function CheckTitleForUnicodeAbuse.check_title>]¶

agent.process.metadata_checks.intersection(phrase_a, phrase_b)¶

Calculate the number tokens shared by two phrases.

Return type: int

agent.process.metadata_checks.jaccard(phrase_a, phrase_b)¶

Calculate the Jaccard similarity of two phrases.

Return type: float

agent.process.metadata_checks.normalize(phrase)¶

Prepare a phrase for tokenization.

Return type: str

agent.process.metadata_checks.tokenized(phrase)¶

Split a phrase into tokens and remove stopwords.

Return type: Set[str]

agent.process.metadata_checks.union(phrase_a, phrase_b)¶

Calculate the total number tokens in two phrases.

Return type: int

agent.process.metadata_checks.window(days)¶

Get a datetime from days days ago.

Return type: datetime

agent.process.metadata_checks module¶

arXiv submission & moderation

Navigation

Related Topics