agent.process.metadata_checks module¶
Automated metadata checks.
-
class
agent.process.metadata_checks.CheckAbstractForUnicodeAbuse(submission_id, process_id=None)¶ Bases:
agent.process.base.ProcessScreen for possible abuse of unicode in abstracts.
We support unicode characters in abstracts, but this can get out of hand. This rule adds a flag if the ratio of non-ASCII to ASCII characters is too high.
-
check_abstract(previous, trigger, emit)¶ Check abstract for low ASCII content.
- Return type
None
-
steps= [<function CheckAbstractForUnicodeAbuse.check_abstract>]¶
-
-
class
agent.process.metadata_checks.CheckForSimilarTitles(submission_id, process_id=None)¶ Bases:
agent.process.base.ProcessCheck for other submissions with very similar titles.
Ask classic for titles of papers submitted within the last several months. Add an annotation to the submission if a title is more similar to the current submission’s title than a configurable threshold.
-
check_for_duplicates(candidates, trigger, emit)¶ Look for very similar titles, and add flags if appropriate.
- Return type
None
-
get_candidates(previous, trigger, emit)¶ Get candidate titles from the database.
-
steps= [<function CheckForSimilarTitles.get_candidates>, <function CheckForSimilarTitles.check_for_duplicates>]¶
-
-
class
agent.process.metadata_checks.CheckTitleForUnicodeAbuse(submission_id, process_id=None)¶ Bases:
agent.process.base.ProcessScreen for possible abuse of unicode in titles.
We support unicode characters in titles, but this can get out of hand. This rule adds a flag if the ratio of non-ASCII to ASCII characters is too high.
-
check_title(previous, trigger, emit)¶ Check title for low ASCII content.
- Return type
None
-
steps= [<function CheckTitleForUnicodeAbuse.check_title>]¶
-
-
agent.process.metadata_checks.intersection(phrase_a, phrase_b)¶ Calculate the number tokens shared by two phrases.
- Return type
-
agent.process.metadata_checks.jaccard(phrase_a, phrase_b)¶ Calculate the Jaccard similarity of two phrases.
- Return type
-
agent.process.metadata_checks.tokenized(phrase)¶ Split a phrase into tokens and remove stopwords.
-
agent.process.metadata_checks.union(phrase_a, phrase_b)¶ Calculate the total number tokens in two phrases.
- Return type