agent.process.metadata_checks module¶
Automated metadata checks.
-
class
agent.process.metadata_checks.
CheckAbstractForUnicodeAbuse
(submission_id, process_id=None)¶ Bases:
agent.process.base.Process
Screen for possible abuse of unicode in abstracts.
We support unicode characters in abstracts, but this can get out of hand. This rule adds a flag if the ratio of non-ASCII to ASCII characters is too high.
-
check_abstract
(previous, trigger, emit)¶ Check abstract for low ASCII content.
- Return type
None
-
steps
= [<function CheckAbstractForUnicodeAbuse.check_abstract>]¶
-
-
class
agent.process.metadata_checks.
CheckForSimilarTitles
(submission_id, process_id=None)¶ Bases:
agent.process.base.Process
Check for other submissions with very similar titles.
Ask classic for titles of papers submitted within the last several months. Add an annotation to the submission if a title is more similar to the current submission’s title than a configurable threshold.
-
check_for_duplicates
(candidates, trigger, emit)¶ Look for very similar titles, and add flags if appropriate.
- Return type
None
-
get_candidates
(previous, trigger, emit)¶ Get candidate titles from the database.
-
steps
= [<function CheckForSimilarTitles.get_candidates>, <function CheckForSimilarTitles.check_for_duplicates>]¶
-
-
class
agent.process.metadata_checks.
CheckTitleForUnicodeAbuse
(submission_id, process_id=None)¶ Bases:
agent.process.base.Process
Screen for possible abuse of unicode in titles.
We support unicode characters in titles, but this can get out of hand. This rule adds a flag if the ratio of non-ASCII to ASCII characters is too high.
-
check_title
(previous, trigger, emit)¶ Check title for low ASCII content.
- Return type
None
-
steps
= [<function CheckTitleForUnicodeAbuse.check_title>]¶
-
-
agent.process.metadata_checks.
intersection
(phrase_a, phrase_b)¶ Calculate the number tokens shared by two phrases.
- Return type
-
agent.process.metadata_checks.
jaccard
(phrase_a, phrase_b)¶ Calculate the Jaccard similarity of two phrases.
- Return type
-
agent.process.metadata_checks.
tokenized
(phrase)¶ Split a phrase into tokens and remove stopwords.
-
agent.process.metadata_checks.
union
(phrase_a, phrase_b)¶ Calculate the total number tokens in two phrases.
- Return type