Submissions
Results of this year’s submissions
The results of this year’s submissions can be found here:
- Auswertung_Abgaben_GermEval_2019_Subtask_1
- Auswertung_Abgaben_GermEval_2019_Subtask_2
- Auswertung_Abgaben_GermEval_2019_Subtask_3
Submission Runs
Each group that has registered for the shared task is allowed to submit up to 3 different runs per subtask. Groups that registered for all three tasks therefore can submit up to 9 runs. Submitting multiple runs, however, is not mandatory. It is perfectly sufficient to just submit one run per subtask.
Filename Format of Submission Runs
The names of the files to represent your different runs should follow strict naming conventions. Each filename should consist of three components. The format is:
<group-id>_<subtask>_<run-number>.txt
<group-id>: we only accept alphabetic characters from the English alphabet. Do not use any special characters (not even Umlaute or similar characters.). No spaces are allowed as part of the filename. (For the sake of better readability you may use uppercase and lowercase letters.)
Note that this id will be used in the official documents publishing the results of the shared task, so we encourage you to think of an id by which one can easily identify the affiliation of your group.
<subtask>: either coarse for the coarse-grained task and fine for the fine-grained task. Please use exactly this spelling. It must be lowercase!
<run-number>: a single digit to indicate your run number, i.e. 1, 2 or 3.
Here is an example for a valid file format:
unisaar_coarse_1.txt
unisaar_coarse_2.txt
unisaar_coarse_3.txt
unisaar_fine_1.txt
unisaar_fine_2.txt
unisaar_fine_3.txt
unisaar_implicit_1.txt
unisaar_implicit_2.txt
unisaar_implicit_3.txt
File Format
The files should have the same format as that of the labeled data files (i.e. trial data, training data). For further details, please go to the Format section.
How to Submit the Submission Files of Your Runs
Send your file(s) to the organizers: iggsa2019@googlegroups.com
We will confirm any submission that we receive. If you do not obtain any confirmation email upon your submission within one day, please contact the organizers again. We encourage you to send this enquiry without any attachment.
As a further safety precaution, we will publish a list of the names of the runs we received by the submission deadline. You should double check that your runs are included on that list.
We will only consider submission files that conform to the specifications of the shared task. Please check in advance whether the format of your file is correct.
We recommend you use the given evaluation tool (it runs format checks prior to the computation of the evaluation scores).
Simply create a dummy gold standard file that uses randomly generated labels.
Important — Please Read!
Even if you submit runs that comply with the format restrictions, in order to appear in the official publication of the results, you will have to complete a survey and submit a participation paper.
You can find some information on the survey.
Further information on the format of participation paper and its submission can be found here.
We have some limited quality restrictions regarding the participation paper. If you do not follow them, we will reject your submitted participation paper. This would also mean that we will not include you in the official publication of the results
Proceedings
Struß, Julia Maria and Siegel, Melanie and Ruppenhofer, Josef and Wiegand, Michael and Klenner, Manfred (2019). Overview of GermEval Task 2, 2019 Shared Task on the Identification of Offensive Language. In: Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019), October 9 – 11, 2019 at Friedrich-Alexander-Universität Erlangen-Nürnberg. – München [u.a.]: German Society for Computational Linguistics & Language Technology and Friedrich-Alexander-Universität Erlangen-Nürnberg, 2019, pp. 352-363.
Resources, Tools and Literature
Resources
Dataset for abusive language — English specializing for racism and sexism (from University of Copenhagen)
Dataset for abusive language in German — from University of Duisburg-Essen
German slur-dictionary. Also this.
German Sentiment Lexicon — from University of Zurich
SentiWS — a Publicly Available German-language Resource for Sentiment Analysis (from University of Leipzig)
GermanPolarityClues — A Lexical Resource for German Sentiment Analysis (from University of Bielefeld)
GermaNet — a highly sophisticated semantic ontology of German (from University of Tübingen)
Link to API — for GermaNet
JWKTL — a Java-based Wiktionary library (from Technische Universität Darmstadt)
Word Embeddings trained on German tweets — from SpinningBytes
Word Embeddings trained on German Wikipedia
German corpora generated from the Web — deWaC and sdewac
Tools
- Textblob-de — The German language extension for TextBlob, a Python (2 and 3) library for processing textual data
- Spacy — Python modules for processing English and German language
- Treetagger — a part-of-speech tagger for German (included lemmatization) from LMU
- MarMoT — a fast and accurate morphological tagger for German (from LMU)
- Mate tools — lemmatizer, POS-tagger, morphology, dependency parser for German
- For running German tools, you need anna-3.61.jar and ger-tagger+lemmatizer+morphology+graph-based-3.6+.tgz
- Description of processing pipeline and corresponding formats
- Morphisto — a tool for morphological analysis for German (from Institute for German Language, Mannheim)
- Keras — a high level API for neural networks in python
- SVMlight — an implementation of Support Vector Machines
- FastText — a library for fast text representation and text classification (from Facebook research)
- Word2vec — a tool for inducing word embeddings
- vecMap — a tool for inducing cross-lingual word embeddings
- Brown clustering tool — from Stanford University
- Website — listing useful tools for processing Twitter
Literature
Cynthia Van Hee, Els Lefever, Ben Verhoeven, Julie Mennes, Bart Desmet, Guy De Pauw, Walter Daelemans, and Veronique Hoste_ “Detection and fine-grained classification of cyberbullying events”, In Proceedings of Recent Advances in Natural Language
Processing (RANLP), 2015.
URL: http://www.aclweb.org/anthology/R15-1086
Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang: “Abusive language detection in online user content”, In Proceedings of the International Conference on World Wide Web”, 2016.
URL: http://www.yichang-cs.com/yahoo/WWW16_Abusivedetection.pdf
Amir H. Razavi, Diana Inkpen, Sasha Uritsky, and Stan Matwin: “Offensive language detection using multi-level classification”, In Proceedings of the Canadian Conference on Advances in Artificial Intelligence, 2010.
URL: https://link.springer.com/content/pdf/10.1007%2F978-3-642-13059-5_5.pdf
Björn Ross, Michael Rist, Guillermo Carbonell, Ben Cabrera, Nils Kurowsky, Michael Wojatzki: “Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis”, In Proceedings of the KONVENS-Workshop on Natural Language Processing for Computer-Mediated Communication (KONVENS-NLP4CMC), 2016.
URL: https://www.linguistics.rub.de/bla/nlp4cmc2016/ross.pdf
Anna Schmidt, Michael Wiegand: “A Survey on Hate Speech Detection using Natural Language Processing”, in Proceedings of EACL-Workshop on Natural Language Processing for Social Media (EACL-SocialNLP), 2017.
URL: https://aclanthology.info/papers/W17-1101/w17-1101
Ellen Spertus: “Smokey: Automatic recognition of hostile messages”, In Proceedings of the National Conference on Artificial Intelligence and the Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI), 1997.
URL: https://www.aaai.org/Papers/IAAI/1997/IAAI97-209.pdf
William Warner and Julia Hirschberg: “Detecting hate speech on the world wide web”, In Proceedings of the NAACL-Workshop on Language in Social Media, (NAACL-LSM), 2012.
URL: http://www.aclweb.org/anthology/W12-2103
Zeerak Waseem and Dirk Hovy: “Hateful symbols or hateful people? predictive features for hate
speech detection on twitter”, In Proceedings of the NAACL Student Research Workshop, 2016.
URL: http://www.aclweb.org/anthology/N16-2013
A large bibliography related to the detection of abusive language maintained by Zeerak Waseem (University Sheffield).
URL: https://drive.google.com/file/d/0B4xDAGbwZJjQRS1Pa2VYOHdnRjA/view?usp=sharing