LogoAISecKit
icon of TOXIGEN

TOXIGEN

This repository contains the code for generating the ToxiGen dataset for hate speech detection.

Introduction

ToxiGen

ToxiGen is a large-scale machine-generated dataset designed for adversarial and implicit hate speech detection, published at ACL 2022. This repository includes the necessary code and tools to generate the ToxiGen dataset, which contains implicitly toxic and benign sentences mentioning 13 minority groups. The dataset aims to train classifiers to detect subtle hate speech that does not include slurs or profanity.

Key Features:
  • Dataset Generation: Code for generating the ToxiGen dataset using pretrained language models like GPT-3.
  • ALICE Tool: A tool to stress test content moderation systems and improve their performance across minority groups.
  • Human Annotations: Includes 27,450 human annotations for better dataset quality and reliability.
  • Community Contributions: Encourages users to contribute new prompts and data generation methods.
  • Pretrained Classifiers: Provides checkpoints for HateBERT and RoBERTa models fine-tuned on ToxiGen data.
Benefits:
  • Research Utility: Designed for research purposes to improve toxicity detection methods.
  • Open Source: Available for community contributions and enhancements.
  • Responsible AI Considerations: Acknowledges the complexities of problematic language and encourages multidisciplinary research.
Highlights:
  • Released source codes and prompt seeds to foster community engagement.
  • Available on HuggingFace for easy access and integration into projects.
  • Comprehensive documentation and examples provided for users to get started quickly.

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates