UC Irvine
ML Repository
Theme

Bengali Hate Speech Detection Dataset

Download(700.3 KB)
Thumbnail

About

The dataset can be used for hate speech detection in Bengali social media texts. The dataset is categorized into political, personal, geopolitical, religious, and gender abusive hates that are either directed or generalized towards a specific person, entity, or group. The data and lexicons contain content that is racist, sexist, homophobic, and offensive in many different ways. The dataset is collected and subsequently annotated only for research-related purposes. Besides, authors don't take any liability if some statements contain very offensive and hateful statements that are either directed towards a specific person or entity or generalized towards a group. Therefore, please use it at your risk. Please check https://github.com/rezacsedu/Bengali-Hate-Speech-Dataset for more details about this dataset. Preprocessing description: PoS tagging, removal of proper nouns, hashtags normalization, stemming, Emojis and duplicates, removal of infrequent words Does this dataset contain sensitive information?: The data and lexicons contain contenst that are racist, sexist, homophobic, and offensive in many different ways. The dataset is collected and subsequently annotated only for research-related purposes. Besides, authors don't take any liability if some statements contain very offensive and hateful statements that are either directed towards a specific person or entity or generalized towards a group. Therefore, please use it at your risk.
Subject Area
Computer Science
Instances
4,500
Features
Data Types
Text
Tasks
Classification
Feature Types

Features

Introductory Paper

Classification Benchmarks for Under-resourced Bengali Language based on Multichannel Convolutional-LSTM Network
Md. Rezaul Karim, Bharathi Raja Chakravarthi, Mihael Arcan, John P. McCrae, Michael Cochez. 2020.
2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA)

Additional Metadata

Authors
Sumon Kanti Dey
Michael Cochez
Md. Rezaul Karim
Year Created
2020
License
CC BY 4.0