Back to Portfolio

Sexism Detection API

Production-Ready ML System for Content Moderation

View on GitHub 📅 2023 → 2026

Overview

Modernized a 2023 Master's research project into a production-ready ML system for detecting sexist language. Restructured from Jupyter notebooks into a scalable FastAPI application with Docker deployment, comprehensive testing, and interactive web interface.

Interface

Gradio interface showing sexism detection with confidence scores

Interactive Gradio interface with real-time predictions and confidence visualization

Key Achievements

Model Performance
  • • F1 Score: 0.827
  • • Accuracy: 87.85%
  • • Binary classification (sexist/not sexist)
  • • 20,000 EDOS dataset samples
Production API
  • • FastAPI with OpenAPI docs
  • • 80-95ms inference latency
  • • Batch processing (up to 100 texts)
  • • Request validation & error handling
Infrastructure
  • • Docker & docker-compose setup
  • • Gradio web interface
  • • Health monitoring
  • • Multi-container orchestration
Code Quality
  • • Modern Python package structure
  • • Type hints & modular design
  • • Comprehensive testing
  • • Clean Git history (12 commits)

Technical Implementation

Training Strategy

Trained on Google Colab (Tesla T4 GPU) to overcome local hardware limitations, reducing training time from 4-6 hours (Mac M3) to 90 minutes while achieving target performance metrics.

Architecture

RoBERTa-base encoder with custom classification head, optimized preprocessing pipeline with LRU caching, and PyTorch DataLoaders for efficient batch processing.

Deployment

Multi-container setup with separate API and UI services, automatic model loading on startup, error handling with proper HTTP status codes, and CORS middleware for cross-origin requests.

Key Trade-offs & Design Decisions

Binary vs. Multi-class

Focused on binary classification (sexist/not sexist) rather than fine-grained categories (threat, derogation, animosity, prejudiced) to maximize F1 score and deployment simplicity. Binary classification is more actionable for content moderation.

Model Size

Used RoBERTa-base (125M parameters) instead of RoBERTa-large (355M parameters) to balance performance with inference speed and deployment costs. Achieved comparable results (F1 0.827 vs. 0.83) with 3x faster inference.

Training Infrastructure

Opted for cloud GPU (Colab) over local training to avoid 4-6 hour Mac M3 training time and system freeze issues. Demonstrates practical problem-solving and resource optimization.

Technology Stack

Python 3.12 PyTorch 2.10 Transformers 5.1 FastAPI Gradio Docker Google Colab
View Full Code on GitHub