Trust and safety (TNS) is crucial for online platforms like Tinder, which faces various forms of harmful content. Generative AI poses new challenges and opportunities for TNS, including the rapid spread of misinformation and the proliferation of scams but also offers advancements in detecting violations through fine-tuned large language models (LLMs). By leveraging LLMs, Tinder can efficiently detect and mitigate harm at scale, creating a safer user experience.
Tinder's TNS Challenges
• 00:01:50 Tinder, as the world's largest dating app, experiences a wide range of harmful content, including social media links in profiles, hate speech, harassment, and scams. These violations are diverse, with some being more prevalent but less harmful (e.g., social media links), while others are less frequent but pose a greater risk (e.g., hate speech).
Generative AI's Impact on TNS
• 00:02:28 Generative AI presents challenges to TNS through the rapid creation of harmful content, including misinformation, propaganda, and spam. The accessibility of deepfake technology increases the risk of impersonation and catfishing, while generative AI can also be used to scale up organized spam and scam operations. Platforms may face copyright issues due to the increased use of generative AI tools.
LLM Opportunities for TNS
• 00:03:54 Pre-trained LLMs provide a strong foundation for building TNS solutions, possessing latent semantic capabilities and global language coverage. Fine-tuning these models achieves state-of-the-art performance in detecting violations in text, often exceeding the capabilities of closed-source models like GPT-4. The open-source community provides tools and libraries that simplify the fine-tuning process, accelerating model development from months to weeks.
LLM-based TNS Implementation
• 00:05:38 The process starts with creating a high-quality training dataset, which can be assembled manually or through a hybrid process combining LLMs and internal analytics data. Fine-tuning LLMs allows for full control over model weights and facilitates adaptability to evolving harmful content. Productionizing these models using LoRA and Lorax enables efficient inference at scale, with low marginal costs for deploying new adapters.
Future of LLM-based TNS
• 00:13:27 Future efforts focus on incorporating non-textual modalities like images using visual language models for TNS. The goal is to rapidly train adapters for various TNS violations, automating training and retraining pipelines with AI-in-the-loop. Leveraging Lorax ensures efficient inference for new adapters, ultimately building a robust defense against harm and improving the safety and health of the platform.