Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models.
Choose from auto-detected languages Edit in a new tab with syntax highlighting Press Ctrl+S to save and sync back Note: Language detection is built into the extension and cannot be customized by users ...
Open Leaderboards. Trustworthy Evaluation. Robust AI Detection. RAID is the largest & most comprehensive dataset for evaluating AI-generated text detectors. It contains over 10 million documents ...