Aman Saini, Artem Chernodub, Vipul Raheja, Vivek Kulkarni
arXiv 2024
We introduce Spivavtor, a Ukrainian-focused instruction-tuned text-editing model and dataset. Spivavtor adapts the CoEdIT framework to Ukrainian and provides models and data for instruction-based text rewriting/editing. We describe the dataset construction, model training, and evaluation results.
Shubhanshu Mishra, Aman Saini, Raheleh Makki, Sneha Mehta, Aria Haghighi, Ali Mollahosseini
NeurIPS — Datasets & Benchmarks Track, 2022
We present TweetNERD, a large-scale benchmark for entity extraction and linking on Twitter. The dataset spans 2010–2021 with three tasks: NER, Entity Linking with gold spans, and End-to-End Entity Linking. We provide strong baselines and analyze in-domain and out-of-domain performance.
Artem Chernodub, Aman Saini, Yejin Huh, Vivek Kulkarni, Vipul Raheja
arXiv 2025
We propose APIO, a two-stage approach that first induces task-specific prompts from examples and then optimizes them for text editing tasks. On Text Simplification, APIO attains strong SARI scores on ASSET-Test. We outline the induction/optimization pipeline and report evaluations across simplification and grammatical error correction.