publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2026
- Dr. Zero: Self-Evolving Search Agents without Training DataarXiv preprint arXiv:2601.07055, 2026
- Learning Personalized Agents from Human FeedbackarXiv preprint arXiv:2602.16173, 2026
- Verifying Chain-of-Thought Reasoning via Its Computational GraphICLR 2026 (Oral), 2026
2025
- Weak-to-strong jailbreaking on large language modelsIn International Conference on Machine Learning (ICML), 2025
- AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommonsarXiv preprint arXiv:2503.05731, 2025
- MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI AgentsarXiv preprint arXiv:2502.05174, 2025
- Diversity-driven Data Selection for Language Model Tuning through Sparse AutoencoderarXiv preprint arXiv:2502.14050, 2025
- Many-Turn JailbreakingarXiv preprint arXiv:2508.06755, 2025
- Your Thoughts Tell Who You Are: Characterize the Reasoning Patterns of LRMsarXiv preprint arXiv:2509.24147, 2025
- Humanity’s last examarXiv preprint arXiv:2501.14249, 2025
- Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in Large Reasoning ModelsarXiv preprint arXiv:2510.21978, 2025
2024
- A survey on large language models for critical societal domains: Finance, healthcare, and lawarXiv preprint arXiv:2405.01769, 2024
- Introducing v0.5 of the ai safety benchmark from mlcommonsarXiv preprint arXiv:2404.12241, 2024
- A safe harbor for ai evaluation and red teamingIn International Conference on Machine Learning (ICML), 2024Oral
- Test-time backdoor attacks on multimodal large language modelsarXiv preprint arXiv:2402.08577, 2024
- TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent ConstitutionIn Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
-
- Pllama: An open-source large language model for plant sciencearXiv preprint arXiv:2401.01600, 2024
- MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via DebateIn Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
- Trustagent: Towards safe and trustworthy llm-based agentsarXiv preprint arXiv:2402.01586, 2024
- A survey on detection of llms-generated contentIn Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
- Large language models can be good privacy protection learnersIn Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
- Quokka: An Open-source Large Language Model Chatbot for Materials SciencearXiv preprint arXiv:2401.01089, 2024
- Unveiling the Misuse Potential of Base Large Language Models via In-Context LearningarXiv preprint arXiv:2404.10552, 2024
- Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models ReasoningarXiv preprint arXiv:2405.20535, 2024
- DALD: Improving Logits-based Detector without Logits from Black-box LLMsarXiv preprint arXiv:2406.05232, 2024
- MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific UnderstandingIn Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2024
- CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior TherapyarXiv preprint arXiv:2410.13218, 2024
2023
- TRACE: A Comprehensive Benchmark for Continual Learning in Large Language ModelsIn , 2023
- Shadow Alignment: The Ease of Subverting Safely-Aligned Language ModelsIn , 2023
- Zero-Shot Detection of Machine-Generated CodesarXiv preprint arXiv:2310.05103, 2023
- Large Language Models Can Be Good Privacy Protection LearnersarXiv preprint arXiv:2310.02469, 2023
- DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated TextarXiv preprint arXiv:2305.17359, 2023
- Enhancing Small Medical Learners with Privacy-preserving Contextual PromptingarXiv preprint arXiv:2305.12723, 2023
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis EvaluationarXiv preprint arXiv:2305.11116, 2023
- Dynamic Prompting: A Unified Framework for Prompt TuningarXiv preprint arXiv:2303.02909, 2023
- Exploring the limits of chatgpt for query or aspect-based text summarizationarXiv preprint arXiv:2302.08081, 2023
- MatKB: Semantic Search for Polycrystalline Materials Synthesis ProceduresarXiv preprint arXiv:2302.05597, 2023
- ReDi: Efficient Learning-Free Diffusion Inference via Trajectory RetrievalarXiv preprint arXiv:2302.02285, 2023
- OASum: Large-Scale Open Domain Aspect-based SummarizationIn Findings of the Association for Computational Linguistics: ACL 2023, Jul 2023
- Few-Shot Document-Level Event Argument ExtractionIn Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2023
- Alpacare: Instruction-tuned large language models for medical applicationarXiv preprint arXiv:2310.14558, Jul 2023
2022
- PcMSP: A Dataset for Scientific Action Graphs Extraction from Polycrystalline Materials Synthesis Procedure TextIn Findings of the Association for Computational Linguistics: EMNLP 2022, Jul 2022
2021
- An Analysis of Relation Extraction within Sentences from Wet Lab ProtocolsIn 2021 IEEE International Conference on Big Data (Big Data), Jul 2021
- On explosive boiling of a multicomponent Leidenfrost dropProceedings of the National Academy of Sciences, Jul 2021
2019
- Convective heat transfer along ratchet surfaces in vertical natural convectionJournal of fluid mechanics, Jul 2019