Does In-IDE Calibration of Large Language Models Work at Scale?
Empirical investigation of calibration techniques for in-IDE LLM suggestions at scale.
Empirical investigation of calibration techniques for in-IDE LLM suggestions at scale.
Platform for code-completion research combining IDE plugin, backend, and ML infrastructure.
Theoretical work exploring formal models of agentic AI via classical automata and language hierarchies.
Runtime verification framework to ensure AI agents operate within defined constraints and enable auditing.
Review and tooling for elevating benchmark quality in AI4SE; introduces BenchScout and an enhancement protocol.
Preprint on improving developer understanding of LLM-generated unit tests.