About Jay Bailey

Hi, I’m Jay! I’m a former software engineer from Brisbane, Australia who’s now made the career transition into AI alignment. I’ve previously worked for Amazon, Service NSW, and a consulting company called Versent as a contractor, so I’ve seen a lot, both good and bad. For more, you can check out my CV! In alignment, I’ve explored both language model evaluation (AI psychology) and mechanistic interpretability (AI neuroscience).

I’m especially interested in improving our world’s capacity to perform thorough, scientifically backed evaluations on frontier AI models. This means building institutions (both private and government) and advancing the conceptual science of how to evaluate language models, especially for dangerous capabilities that may be hidden from us (such as deception) or difficult to elicit without a lot of work (such as agent-like behaviour that requires AutoGPT-like frameworks). I think our ability to do this is going to be key to maintaining control over the future as this transformation occurs, and ensure we keep harnessing these amazing tools, rather than be controlled by them.

Outside of work, I’m a former regional Magic: The Gathering competitor, current avid Kindle reader, a signatory of the Giving What We Can pledge, and constantly grateful for living in an age of technological and material wonders. I can be reached at jaybaileycs@gmail.com.