teaching a gpt to judge itself, part zero
baby datasets, baselines, and traces
Aug 25, 20253 min read24
Search for a command to run...
Series
i’m building a reasoning lab to judge gpt’s logic harder than I judge my own life. reason-saver logs, scores, and critiques completions. this is my journey through llm tooling, prompt debugging, and other ml chaos.