If a proof depends on a fact, the answer follows.
stable
Releasing soon
Sophontic cultivates geometric density in the substrate to achieve reasoning performance on public benchmarks exceeding that of models up to 60× its size. We have pioneered methods of directly training the internal geometry of the model.
Observed signal
Methodology
We measure reasoning by flip rate — a paired-item test that precludes the surface-feature heuristics most models use to game conventional benchmarks. The pair is the unit of measurement, not the item. Reasoning, not recall.
Inspect the eval kitIf a proof depends on a fact, the answer follows.
stable
Change the load-bearing fact. The answer must change.
flip
The model
fig. iv.aOur first prototype demonstrates that reasoning need not scale with size.
The eval kit
fig. iv.bFive public reasoning benchmarks, extended with paired perturbation items and released open-source.