And you could identify these neurons using dropout techniques and repetitively q...

And you could identify these neurons using dropout techniques and repetitively querying the model against them.

Drop a set of neurons and there’s no change? Probably doesn’t contain the “sky color” concept.

Drop a set of neurons and the model freaks out, definitely conceptual neurons.

Rinse and repeat to find the distilled pattern across all the neurons.

You could train an LLM against the neuron graph to do this for you.