Drop a set of neurons and there’s no change? Probably doesn’t contain the “sky color” concept.
Drop a set of neurons and the model freaks out, definitely conceptual neurons.
Rinse and repeat to find the distilled pattern across all the neurons.
You could train an LLM against the neuron graph to do this for you.
Drop a set of neurons and there’s no change? Probably doesn’t contain the “sky color” concept.
Drop a set of neurons and the model freaks out, definitely conceptual neurons.
Rinse and repeat to find the distilled pattern across all the neurons.
You could train an LLM against the neuron graph to do this for you.