Large Language models (LLMs) have witnessed impressive progress and these large models can do a variety of tasks, from generating human-like text to answering questions. However, understanding how these models work still remains challenging, especially due a phenomenon called superposition where features are mixed into one neuron, making it very difficult to extract human understandable […]
The post Formulation of Feature Circuits with Sparse Autoencoders in LLM appeared first on Towards Data Science.
