The transformers used by LLMs discover and encode the relationships between words IN HUMAN THOUGHT EXPRESSIONS. The multi-layer nature of the deep neural net into which the transformers encode these relationships results in increasing levels of abstraction of the relationships being encoded in different layers of the neural net, along with the ways and strength that each more abstract relationship-aspect is involved in each more particular relationship. By focus on contextual relationships that humans attend to in their communications, and by having the ability to capture abstract aspects of the relationships, the LLM is effectively learning a semantic network of the concepts that words and word sequences utttered by humans represent. The carefully hierarchically organized and represented statistical properties of human syntax (when averaged over many many uttereances so as to ignore accidental, non-essential differences in expression) ultimately result in a usable (cheaply associatively tourable) representation of the semantics (MEANING) behind human communications. i.e. a general and specific knowledge base has been represented in the deep neural net.
It is then possible to design and implement various query-driven and goal-directed associative touring algorithms to visit the concepts in the knowledge base in appropriate touring orders (nearby concepts are nearby in the neural net representation, more general ones are "above", more specific ones are "below") so that we could say these algorithms are thinking abouit the queries and appropriate answers to them.