The board and current position after each move are the state. 3 stacks across 3 pegs.
Since the LLM's context cannot be arbitrarily written to, only appended to, it must essentially record the entire configuration space of the towers as well as the couple of variables for the algorithm (Step #, even? odd? IIRC). In practice, I was only able to do 6 disks before I ran out of context on my local models. Performance degraded seriously at 5.
As I said, using an agentic system where you managed the board and context externally, I have no doubt it could be done to arbitrary length. The problem is fitting the moves and "memorization" of the board state in context. The context window isn't a notepad for which there is perfect recall against, it's pumped into an attention layer which turns it into weights. The fuller it gets, the more it tries to consider all of it to do any job, the higher the chances of a mistake. There are no training wheels stopping it from making incorrect moves.
As I said, now try to do it in your head. No constraints.
I can do 3- but with minor difficulty. I didn't try 4- but I also really don't want to.
An agentic system would maintain the board externally and feed it to the LLM for consideration of each step. Even a very tiny LLM would succeed at this task to essentially arbitrary size.