Neural networks are huge graphs of nodes and activation functions. There are some common design patterns that can be used to indicate that certain nodes are actually the same variable. For example, an image recognition network wants to be able to recognize a lion not just in the center of the image, but anywhere in the image. The lion recognition portion of the network will be the same no matter where in the image it's looking. Convolutional neural networks (CNNs) are a good way to represent this.
The problem here is time. How do you represent time in a neural network? CNNs sort of work but they don't understand temporal relationships. Another approach is recurrent networks (RNNs), but they tend to lose focus on their initial conditions too fast. Attention is an alternative network structure. Instead of looking back just a few layers like recurrence, it looks back over everything, with the weight give to each previous layer dynamically learned. It's WAY better than RNNs, obviously.