The central task of machine learning research is to extract regularities from data. These regularities are often subject to transformations that arise from the complexity of the process that generates the data. There has been a lot of effort towards creating data representations that are invariant to such transformations. However, most research towards learning invariances does not model the transformations explicitly. My research is focused towards modeling data in ways that separate their “content” from the potential “transformations” it undergoes. I primarily used a probabilistic generative framework due to its high expressive power and the belief that any potential representation will be subject to uncertainty. To model data content I focused on sparse coding techniques due to their ability to extract highly specialized dictionaries. I defined and implemented a discrete sparse coding model that models the presence/absence of a dictionary element subject to finite set of scaling transformations. I extended the discrete sparse coding model with an explicit representation for temporal shifts that learns time invariant representations for the data without loss of temporal alignment. In an attempt to create a more general model for data transformations, I defined a neural network that uses gating units to encode transformations from pairs of datapoints. Furthermore, I defined a non-linear dynamical system that expresses the dynamics in terms of a bilinear transformation that combines the previous state and a variable that encodes the transformation to generate the current state. In order to examine the behavior of these models in practice I tested them with on a variety of tasks. Almost always, I tested the models on recovering parameters from artificially generated data. Furthermore, I discovered interesting properties in the encoding of natural images, extra-cellular neural recordings, and audio data.