Event-based cameras encode changes in a visual scene with high temporal precision and low power consumption, generating millions of events per second in the process. Current event-based processing algorithms do not scale well in terms of runtime and computational resources when applied to a large amount of data. This problem is further exacerbated by the development of high spatial resolution vision sensors. We introduce a fast and computationally efficient clustering algorithm that is particularly designed for dealing with large event-based datasets. The approach is based on the Expectation-Maximization (EM) algorithm and relies on a stochastic approximation of the E-step over a truncated space to reduce the computational burden and speed up the learning process. We evaluate the quality, complexity, and stability of the clustering algorithm on a variety of large event-based datasets, and then validate our approach with a classification task. The proposed algorithm is significantly faster than standard k-means and reduces computational demands by two to three orders of magnitude while being more stable, interpretable, and close to the state of the art in terms of classification accuracy.