We created the first-ever qualitative, shape-based, functional anomaly detection algorithm for time series: the STAR (Shocklet Transform and Ranking) algorithm. Using this methodology, you can analyze high-dimensional time series data, find period of time in which they displayed anomalous behavior, and distill coherent, human-understandable narratives from them, all without any off-line training. We have successfully used this algorithm to:

  • generate social timelines from terabytes of social media data;
  • classify periods of financial market instability; and
  • generate document-free topic networks from natural language data.

Our algorithm compares favorably with Twitter’s anomaly detection algorithm for uncovering spiky dynamics over short time intervals, but unlike this and other anomaly detection algorithms, STAR can pull out long-term, shock- and cusp-like dynamics. These dynamics often correspond to important social and economic movements, such as the Occupy Wall Street protests of 2010…

Words surrounding the Occupy Wall Street protests

…or Israeli-Gaza conflict of 2014…

Words surrounding the Israel - Gaza conflict

Each panel shows the time series of that word’s popularity on Twitter. Red highlights Twitter’s anomaly detection algorithm indicators, while STAR-indicated anomalies are in blue. It’s clear that STAR is able to find functionally-meaningful anomalies over a wide range of timescales.

Please contact star@sociotechnicalsignals.com for more information.

Technical overview

Since pictures are worth \(\geq 1000 \) words…

The STAR algorithm

  • We first search for pieces of the time series that look like little kernel archetypes: maybe like rapid ramp-ups followed by equally rapid cool-down periods—what we term “shocks” and “cusps”, or discontinuous changes to new levels (in which case STAR just acts like a changepoint detection algorithm).
  • We then aggregate all of the information collected in the previous step into new time series. We then threshold thesee time series to extract windows of anomalous behavior.
  • Weighting each anomaly windows by the change in the time series over the duration of each window (or by some other custom weighting function), we sort the set of time series at each point in time by the window weights. This defines a univariate Markov chain (panel E above) that gives a straightforward narrative about which time series is most anomalous at each time step.

This process can be generalized so that we provide a summary of the top \(r\) most anomalous time series at each time step. If you want the full technical details, you should read the paper.

Social media

The mathematical machinery underlying this algorithm - the so-called “shocklet transform” - got some love on Twitter.