Storing time series statistics in Redis
The following is the tale describing the creation of a flexible strategy for storing time series statistical data in Redis.
This was developed as part of a recent rethink of how the popular API proxy ApiAxle handles it’s statistical data so it’s quite specific to the associated domain. This is certainly not the only way to tackle this problem and may not be the best, simply the one that worked for this developer on this project.
- Super fast inserts.
- Query by arbitrary time range.
- Support for both near real-time (per second) and historical data.
- Reasonable DB space usage.
After looking at various solutions including using Sorted Sets we decided we could get the best performance/space setup by breaking storing each API hit in a range of hashmaps representing different granularities of time (e.g. minutes, seconds…). Each hashmap would hold a suitable number of values to provide useful data at that granularity.
granularities = seconds: # kept for 1 hour size: 3600 ttl: 7200 factor: 1 minutes: # Available for 24 hours size: 1440 # Minutes in 24 hours ttl: 172800 # Seconds in 48 hours factor: 60 # Number of seconds that make up this granularity
This structure is easily extensible and customisable to suit the projects needs.
Each key is then assigned a TTL (using Redis EXPIREAT) of twice the duration of storage (this is doubled to accommodate rollover between one day/hour/minute and the next.)
Each key then includes the timestamp at which the hit occurred rounded down to the nearest whole number when divided by the granularity required.
# Round a timestamp (in seconds) for a give granularity getRoundedTimestamp: ( timestamp, granularity ) -> factor = granularity.size * granularity.factor return Math.floor( timestamp / factor ) * factor
A hit occurring at 1364833411 would create the following keys:
Each key then contains a mapping from timestamp:hits. In this case the timestamp is rounded down divided by the amount of seconds per granularity (60 for minutes). The values are updated using the atomic HINCRBY operation.
The statistics data is made available using the ApiAxle API with the consumer specifying the required granularity and time range as query parameters. The from and to timestamps are rounded to the required granularity, reusing the logic for saving and a simple
while loop iterates the range, incrementing by the number of seconds per unit.
i = from while from <= to rounded_ts = getRoundedTimestamp( getTime(), granularities["seconds"] ) redis_key = "<API_ID>:stats:seconds:rounded_ts" results[i] = hget( redis_key, i ) i += granularities["seconds"].factor
The full source code for this implementation is available in ApiAxle’s github repo. This version is specific to ApiAxle but if any one is particularly interested get in touch with me, I’d be happy to help create a more generic JS/Coffeescript library for this.
Signup for the newsletter
What is ApiAxle?
ApiAxle is a proxy that sits on your network, in front of your API(s) and manages things that you shouldn't have to like rate limiting, authentication and analytics. It's fast, open and easy to configure. Read more
© Copyright 2013