Mastodon

Redis, from the Ground Up

A deep dive into the data structure server: origins, design decisions, feature set, applications, and more.

Redis Logo

Redis (“REmote DIctionary Service”) is an open-source (BSD-licensed) database server.

History

The Redis project was started in early 2009 by an Italian developer named Salvatore Sanfilippo. Redis was initially written to improve the performance of LLOOGG, a real-time web analytics product out of Salvatore’s startup.

By June of 2009, Redis was stable enough, and had enough of a base feature set, to serve production traffic at LLOOGG. On June 19, 2009, an important milestone was hit: Salvatore deployed Redis to LLOOGG’s production environment and retired the MySQL installation.

Over the next months, Redis rapidly grew in popularity. Salvatore fostered a great community, added features at a very rapid pace, and dealt with any and all reports of database corruption, instability, etc. with the utmost severity.

In March of 2010 VMWare hired Salvatore to work full-time on Redis. (Redis itself remains BSD licensed.) Shortly thereafter, VMWare hired Pieter Noordhuis, a key Redis contributor, to give the project an additional momentum boost.

The rest, as they say, is history.

Data Structure Server

The most apt description of Redis is that it is a “data structure server”. This is a very natural label for the database, because Redis natively supports many of the foundational data types of computer science, and provides a rich set of familiar primitives for manipulating these types.

The supported data types are:

  • Strings
  • Lists
  • Sets
  • Sorted Sets
  • Hashes

Key Advantages

Performance

Redis can perform >100k+ SETs per second, and >80k+ GETs per second.

The redis-benchmark utility included with the project can be used to reproduce these findings. (More on benchmarks here and here.)

Atomicity

Every operation that Redis exposes via the available command primitives is atomic. (And, as we’ll see later, there are ways to combine multiple primitive commands into larger atomic units.) This makes application-level code easier to build and reason about.

Atomic guarantees are a direct consequence of Redis’ single-threaded core; as a result, Redis’ internals are simpler (no complex locking and synchronization code complicating the codebase, introducing bugs, and burning up CPU cycles).

Foundational Data Types

The data types that Redis offers are foundational. All computer scientists are very familiar with these data types and have already used them to solve countless problems. Lists, sets, etc. are more fundamental to computer scientists than relational database tables, columns, and rows.

The documentation for all Redis commands includes complexity measurements, in big O notation. This makes it very straightforward for computer scientists and software engineers to visualize, reason about, optimize, and understand the performance of Redis queries. Is it this straightforward to understand query performance in a relational database?

Polyglot Persistence

This is not specifically an advantage inherent to Redis, but is worth noting. Using Redis does not require a commitment to use it exclusively. Use Redis to solve problems that can be naturally modelled using its primitives, and embrace polyglot persistence for everything else.

Client Protocol

The client protocol is very straightforward. This makes it simple to build client libraries.

Getting Started

It is very easy to get started. Redis has no external dependencies, and getting up and running only involves the following steps:

  1. Clone the git repository or unpack the tarball
  2. Run make
  3. Run ./src/redis-server

Then, from another terminal, start the interactive command line interface by running ./src/redis-cli.

Feel free to issue a few commands:

    redis> PING
    PONG
    redis> INFO
    redis_version:2.1.1
    ...

(Alternatively, to try Redis from your browser, visit try.redis-db.com.)

Key Disadvantages

Redis requires that the whole dataset be loaded into main memory at all times. (Redis Virtual Memory, which we’ll discuss later, relaxes this requirement, but still needs all keys to always be in memory.). Guaranteed in-memory access to most of the dataset is Redis’ main performance driver – and is also responsible for creating its main limitations.

RAM

RAM is the gold of cloud computing. (Cloud servers are primarily priced based on the amount of available RAM. By comparison, disk and CPU are cheap.)

The amount of RAM that Redis needs is proportional to the size of the dataset. Large datasets in Redis are going to be fast, but expensive.

Persistence

Redis persistence is highly configurable but the implementation makes extremely heavy use of I/O resources. Furthermore, most save operations require additional memory to complete successfully, and, in some cases, asynchronous saves can block the server for lengthy periods of time. (These points are discussed in more detail, below; see the Persistence section.)

Memory Bloat

Redis’ internal design typically trades off memory for speed. For some workloads, there can be an order of magnitude difference between the raw number of bytes handed off to Redis to store, and the amount of memory that Redis uses.

Diving In

String Keys

Regardless of the data type, the data is always identified by a key, and the key is always a string.

For example, using the string data type:

    redis> SET foo bar
    OK
    redis> GET foo
    "bar"
    redis> GET dne
    (nil)

Expiry

Keys can be marked for expiry. For example:

    redis> EXPIRE foo 2
    (integer) 1

After waiting for 2 seconds:

    redis> GET foo
    (nil)

Sidenote: memcached?

So far, this looks quite similar to memcached (GET/SET API, in-memory storage, etc.). However, there are a few important things to note:

  1. Redis supports replication out of the box. Any sort of topology is possible, so you can create replication trees.
  2. Redis supports persistence, so you don’t lose everything that’s in memory when the server restarts.
  3. Redis supports a rich set of data types (far beyond memcached’s simple key-value-pairs).

Each of these points will be addressed in more detail, below.

Replication

Redis’ replication capabilities are powerful yet straightforward.

A master can have any number of slaves, and each slave can have any number of their own slaves, and so on and so forth. Any topology is possible.

To point a slave to a specific master, issue the SLAVEOF command on the slave. Slaves will block until the initial synchronization with the master is complete.

This initial synchronization process consists of the master asynchronously snapshotting the current state of the database, then transferring the snapshot to the slave, and then subsequently streaming all commands received after initiating the snapshot.

Persistence

Redis has configurable persistence settings, enabling durability to be tweaked depending on the problem domain.

Options

If durability is not important:

Redis can be configured in “snapshotting mode”. In this mode, Redis saves a binary dump of the contents of the database every x seconds or every y operations. If one of these criteria are met, Redis forks the process. The child process writes the dump file to disk while the master continues to service requests.

This procedure can be memory-efficient due to the way that Copy-On-Write works when forking. (Here, a snapshot of the database is saved as it existed exactly at the time of forking; extra memory is required only to store the keys that change during the snapshot procedure. If every key changes in value over the course of the snapshot, then roughly 2x the amount of memory used by Redis before the save is required to complete the save operation. This is the upper bound on the memory usage required for saving.)

Of course, in this mode, any data that is not written in the snapshot is immediately lost if the server is killed.

If durability is important:

Redis can be configured to use an Append-Only File (AOF). Here, every command is written to a file. To recover from a crash or other server restart, the append-only file is replayed. There are three modes:

  • fsync() on every new command
  • fsync() every second
  • Let the OS decide when to fysnc()

Using the BGREWRITEAOF command, Redis will update the snapshot and re-write the Append-Only File to shorten it. Like snapshotting, this is done asynchronously, in the background.

More advanced configurations:

Persistence can be turned off completely. This is useful in a number of scenarios.

For example, if performance is very critical and your application demands extremely tight control over RAM usage, the following configuration is possible:

  • One master, persistence off, and
  • One slave, persistence off, and
  • Periodic synchronous saves, issues against the slave only

The advantage of this set-up is that it requires no extra memory to complete a save, regardless of the number and frequency of writes. In this way, you are trading off durability for extremely tight control over memory usage.

No extra memory is required to complete the save because the SAVE command performs a synchronous save operation, thereby blocking the server that the command is issued against until the saving process completes. (Asynchronous saves, as discussed above, require extra memory proportional to the number of writes performed during the save.)

Other variations on this theme are possible, for example AOF can be enabled on the slave only while persistence remains off on the master.

Binary Dumps and In-Memory Representation

The binary dumps (i.e. those produced by the snapshot operations) are stored in a very efficient manner on disk.

Once a binary dump is loaded, Redis will use several factors more memory than the on-disk representation requires. The exact factor increase depends primarily on the data types that are in use. For example, Sorted Sets use significantly more memory than Sets, even though both data structures require similar amounts of space when serialized to disk.

This is expected behaviour, given that Redis optimizes heavily both read and write performance.

Note that optimizations are continually being made to reduce the amount of memory required to represent each of the data types in memory.

Problems

Redis exhibits the following issues with persistence:

  • Most save operations require additional memory to complete successfully (as previously discussed). Depending on the size of the dataset, the frequency of writes, and the amount of RAM you are comfortable reserving, this may or may not be an issue.

  • Redis persistence requires extremely heavy I/O usage. This is discussed in detail here. Also see Salvatore’s response.

  • In some cases, asynchronous saves can block the server for lengthy periods of time. See this post on the mailing list for an interesting discussion.

Although the issues with Redis persistence are hard problems to solve, the issues are beginning to be discussed at length. We should continue to see improvements in this area.

Expiry

The EXPIRE command enables a timeout to be set for a specified key. Once the timeout is reached, the server can delete the key. We call a key volatile if it is set to expire.

Note that expiry semantics changed very significantly with Redis 2.1.3. It is important to be aware of the differences, especially given the wide install base of Redis 2.0.

Redis Versions Prior to 2.1.3

The initial design of Redis’ expiry mechanism is extremely concerned with consistency.

Here, if the value of a volatile key is modified in any way, then the key becomes non-volatile and will not expire unless explicitly EXPIRE‘d again.

Furthermore, the following restrictions apply:

  • If a key is volatile, the EXPIRE command cannot be re-issued to change the expiry time
  • A write operation against a volatile key will destroy the key first and then perform the write

For example:

    redis> SET foo bar
    OK
    redis> EXPIRE foo 100000
    (integer) 1
    redis> GET foo
    "bar"
    redis> APPEND foo 123
    (integer) 3
    redis> GET foo
    "123"

In order to enforce consistency requirements, this is expected behaviour. (To be strongly consistent, if n servers receive the same list of commands in the same sequence, they must always end with the same dataset in memory. However, with replication over unreliable network links, and without the restrictions detailed above, slowdowns in the network could cause the master and slaves to get out sync, thus violating the consistency requirement.)

Redis Versions 2.1.3 and Above

None of these constraints apply to the later versions of Redis (i.e., write operations no longer destroy the key before performing the write, and expiry times can be modified after initially being set). The implementation was changed to remove the constraints, without sacrificing the consistency requirements.

The new implementation simply injects a DEL (delete) operation into the replication stream and the AOF file every time that the server expires a key.

More on Strings

We previously demonstrated the GET, SET, and APPEND commands.

Increment

The INCR command makes it possible to use strings as an atomic counter.

For example:

    redis> SET foo 0
    OK
    redis> INCR foo
    (integer) 1
    redis> INCR foo
    (integer) 2
    redis> GET foo
    "2"
    redis> SET bar baz
    OK
    redis> INCR bar
    (error) ERR value is not an integer
SETNX and Locking

The SETNX (“set if not exists”) command works like SET, but performs no operation if the key already exists.

This command can be used to implement a locking system. For example, the Redis documentation describes a locking algorithm.

Lists

Lists are used to store an (ordered) collection of items. Stacks and queues can be very easily modelled with lists.

For example:

    redis> LPUSH newset a
    (integer) 1
    redis> LRANGE newset 0 -1
    1. "a"
    redis> LPUSH newset b
    (integer) 2
    redis> RPUSH newset c
    (integer) 3
    redis> LRANGE newset 0 -1
    1. "b"
    2. "a"
    3. "c"
    redis> LPUSH myset z
    (integer) 1
    redis> RPOPLPUSH newset myset
    "c"
    redis> LRANGE newset 0 -1
    1. "b"
    2. "a"
    redis> LRANGE myset 0 -1
    1. "c"
    2. "z"

Sets

Sets store an un-ordered, unique collection of items. In addition to supporting sets, Redis has native support for for union, intersection, and diff operations.

    redis> SADD set1 a
    (integer) 1
    redis> SADD set1 b
    (integer) 1
    redis> SADD set1 c
    (integer) 1
    redis> SADD set2 c
    (integer) 1
    redis> SADD set2 d
    (integer) 1
    redis> SADD set2 e
    (integer) 1
    redis> SINTER set1 set2
    1. "c"
    redis> SUNION set1 set2
    1. "c"
    2. "d"
    3. "a"
    4. "b"
    5. "e"

Sets are a critical tool for building highly-optimized indexes. For example, if your application allows users to like articles and follow users, then you should be manually maintaining indexes (i.e. sets) to pre-compute the answers to questions like “which users liked article x” and “which users follow user y”.

Sorted Sets

Sorted sets (also known as “zsets”) are similar to sets, but each member of the set has an associated floating point score. Members are stored in sorted order, so no sorting is required to retrieve the data ordered by the floating point score.

For example:

    redis> ZADD days 1 Monday
    (integer) 1
    redis> ZADD days 2 Tuesday
    (integer) 1
    redis> ZADD days 3 Wednesday
    (integer) 1
    redis> ZADD days 4 Thursday
    (integer) 1
    redis> ZADD days 5 Friday
    (integer) 1
    redis> ZADD days 6 Saturday
    (integer) 1
    redis> ZADD days 7 Sunday
    (integer) 1
    redis> ZRANGE days 0 -1
    1. "Monday"
    2. "Tuesday"
    3. "Wednesday"
    4. "Thursday"
    5. "Friday"
    6. "Saturday"
    7. "Sunday"

Sorted sets are essential whenever a range query is needed. For a clever application of sorted sets, see Auto Complete with Redis.

Furthermore, as with plain sets, sorted sets are an important tool to consider when constructing indexes.

Hashes

Redis hashes are conceptually equivalent to the Ruby hash, Python dictionary, Java hash table or hash map, etc.

For example:

    redis> HSET user:1 name bob
    (integer) 1
    redis> HSET user:1 email b@bob.com
    (integer) 1
    redis> HGET user:1 name
    "bob"
    redis> HGETALL user:1
    1. "name"
    2. "bob"
    3. "email"
    4. "b@bob.com"

Before this data type was added to Redis, users had two main options for storing hashes: use one key to store a JSON structure with multiple fields, or use one key for each field in the hash. Both of these approaches have many downsides versus using the native hash data type; the former introduces concurrency issues, and the latter requires significantly more memory.

Redis Transactions

Redis transactions can be used to make a series of commands atomic. (Every Redis command is already atomic; transactions, however, allow us to combine a number of commands into a single atomic unit.)

In this example, the SET and two INCR commands are all executed atomically within the context of a transaction:

    redis> MULTI
    OK
    redis> SET foo 0
    QUEUED
    redis> INCR foo
    QUEUED
    redis> INCR foo
    QUEUED
    redis> EXEC
    1. OK
    2. (integer) 1
    3. (integer) 2

Here, MULTI delineates the start of a transaction block; EXEC is responsible for actually running the transaction (i.e., executing all commands between the MULTI and the EXEC, with no other commands executed in-between).

There is an important caveat: because the commands are queued and then executed sequentially (all at once, as if they were a single unit), it is not possible to read a value while a transaction is being executed and to then subsequently use this value at any point during the course of the same transaction.

To address this restriction, Redis supports “Check and Set” (CAS) transactions. CAS is available via the WATCH command.

The following example (in pseudo-code) is an example of a good implementation of a function that only increments a key if the current value of the key is even.

    WATCH mykey
    val = GET mykey
    MULTI
    if val % 2 == 0:
        SET mykey (val+1)
    EXEC

If mykey changes after the WATCH command is executed but before the EXEC, then the entire transaction is aborted. The client can retry indefinitely on transaction failure. Any implementation that does not not use CAS is subject to race conditions, unless some other locking mechanism is employed (for example, the locking system described during the treatment of SETNX).

A more involved example, taken from an earlier post I made to the Redis mailing list:

I'm looking to build a FIFO queue, with one important/ non-standard property: if an item that already exists in the queue is added again, then the insert should effectively be ignored. (This data structure can be useful for managing certain types of jobs, like warming caches, etc.)

An efficient way to model this structure in Redis would be to use a ZSET, where the score of each item in the set is the unix timestamp of when the item was added.

However, the semantics of ZADD are that if an item already exists in the set, that the score is updated to the score specified in the command. If there were some way to tell ZADD to not update the score if an element already exists, or perhaps to supply a certain score if the element is new and a different score if the element is not new, then this queue-like data structure could be easily (and very efficiently) modelled in Redis.

There are potentially a few other ways to do this, with the obvious approach being to check if an item already exists in the ZSET (using a set intersection) before issuing the ZADD command. However, this isn't bullet-proof, because it doesn't look like we will be able to do this within the context of a MULTI/EXEC.

Does anyone have any other ideas?

Coincidentally, this was posted on the exact same day that the WATCH command made it into the Redis master branch.

A solution (in pseudocode, with thanks to Salvatore):

    FUNCTION ZADDNX(key,score,element):
        WATCH key
        MULTI
        IF (ZSCORE key element) == NIL
            ZADD key score element
            retval = EXEC
            IF retval == NIL:
                ZADDNX(key,score,element)
        ELSE
            DISCARD

(In this example: if the element already exists in the ZSET, then do nothing.)

Redis Virtual Memory

The goal of Redis Virtual Memory (VM) is to swap infrequently-accessed data from RAM to disk, without drastically changing the performance characteristics of the database. This enables a single instance of Redis to support datasets that are larger than main memory.

Virtual Memory is a very important feature of most modern operating systems. However, for efficiency reasons, Redis does not use the OS-supplied VM facilities and instead implements its own system. The rationale is as follows:

  1. A single page, as managed by the OS, is 4 kB.
  2. The value of a single Redis key may touch many different pages, even if the key is small enough to fit in a single page.
  3. For reasons previously discussed, Redis objects can be an order of magnitude larger in RAM than they are when stored on disk. Therefore, if using the OS’ Virtual Memory facilities, the OS would need to perform an order of magnitude more I/O versus a custom Redis Virtual Memory implementation.
  4. Hence, by building Virtual Memory into the database server, overall efficiency can be significantly improved.
Limitations

There are a few main limitations of Redis Virtual Memory:

  • All keys must be stored in memory at all times. Values can be swapped to disk, but keys cannot.
  • Values must be swapped in their entirety, even for complex types. For example, if a list has one thousand items, all one thousand items must be resident in main memory before any list-related operation can be performed, including accessing the head of the list or appending a single item to the list’s tail.
Implementation Details

When Virtual Memory is enabled, Redis stores the last time that each object was accessed. Additionally, Redis maintains a swap file that is divided into pages of configurable size, with the page allocation table stored in memory. Each page uses 1 bit of actual RAM.

When Redis is out of memory and there is something to swap, a few random objects from the dataset are sampled. The object with the higher “swappability factor” is the object that will be swapped to disk.

    Swappability = Object.age * Logarithm(Object.used_memory)

Redis maintains a pool of I/O threads that are solely responsible for loading values from disk into RAM.

When a request arrives, the command is read and the list of keys is examined. If any of the keys have been swapped to disk, the client is temporarily suspended while an I/O job is enqueued. Finally, once all keys that are needed by a given client are loaded, then the client resumes execution of the command.

From a configuration perspective, the vm-max-memory setting can be used to set the maximum amount of memory that Redis can use before it swaps to disk.

For more detail, see Redis Virtual Memory: the Story and the Code.

Publish/Subscribe

Redis has native support for publish/subscribe.

In addition to supporting exact matches on channel names, it is also possible to subscribe against a pattern. In this way, subscribers do not need to know the exact name of all channels a priori, thereby increasing the flexibility of this messaging mechanism.

Although pub/sub may seem like an odd fit, Redis’ internals are very well suited for this feature. Furthermore, pub/sub brings with it numerous advantages. In particular, this feature is highly convenient in the context of the use cases of a large class of modern web applications, and, with some creativity, can be used as a substitute for not having native scripting support within Redis.

Example

Imagine the scenario where a news-related site needs to update the cached copy of its home page every time that a new article is published.

The background cache worker process subscribes to all channels that begin with ‘new.article.’:

    redis> PSUBSCRIBE new.article.*

The article publishing process creates a new technology article (in this example, this article has ID ‘1021’), adds the article’s ID to the set of all technology articles, and publishes the article’s ID to the ‘new.article.technology’ channel:

    redis> MULTI
    OK
    redis> SET article.technology.1021 "In today's technology news, ..."
    QUEUED
    redis> SADD article.technology 1021
    QUEUED
    redis> PUBLISH new.article.technology 1021
    QUEUED
    redis> EXEC
    1. OK
    2. (integer) 1
    3. (integer) 1

At this point, the background cache worker process will receive a message and know immediately that a new technology article was published, subsequently executing the appropriate callback to re-generate the home page.

Usage Examples

Redis is extremely flexible and highly usable in a number of different scenarios.

I see Redis definitely more as a flexible tool than as a solution specialized to solve a specific problem: his mixed soul of cache, store, and messaging server shows this very well.

Salvatore Sanfilippo

A small sampling of potential applications:

Caching

Caching (particularly for web applications) is likely Redis’ most common use case. For details on configuring Redis as an LRU cache, see here.

Interestingly, despite memcached’s dominance in this area, plain key-value stores (i.e. those without support for data types like lists and sets) are at a disadvantage when acting as a web application cache.

For example, the resources returned from requests to web apps are typically composed of lists (lists of posts, lists of comments, lists of friends, etc.). With plain key-value stores, these lists will almost always be stored in single units (“blobs”). This makes very common list-related operations, such as adding an element to a list, getting the first ten items in a list, deleting the last item in a list, etc. very inefficient because the list is stored as a single unit and needs to frequently be serialized and deserialized within the application server. Furthermore, atomic updates of these lists are impossible without implementing some other mutual exclusion system. (Redis, with native support for lists, can perform these operations efficiently and atomically.)

This flexibility enables other cache-related advantages. For example:

One potential use for Redis is as a smarter replacement for memcached. A common challenge with caching systems is de-caching things based on dependencies - if a blog entry tagged with "redis" and "python" has its title updated, the cache for both the entry page and the "redis" and "python" tag pages needs to be cleared. Redis sets could be used to keep track of dependencies and hence take a much more finely grained approach to cache invalidation.

Simon Willison, Redis Tutorial
Nginx + Redis

This is a more specific type of (web application) caching than described above. Here, responses for certain types of dynamic requests are delivered directly to the requestor via the cache, bypassing the application server entirely. (See here for a more detailed treatment of the subject.)

With the HttpRedis module, the Nginx web server can serve certain requests directly from Redis.

Interprocess Communication

Redis provides a very effective set of primitives for multiple processes on a single machine (or multiple machines connected via a network) to share state and communicate via message passing.

Views

Redis can be used to compute “views” for tables in relational (or other NoSQL) databases that are difficult to query effectively, due to factors such as schema design, index design, data volume, write volume, etc.

For example, given a relational table that is used in an append-only fashion, a daemon could periodically pull down rows that it has not yet processed and “explode” the data into Redis, building out a number of lists, sets, sorted sets, counters, etc. (This is, effectively, hand-rolled index generation.) A reporting script can then perform operations against these data structures to compute all of the desired metrics.

Job Management

Resque (and alternate implementations, like Pyres) leverage Redis’ capabilities very extensively.

A number of other job systems/ task queues (e.g. Celery and Octobot) also support Redis.

Locking

Redis can be used to implement a lock service. As described earlier, SETNX is a key element of this locking algorithm.

Designing with Redis

There is no query optimizer. Redis provides extremely fast primitives, but overall query performance is highly dependent on how the user chooses to arrange the data.

The most important things to remember are:

  • The layout of the data should be designed based on how it will be queried.
  • It is the user’s responsibility to manually build indexes.

As a direct consequence, data will almost always be duplicated in several places.

For example, imagine the scenario of using Redis to store a book database. An efficient data layout will include storing the details of each book (title, author, publisher, ISBN, genre, etc.) in a Redis hash.

In order to query the database to answer questions like “what other books did this book’s author write?”, the data layout should also include a number of manually-designed indexes. In this case, sets like the following should be built, each of which contain the ID number of all applicable books:

  • all authors
  • all books by author
  • all publishers
  • all books by publisher
  • all genres
  • all books by genre
  • etc.

In this example, we have duplicated the ID number of each book across multiple disparate data structures. (More generally, we have de-normalized our data to optimize the speed of each query.)

Redis cannot automatically remove all instances of a book from all indexes when the book is deleted. The application developer should keep track of all sets that a book is in (using an additional set) so that clean-up can be performed efficiently.

This type of data duplication is extremely common with non-relational data sets. For most systems, this necessitates running background workers that are responsible for constantly scanning the data set and repairing any inconsistencies that are detected.

Other Resources

Some other fantastic Redis-related resources include:

You should follow me on Twitter here.