**What is Resampled Importance Sampling (or RIS)?** Chris Wyman June 18, 2022 Resampled importance sampling, or RIS, is a two-step sampling procedure introduced to computer graphics by [Justin Talbot](http://justintalbot.com/) in his 2005 [Eurographics Symposium on Rendering](https://egsr.eu/) paper ["Importance Resampling for Global Illumination"](https://diglib.eg.org/handle/10.2312/EGWR.EGSR05.139-146) with more extended discussion and analysis in [his masters thesis](https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=1662&context=etd) from Brigham Young University. Personally, I *really* hate the name. Especially since it is closely related to the statistics algorithm "sampling importance resampling" (or SIR). In both cases, I think the names are something only a statistician could love. For clarity (and my sanity), I will typically refer to them as merely "importance resampling," "resampling," or sometimes RIS. The basic idea is if you have a very large pool of low-quality samples, you can intelligently take a subset of this large pool to get a smaller set of much better quality samples. Algorithmically, this means: 1. First, use a cheap, or naive, algorithm to generate a large number of samples $S_i$. 2. Second, (possibly using different weights) pick a subset of $S_i$ to create a new, smaller set of samples $R_j$. 3. Use samples $R_j$ for your rendering. This is called "resampling" because you pick your final samples $R_j$ by re-evaluating weights for your earlier samples $S_i$ and picking a subset of them. (I.e., every $R_j$ is also a sample $S_i$.) While not particularly important for the rest of this document, for completeness sake, the theoretical difference in how these samples are used (in step 3) changes. In traditional [Monte Carlo integration with importance sampling](https://en.wikipedia.org/wiki/Monte_Carlo_integration#Importance_sampling_algorithm), you can approximate a function by numerically sampling a bunch of samples $S_i$ that are sampled with some probability $P_S(S_i)$: $$\int f(x)dx \approx \frac{1}{N}\sum_{i=1}^N\frac{f(S_i)}{P_S(S_i)}$$ Instead, with importance resampling, you estimate the integral as follows: $$\int f(x)dx \approx \frac{1}{N}\sum_{j=1}^N\left[\frac{f(R_j)}{P_R(R_j)}\sum_{i=1}^M\frac{P_R(S_i)}{P_S(S_i)}\right]$$ Here $P_R$ is a (potentially) different sampling distribution (than $P_S$) that you use in step 2, above, as part of reweighting your samples. It is the _target_ distribution you would _like_ the samples $R_j$ to follow, if only you had a cheap way to sample it. Importantly, resampling does _not_ guarantee that samples $R_j$ are distributed according to $P_R$; they only approximate $P_R$'s distribution. Why Is This Interesting? ============================================================== The key reason this idea is useful is **it gives you two places** to control the distribution of your final samples $R_j$: 1. Changing the sampling algorithm used to pick $S_i$ affects which samples you get. 2. Changing how you subsample (or resample) $R_j$ from $S_i$ **also** affects which samples you get. Importantly, when picking samples $S_i$ we often need to use analytic sample distributions, e.g., uniform samples, cosine-distributed samples, Phong samples, etc. Essentially, we need to have a clear algorithm for taking a uniform random number $\xi_i\in[0..1)$ into sample $S_i$, and we often do this by inverting some probability density function $P_S(x)$ into $P^{-1}_S(x)$ and finding sample $S_i$ by computing $S_i = P^{-1}_S(\xi_i)$. But this is *not* a requirement when resampling $R_j$ from our $S_i$'s. When resampling, we can numerically approximate the distribution $P_R(x)$. This means we can possibly select $R_j$ according to *much more complex* distributions $P_R(x)$ for which we do not have an analytic form. (One fascinating application is that RIS essentially allows us to *approximate* [perfect importance sampling](https://www.pbr-book.org/3ed-2018/Monte_Carlo_Integration/Importance_Sampling). This is somewhat similar to the idea of "almost exact inversion" described by Luc Devroye on [p. 133](http://www.nrbook.com/devroye/Devroye_files/chapter_four.pdf#page=16) of his book ["Non-Uniform Random Variate Generation"](http://www.nrbook.com/devroye/).) This is potentially very powerful, though basic RIS has some practical considerations that limit how good sample quality can get. But the key insight is that applying RIS *improves sample quality* at the cost of thinning your available sample pool. Sometimes this is a very good trade. Now you might read the above and think, "this must be a very complex algorithm!" Not so. In fact, you probably do resampling *all the time*, already. A Simple Example ============================================================== ![ ](risImgs/raffleTicket.png width=80) So let's consider a simple example. Imagine you are going to a big sports game, and the home team is giving away some big prize (a car, vacation, etc.) by giving every attendee a raffle ticket and having a drawing during a break in the game. Hooray! You get to see your favorite team and you also have a (small) chance of a big surprise. Now, if you're attending a major sports event, there might be 100,000 attendees. So your chances of winning are small, at $1/100000$. ![ ](risImgs/ticketMixer.jpg width=150) But imagine the live drawing at half-time uses some relatively small ticket mixer, e.g., on the right. This device might not actually *hold* 100,000 tickets. Can they even have a fair drawing? Sure. ![](risImgs/stadiumMap.png width=150 style="float:left; padding: 15px;") Consider a typical stadium. It might be divided into 100 seating sections, each with 1000 seats. Since the ticket mixer doesn't hold 100,000 tickets, perhaps they can hold 100 separate drawings: one for each section. If they select 1 winning ticket for each section, and then do a final drawing where they select from the 100 winning tickets, the probability of winning is $[1/1000 ]\cdot[1/100] = 1/100000$. This is a very simple two-step resampling process using RIS: first select 100 raffle tickets from the entire pool of tickets, then resample from those 100 tickets to select the one winner. Note that the first step, in this example, is a [stratifed sampling](https://en.wikipedia.org/wiki/Stratified_sampling) step, since only one ticket is selected per section of the stadium. You could do the same resampling but with a non-stratifed first step: first select 100 tickets from the entire set of 100,000, then pick one of those 100 tickets as the final winner. (Final probability: $[100/100000]\cdot[1/100] = 1/100000$.) Of course, this non-stratified approach doesn't solve the issue of the mixer being too small. Adding a bit of complexity -------------------------------------------------------------- Now what if the prize sponsor or sports team *does not* want the drawing to be fair? Perhaps they would like to give season ticket holders, student attendees, home-team fans, or some other type of attendee a higher chance of winning? It turns out that with this two-stage resampling process, this is very simple to do. Fans at the game probably won't even notice, since the final drawing will still appear to be selecting one winning ticket uniformly out of the mixer. If you want the chance of a winner from the student section of the stadium to be $10\times$ larger than other attendees, you could select 10 tickets from the student section in the first stage while continuing to select 1 ticket from every other section. The final drawing is identical, except there are now 109 tickets in the mix (instead of 100). This means the probability of winning change: - Student section: $[10/1000]\cdot[1/109] = 10/109000$ - Other sections: $[1/1000]\cdot[1/109] = 1/109000$ But remember RIS gives you *two* ways to control the final probabilities. The above approach changes the probabilities for sampling the original samples $S_i$. You can also change the probabilities for resampling $R_j$ from $S_i$. What does that look like in this simple example? Let's say in the first stage, we continue to select one winning ticket per section of the stadium. But we'd still like the student section to have a $10\times$ higher chance of winning. Well, we could put 10 copies of the winning student section ticket into the mixer for the final drawing. This changes the probabilities of winning as follows: - Student section: $[1/1000]\cdot[10/109] = 10/109000$ - Other sections: $[1/1000]\cdot[1/109] = 1/109000$ We get the same probabilities two different ways: by changing the PDF for picking our initial samples, or by changing how we reweight intermediate samples $S_i$ when resampling! Resampling for Environment Map Sampling ============================================================== A simple, but more rendering-related example is environment map sampling. Let's start with a thought experiment. Mentally pick a random(-ish) number $\xi\in[0..1)$. Now go outside, look up at the real world sky (i.e., a real environment map light probe). I'd like you to sample the (real-world) sphere of directions based on intensity. Point in the direction corresponding to $\xi$. If your random number $\xi=0$, you should point towards the darkest direction. If you picked $\xi=1$, you would point towards the brightest direction. If you picked $\xi=0.37$, you should point where 37% of the sky is darker than your selected direction. Now, unless you select $\xi=0$ or $1$, this is **remarkably** hard. Why? In graphics, we *regularly* sample from an environment probe by intensity, often using a single random number (like this) as input. In fact, you've almost certainly written code to do this. If you've written that code before, you could probably rewrite it in 5 minutes or less. So why is it hard to sample by physically pointing in real life? Resampling v.s. no resampling -------------------------------------------------------------- The difference between this thought experiment and light probe sampling in renderers is simple: in rendering, we *always* use RIS. This allows easy and cheap (approximate) sampling from a very complex distribution, specifically sampling proportional to intensity from a continuous distribution of environment lighting (that has no analytic representation). In my thought experiement, I asked you go out and directly sample this continuous lighting function. This is more-or-less impossible. Now I hear you thinking, "I don't remember implementing RIS for sampling my light probes!" Well, you did. Resampling with RIS is a two-step sampling procedure. What are the two steps here? Well: 1. Use a camera to capture a light probe. This creates numerous discrete (texel) samples $S_i$ on the continuous sphere of directions. 2. Reweight each texel by its intensity (times solid angle), and select one $R_j$ as the $S_i$ corresponding to the intensity specified by $\xi$. The reason nobody thinks of this as resampling is *you* probably did not personally capture the probe, so it is easy to forget about that sampling step. And reweighting by solid angle is thought of as just "part of the process," not what it actually is: carefully controlling the target PDF of our resampled $R_j$ by accounting for non-uniformities in the samples generated by the camera (and any postprocess stitching). Resampling Is Everywhere ============================================================== To be clear, I picked these examples as simple, intuitive, and fairly widely-understood examples. But once you start looking for multi-step sampling in our renderers (real-time, interactive, and offline), resampling appears everywhere. It's often not obvious, and it's rarely *described* as resampling, but this is an idea that any experienced rendering engineer is already comfortable with (though you may not know it). For instance: mip-mapping, hierarchical voxel grids, geometric level-of-detail, and even many neural networks can often be viewed as a multi-step sampling process, where each sampling step reduces the size of the available sample pool. Why Use Resampling? ============================================================== Let's go back to the standard Monte Carlo integration: $$\int f(x)dx \approx \frac{1}{N}\sum_{i=1}^N\frac{f(S_i)}{P_S(S_i)}$$ What problems can this have? Well: - It may be expensive (or impossible) to sample according to a good distribution $P_S$. - Evaluating $f$ may be expensive; the more samples $N$ that are needed, the higher this cost goes. - With an affordable distribution $P_S$, extremely high $N$ may be needed for a good approximation. This may be impractical. Resampling addresses these problems. By splitting the estimate into two stages: $$\int f(x)dx \approx \frac{1}{N}\sum_{j=1}^N\left[\frac{f(R_j)}{P_R(R_j)}\sum_{i=1}^M\frac{P_R(S_i)}{P_S(S_i)}\right]$$ We can use a cheap $P_S$ and a more expensive $P_R$ that need not have an analytic sampling algorithm. Because we do not evaluate $f$ for each candidate $M$, we can increase $M$ significantly without doing more expensive evaluations of $f$. Because we use a better distribution $P_R$, we need fewer final samples $N$, which may make estimation of more complex integrals feasible. (Though there's still some limits if $M$ must grow excessively.) But there's some **additional** reasons to use resampling that are not obvious from staring at the equations. If you try to amortize the cost of your samples by reusing them across adjacent pixels, you need to store these samples prior to reuse. When reusing large numbers of samples, the required storage quickly becomes impractical. But resampling reduces this overhead. Rather than sharing a large number of samples $S_i$ for each pixel, these are resampled into a few, better quality samples $R_j$; since there are fewer $R_j$'s, this reduces the storage overhead and dramatically increases the practical scale of sample reuse and amortization.