Archive for August, 2009

Foliage Rendering in Pure

Posted in Video Game Development with tags , , , , on August 21, 2009 by Dan Amerson

As I noted in my post on the first day at SIGGRAPH, I found the presentation from the guys at Black Rock Studio very interesting. Since the show, I’ve traded a few emails with Jeremy Moore who talked about the foliage rendering in Pure. Although the slides and course notes are not yet available, he provided me with some of the diagrams and screens from his talk to help my summary here. I think their method was very cool from a number of aspects.

  1. Their ground cover used a good blend of precomputation and runtime sorting to achieve good looking results.
  2. The foliage rendering for trees was an interesting technique that I think is applicable to both forward and deferred rendering.
  3. The tree rendering balanced the sort-order independent characteristics of alpha testing with the antialiasing of alpha blending.

Ground Cover

The ground cover system was independent of tree rendering. Here are the highlights of the system:

  • Density was painted by an artist into a 2D map. Values ranged from 0-15.
  • Each density value corresponded to a ground cover image within a texture atlas rather than a dynamic amount of geometry.
  • Ground cover was rendered within a fixed distance from the camera.
  • This distance, which I assume was square but easily could be radial, was divided into 400 tiles.
  • Each tile contained 256 camera aligned sprites using alpha blending for pleasing results. Experiments with the art staff rejected a alpha testing or alpha to coverage approach.
  • Within each tile, there were 32 chunks each containing 8 sprites spaced uniformly.
  • Drawing orders for chunks within a tile were precomputed from 16 different directions. See image below. IBs for each were built to avoid excessive runtime work.
  • At runtime, the system simply needed to sort the tiles for draw ordering and then select from the 16 precomputed orderings of the chunks.
Ground cover sorting in Pure. Image courtesy of Jeremy Moore at Black Rock Studio.

Ground cover sorting in Pure. Darker areas render later. Image courtesy of Jeremy Moore at Black Rock Studio.


  • There’s basically a hierarchical sorting algorithm here which is a twist that I hadn’t considerd although it’s obvious in hindsight.
  • Using precomputed results for the finest level of sorting is a great optimization.
  • Artist interface is simple. Paint ground cover into a texture.


  • The system was designed to minimize popping in Pure which is a fairly high speed game based on the footage I’ve seen. I wonder if the popping would be more noticeable with a slower camera. (I should probably rent the game at this point, but I am occasionally lazy.)
  • The precomputed sorting of chunks within a tile occurs based on a 2D vector to that tile. Since Pure is a racing game, that probably works well; the camera is generally at a low angle to the ground. With a free camera, you might need to map those vectors onto a hemisphere and hence use more than 16 precomputed directions.

Overall, there’s a great balance of “correct”‘ sorting for alpha blending with practical performance considerations here. Obviously, there’s going to be some amount of popping when you switch between the precomputed draw ordering for chunks. There’s no escaping that with a discretization like this. However, if the system minimizes that to the point where it’s not noticed by the player, it’s a success.


The tree rendering in Pure was another very well balanced system that I think could be applicable to objects other than trees. Based on their description, this approach is good any time some internal aliasing is acceptable but a good antialiasing at the edge of the alpha blended object is needed. With trees, some internal aliasing between leaves is acceptable, but you really want smooth alpha blending with no aliasing between the trees and the environment. I think this extends to any self-intersecting, dense, alpha blended object. Hair comes to mind as another good example.

Here’s the technique:

  1. Render the opaque objects.
  2. Clear the color buffer but not depth.
  3. Render the foliage alpha mask using alpha testing and depth testing but not depth writes.
    • There’s a separate pass here due to the alpha mask accumulation they use. Otherwise, we could store it during the color pass.
    • In the color channel, they store additively accumulated alpha.
    • In the alpha channel, they store the max alpha for a pixel.
    • Since both of these modes are commutative, there’s no need for depth writes and hence depth tests between the blended pixels.
    • See the image below. The top half is the accumulated alpha and the bottom is the max.
  4. Render the foliage color using alpha testing, depth testing, and depth writes to another color buffer.
  5. Composite the foliage color over the opaque scene using standard SRCALPHA/INVSRCALPHA blending.
    • The alpha value is computed as the average of the accumulated and max values.
    • This is not strictly correct, but the guys at Black Rock noted that it gives a good look with a pleasingly soft edge.
Alpha mask of foliage in Pure. Image courtesy of Jeremy Moore at Black Rock Studio.

Alpha mask of foliage in Pure. Image courtesy of Jeremy Moore at Black Rock Studio.


  • I don’t have to sort my trees at all! Hooray, alpha test and depth test!
  • The system still retains the soft, antialiased look of alpha blending at the boundary of trees rather than the highly aliased look of alpha testing.


  • Rendering the trees twice creates a lot of extra draw calls. Depending on the overhead of your system, that could be a concern.
  • Less of a concern and more just a crazy thought. I’m wondering what effects you could achieve using math other than a simple average in step 5.

Closing Summary:

Stuff like this always gets me excited because it’s immensely practical and current. Since it’s been used in production, I know I could turn around and use it tomorrow. In this case, there are some really good tricks, hacks, approximations, or whatever you want to call them for high performance sorting and order-independent alpha blending. They aren’t for everyone, but it’s good food for thought.

If anyone has questions, let me know. I took a lot of notes, so maybe I can answer. If we’re lucky, Jeremy might even come read and post a comment or two.

The alpha blending on the trees outside my office is stupendous. dba


SIGGRAPH 09: Days 4 and 5

Posted in Video Game Development with tags , , , , , , , , on August 9, 2009 by Dan Amerson

Friday, after the whirlwind of the show, I couldn’t muster the energy to write up the fullness of days 4 and 5. Luckily, a 3 hour layover in St. Louis accomodated me a bit. Since this will be a long post, here’s the executive summary.

  • More from Beyond Programmable Shading – This talk was very good, and I think it has some pieces that would be really nice to share even with co-workers who aren’t focused on graphics and GPU computing just so they understand the problem space better.
  • New Exporters for PhysX – NVIDIA had the new exporters showing in their booth. They look much slicker than the current stuff.
  • Papers from Friday – Some of these were really sweet. I have some summaries below. I think I’ll break out the more exciting ones in a later post.
  • Teapot FAIL. Chris wanted a teapot, but the line was too long. See image below.

To Shading and Beyond

The course notes are here, and they are worth a read. I’ll probably mention topics from this talk for some time. Here’s the relatively quick bullet points.

  • There was a great discussion of how GPUs are SIMD devices. We’re all familiar with the explicit 4X SIMD that we get in SSE instructions, but a lot of times we lose sight of the fact that GPUs are really SIMD in the same way. They are just 16X or 32X or more. This is the portion I mentioned above that I want to share with others in my company. When you have a very wide SIMD product, it behooves you to recast many if not all problems as compute intensive even if that’s more work.
  • The information presented by Johan at DICE about their expieriments with Frostbite was top notch. He showed a prototype running 1000 lights using a compute shader to sort, tile, and render the lights. That portion was the most valuable. The cool thing was that the compute shader used a few synchronization barriers. At first, it was parallel on pixels, then on lights, then on pixels again. It’s a very cool trick, and I never thought about parallelizing on multiple dimensions in a single shader to amortize the cost of dispatch.
  • In the afternoon, Paul Lalonde from Intel talked about extensible graphics pipelines. In short, he advocated a graphics pipeline that’s in place and usable but that has fully extensible pieces in software. On the one hand, everything that he said was straight Larrabee marketing. On the other hand, I really agree with his points. I don’t want to have to write a pipeline from scratch every time I grab a GPU. However, it’s nice to be able to violate any “restriction” that is there.

Let’s Get Physical

PhysX is a great physical simulation system, but the art pipeline has always left something to be desired. Luckily, there are new exporters for Max and Maya coming from NVIDIA. I got a demo at the show from Dan Horowitz. (I hope that’s his name. It was really loud the night before when I met him.) The new plugins look really slick and offer a much better interface for editing. However, they may be a bit of work for Emergent to update the Gamebryo exporters. I’ve definitely got to touch base with Adam and Stephen tomorrow to make sure we’re on top of this.

Papers and Talks.

Multi-Layer, Dual-Resolution Screen-Space Ambient Occlusion. The multilayer stuff was a technique similar to depth peeling to deal with single-sample artifacts in SSAO. I don’t think it’s as valuable as the dual resolution stuff which has some good performance implications.

RACBVH. That expands to Random Access Compressed Bounding Volume Hierarchies. This was a great presentation on compressed bounding volume hierarchies for large scenes. I don’t think it has a lot of value for client side graphics in games in the short term, but it could be useful on the server side or perhaps as part of a streaming system in a large world game.

Bucket Depth Peeling. This talk had very impressive results, but I think the cost is too much. 16 full resolution buffers for storing 8 color and 8 depth samples. That’s a pile of memory.

An Efficient GPU-based Approach for Interactive Global Illumination. This was a very impressive  implementation that ran a pretty robust global illumination on the GPU at 3-4 Hz. It’s way too costly for now, but the results were very impressive. Here’s a link to the paper.

Beyond Triangles: GigaVoxels Effects in Games. This talk was mistitled. I don’t think we’ll see this in games in the near term. However, the medical imaging applications in the short term could be enormous. This technique could render a 2,048*3 grid of voxels in real time. A full CT scan was one of the examples. Here’s a site from the author.

Teapots! We don’t need no stinking teapots!

Ok, I lie. The teapot is cool. Apparently, this year’s theme was 3D so it had some 3D glasses on. However, I couldn’t bring myself to burn what little time I had on Thursday waiting in line. Sorry, Chris.

Line for Teapots

Line for Teapots

For context, Pixar’s booth was a 20′ by 40′ affair. That makes the line a good 300+ feet long when compared with the booths on the map.

I hit publish because I’m tired of typing. Comment about mistakes. dba

SIGGRAPH 09: Day 4 Quickshot

Posted in Video Game Development with tags , on August 7, 2009 by Dan Amerson

A quick update. I was out too late networking and carousing with the guys from IDV, FIEA, and Intel last night, so I didn’t get a full post up. I’ll rectify that later. For now, here’s the quick summary.

  • The bulk of the day was in the Beyond Programmable Shading course. It was pretty cool. The notes are here if you want to peruse.
  • Another top secret meeting.
  • Saw the Caustic Graphics guys real-time raytracing stuff. It’s pretty cool, but it’s a new card. Run it on the GPU! They claim they have the best cost per ray in terms of dollars and watts hence the new HW, but…
  • Demo of the new PhysX DCC plugins. Pretty cool.

Short but sweet. dba

SIGGRAPH 09: Day 3

Posted in Video Game Development with tags , , on August 6, 2009 by Dan Amerson

Day 3 was long, quite possibly because Evenings 1 and 2 were long, so I’m taking it easy in the room tonight. Here’s the recap:

  • Information Aesthetics Showcase
  • Another Super Secret Meeting!
  • Emerging Technologies
  • Efficient Substitutes for Subdivision Surfaces
  • Creating Natural Variation
  • Dinner with Interesting Folk


I took some time this morning to swing through the Information Aesthetics Showcase. Overall, it was pretty thin. However, there was a very cool installation that visualized stock market data as a solar system. Distance from the center was volume. Size of the planets represented market capitalization, and orbital velocity represented percentage change on yesterday’s price. It really got me thinking about visualization. I had never though of using velocity or distance that way, but it’s quite compelling. Once the metaphor was explained, it was really easy to pick out “planets” with high velocities for example. I’m not really sure how it might apply to games or graphics dev tools, but it definitely served the purpose of provoking thought.

Not Emergent, Emerging

The Emerging Technologies section at SIGGRAPH always has a mix of interesting and utterly crazy stuff. Here’s my rundown from this year:

  • There was a sleighing simulator that tried to give an enhanced sense of speed by running a noise pattern in your peripheral view. It scores points for getting me to climb into a sled at a professional conference. Not much else.
  • A project called Twinkle projected a fairy onto a whiteboard with a little handheld projector. The projector also had a camera that read values back allowing the fairy to collide with the world and catch on fire. Overall, a really cool little project. You could fly around and bump into things which were just whiteboard scribbles.
  • There was an installation using night vision goggles called the Post Global Warming Survial Kit or something. Here’s the artist’s site. I was underwhlemed. It was a tent and a screen with an empty landscape. The backstory has potential.
  • There was a totally cool cabinet in the Generative Fabrication exhibit by this group/person. Unfortunately, there’s nothing even close to information on this site, so I can’t point to an image of the cabinet in question there. Mabye here.
  • One of the coolest things was a haptic pen called Pen De Touch from this group. It used a set of motors to vary the feedback. If you pushed down on a virtual surface, the feedback felt distinctly than if you brushed it sideways. Very cool to play with. I have no idea what one might use it for.

Efficient Substitutes for Subdivision Surfaces

This course was a pretty solid overview of things. You can find the course notes here. I’ve got some reading to do after the show to really digest all this. One of the more interesting notes was a pair of samples from NVIDIA showing how to simulate tessellation on D3D10 using multipass techniques and stream out. They are linked from the course notes page if you want to play. I ducked out after a couple of hours to hit the next talk on my list, so I didn’t get to hear Valve talk about their implementation.

Mixing It Up

I went to the papers session on creating natural variation to hear this talk on saliency-based variation for crowd rendering. That talk was pretty cool. In summary, people fixate on the upper body and face. Therefore, variation in locomotion of the hips or changing lower body color is a waste of time and effort. The best bang for your buck is upper-body texture or palette swaps and facial variation. There’s more data in the paper, it’s worth a quick scan.

The preceding talk on producing noise with a sparse Gabor kernel turned out to be much cooloer though. I saw a video on this a few weeks back and didn’t get that excited mostly because I don’t have a huge use for noise. However, this talk was very compelling. It got me thinking about what I could use noise for in the engine.


I grabbed dinner after the show with Timothy Farrar of Human Head Software or, more interestingly, this blog where he puts his mental spew.  Tim is really sharp, and he’s great dinner conversation. If you aren’t reading his stuff, you should. I emailed him cold on a suggestion from a colleague, and it turned out to be an interesting, pleasant evening. I figured I’d thank him publically.

Tomorrow’s post may well be late. dba

SIGGRAPH 09: Day 2

Posted in Video Game Development with tags , , , on August 4, 2009 by Dan Amerson

Today was another great day at the show. I started with the course on Real-Time Global Illumination for Dynamic Scenes. This was a great overview of techniques. They started with SSAO and proceeded to walk through various algorithms. One category used virtual point lights to approximate indirect illumination. Another set basically ran a real time radiosity solution. Unfortunately, it was information overload. I spent the whole session writing down references to papers so I can look them up next week. I’ll try to post some of those once I have links and titles. I’ll also try to link to the course notes at some point since they have a reall great table on the last page that shows the pros and cons of each technique. Overall, this was a great course. If they repeat next year, it’s well worth your time.

I spent some time on the exhibition floor. There was some cool stuff, but I’ll definitely need to go back through. There are a ton of booths.

  • IDV’s hand modeling tools for SpeedTree 5.0 are unbelievable. I had seen a demo a few months back, but it’s come so far that my jaw dropped. If you’re at the show, go ask Steve for the hand modeling demo. You can check out the second coolest piece of middleware that I know… LightSpeed is first, obviously.
  • The University of Central Florida – FIEA has been using Gamebryo for some time for student projects. I stopped by today to see some of them. They have a DJ/dance game called Sultans of Scratch which is very cool and worth a look.
  • The job fair was running. It looked a bit anemic which is not surprising given the economy, but games were well represented.

The rest of my day was in meetings. I originally intended to go to the Color Imaging course, but I made a tactical decision to peruse the exhibition a bit more and then skip it to come back to the room. That let me swap emails back home, work out, and write this post BEFORE I go out tonight. Dinner tonight is at NOLA. Sweet!

Bam! dba

SIGGRAPH 09: Day 1/2

Posted in Uncategorized, Video Game Development with tags on August 4, 2009 by Dan Amerson

Let’s review the day:

  • Wake up in Miami because your flights were late on Sunday.
  • Wait on standby hoping you get lucky to get to New Orleans.
  • Miss Anton Kaplanyan’s talk on real time GI that you really wanted to see.
  • Get lucky on a standby flight and at least catch the last half of the day.

Yes, that was my day. I was blessed with a delay at RDU last night that landed me in Miami 4 minutes after my connection to New Orleans left. I did, however, get lucky on standy today. My confirmed flight landed me around 5:30PM CDT.

Once I hit the ground, I swept into Advanced in Real-Time Rendering in 3D Graphics and Games. The afternoon sessions were pretty good. They started with a presentation from Jeremy Moore and David Jeffries from Black Rock Studios who worked on PURE and the upcoming Split/Second. The latter half of the presentation on Split/Second had some interesting tidbits about deferred rendering. However, I was much more interested in the foliage rendering discussion from PURE. There was a very interesting technique presented which rendered color and alpha to separate buffers with a subsequent combine pass. The technique used alpha testing for internal aliasing but relied on a more sophisticated shader to reduce aliasing at foliage edges when combining. I’m really looking forward to the course notes on this since they have more detail on the alpha blending operations that were used. It’s a combination of blending and testing that looked somewhat order independent in the talk.

Jason Yang presented a number of techniques from AMD. I really wish I could dive into them, but he presented a ton of information. I touched base with him after the talk to get some of the papers via email. There was a ton of technical information.

The session closed, for me anyway, with a discussion from Alex Evans of Media Molecule on the rendering decisions in Little Big Planet. I’d summarize them thusly:

  • Make development decisions based on your teams constraints, limitations, and prejudices.
  • Mix a number of techniques into a rendering pipeline that works for you and gives you the look you desire.
  • We’re past DX9, so don’t be afraid to use techniques that don’t fit precisely in the VS/PS model.

Overall, Alex’s talk was great. There were a number of great technical nuggets even when he said we shouldn’t implement them.

I finished the night up having a bite with Mark from NVIDIA, and that was a good time. We had a great creole dinner plus Hurricanes at Pat O’Brien’s which is why this post may or may not have made sense. 🙂

And that’s the way it is. More good stuff tomorrow. dba.