The Forge Versions Save

The Forge Cross-Platform Rendering Framework PC Windows, Steamdeck (native), Ray Tracing, macOS / iOS, Android, XBOX, PS4, PS5, Switch, Quest 2

v1.56

1 month ago

Release 1.56 - April 4th, 2024 I3D | Warzone Mobile | Visibility Buffer | Aura on macOS | Ephemeris on Switch | GPU breadcrumbs | Swappy in Android | Screen-space Shadows | Metal Debug Markers improved

I3D

We are sponsoring I3D again. Come by and say hi! We also will be giving a talk on the new development around Triangle Visibility Buffer.

Warzone Mobile launched

We work on Warzone Mobile since August 2020. The game launched on March 21, 2024.

Warzone Mobile

Visibility Buffer

We removed CPU cluster culling and simplified the animation data usage. Now traingle filtering only takes one dispatch each frame again.

Swappy frame pacer is now vailable in Android/Vulkan

We integrated the Swappy frame pacer into the Android / Vulkan eco system.

GPUCfg system improved with more ids and less string compares

we did another pass on the GPUCfg system and now we can generate the vendor Ids and model Ids with a python script to keep the *_gpu.data list easily up to date for each platform. We removed most of the name comparisons and replaced them with the id comparisons which should speed up parsing time and is more specific.

Screen-Space Shadows in UT9

We added to the number of shadow approaches in that unit test screen-space shadows. These are complementary to regular shadow mapping and add more detail. We also fixed a number of inconsistencies with the other shadow map approaches.

PS5 - Screen-Space Shadows on Screen-Space Shadows PS5

PS5 - Screen-Space Shadows off Screen-Space Shadows PS5

Nintendo Switch Screen-Space Shadows Switch

PS4 Screen-Space Shadows PS4

GPU breadcrumbs on all platforms

Now you can have GPU crash reports on all platforms. We skipped OpenGL ES and DX11 so ...

A simple example of a crash report is this:

2024-04-04 23:44:08 [MainThread ] 09a_HybridRaytracing.cp:1685 ERR| [Breadcrumb] Simulating a GPU crash situation (RAYTRACE SHADOWS)... 2024-04-04 23:44:10 [MainThread ] 09a_HybridRaytracing.cp:2428 INFO| Last rendering step (approx): Raytrace Shadows, crashed frame: 2

We will extend the reporting a bit more over time.

Ephemeris now also runs on Switch ...

v1.55

2 months ago

Release 1.55 - March 1st, 2024 - Ephemeris | gpu.data | Many bug fixes and smaller improvements

Ephemeris 2.0 Update

We improved Ephemeris again and support it now on more platforms. Updating some of the algorithms used and adding more features.

Now we are supporting PC, XBOX'es, PS4/5, Android, Steamdeck, iOS (requires iPhone 11 or higher (so far not Switch)

Ephemeris on XBOX Series X Ephemeris 2.0 on February 28th, 2024

Ephemeris on Android Ephemeris 2.0 on February 28th, 2024

Ephemeris on PS4 Ephemeris 2.0 on February 28th, 2024

Ephemeris on PS5 Ephemeris 2.0 on February 28th, 2024

IGraphics.h

We changed the graphics interface for cmdBindRenderTargets

// old
DECLARE_RENDERER_FUNCTION(void, cmdBindRenderTargets, Cmd* pCmd, uint32_t renderTargetCount, RenderTarget** ppRenderTargets, RenderTarget* pDepthStencil, const LoadActionsDesc* loadActions, uint32_t* pColorArraySlices, uint32_t* pColorMipSlices, uint32_t depthArraySlice, uint32_t depthMipSlice)
// new
DECLARE_RENDERER_FUNCTION(void, cmdBindRenderTargets, Cmd* pCmd, const BindRenderTargetsDesc* pDesc)

Instead of a long list of parameters we now provide a struct that gives us enough flexibility to pack more functionality in there.

Variable Rate Shading

We added Variable Rate Shading to the Visibility Buffer OIT example test 15a. This way we have a better looking test scene with St. Miguel.

VRS allows rendering parts of the render target at different resolution based on the auto-generated VRS map, thus achieving higher performance with minimal quality loss. It is inspired by Michael Drobot's SIGGRAPH 2020 talk: https://docs.google.com/presentation/d/1WlntBELCK47vKyOTYI_h_fZahf6LabxS/edit?usp=drive_link&ouid=108042338473354174059&rtpof=true&sd=true

The key idea behind the software-based approach is to render everything in 4xMS targets and use a stencil buffer as a VRS map. VRS map is automatically generated based on the local image gradients. It could be used on a way wider range of platforms and devices than the hardware-based approach since the hardware VRS support is broken or not supported on many platforms. Because this software approach utilizes 2x2 tiles we could also achieve higher image quality compared to hardware-based VRS.

Shading rate view based on the color per 2x2 pixel quad:

White – 1 sample (top left, always shaded);
Blue – 2 horizontal samples;
Red – 2 vertical samples;
Green – all 4 samples;

PC VRS

Debug Output with the original Image on PC VRS

PC VRS

Debug Output with the original Image on PC VRS

Android VRS

Debug Output with the original Image on Android VRS

Android VRS

Debug Output with the original Image on Android VRS

UI description:

Toggle VRS – enable/disable VRS
Draw Cubes – enable/disable dynamic objects in the scene
Toggle Debug View – shows auto-generated VRS map if VRS is enabled
Blur kernel Size – change blur kernel size of the blur applied to the background image to highlight performance benefits of the solution by making fragment shader heavy enough. Limitations: Relies on programmable sample locations support – not widely supported on Android devices.

Supported platforms: PS4, PS5, all XBOXes, Nintendo Switch, Android (Galaxy S23 and higher), Windows(Vulkan/DX12), macOS/iOS.

gpu.data

You want to check out those files. They are now dedicated per supported platform. So it is easier for us to differ between different Playstations, XBOX'es, Switches, Android, iOS etc..

Unlinked Multi GPU

The Unlinked Multi GPU example was broken on AMD 7x GPUs with Vulkan. This looks like a bug. We had to disable DCC to make that work.

Vulkan

we track GPU memory now and will extend this to other platforms.

Vulkan mobile support

We support now the VK_EXT_ASTC_DECODE_MODE_EXTENSION_NAME extension

Remote UI

Various bug fixes to make this more stable. Still alpha ... will crash.

Retired:

35 Variable Rate Shading ... this went into the Visibility Buffer OIT example 15a.
Basis Library - after not having found any practical usage case, we remove Basis again.

V1.54

3 months ago

Release 1.54 - February 2nd, 2024 - Remote UI Control | Shader Server | Visibility Buffer | Asset Pipeline | GPU Config System | macOS/iOS | Lots more ...

Our last release was in October 2022. We were so busy that we lost track of time. In March 2023 we planned to make the next release. We started testing and fixing and improving code up until today. The amount of improvements coming back from the -most of the time- 8 - 10 projects we are working on where so many, it was hard to integrate all this, test it and then maintain it. To a certain degree our business has higher priority than making GitHub releases but we realize that letting a lot of time pass makes it substantially harder for us to get the whole code base back in shape, even with a company size of nearly 40 graphics programmers. So we cut down functional or unit tests, so that we have less variables. We also restructured large parts of our code base so that it is easier to maintain. One of the constant maintenance challenges were the macOS / iOS run-time (More about that below). We invested a lot in our testing environment. We have more consoles now for testing and we also have a much needed screenshot testing system. We outsource testing to external service providers more. We removed Linux as a stand-alone target but the native Steamdeck support should make up for this. We tried to be conservative about increasing API versions because we know on many platforms our target group will use older OS or API implementations. Nevertheless we were more adventurous this year then before. So we bumped up with a larger step than in previous years. Our next release is planned for in about four weeks time. We still have work to do to bring up a few source code parts but now the increments are much smaller. In the meantime some of the games we worked on, or are still working on, shipped:

Forza Motorsport has launched in the meantime:

Starfield has launched:

Starfield

No Man Sky has launched on macOS:

No Man's Sky

Internal automated testing setup on our internal GitLab server

Our automated testing setup that tests all the platforms now takes 38 minutes for one run. At some point it was more. We revamped this substantially since the last release adding now screenshot comparisons and a few extra steps for static code analysis.

Visibility Buffer

the Visibility Buffer went through a lot of upgrades since October 2022. I think the most notable ones are:
- Refactored the whole code so that it is easier to re-use in all our examples, there is now a dedicated Visibility Buffer directory holding this code
- Animation of characters is now integrated
- Tangent and Bi-Tangent calculation is moved to the pixel shader and we removed the buffers

Software Variable Rate Shading

This Unit test represents software-based variable rate shading (VRS) technique that allows rendering parts of the render target at different resolution based on the auto-generated VRS map, thus achieving higher performance with minimal quality loss. It is inspired by Michael Drobot's SIGGRAPH 2020 talk: https://docs.google.com/presentation/d/1WlntBELCK47vKyOTYI_h_fZahf6LabxS/edit?usp=drive_link&ouid=108042338473354174059&rtpof=true&sd=true

PC Windows (2560x1080): Variable Rate Shading on PC

Switch (1280x720): Variable Rate Shading on Switch

XBOX One S (1080p): Variable Rate Shading on XBOX One S

PS4 Pro (3840x2160): Variable Rate Shading on XBOX One S

The key idea behind the software-based approach is to render everything in 4xMS targets and use a stencil buffer as a VRS map. The VRS map is automatically generated based on the local image gradients. The advantage of this approach is that it runs on a wider range of platforms and devices than the hardware-based approach since the hardware VRS support is broken or not supported on many platforms. Because this software approach utilizes 2x2 tiles we can also achieve higher image quality compared to hardware-based VRS.

Shading rate view based on the color per 2x2 pixel quad:

White – 1 sample (top left, always shaded);
Blue – 2 horizontal samples;
Red – 2 vertical samples;
Green – all 4 samples;

Variable Rate Shading Debug

UI description:

Toggle VRS – enable/disable VRS
Draw Cubes – enable/disable dynamic objects in the scene
Toggle Debug View – shows auto-generated VRS map if VRS is enabled
Blur kernel Size – change blur kernel size of the blur applied to the background image to highlight performance benefits of the solution by making fragment shader heavy enough. Limitations: Relies on programmable sample locations support – not widely supported on Android devices.

Supported platforms:

PS4, PS5, all XBOXes, Nintendo Switch, Android (Galaxy S23 and higher), Windows(Vulkan/DX12). Implemented on MacOS/IOS, but doesn’t give expected performance benefits due to the issue with stencil testing on that platform

Shader Server

To enable re-compilation of shaders during run-time we implemented a cross-platform shader server that allows to recompile shaders by pressing CTRL-S or a button in a dedicated menu. You can find the documentation in the Wiki in the FSL section.

Remote UI Control

When working remotely, on mobile or console it can cumbersome to control the development UI. We added a remote control application in Common_3\Tools\UIRemoteControl which allows control of all UI elements on all platforms. It works as follows:

Build and Launch the Remote Control App located in Common_3/Tools/UIRemoteControl
When a unit test is started on the target application (i.e. consoles), it starts listening for connections on a part (8889 by default)
In the Remote Control App, enter the target ip address and click connect

Remote UI Control

This is alpha software so expect it to crash ...

VK_EXT_device_fault support

This extension allows developers to query for additional information on GPU faults which may have caused device loss, and to generate binary crash dumps.

Ray Queries in Ray Tracing

We switched to Ray Queries for the common Ray Tracing APIs on all the platforms we support. The current Ray Tracing APIs increase the amount of memory necessary substantially, decrease performance and can't add much visually because the whole game has to run with lower resolution, lower texture resolution and lower graphics quality (to make up for this, upscalers were introduced that add new issues to the final image). Because Ray Tracing became a Marketing term valuable to GPU manufacturers, some game developers support now Ray Tracing to help increase hardware sales. So we are going with the flow here by offering those APIs.

macOS (1440x810) Ray Queries on macOS

PS5 (3840x2160) Ray Queries on PS5

Windows 10 (2560x1080) Ray Queries on Windows 10

XBOX One Series X (1920x1080) Ray Queries on XBOX One Series X

iPhone 11 (Model A2111) at resolution 896x414 Ray Queries on iOS

We do not have a denoiser for the Path Tracer.

GPU Configuration System

This is a cross-platform system that can track GPU capabilities on all platforms and switch on and off features of a game for different platforms. To read a lot more about this follow the link below.

GPU Configuration system

New macOS / iOS run-time

We think the Metal API is a well balanced Graphics API that walks the path between low-level and high-level very well. We ran into one general problem with the Metal API for both platforms. It is hard to maintain the code base. There is an architectural problem that was probably introduced due to lack in experience in shipping games. In essence what Apple decided to do is have calls like this:

https://developer.apple.com/documentation/swift/marking-api-availability-in-objective-c

Anything a hardware vendor describes as available and working might not be working with the next upgrade of the operating system, hardware or just the API or XCode. If you have a few hundred of those macros in your code, it becomes a lottery what works and what not on a variety of hardware. On some hardware one is broken, on the other hardware something else. So there are two ways to deal with this: for every @available macro you start adding a #define to switch off or replace that code based on the underlying hardware and software platform. You would have to manually track if what the macro says is true on a wide range of platforms with different outcome. So for example on macOS 10.13 running on a certain Macbook Pro (I make this up) with an Intel GPU it is broken but then a very similar Macbook Pro that has additionally a dedicated GPU actually runs it. Now you have to track what "class of Macbook Pro" we are talking about and if the Macbook Pro in question has an Intel or an AMD GPU. We track all this data already so that is not a problem. We know exactly what piece of hardware we are looking at (see above GPU Config system). The problem is that we have to guard every @available macro with some of this. From a QA standpoint that generates an explosion of QA requests. To cut down on the number of variables we decided to focus only on calls that are available in two different macOS and two different iOS versions. Here is the code in iOperatingSystem.h

// Support fixed set of OS versions in order to minimize maintenance efforts
// Two runtimes. Using IOS in the name as iOS versions are easier to remember
// Try to match the macOS version with the relevant features in iOS version
#define IOS14_API     API_AVAILABLE(macos(11.0), ios(14.1))
#define IOS14_RUNTIME @available(macOS 11.0, iOS 14.1, *)

#define IOS17_API     API_AVAILABLE(macos(14.0), ios(17.0))
#define IOS17_RUNTIME @available(macOS 14.0, iOS 17.0, *)

Dynamic Rendering extension - VK_KHR_dynamic_rendering

We were one of the big proponents of the Dynamic Rendering extension. As game developers we took over part of the driver development by adopting Vulkan as a Graphics API. The cost of game production rose substantially due to that because our QA efforts had to be increased to deal with an API that is more lower level. One of the interesting findings that were made by many who adopted Vulkan in games is that a Vulkan run-time in most cases runs slower on the GPU compared to the DirectX 11 run-time. It is very hard to optimize a Vulkan run-time to run as fast as a DirectX 11 run-time on the GPU. This is due to parts of the responsibilities having shifted from the device driver writer to the game developer. The main advantage of using Vulkan is the lower CPU overhead. While older DirectX 11 drivers were so inefficiently programmed that they were bringing down high-end PC CPUs and did not allow anymore to run two GPUs in SLI / Crossfire (hence the practical death of multi-GPU support for games), Vulkan has a substantially lower CPU overhead. So that being said the one thing that shouldn't have been moved from the driver into the game developer space is the render pass concept. It is so close to the hardware that device driver writers can deal much better with it then game developers. In our measurements of tiled hardware renderers, using tiles never had a positive effect. We can imagine a device driver writer with direct access to the hardware can make tiles run much faster than we can. This is why we embrace the dynamic rendering extension.

Removed glTF loading from the resource loader

glTF is an art exchange format but not suitable to load game assets. There are mostly two reasons for this: it doesn't make sense to load one glTF file for each "model" and it also doesn't make sense to load one glTF file for a scene because that would not allow streaming. Generally the way we load art assets is in one large zipped file with one "fopen" call. To not use any other OS calls we load from this large file all the assets by looking into a look-up table, find the address in memory, run a pointer there and then copy the data into system memory (that was a simplified view but should suffice for this purpose). To allow streaming depending on the type of game, we pre-package art assets in meaningful ways so when we load on demand they are as expected. glTF does not align with this concept or the fact that we compress data to fit into our internal caches. In other words it doesn't make sense to use glTF during run-time of a game.

The Asset Pipeline now loads and converts and optimizes glTF data into our internal format offline. This can be adjusted to the needs of any type of game and represents currently a proof-of-concept.

Unified the art assets to St. Miguel

Today there is no point in using Sponza as a test scene anymore. One can argue that even St. Miguel does not fullfil that purpose. Nevertheless it is currently the best we got. We removed all Sponza art assets and replaced them with St. Miguel.

Updated Wiki Documentation

We updated the Wiki documentation. Check it out. We know it could be more ...

Retired functional / unit tests
- 04_ExecuteIndirect - it is used now extensively in the Visibility Buffer ... we don't need that test anymore
- 07_Tessellation - similar to geometry shaders, the tessellation shaders never really worked out. So we are removing the unit test
- 18_VirtualTexture - similar to geometry shader and tessellation shader, this didn't seem to have worked out ... support became more spotty ... so better remove it
- 33_YUV - looks like this never worked out and was only supported by a small amount of hardware / software combinations ... so not useful for game development
- 37_PrecomputedVolumeDLUT - a very specific technique that didn't show any new abilities, so we removed it
- 38_AmbientOcclusion_GTAO - the maintainer could not fix one bug in the implementation ... so we removed it until someone else can write a consistent implementation for all platforms

V1.53

3 months ago

Release 1.53 - October 5th, 2022 - Steamdeck Support | App life cycle changes | Shader Byte Code Offline Generation | GTAO Unit Test | Improved gradient calculation in Visibility Buffer | New C Containers | Reorg TF Directory Structure | Upgraded to newer ImGUI | The Forge Blog

The Starfield Official Gameplay Reveal Trailer is out. It always brings us pleasure to see The Forge running in AAA games like this:

We added The Forge to the Creation Engine in 2019.

The Forge made an appearance during the Apple developer conference 2022. We added it to the game "No Man's Sky" from Hello Games to bring this game up on macOS / iOS. For the Youtube video click on the image below and jump to 1:22:40

We switched our Linux OS to Manjaro to have an easier upgrade path to the Steamdeck. Please note the changed Linux requirements below.
Shader byte code can now be generated offline.
- Shader binaries are compiled through FSL
- Introduced ShaderList files that determine all the binary shaders that FSL needs to produce. Defines, shader target and other specific configuration can be specified per shader binary declaration
- Update all projects (UT, VB, Aura, Ephemeris) to use the new ShaderLists
- Remove all ShaderStageLoadDesc::pMacros, shaders are compiled offline through ShaderLists
- Remove all Renderer::pBuiltinShaderDefines, all configuration is done through FSL
Over the last few projects we had always challenges with EASTL. So over the last 9 months we slowly removed it and replaced it by new C language based containers that prefer stack allocations over heap allocations. There is a new unit test that helps us to test the new libraries.

For string management: bstrlib

For dynamic arrays and hash tables: stb_ds.h

There is a new unit test to make sure those new containers are tested. It is called 36_AlgorithmsAndContainers

We changed the App life cycle: modern APIs have so many ways to reset the driver or reload assets, so we made a more flexible "reload" mechanism that generalizes all the special cases we had in there before.
- App extended with reload functionality by making use of ReloadDesc* parameter for the Load/Unload functions
- define reload/reset descriptors structs
- define reload/reset enum types
- Updated OS base files regarding new structs
- Able to reload shaders on all examples This is a breaking change to all of our rendering interfaces.
New Animation test that unifies most of the former animation tests into one. This way we can save some testing time in our Jenkins setup.
We added a new unit test called 38_AmbientOcclusion_GTAO. It implements the paper "Practical Real-Time Strategies for Accurate Indirect Occlusion" by Jorge Jimenez et. all.

macOS GTAO running on macOS

PC GTAO running on PC

PS4 GTAO running on PS4

PS5 GTAO running on PS5

Switch GTAO running on Switch

XBOX GTAO running on XBOX

We improved the gradient calculation in the Visibility Buffer. Thanks to Stephen Hill @self_shadow who brought this to our attention.
We reorganized the whole TF directory structure to allow development in more areas. Here is an image representing the new structure:

The Forge Reorg

What is still missing is the "Render Abstraction Layer", "Scene Loader" and we have to populate the "Game Layer" more.

We upgraded to ImGUI 1.88 to get access to the docking feature. In the process we improved the ImGUI integration substantially.
We started a blog for The Forge at The-Forge-Blog. We have no idea where we can find the time to write blog posts ... let's see what is happening ...
Retired Unit/Functional Tests:
- 08_GltfViewer - generally glTF is not a model format that is applicable for game development. So we use it as an intermediate format in the Resource loader. In the future we might only use it in the offline asset pipline. The main idea is to extract the data and bring it into a form that is usable in games. Unfortunately many people thought that the glTF viewer is a good model to start with. So we want to guide them in the right diretion here by not offering direct access to a glTF reader anymore.
- Most of the animation unit tests are now merged into 21_Animations, to reduce our hardware testing time. Our Jenkins testing environment that tests all platforms before someone can merge code is taking too long.

V1.52

3 months ago

Release 1.52 - April 29th, 2022 - C Code Hot Reloading Unit Test | Visibility Buffer OIT | Pre-Computed DLUT Test | Unified Window and Resolution control | Android Vulkan Validation Layer | CPU Features | Upgraded Vulkan and DX GPU allocator | macOS / iOS improvements | Double precision Math Library | Impoved Input System with HID support

We are always looking for more graphics / engine programmers. We are also specifically looking for a consultant who can help us to scale up our hardware testing environment.

The following list of changes is not fully representative of all the improvements we made, so it is just a selection:

C Code Hot Reloading Unit Test - This unit test showcases an implementation of code hot reloading in C, we've used and adapted the following GitHub library

for this.

C Code Hot Reloading Unit test

The test contains two projects:

19_CodeHotReload_Main: generates the executable. All code in this project can't be hot-reloaded. This is the project you should set as startup project when running the program form an IDE.
19a_CodeHotReload_Game: for development platforms Windows/MacOS/Linux generates a dynamic library that is loaded by the Main project in runtime, when the dynamic library changes the Main program reloads the new code. For Android/IOS/Quest/Consoles this project is compiled and linked statically.

How to use it: While the Main project is running open 19_CodeHotReload_Game.cpp and perform some change, there are lines marked with TRY_CODE_RELOAD to make easy changes. Once the file is saved, you can rebuild the project and see the changes happen automatically.

Windows/Linux: Click on the UI "RebuildGame" button.
MacOS: Command+B on XCode to rebuild.

Note: In this implementation we can't call any functions from The Forge from the HotReloadable project (19a_CodeHotReload_Game), this is because we are compiling OS and Renderer as static libraries and linking them directly to the exe. Ideally these projects should be compiled as dynamic libraries in order to expose their functionality to the exe and hot reloadable dll. The reason we didn't implement it in this way is because all our other projects are already setup to use static libraries.

Visibility Buffer Order-Independent Transparency - we added OIT by utilizing a per-pixel linked list to a Visibility Buffer (VB) rendering architecture. In case of Deferred Shading (DS), the per-pixel linked list holds per-pixel data. In case of VB it only holds the triangle index data. You can switch between DS and VB in this example. The VB version occupies substantially less memory and is faster. With memory bandwidth being the biggest challenge in graphics programming, this is not unexpected. Most people by now adopted the idea of VB in one or two ways but it doesn't hurt to show another advantage of the architecture.

Linux 1080p resolution Visibility Buffer OIT Linux

macOS 3200x1760 resolution Visibility Buffer OIT macOS

PS4 1080p resolution Visibility Buffer OIT Orbis

PS5 4k resolution Visibility Buffer OIT Prospero

Windows 10 1080p resolution Visibility Buffer OIT Windows

XBOX One (original) 1080p resolution Visibility Buffer OIT Orbis

Pre-Computed DLUT Test - this test implements pre-computing volume transmittance in Blender or Houdini for 6 directions and shading clouds/smoke based on the following tweets:

https://twitter.com/Vuthric/status/1286796950214307840

A detailed description can be found here: https://realtimevfx.com/t/smoke-lighting-and-texture-re-usability-in-skull-bones/5339

DLUT Test Blender Support

In this repository is a "dlut.blend" file that contains a minimal volumetric render setup. In order to generate DLUT image do the following steps:

Set the viewport shading to "Rendered"
Select the "Sun" object
Set the X rotation to 0 degrees
Press F12 to render the image and wait for a few minutes until it's done
Save the rendered image to "dlut_0.png"
Repeat steps 3-5 for 90, 180 and 270 degrees and save "dlut_90.png", "dlut_180.png" and "dlut_270.png"
Run the "combine_dlut.py" Python script or manually combine rendered images in your image editor of choice, each color channel should contain the red channel from the corresponding "dlut_*.png" image multiplied by the alpha channel of the same image. For example, green channel should contain the red channel from "dlut_90.png" multiplied by the alpha channel of "dlut_90.png"
Experiment and implement further ideas from the article above. Setting up a Mantaflow simulation in Blender and exporting animated smoke and simulation attributes like temperature can yield interesting results!

Resulting DLUT image should look like this:

DLUT Test Blender Support

The example program running on Android:

DLUT Test running on Android

Window Management - all the platforms that support the concept of having a windowed application have now a base file named {Platform}Window.cpp. There is now a common UI element that offers -if supported- multi-monitor support and various window settings. There are also LUA scripts that test the functionality in our Jenkins setup.
Android Vulkan Validation layers: we added the validation layer from Khronos GitHub repo as they have stopped shipping the layer in the NDK.

Android Vulkan Validation Layers

You can find them in ThirdParty/OpenSource/AndroidVulkanValidationLayers

CPU / GPU Features - we integrated the following library to test CPU features during start-up. Now you will see a lot more information about the CPU in the upper left corner of a window.

CPU Features

This library is the stepping stone of utilizing more CPU instrinsics on various platforms. You can see its results in the screenshots above, showing the name of the CPU, the supported instruction set. We also show now the GPU name and the driver version that the GPU uses.

Upgraded Vulkan and DX Allocators: following the updates to these open-source libraries on GitHub we upgraded our code base accordingly.
macOS / iOS - while working with TF on various projects, we bring back improvements and lessons learned from those projects. You will find numerous macOS / iOS improvements in this release.
For one of the business applications we worked on, we needed double precision Math. We extended the math library now accordingly with support.
We also improved the input system with HID support, which is an on-going effort. So better controller support on more platforms ...

HIDAPI

Windows 7 - better Windows 7 support with DX11 and Vulkan ... still a bug in the Vulkan run-time with sRGB ...
We upgraded the 06_MaterialPlayground with shadows:

Material Playground Unit Test

Retired unit test: we are going to retire many unit tests now because our automated testing cycle takes too long and heats up the "engine" room (see above passage on us looking for an consultant to scale up our testing environment). Today we retire:
- 02_Compute
- 05_FontRendering
- 13_UserInterface - we might create a much more advanced one for tools development in the future
- 16a_SphereTracing
- 32_Windows - not necessary anymore with every unit test now offering windows management
Resolved GitHub Issues:

V1.51

3 months ago

Release 1.51 - December 21st, 2021 - ECS uses flecs | Better Borderless Window | Descriptor Management improvements | sRGB | Android Game Development Extensions | FSL Improvements | Ray Tracing | Meshoptimizer | Buildbox | Lethis

Happy Holidays! 🎄🎅🔥🎁🧨

We wish you and your loved ones all the best for the Holidays and a Happy New Year 2022!

This update is again a mixture of things we learned while integrating The Forge and feedback and contributions from our users. Thanks for all the support!

In one of the next updates we will remove EASTL and offer dedicated containers compatible with C99. Over time EASTL was a huge productivity burner. The inefficient memory access patterns hugged too much CPU time in games where we integrated TF and we always had to go back and fix those later manually. We know this is a breaking change but considering that STL was a good idea on CPUs 20+ years ago, we would like to align more with what modern CPUs are expecting.

We keep moving towards C99 usage. We replaced the old ECS code with Flecs:

Now our build times are much better and the overall system runs faster:

CPU: Intel i7-7700k
GPU: AMD Radeon RX570

Old ECS
Debug
Single Threaded: 90.0ms 
Multi Threaded 29.0ms

Release:
Single Threaded: 5.7ms
Multi Threaded: 2.3ms


flecs
Debug
Single Threaded: 23.0ms   
Multi Threaded 6.8ms

Release
Single Threaded 1.7ms
Multi Threaded 0.9ms

Descriptor Management improvements - we changed the rendering interface for all platforms - cmdBindPushConstants now takes an index instead of a name, we also allow partial updates of array descriptors
Borderless window - there are improvements to borderless windows support
- remove borderless window "top white bar" on Windows OS
- add "Win Key + arrow" behavior (for standard maximize/minimize/split of the borderless window using keyboard)
- top resize area necessarily overlaps rendering (this is how we can remove the top white bar)
FSL improvements
- incremental shader generation and build with header dependencies
- improved error reported by extending line directives, errors now show up at the correct line in source fsl file
- extended matrix column/row access functions for all targets
- vec type padding to match our math lib datatypes

sRGB - all examples should be now more correct when it comes to linear lighting and sRGB
Android Game Extension usage: in one of our AAA game engine projects, we are now using successfully for the first time Android Game Extensions. So we also brought it to The Forge.

Android Game Extension

We redid a lot of the Android development setup to streamline the experience a bit. We still use Visual Studio 2017 because it allows to be more productive compared to other IDEs. Two years ago we went back and forth between IDEs but concluded that the only IDE that we could efficiently integrate into our Jenkins testing setup was Visual Studio. Please check out the new Readme below and let us know if we missed anything.

Vulkan: moved to KHR ray tracing extensions. Upgraded max spec to Vulkan SDK 1.2.162 and tested ray tracing support on an AMD RX 6700 XT GPU
meshoptimizer - somehow the integration of meshoptimizer "got lost" over time and we just re-integrated it into the resource loader. Here are some numbers that we got

meshoptimizer on various art assets meshoptimizer improvements

Buildbox

The game engine BuildBox is now using The Forge (click on Image to go to Buildbox website):

Lethis

The game Lethis Path of Progress uses The Forge now (click on image to go to the Steam Store)

v1.50

2 years ago

M²H uses The Forge for Stroke Therapy - M²H is a medical technology company. They developed a physics-based video game therapy solution that is backed by leading edge neuroscience, powered by Artificial Intelligence and controlled by dynamic movement – all working in concert to stimulate vast improvement of cognitive and motor functions for patients with stroke and the aged. The Forge provides the rendering layer for their application. Here is a YouTube video on what they do:

Unlinked multiple GPU Support: for professional visualization applications, we now support unlinked multiple GPU. A new renderer API is added to enumerate available GPUs. Renderer creation is extended to allow explicit GPU selection using the enumerated GPU list. Multiple Renderers can be created this way. The resource loader interface has been extended to support multiple Renderers. It is initialized with the list of all Renderers created. To select which Renderer (GPU) resources are loaded on, the NodeIndex used in linked GPU configurations is reused for the same purpose. Resources cannot be shared on multiple Renderers however, resources must be duplicated explicitly if needed. To retrieve generated content from one GPU to another (e.g. for presentation), a new resource loader operation is provided to schedule a transfer from a texture to a buffer. The target buffer should be mappable. This operation requires proper synchronization with the rendering work; a semaphore can be provided to the copy operation for that purpose. Available with Vulkan and D3D12. For other APIs, the enumeration API will not create a RendererContext which indicates lack of unlinked multi GPU support.
Config.h: We now have a central config.h file that can be used to configure TF.
- Created config files:

Common_3/OS/Core/Config.h
Common_3/Renderer/RendererConfig.h
Common_3/Renderer/{RenderingAPI}/{RenderingAPI}Config.h

    * Modified PyBuild.py
        * Proper handling of config options.
        * Every config option has --{option-name}/--no-{option-name} flag that uses define/undef directives to enable/disable macros. 
        * Macros are guarded with ifndef/ifdef.
        * Updated Android platform handling
      * Deleted Common_3/Renderer/Compiler.h. It's functionality was moved into Config.h
      * Moved all macro options to config files
      * Renamed USE_{OPTON_NAME} to ENABLE_{OPTION_NAME}
      * Changed some macros to be defined/not defined instead of having values of 0 or 1.
      * Deleted all DISABLE_{OPTION_NAME} macros
      * When detecting raytracing replaced ENABLE_RAYTRACING with RAYTRACING_AVAILABLE. This was done, because not all projects need raytracing even if it is available. RendererConfig.h defines ENABLE_RAYTRACING macro if it is available. So, it can be commented out in singular place instead of searching for it for every platform
      * Removed most of the macro definitions from build systems. Some of the remaining macros are:
        * Target platform macros: NX64, QUEST_VR
        * Arm neon macro ANDROID_ARM_NEON.
        * Windows suppression macros(like _CRT_SECURE_NO_WARNINGS)
        * Macros specific to gainputstatic

glTF viewer improvements:
- sRGB fixes
- IBL support now with prefiltered CCO/public domain cube maps
- TAA support on more platforms and fixes
- Vignette support

glTF Viewer running on Android Galaxy Note 9

glTF Viewer running on iPhone 7

glTF Viewer running on Linux with NVIDIA RTX 2060

glTF Viewer running on Mac Mini M1

glTF Viewer running on PS5

glTF Viewer running on Switch

glTF Viewer running on XBOX One Original

Specialization/Function constants support on Vulkan and Metal only - these constants get baked into the micro-code during pipeline creation time so the performance is identical to using a macro without any of the downsides of macros (too many shader variations increasing the size of the build).

Good read on Specialization constants. Same things apply to function constants on Metal

https://arm-software.github.io/vulkan_best_practice_for_mobile_developers/samples/performance/specialization_constants/specialization_constants_tutorial.html

Declared at global scope using SHADER_CONSTANT macro. Used as any regular variable after declaration

Macro arguments:

#define SHADER_CONSTANT(INDEX, TYPE, NAME, VALUE)

Example usage:

SHADER_CONSTANT(0, uint, gRenderMode, 0);
// Vulkan - layout (constant_id = 0) const uint gRenderMode = 0;
// Metal  - constant uint gRenderMode [[function_constant(0)]];
// Others - const uint gRenderMode = 0;

void main()
{
    // Can be used like regular variables in shader code
    if (gRenderMode == 1)
    {
        // 
    }
}

Resolved GitHub Issues
- #206 - Executing Unit Tests on Mac OS 10.14 gives a Bad Access error
- #209 - way to read texture back from GPU to CPU - this functionality is now in the resource loader
- #210 - memory allocation challenge - not an issue
- #212 - Question: updating partial uniform data on OpenGLES backend - not possible with OpenGL ES 2.0 run-time
- #219 - Question : way to support Vulkan SpecializationInfo? - support is now in the code base see above

v1.49

2 years ago

Quest 2 Support - after working now for the last 4 years on various Quest projects, we decided to add Quest 2 support to our framework.

Quest 2 running 01_Transformations

Quest 2 running 09_ShadowPlayground

At this moment the following unit tests do not work:
- 07_Tessellation: Tesselation is not supported when using Multiview. Unit test has been removed from Quest solution file.
- 10_ScreenSpaceReflections: Lots of artifacts.
- 14_WaveIntrinsics: Wave intrinsics are not supported.

Apple M1 support - we are testing now on a M1 iMac and a M1 iPad Pro. Unfortunately we have crashes in one unit test and all the more complex examples and middleware.

iMac with M1 chip running at 3840x2160 resolution iMac with M1 Running 10_PerPixelProjectedReflections

iPad with M1 chip running with 1024x1366 resolution iPad with M1 Running 10_PerPixelProjectedReflections

It is astonishing how well the iPad with M1 chip perform. Due to -what we consider driver bugs- M1 hardware crashes in

Aura
16_raytracing
Visibility Buffer
Ephemeris

UI / Fonts / Lua interface refactor

Moved Virtual Joystick to IInput.h / InputSystem.cpp
Pulled current Lua implementation out of AppUI and gave it its own interface (IScripting.h)
Pulled Fontstash implementation out of AppUI and gave it its own interface (IFont.h)
IFont and IScripting are now initialized on the OS Layer, with user customization functions available on the App Layer
Fonts and Lua can now be disabled via preprocessor defines and UI will still function (using default 'ProggyClean' font)

Zip unit test refactor to support encryption and writes into archive

minizip ng

Extended iOS Gesture / Android gesture support
Partial C99 rewrite of OS/Interfaces headers and implementation files
OpenGL ES 2 - Unit test 17 is now working as well.

GitHub fixes:

Pull Request "Fix typo" #199
Pull Request "Fix iOS Xcode OpenGL ES Error breakpoint crash" #202
Pull Request "Reduce GL ES buffer allocation frequency" #204
Pull Request "Apple silicon m1 fixes" #208

v1.48

2 years ago

This is our biggest update since we started this repository more than three years ago. This update is one of those "what we have learned from the last couple of projects that are using TF" updates and a few more things.

Aura - Dynamic Global Illumination - we developed this system in the 2010 / 2011 time frame. It is hard to believe it is 10 years ago now :-) ... it shipped in Agents of Mayhem at some point and was implemented and used in other games. We are just putting the "base" version without any game specific modifications in our commercial Middleware repository on GitHub. The games that used this system made specific modifications to the code base to align with their art asset and art style. In today's standards this system still fulfills the requirement of a stable rasterizer based Global Illumination system. It runs efficiently on the original XBOX One, that was the original target platform, but might require art asset modifications in a game level. It works with an unlimited number of light sources with minimal memory footprint. You can also cache the reflective shadow maps for directional, point and spotlights the same way you currently cache shadow maps. At some point we did a demo running on a second generation integrated Intel GPU with 256 lights that emitted direct and indirect light and had shadow maps in 2011 at GDC? :-) It is best to integrate that system in a custom game engine that can cache shadow maps in an intelligent way.

Aura - Windows DirectX 12 Geforce 980TI 1080p Driver 466.47

Aura on Windows DX12

Aura - Windows Vulkan Geforce 980TI 1080p Driver 466.47

Aura on Windows Vulkan

Aura - Ubuntu Vulkan Geforce RTX 2080 1080p

Aura on Ubuntu Vulkan

Aura - PS4

Aura on Ubuntu Vulkan

Aura - XBOX One original

Aura on Ubuntu Vulkan

Forge Shader Language (FSL) translator - after struggeling with writing a shader translator now for 1 1/2 years, we restarted from scratch. This time we developed everything in Python, because it is cross-platform. We also picked a really "low-tech keep it simple" approach. The idea is that a small game team can actually maintain the code base and write shaders efficiently. We wanted a shader translator that translates a FSL shader to the native shader language of each of the platforms. This way whatever shader compiler is used on that platform can take over the actual job of compiling the native code. The reason why we are doing this lies mostly in the unreliability of DXC and SPIR-V in general and also their lack of reliability if it comes to cross-platform translation.

There is a Wiki entry that holds a FSL language primer and general information how this works here:

https://github.com/ConfettiFX/The-Forge/wiki
Run-Time API Switching - we had some sort of run-time API switching in an early version of The Forge. At the time we were not expecting this to be very useful because most game teams do not switch APIs on the fly. In the meantime we found a usage case on Android, where we have to reach a large number of devices. So we came up with a better solution that is more consistent with the overall architecture and works on at least PC and Android platforms. On Windows PC one can switch between DX12, Vulkan and DX11 if all are supported. On Android one can switch between Vulkan and OpenGL ES 2.0. The later allows us to target a much larger group of devices for business application frameworks. We could extend this architecture to other platforms like consoles easily. This new API switching required us to change the rendering interfaces. So it is a breaking change to existing implementations but we think it is not much effort to upgrade and the resulting code is easier to read and maintain and overall improves the code base by being more consistent.
Device Reset - This was implemented together with API switching. Windows forces game developers to respond to a crashing device driver by resetting the device. We implemented the functionality already in the last update here on GitHub. This update integrates it better into the OS base layer. We also verified that the life cycle management for Windows in each application based on the IApp interface works now for device change, device reset and for API switching so that we can cover all cases of losing and recovering the device.

The functions for API switching and device reload and reset are:

void onRequestReload();
void onDeviceLost();
void onAPISwitch();

Variable Rate Shading (VRS) - we implemented VRS in a new unit test 35_VariableRateShading. It is only supported by DirectX 12 on Windows and XBOX Series S / X. In this demo, we demonstrate two main ways of setting the shading rate:
- Per-tile Shading Rate: Generating a shading rate lookup texture on-the-fly. Used for drawing the color palette which makes up the background. The rate decreases the further the pixels are located from the center. We can see artifacts becoming visible at aggressive rates, such as 4X4. There is also a slider in the UI to modify the center of the circle.

Per-tile Shading Rate

Per-draw Shading Rate: The cubes are drawn by a different shading rate. They are following the Per-draw rate, which can be changed via the dropdown menu in the UI. By using a combiner that overrides the screen rates, we ensure that cubes are drawn by an independent rate.

The cubes are using per-draw shading rate while the background is using per-tile shading rate.

Notes:
- There is a debug view showing the shading rates and the tiles' size.
- Per-tile method may not be available on certain GPUs even if they support the Per-draw method.
- The tile size is enforced by the GPU and is readable, as shown in the example.
- The shading rates available can vary based on the active GPU.
Multi-Sample Anti-Aliasing (MSAA) - we added a dynamic way of picking MSAA to unit test 9 and the Visibility Buffer example on all platforms.

PC MSAA

PS4 MSAA

PS5 MSAA

Android & OpenGL ES 2 - the OpenGL ES 2 layer for Android is now more stable and tested and closer to production code. As mentioned above on an Android phone one can switch between Vulkan and OpenGL ES 2 dyanmically if both are supported. Now Android & OpenGL ES 2 support additionally unit test 17 - Entity Component System Test. In general we are testing many Android phones at the moment on the low and high end of the spectrum following the two Android projects we are currently working on, which are on both ends of the spectrum.
PVS Studio - we did another manual pass on the code base with PVS Studio -a static code analyzer- to increase code quality.

v1.47

3 years ago

As the year winds slowly down, we finally found time to do another release. First of all, Happy Holidays and a happy new Year!

Happy Holidays and a happy new Year!

Most of us will take off over the Holiday season and spent time with their families. We should be back online in the middle of January 2021.

OpenGL ES 2.0: TF will run on probably several hundred million of mobile devices in the future. It will be the rendering layer of business application frameworks. For this usage case, we added OpenGL ES 2.0 support only for Android. The OpenGL ES 2.0 layer only supports unit tests 1, 5, 12 and 31 at the moment.
Device change / reset: we finally implemented all the code that can deal with device changes, device resets or device removed scenarios on all platforms. The underlying design was always there but it took us 3+ years to finally add the functionality :-) When you go into any of the *OSBase.* files you can find a snippet of code that looks like this:

if (pApp->mSettings.mResetGraphics) 
	{
		pApp->Unload();
		pApp->Load();
		pApp->mSettings.mResetGraphics = false;
	}

DRED / Breadcrumb support: to be able to better tell what the reason behind a removed device is, we implemented DRED support on PC with DirectX 12 and XBOX. We integrated this into the first functional test 01_Transformations. Here is a screenshot. Look for the "Simulate crash" button:

Image of the Transformations Unit test

Breadcrumb are user defined markers used to pinpoint which command has caused GPU to stall. In the Breadcrumb unit test, two markers get injected into the command list. Pressing the crash button would result in a GPU hang. In this situation, the first marker would be written before the draw command, but the second one would stall for the draw command to finish. Due to the infinite loop in the shader, the second marker won't be written, and we can reason that the draw command has caused the GPU to hang. We log the markers' information to verify this. Check out this link for more info: D3D12 Device Removed Extended Data (DRED)

More Lua Scripting support for all functional tests:
- For the scripted testing of the Unit Tests, this layer provides automated function registration of the UI elements to Lua State.
- Any UI elements added to the GUI will add a function or a pair of function(Getter/Setter) to the Lua state for using them in any script. Lua function name resolution will work like this:
- UI Widget "label" name will be included in the function name as follows,
  - For Widget events: label name + "Event Name". e.g., Lua Function name for label - "Press", and event - OnEdited : "PressOnEdited"
  - For Widget modifiers such as ints / floats: "Set" and "Get" function pair will be added as a prefix to label name e.g., "X" variable will have "SetX" and "GetX" pair of functions.
After writing the scripts, you can let the layer know about the scripts using AddTestScripts() function call and run them on any frame by RunTestScript() defined in UIApp class. There are examples of these test scripts in most of the UTs showing how you can also add these scripts to UI and test them on runtime.

Here is how the current Lua support in the functional tests might look like:

Lua support

DX11 refactor: we re-wrote the DX11 run-time a few times. We ended up with the most straighforward version. This version only recently shipped in Hades along with the Vulkan run-time on PC.
YUV support: we have now YUV support for all our Vulkan API platforms PC, Linux, Android and Switch. There is a new functional test for YUV. It runs on all these platforms:

YUV unit test

Audio: we removed the audio functional test. It was the only test that was released unfinished and didn't run on all our platforms. Our customers show love for FMOD ... would make more sense to show an integration of that.
GitHub issues fixed:
- #188 - typo - lowercase L in first DepthStencilClearFlags constant "ClEAR_DEPTH"
- #186 - Ubuntu: Examples fail to build
- #182 - Flickering on master when vsync is off
- #176 - [08_GlftViewer] Application crash on missing resource

Numerous other fixes ...