GPU
Adapter info, frame stats, raycasting against the 3D scene, screen-to-world projection, and storage buffer plumbing for shaders.
local GPU = import("GPU")
print(GPU.Info().Name, GPU.FrameStats().FPS)
Adapter / frame info
Static adapter info. Cached, calling repeatedly is
free. Useful for logging, settings menus, and gating high-cost
features behind DeviceType == "DiscreteGpu".
Hard limits the active GPU device reports. wgpu doesn't
expose total VRAM portably across DX12 / Vulkan / Metal,
MaxBufferSize is the closest single number for
"biggest thing I can allocate".
FPS / frame time / scene part count. Updated once per heart tick. Cheap; safe to poll every frame for a debug overlay.
Raycasting
GPU-dispatched ray test against every renderable BasePart.
One compute thread per part runs in parallel: cubes test as
oriented bounding boxes (CFrame + Size), spheres as
ellipsoids (CFrame + Size), and models with mesh
data test per-triangle via Möller-Trumbore in
part-local space, so corners and concavities hit precisely
instead of bounding boxes. Models without mesh data fall
back to OBB. Parts with Render = false or
IgnoreInRaycast = true are skipped. Falls back
to a CPU loop only if the GPU has not been initialized
(i.e. no window opened yet).
Parameters
origin | Vector, world-space ray start. |
direction | Vector, world-space ray direction. Length doesn't matter; Ruzit normalizes it. Zero-length errors. |
filter | Optional (part: BasePart) -> boolean. Hits visited nearest-first; return true to accept, false / nil to skip past. If omitted, the first hit is always accepted. |
maxDistance | Optional cap in world units. Hits past this distance are ignored. Defaults to ~1e6. |
Example: pick a part under the mouse, ignoring red ones
local Mouse = import("Mouse")
local origin, dir = GPU.ScreenToRay(Mouse.Position)
local hit = GPU.Raycast(origin, dir, function(part)
return part.Color ~= Color3.new(1, 0, 0)
end)
if hit then
print("clicked", hit.Part, "at", hit.Position, "distance", hit.Distance)
end
Projection
Convert a screen pixel (Mouse.Position style) to a world-space
ray. Returns (origin, direction), feed
straight into GPU.Raycast. Uses the active
Renderable.Camera as the eye + projection.
Project a world-space point onto the screen.
(Dim, false) when the point is in front of the
camera; (nil, true) when at or behind the near
plane. Useful for billboards, world-anchored UI, off-screen
indicators.
Storage buffers
Read-only data accessible from any 3D fragment shader. Bound at
@group(0) @binding(4) as
SDATA: array<f32>.
Allocate a GPU storage buffer of size floats.
Memory lives on the GPU. Freed when the GPUBuffer userdata is
garbage-collected.
Bind a GPUBuffer as the active SDATA storage. Cheap to swap between frames, bind groups are rebuilt only when the binding identity changes.
Unbind the active storage buffer. SDATA reverts to a 1-float stub so shaders that don't reference it keep working.
Cap or uncap the heart-loop tick rate. Pass
nil (or 0) to remove the cap
and run as fast as the system allows; pass a positive
number to throttle. The engine ships
uncapped — frames run as fast as the
renderer + game logic allow, bounded only by the
swapchain present mode (see SetVSync).
The cap is applied by sleeping at the end of each
tick if the frame finished early, so it has no
effect on a system that can't keep up. Cheap to
call live (atomic store), wire it to a settings
menu freely.
Current cap, or nil when uncapped.
Toggle vertical sync on the swapchain. Default is OFF — the engine picks the highest-FPS tear-free mode the driver offers (Mailbox > Immediate > FifoRelaxed > Fifo). Turning VSync on hard-caps presents to the monitor's refresh rate (Fifo / FifoRelaxed). Use it for laptop battery life or to eliminate tearing on monitors that don't expose a tear-free uncapped mode. Takes effect on the next rendered frame.
True if the swapchain is currently in a VSync mode.
Set the GPU power-mode preference. Three options:
"Quality"— default. Requests the HighPerformance adapter (discrete GPU on dual-GPU laptops) and biases the driver toward high clocks. Best for shooters / action games where you want the highest sustained FPS the hardware can give you."Performance"— despite the name, the battery-friendly mode. Requests the LowPower adapter (integrated GPU when present) and lets the driver downclock aggressively. Use for lightweight / stylized games where you'd rather extend laptop battery life than push FPS."Auto"— whatever the driver picks when no preference is given. On laptops that often means LowPower; on desktops it usually means the only available adapter.
SetPowerMode after the window
opens stores the preference (so
GetPowerMode returns it), but doesn't
switch the live adapter. To actually change the
adapter, call this before opening the
window (or set it from a startup script the launcher
runs).
Current power-mode preference. Default is
"Quality".
Return one or more world-space AABBs covering a BasePart.
Each entry is { Position, Size, Min, Max } with
Position the AABB center and Size
its full extents (max − min). Honors deformations,
current animation state, and DynMesh-driven movement, so
the boxes follow the live mesh.
quality controls how many boxes you get:
- 1 (default) — single box that encloses the entire part. Cheapest. Always accurate for cubes / spheres.
- 2+ — split a model's vertices into N bins along the longest axis and return a tight AABB per bin. Useful for long thin meshes where one big box is loose. Cubes and spheres always return one box regardless of quality.
Return every renderable BasePart whose bounds (computed at
the given quality) overlap the AABB centered at
center with full extents size.
Skips parts with Render = false. Honors
deformed / animated meshes and DynMesh offsets.
GPU spatial queries
These run as a compute pass on the GPU: one thread per part tests the
part's world-space AABB against the query shape, matches are written
to a packed atomic-counter buffer, then the index list is read back
and resolved to BaseParts. Skips parts with Render = false.
Filter functions (when provided) run on the CPU after the GPU returns
candidates, so they're only invoked on hits.
Return every part whose AABB intersects the sphere centered
at center with the given radius.
Filter is optional (BasePart) -> boolean;
return true to keep, false / nil to drop.
Return every part whose AABB intersects the oriented box at
center with full extents size,
rotated by an Euler rotation (radians, defaults
to no rotation). The GPU test is conservative
(AABB-vs-OBB-AABB), so a heavily rotated thin box may
include a few false positives near the corners. Tighten
with the filter callback if you need exact OBB-OBB on a
small candidate set.
Return every part whose AABB is inside or intersects the
current camera's view frustum. Uses
Renderable.Camera's CFrame, FOV, near and far
plus the window's aspect ratio. Useful as a fast first pass
for "what would be drawn this frame" or for area-of-interest
streaming around the player.
Mesh-precise oriented-zone query. cframe places
and rotates the zone, size is its full extents
along the zone's local X / Y / Z. The compute shader tests
each part's eight world-space corners against the zone OBB,
and for parts with mesh data (Models) it additionally
transforms the actual triangle vertices and tests each one,
so a spike sticking out of a body still registers even when
the body's bounding box doesn't reach the zone. Honors
deformed meshes (uses the live deformed verts when
DynMesh or animations are active, otherwise
the source mesh). Also catches the inverse case where the
zone is fully inside a large part.
Like the other Overlap queries this runs as one compute thread per part with an atomic-counter index list, then the CPU reads back only the matched indices.
Shadow configuration
Knobs the default 3D shader reads each frame to decide how the sun terminator behaves and how dark receivers get on their unlit side. Real depth-map sampling (per-light render passes, comparison samplers, cascade splits) is scaffolded for a future pass — this API ships now so you can author lighting against it already.
When SetShadowsEnabled(true) the default shader stops
using a soft half-Lambert and instead does a smoothstep at the
light terminator (sharper, more "shadowy" look). Receivers darken
proportional to MapQuality, the PCF knob
widens the smoothstep band to simulate softness, and
ShadowDistance fades the term back into half-Lambert
past that camera distance.
Toggle the shadow term in the default 3D shader. Default
false. Custom Frag3D shaders can
read the same value via F.shadow_enabled
(u32, 0 or 1) and choose to use it or ignore it.
Texture resolution future shadow maps will be rendered at.
Clamped to [64, 8192] and rounded to the next
power of two. Today the value also drives the strength of
the stylized shadow floor in the default shader (higher =
more contrasty receive-shadow darkening). Default
1024.
World-space radius (from the camera) beyond which the
shadow term fades back to the regular half-Lambert. Set to
0 to disable distance fade. Default
80. Surfaced as
F.shadow_distance in user shaders.
Depth comparison bias for future shadow-map sampling.
Stored in the GPU module state and forwarded to the Frame
uniform so user shaders can read it now. Default
0.0015.
Percentage-Closer-Filtering tap count, clamped to
[1, 9]. 1 is hard shadows;
3 / 5 / 9 are
progressively softer. Today this widens the smoothstep
band of the default shader's stylized terminator; once
real depth-map sampling lands the same value will drive
the actual PCF kernel. Default 1.
Reading shadow state from your own shaders
Every GPU.SetShadow* knob is mirrored into the
F uniform that every Frag3D shader already
sees. You don't bind anything extra — just read these from
inside your fs_main:
| Lua call | WGSL field | Type | Notes |
|---|---|---|---|
GPU.SetShadowsEnabled(on) | F.shadow_enabled | u32 | Branch with if (F.shadow_enabled == 0u) { ... }. |
GPU.SetShadowMapQuality(size) | F.shadow_quality | u32 | Power-of-two pixel size. Use as the kernel resolution if you ever bind a real shadow texture. |
GPU.SetShadowDistance(d) | F.shadow_distance | f32 | World units. Fade your shadow term to zero past this. |
GPU.SetShadowBias(b) | F.shadow_bias | f32 | Offset the shadow-ray origin along the surface normal to avoid self-shadowing. |
GPU.SetShadowPCF(taps) | F.shadow_pcf | u32 | 1, 3, 5, or 9. Use as your tap count for jittered shadow samples. |
| — derived — | F.shadow_strength | f32 | How dark a receiver gets on its unlit side. Engine-derived (0.3..0.85) from MapQuality. |
| — derived — | F.shadow_softness | f32 | Smoothstep band width at the light terminator. Engine-derived from PCF. |
I.cast_shadow and I.receive_shadow are also
available per-part (u32, 0 or 1) so a shader can skip the shadow
test on parts that have opted out:
// Skip the work entirely on non-receivers.
var sun_mask = 1.0;
if (I.receive_shadow != 0u && F.shadow_enabled != 0u && ndl > 0.0) {
sun_mask = compute_shadow(in.world_pos, n, l);
}
A real working ray-traced soft-shadow shader living on the floor
plus per-part variant that shadow each other lives in
examples/shadow_floor.frag and
examples/shadow_part.frag. The short version: read
your caster list from SDATA, do a tangent-space jitter
using F.shadow_pcf taps and the engine's
hash3 helper, offset the ray origin by
n * F.shadow_bias, and fade with
smoothstep(F.shadow_distance * 0.7, F.shadow_distance, d).
fn compute_shadow(world_pos: vec3<f32>, n: vec3<f32>, l: vec3<f32>) -> f32 {
if (F.shadow_enabled == 0u) { return 1.0; }
let tb = tangent_basis(l);
let taps = max(F.shadow_pcf, 1u);
let spread = 0.015 * f32(taps);
let ro = world_pos + n * max(F.shadow_bias, 0.0001);
var occ_sum = 0.0;
for (var s: u32 = 0u; s < taps; s = s + 1u) {
let hv = hash3(world_pos * 137.7 + vec3<f32>(f32(s), f32(s)*1.3, f32(s)*2.7));
let ox = (hv.x - 0.5) * spread * 2.0;
let oy = (hv.y - 0.5) * spread * 2.0;
let rd = normalize(l + tb[0] * ox + tb[1] * oy);
occ_sum = occ_sum + occluded(ro, rd);
}
var occ = occ_sum / f32(taps);
if (F.shadow_distance > 0.0) {
let d = distance(world_pos, F.camera_pos);
let fade = smoothstep(F.shadow_distance * 0.7, F.shadow_distance, d);
occ = occ * (1.0 - fade);
}
return 1.0 - clamp(occ * 0.95, 0.0, 1.0);
}
Caster list. The occluded() helper
above walks the SDATA storage buffer for caster
geometry. That buffer is the same general-purpose slot
GPU.SetBuffer binds, so you populate it from Lua:
allocate via GPU.NewBuffer, pack 4 header
floats followed by 8 floats per caster
(kind then padding then center+half-extents), and
rebind on every frame the casters move. The engine does not
auto-populate this — the shape of the caster layout is
entirely up to you.
GPUBuffer
Returned by GPU.NewBuffer.
Number of f32 slots. Same value as the size arg
passed to GPU.NewBuffer; doesn't change after
construction.
Write a list of f32 values into the buffer at
offset (in floats, not bytes). Bounds-checked
, writing past the end errors instead of corrupting
adjacent GPU memory. Cheap; one queue-write under the hood.
Fill the entire buffer with one number. Convenience for
clearing (:Fill(0)) or initialising to a sentinel.
RaycastHit
Returned by GPU.Raycast. All vectors world-space.
Part | BasePart, the part the ray hit. |
Position | Vector, world-space hit point. |
Distance | number, world-space length from the ray's origin. |
Normal | Vector, surface normal at the hit, world-space. |
GPUInfo
Static adapter info, filled once at window-open time. Returned by
GPU.Info().
Name | string, driver-reported display name ("NVIDIA GeForce RTX 4070"). |
Vendor | string, friendly label: "NVIDIA" | "AMD" | "Intel" | "Apple" | "ARM" | "Qualcomm" | "Imagination" | "Microsoft" | "Other" | "Unknown". |
VendorID | number, raw PCI vendor id. |
DeviceID | number, raw PCI device id. |
Backend | string, "Dx12" | "Vulkan" | "Metal" | "Gl" | "BrowserWebGpu". |
Driver | string, driver version, free-form. |
DriverInfo | string, extra driver detail (build, date). May be empty. |
DeviceType | string, "DiscreteGpu" | "IntegratedGpu" | "VirtualGpu" | "Cpu" | "Other". Useful for quality presets at startup. |
GPULimits
Hard caps reported by the wgpu device. Returned by GPU.Limits().
MaxTextureSize | number, largest 2D texture dimension (e.g. 16384 on most GPUs). |
MaxBufferSize | number, largest single buffer allocation in bytes. |
MaxBindGroups | number, max simultaneously bound bind groups in one pipeline. |
MaxVertexBuffers | number, max vertex buffers attached to one pipeline. |
MaxComputeWorkgroup | { X: number, Y: number, Z: number }, max compute workgroup size per axis. |
GPUFrameStats
Live frame stats. Smoothed dt is an EMA so FPS doesn't strobe. Returned
by GPU.FrameStats().
FPS | number, smoothed frames-per-second. |
FrameTime | number, smoothed frame time in seconds. |
FrameCount | number, total frames drawn since process start. |
Uptime | number, wall-clock seconds since GPU tracking started. |
PartCount | number, live count of renderable parts in the scene. |