Metal Shading Language, Part 6: Textures, Samplers, and Reading Image Data

Most of what you see on screen in a 3D application is a texture. The wood grain on the floor, the scratches on the metal surface, the clouds in the sky, the text in the UI, they are all rectangular grids of color data that fragment shaders sample during rendering. Textures are the primary storage format for image data on the GPU, and the sampling system that reads from them is both more sophisticated and more important than it appears from the outside.

This part covers textures and samplers: what they are, how to declare them in MSL, how to read from them, and what the sampling hardware actually does when you call .sample().

What a texture is

A texture is typed, multi dimensional image data stored in a GPU optimized layout. The GPU stores texture data in swizzled or tiled memory layouts that optimize for 2D spatial locality, which makes reading a small region of a texture much faster than reading a random block from a linear buffer.

Textures come in several types:

texture1d<T>          // one-dimensional, rarely used
texture2d<T>          // two-dimensional, the common case
texture3d<T>          // volumetric, for effects like fog or 3D noise
texturecube<T>        // six-faced cube, for skyboxes and environment maps
texture2d_array<T>    // array of 2D textures, for terrain tiling or atlases
depth2d<T>            // depth texture with comparison sampling support

The type parameter T is the component type: float, half, uint, int. It determines the numeric type returned by read and sample operations. For most color textures, float is the correct choice.

A second template parameter specifies access mode:

texture2d<float, access::read>       // read-only (default in compute)
texture2d<float, access::write>      // write-only
texture2d<float, access::read_write> // read and write (requires tier 2 support)
texture2d<float, access::sample>     // sampled access (default in graphics)

access::sample enables the hardware sampling system: filtering, mipmapping, coordinate wrapping. access::read bypasses the sampler and reads from an exact texel. access::write outputs to a specific texel. access::read_write combines read and write in compute kernels, but requires hardware support that not all devices (including the iOS Simulator) provide.

Declaring textures in shader functions

fragment float4 textured(
    VertexOut in [[stage_in]],
    texture2d<float> albedo      [[texture(0)]],
    texture2d<float> normalMap   [[texture(1)]],
    sampler texSampler           [[sampler(0)]]
) {
    float4 color  = albedo.sample(texSampler, in.uv);
    float4 normal = normalMap.sample(texSampler, in.uv);
    // ...
    return color;
}

The [[texture(n)]] and [[sampler(n)]] attributes bind to slots in Metal's resource table. Your Swift code sets textures and samplers at the matching indices before the draw call:

encoder.setFragmentTexture(albedoTexture, index: 0)
encoder.setFragmentTexture(normalMapTexture, index: 1)
encoder.setFragmentSamplerState(sampler, index: 0)

One sampler can be shared across multiple texture accesses. There is no rule requiring a one to one match. A sampler describes how to access, not which texture to access.

What a sampler is

A sampler is a configuration object that controls how the hardware reads from a texture when the requested coordinate does not fall exactly on a texel center. It specifies the filtering mode (what to return when sampling between texels), the mipmap mode (which mip level to sample and whether to blend between levels), the address mode (what to do when coordinates fall outside [0, 1]), and the max anisotropy limit.

You can define samplers in MSL directly with constexpr sampler:

constexpr sampler texSampler(
    filter::linear,            // bilinear filtering within a mip level
    mip_filter::linear,        // trilinear: blend between mip levels
    address::repeat,           // wrap coordinates outside [0, 1]
    max_anisotropy(4)          // anisotropic filtering up to 4x
);

fragment float4 my_fragment(
    VertexOut in [[stage_in]],
    texture2d<float> albedo [[texture(0)]]
) {
    return albedo.sample(texSampler, in.uv);
}

A constexpr sampler defined in the shader is compiled into the pipeline state. No Swift side sampler object is needed. This avoids the overhead of binding a sampler from Swift and is generally preferred when the sampler settings do not need to change between draw calls.

When you need runtime configurable sampling behavior, define the sampler in Swift and bind it:

let desc = MTLSamplerDescriptor()
desc.minFilter = .linear
desc.magFilter = .linear
desc.mipFilter = .linear
desc.sAddressMode = .repeat
desc.tAddressMode = .repeat
let sampler = device.makeSamplerState(descriptor: desc)!

The `.sample()` method

The primary way to read a texture with filtering applied:

float4 color = albedo.sample(sampler, uv);
// uv is float2 with values in [0.0, 1.0]
// (0, 0) is the top left corner of the texture
// (1, 1) is the bottom-right

The sampling hardware takes the UV coordinate, locates it within the texture, and returns a filtered result. With filter::linear, it reads four surrounding texels and returns their weighted average. With filter::nearest, it returns the single nearest texel. The difference in image quality is significant: linear filtering produces smooth gradients, nearest produces the blocky pixelated look of classic video games (intentional in pixel art aesthetics, wrong everywhere else).

Mip levels and .sample(). Mipmaps are precomputed downsampled versions of a texture, each half the dimensions of the previous, stored together in the same texture object. The GPU selects the mip level automatically based on how many screen pixels correspond to one texel at the current camera distance. A surface far from the camera maps many texels to a few pixels; sampling from a reduced mip avoids aliasing and cache thrashing. A surface close to the camera maps one texel to many pixels; the full resolution mip provides maximum detail.

With mip_filter::linear (trilinear filtering), the GPU blends between two mip levels, eliminating the visible pop as you move toward or away from a surface. This is the standard setting for most applications.

Anisotropic filtering. When a surface is viewed at a steep angle, the screen space footprint of a texel becomes elongated rather than square. Isotropic filtering samples in a circle and produces blur. Anisotropic filtering samples along the elongated axis using multiple samples and produces a sharper result. The max_anisotropy setting controls the maximum number of samples per texel lookup. Higher values are sharper but more expensive.

The `.read()` method

Bypasses filtering entirely. Returns the value of a specific texel at integer pixel coordinates:

float4 pixel = source.read(uint2(x, y));
// x and y are integer pixel coordinates
// returns the exact texel value at that location

Use .read() in compute kernels when you need precise per texel access without filtering, such as image processing where each output pixel reads from one specific input pixel. Use .sample() in graphics shaders when you have UV coordinates and want the hardware to handle filtering.

In a 2D image processing kernel:

kernel void blur(
    texture2d<float, access::read>  source [[texture(0)]],
    texture2d<float, access::write> dest   [[texture(1)]],
    uint2 gid [[thread_position_in_grid]]
) {
    uint2 size = uint2(source.get_width(), source.get_height());
    if (gid.x >= size.x || gid.y >= size.y) return;

    // 3x3 box blur
    float4 sum = float4(0.0);
    for (int dy = -1; dy <= 1; dy++) {
        for (int dx = -1; dx <= 1; dx++) {
            int2 samplePos = int2(gid) + int2(dx, dy);
            // Clamp to texture boundaries
            samplePos = clamp(samplePos, int2(0), int2(size) - 1);
            sum += source.read(uint2(samplePos));
        }
    }
    dest.write(sum / 9.0, gid);
}

The `.write()` method

Writes to a specific texel. Only available with access::write or access::read_write:

texture2d<float, access::write> output [[texture(1)]];
output.write(float4(r, g, b, 1.0), uint2(x, y));

No filtering, no interpolation. A direct write to a texel. The values you write persist in the texture for subsequent operations or for the render target output.

Texture coordinate conventions

Metal textures use a coordinate system with the origin at the top left corner. Positive x goes right, positive y goes down. This is the opposite of the mathematical convention (origin bottom left) and different from OpenGL's convention.

If you load image data with standard image loading code, the coordinate system usually matches: row 0 is the top of the image, which matches Metal's UV (0,0) being the top left. Confusion arises when loading assets from OpenGL oriented pipelines, where the y axis is flipped. A UV y coordinate of 1.0 - v flips a texture vertically.

Normalized device coordinates (NDC) in Metal run from -1 to +1 with +y pointing up. Texture coordinates run from 0 to 1 with +y pointing down. These different conventions coexist in the same rendering pipeline. They cause bugs when one convention is used where the other was expected.

Cube map textures

A cube map represents the six faces of a cube. It is indexed with a 3D direction vector rather than 2D UV coordinates. The hardware determines which face and which texel to sample based on which axis the direction vector most closely aligns with.

texturecube<float> skybox [[texture(0)]];
constexpr sampler cubeSampler(filter::linear);

float4 skyColor = skybox.sample(cubeSampler, reflectionDirection);

Cube maps are used for skyboxes (rendering the background from a precomputed panoramic capture), reflection probes (querying approximate reflections from a cube capture at a scene location), and omnidirectional shadow maps (storing depth values in all directions for point light shadows).

The direction vector does not need to be normalized; the sampler normalizes it internally.

Texture array access

A texture array holds multiple textures of the same dimensions in a single object. It is indexed with an integer slice index alongside the usual UV coordinate:

texture2d_array<float> spriteAtlas [[texture(0)]];
float4 sprite = spriteAtlas.sample(s, uv, spriteIndex);

Texture arrays avoid switching between bound textures within a draw call, which is expensive. A sprite renderer or a terrain renderer that needs many different tile textures can pack them into a texture array and select tiles by index in the shader.

Texture sampling in the fragment shader: automatic derivatives

Fragment shaders have access to texture derivatives that compute kernels do not. The dfdx() and dfdy() functions return the rate of change of any value across adjacent fragments, and the texture sampler uses these derivatives internally to select the appropriate mip level.

When you call .sample() in a fragment shader, the hardware computes the UV derivatives automatically and passes them to the sampling unit to determine lod (level of detail). You do not need to compute this yourself. The mip level selection is correct for the current view.

In a compute kernel, there are no adjacent fragments and therefore no automatic derivatives. If you sample a texture in a compute kernel with a sampler that has mipmapping enabled, you must either specify the mip level explicitly or compute the LOD manually:

// Explicit mip level in a compute kernel
float4 texel = myTexture.sample(s, uv, level(mipLevel));

// Or use .read() for exact texel access without any LOD logic
float4 texel = myTexture.read(texelCoord, mipLevel);

This is why image processing kernels typically use .read(): they process textures at a fixed resolution where LOD does not apply.

A complete fragment shader with textures

A physically based rendering fragment shader using albedo, normal, metallic, and roughness maps:

#include <metal_stdlib>
using namespace metal;

struct VertexOut {
    float4 position [[position]];
    float2 uv;
    float3 worldNormal;
    float3 worldPosition;
};

struct Lighting {
    float3 lightDirection;
    float3 lightColor;
    float3 cameraPosition;
};

fragment float4 pbr_fragment(
    VertexOut in [[stage_in]],
    texture2d<float> albedoMap    [[texture(0)]],
    texture2d<float> normalMap    [[texture(1)]],
    texture2d<float> mrMap        [[texture(2)]],  // metallic/roughness packed
    constant Lighting &lighting   [[buffer(0)]]
) {
    constexpr sampler s(filter::linear, mip_filter::linear, address::repeat);

    float4 albedo    = albedoMap.sample(s, in.uv);
    float4 normalSample = normalMap.sample(s, in.uv);
    float4 mr        = mrMap.sample(s, in.uv);

    float metallic   = mr.r;
    float roughness  = mr.g;

    // Unpack normal from [0,1] to [-1,1]
    float3 N = normalize(normalSample.rgb * 2.0 - 1.0);

    float3 L = normalize(-lighting.lightDirection);
    float3 V = normalize(lighting.cameraPosition - in.worldPosition);
    float3 H = normalize(L + V);

    float NdotL = max(dot(N, L), 0.0);
    float NdotV = max(dot(N, V), 0.0);
    float NdotH = max(dot(N, H), 0.0);

    // Simple diffuse + specular approximation
    float3 diffuse  = albedo.rgb * (1.0 - metallic) * NdotL;
    float  gloss    = mix(0.04, 1.0, 1.0 - roughness);
    float3 specular = float3(gloss) * pow(NdotH, 64.0) * NdotL;

    float3 color = (diffuse + specular) * lighting.lightColor;
    return float4(color, albedo.a);
}

This is a simplification of PBR, but the structure reflects real usage: multiple textures sampled with the same sampler, mathematical operations on the sampled values, a final composite returned as the fragment color.

Part 7

Textures and samplers cover most of the resource access patterns you will encounter. The final part covers the Metal standard library's built in functions in depth: the math functions, the geometric operations, the synchronization primitives, and the atomic operations that make safe concurrent writes to device memory possible.