SceneKit point cloud from ARKit depth buffers

Question

I am attempting to find a simple way in SceneKit to calculate the depth of a pixels in SceneKit and LiDAR data from sceneView.session.currentFrame?.smoothedSceneDepth?.depthMap

Ideally I don't want to use metal shaders. I would prefer find a points in my currentFrame and their corresponding depth map, to get the depth of a points in SceneKit (ideally in world coordinates, not just local to that frustum at that point in time).

Fast performance isn't necessary as it won't be calculated at capture.

I am aware of the Apple project at link, however this is far too complex for my needs.

As a starting point, my code works like this:

    
guard let depthData = frame.sceneDepth else { return }
let camera = frame.camera
        
let depthPixelBuffer = depthData.depthMap
let depthHeight = CVPixelBufferGetHeight(depthPixelBuffer)
let depthWidth  = CVPixelBufferGetWidth(depthPixelBuffer)
let resizeScale = CGFloat(depthWidth) / CGFloat(CVPixelBufferGetWidth(frame.capturedImage))
let resizedColorImage = frame.capturedImage.toCGImage(scale: resizeScale);
guard let colorData = resizedColorImage.pixelData() else {
    fatalError()
}

var intrinsics = camera.intrinsics;
let referenceDimensions = camera.imageResolution;
let ratio = Float(referenceDimensions.width) / Float(depthWidth)

intrinsics.columns.0[0] /= ratio
intrinsics.columns.1[1] /= ratio
intrinsics.columns.2[0] /= ratio
intrinsics.columns.2[1] /= ratio

var points: [SCNVector3] = []

let depthValues = depthPixelBuffer.depthValues()

for vv in 0..<depthHeight {
    for uu in 0..<depthWidth {
        let z = -depthValues[uu + vv * depthWidth]
        let x = Float32(uu) / Float32(depthWidth) * 2.0 - 1.0;
        let y = 1.0 - Float32(vv) / Float32(depthHeight) * 2.0;
        points.append(SCNVector3(x, y, z))
    }
}

The resulting point cloud looks ok, but is severely bent on the Z-axis. I realize this code is also not adjusting for screen orientation either.

Dan M Dan M · Accepted Answer · 2021-03-20T05:23:33

A Cupertino kindly got back to me with this response on the forums at developer.apple.com:

The unprojection calculation itself is going to be identical, regardless of whether it is done CPU side or GPU side.

CPU side, the calculation would look something like this:

/// Returns a world space position given a point in the camera image, the eye space depth (sampled/read from the corresponding point in the depth image), the inverse camera intrinsics, and the inverse view matrix.

func worldPoint(cameraPoint: SIMD2<Float>, eyeDepth: Float, cameraIntrinsicsInversed: simd_float3x3, viewMatrixInversed: simd_float4x4) -> SIMD3<Float> {
        let localPoint = cameraIntrinsicsInversed * simd_float3(cameraPoint, 1) * -eyeDepth
        let worldPoint = viewMatrixInversed * simd_float4(localPoint, 1);
        return (worldPoint / worldPoint.w)[SIMD3(0,1,2)];
 }

Implemented, this looks like

for vv in 0..<depthHeight {
    for uu in 0..<depthWidth {
        let z = -depthValues[uu + vv * depthWidth]
        let viewMatInverted = (sceneView.session.currentFrame?.camera.viewMatrix(for: UIApplication.shared.statusBarOrientation))!.inverse
        let worldPoint = worldPoint(cameraPoint: SIMD2(Float(uu), Float(vv)), eyeDepth: z, cameraIntrinsicsInversed: intrinsics.inverse, viewMatrixInversed: viewMatInverted * rotateToARCamera )
        points.append(SCNVector3(worldPoint))
    }
}

The point cloud is pretty messy, needs confidence worked out, and there are gaps vertically where Int rounding has occurred, but it's a solid start. Missing functions come from the link to the Apple demo project in the question above.

SceneKit point cloud from ARKit depth buffers

1 Answers