You’re already most of the way there — your code places a plane atop the detected image, so clearly you have something going on there that successfully sets the center position of the plane to that of the image anchor. Perhaps your first step should be to better understand the code you have...
ARPlaneAnchor
has a center
(and extent
) because planes can effectively grow after ARKit initially detects them. When you first get a plane anchor, its transform
tells you the position and orientation of some small patch of flat horizontal (or vertical) surface. That alone is enough for you to place some virtual content in the middle of that small patch of surface.
Over time, ARKit figures out where more of the same flat surface is, so the plane anchor’s extent
gets larger. But you might initially detect, say, one end of a table and then recognize more of the far end — that means the flat surface isn’t centered around the first patch detected. Rather than change the transform
of the anchor, ARKit tells you the new center
(which is relative to that transform).
An ARImageAnchor
doesn’t grow — either ARKit detects the whole image at once or it doesn’t detect the image at all. So when you detect an image, the anchor’s transform
tells you the position and orientation of the center of the image. (And if you want to know the size/extent, you can get that from the physicalSize
of the detected reference image, like the sample code does.)
So, to place some SceneKit content at the position of an ARImageAnchor
(or any other ARAnchor
subclass), you can:
Simply add it as a child node of the SCNNode
ARKit creates for you in that delegate method. If you don’t do something to change them, its position and orientation will match that of the node that owns it. (This is what the Apple sample code you’re quoting does.)
Place it in world space (that is, as a child of the scene’s rootNode
), using the anchor’s transform
to get position or orientation or both.
(You can extract the translation — that is, relative position — from a transform matrix: grab the first three elements of the last column; e.g. transform.columns.3
is a float4
vector whose xyz elements are your position and whose w element is 1.)
The demo video you linked to isn’t putting things in 3D space, though — it’s putting 2D UI elements on the screen, whose positions track the 3D camera-relative movement of anchors in world space.
You can easily get that kind of effect (to a first approximation) by using ARSKView
(ARKit+SpriteKit) instead of ARSCNView
(ARKit+SceneKit). That lets you associate 2D sprites with 3D positions in world space, and then ARSKView
automatically moves and scales them so that they appear to stay attached to those 3D positions. It’s a common 3D graphics trick called “billboarding”, where the 2D sprite is always kept upright and facing the camera, but moved around and scaled to match 3D perspective.
If that’s the effect you’re looking for, there’s an App(le sample code) for that, too. The Using Vision in Real Time with ARKit example is mostly about other topics, but it does show how to use ARSKView
to display labels associated with ARAnchor
positions. (And as you’ve seen above, placing content to match an anchor position is the same no matter which ARAnchor
subclass you’re using.) Here’s the key bit in their code:
func view(_ view: ARSKView, didAdd node: SKNode, for anchor: ARAnchor) {
// ... irrelevant bits omitted...
let label = TemplateLabelNode(text: labelText)
node.addChild(label)
}
That is, just implement the ARSKView
didAdd
delegate method, and add whatever SpriteKit node you want as a child of the one ARKit provides.
However, the demo video does more than just sprite billboarding: the labels it associates with paintings not only stay fixed in 2D orientation, they stay fixed in 2D size (that is, they don’t scale to simulate perspective like a billboarded sprite does). What’s more, they seem to be UIKit controls, with the full set of inherited interactive behaviors that entails, not just 2D images the likes of which are ways to do with SpriteKit.
Apple’s APIs don’t provide a direct way to do this “out of the box”, but it’s not a stretch to imagine some ways one could put API pieces together to get this kind of result. Here are a couple of avenues to explore:
If you don’t need UIKit controls, you can probably do it all in SpriteKit, using constraints to match the position of the “billboarded” nodes ARSKView
provides but not their scale. That’d probably look something like this (untested, caveat emptor):
func view(_ view: ARSKView, didAdd node: SKNode, for anchor: ARAnchor) {
let label = MyLabelNode(text: labelText) // or however you make your label
view.scene.addChild(label)
// constrain label to zero distance from ARSKView-provided, anchor-following node
let zeroDistanceToAnchor = SKConstraint.distance(SKRange(constantValue: 0), to: node)
label.constraints = [ zeroDistanceToAnchor ]
}
If you want UIKit elements, make the ARSKView
a child view of your view controller (not the root view), and make those UIKit elements other child views. Then, in your SpriteKit scene’s update
method, go through your ARAnchor
-following nodes, convert their positions from SpriteKit scene coordinates to UIKit view coordinates, and set the positions of your UIKit elements accordingly. (The demo appears to be using popovers, so those you wouldn’t be managing as child views... you’d probably be updating the sourceRect
for each popover.) That’s a lot more involved, so the details are beyond the scope of this already long answer.
A final note... hopefully this long-winded answer has been helpful with the key issues of your question (understanding anchor positions and placing 3D or 2D content that follows them as the camera moves).
But to clarify and give a warning about some of the key words early in your question:
When ARKit says it doesn’t track images after detection, that means it doesn’t know when/if the image moves (relative to the world around it). ARKit reports an image’s position only once, so that position doesn’t even benefit from how ARKit continues to improve its estimates of the world around you and your position in it. For example, if an image is on a wall, the reported position/orientation of the image might not line up with a vertical plane detection result on the wall (especially over time, as the plane estimate improves).
Update: In iOS 12, you can enable "live" tracking of detected images. But there are limits on how many you can track at once, so the rest of this advice may still apply.
This doesn’t mean that you can’t place content that appears to “track” that static-in-world-space position, in the sense of moving around on the screen to follow it as your camera moves.
But it does mean your user experience may suffer if you try to do things that rely on having a high-precision, real-time estimate of the image’s position. So don’t, say, try to put a virtual frame around your painting, or replace the painting with an animated version of itself. But having a text label with an arrow pointing to roughly where the image is in space is great.