1
votes

I am experiencing a sporadic issue running containers on ACI that seems to cause Azure to "lose track" of my container instance and result in an orphaned container. My containers always run successfully, but every now and then I get this weird issue. Some peculiarities:

  • the container instance will still succeed internally (the code in it runs successfully), and the parent container group even says "Succeeded", but Azure never tells me the container instance itself has been created. It just says "Started". Typically the events you see are Pulling-->Pulled-->Created-->Started. Why is "Created" missing?
  • I can't view logs of the container without hooking up Azure Log Analytics. The "Logs" tab on the container blade in the Azure portal just says No logs available. Normally you can see the logs of a successful container
  • in cases of this issue occurring, it tries to pull the image twice (and appears to succeed twice - see image below).
  • sometimes there will be a 4th event displayed in the portal, "Killed"

enter image description here

I am creating a single-container container group via Logic Apps' Azure Container Instance connector - I do this reliably for many automated workflows. The logic app monitors the container group's state, and pulls the instance's logs and then deletes the group when done. All of my images are hosted on Azure Container Registry. The Python code inside the container pulls data from SQL, generates a PDF report, and posts it on an Azure Blob. I know the code is running/succeeding because I can see the report being posted! I have also hooked up Log Analytics to the container, so I can see my internal python logging. There are NO other errors I see reported by Log Analytics. I get a failure in the logic app though when I try to pull container logs and it can't find them (see bullet point 2 above).

Here's output from log analytics on container events (a more detailed version of above screenshot) - so bizarre that the container REPULLS 10 seconds after the first one successfully pulled. You can then see my first container actually runs successfully and exits with 0, and we then have this orphan container left over that is killed.

enter image description here

I have noticed one thing VERY consistent when this issue occurs. Typically when I look at a successful container creation event in Azure, the event message specifies that it is pulling my image via its tag: myregistry.azurecr.io/riptuskimage:1.2.5. When this issue occurs, the event messages specifies that the image is being pulled by its digest instead: myregistry.azurecr.io/riptuskimage@shah256:d98fja.... EVERY time the issue has occurred, I've noticed this. I have no idea why Azure is doing this. I most certainly specify the tag in my creation request.

I have viewed this post and this post and neither really help.

I've been scratching my head for a while on this one. The fact that it's sporadic (doesn't always happen), and when it does the images pull twice gives me the suspicion it has something to do with my container registry. The image I'm pulling is large - about 1.6GB. I checked the container registry's throttle limits and I don't think a single pull of a 1.6GB image should end up throttling - but the ACI container creation doesn't really give me a way to see if the registry is returning a 429 HTTP error. I'm not pulling anything else at that time.

Anyone have any ideas? Thanks!

Edit: This is a recent phenomenon! I have logic apps in place that have been creating containers for over a year, and this issue only starting occurring in the last few weeks (as of this posting 9/24/2021)