4
votes

I'd like to alert on the lack of a heartbeat (or 0 bytes received) from any one of large number of Google IOT core devices. I can't seem to do this in Stackdriver. It instead appears to let me alert on the entire device registry which does not give me what I'm looking for (How would I know that a particular device is disconnected?)

So how does one go about doing this?

2

2 Answers

5
votes

I have no idea why this question was downvoted as 'too broad'.

The truth is Google IOT doesn't have per device alerting, but instead offers only alerting on an entire device registry. If this is not true, please reply to this post. The page that clearly states this is here:

Cloud IoT Core exports usage metrics that can be monitored programmatically or accessed via Stackdriver Monitoring. These metrics are aggregated at the device registry level. You can use Stackdriver to create dashboards or set up alerts.

The importance of having per device alerting is built into the promise assumed in this statement:

Operational information about the health and functioning of devices is important to ensure that your data-gathering fabric is healthy and performing well. Devices might be located in harsh environments or in hard-to-access locations. Monitoring operational intelligence for your IoT devices is key to preserving the business-relevant data stream.

So its not easy today to get an alert if one among many, globally dispersed devices, loses connectivity. One needs to build that, and depending on what one is trying to do, it would entail different solutions.

In my case I wanted to alert if the last heartbeat time or last event state publish was older than 5 minutes. For this I need to run a looping function that scans the device registry and performs this operation regularly. The usage of this API is outlined in this other SO post: Google iot core connection status

1
votes

For reference, here's a Firebase function I just wrote to check a device's online status, probably needs some tweaks and further testing, but to help anybody else with something to start with:

// Example code to call this function
// const checkDeviceOnline = functions.httpsCallable('checkDeviceOnline');
// Include 'current' key for 'current' online status to force update on db with delta
// const isOnline = await checkDeviceOnline({ deviceID: 'XXXX', current: true })
export const checkDeviceOnline = functions.https.onCall(async (data, context) => {

    if (!context.auth) {
        throw new functions.https.HttpsError('failed-precondition', 'You must be logged in to call this function!');
    }

    // deviceID is passed in deviceID object key
    const deviceID = data.deviceID

    const dbUpdate = (isOnline) => {
        if (('wasOnline' in data) && data.wasOnline !== isOnline) {
            db.collection("devices").doc(deviceID).update({ online: isOnline })
        }

        return isOnline
    }

    const deviceLastSeen = () => {
        // We only want to use these to determine "latest seen timestamp"
        const stamps = ["lastHeartbeatTime", "lastEventTime", "lastStateTime", "lastConfigAckTime", "deviceAckTime"]
        return stamps.map(key => moment(data[key], "YYYY-MM-DDTHH:mm:ssZ").unix()).filter(epoch => !isNaN(epoch) && epoch > 0).sort().reverse().shift()
    }

    await dm.setAuth()

    const iotDevice: any = await dm.getDevice(deviceID)

    if (!iotDevice) {
        throw new functions.https.HttpsError('failed-get-device', 'Failed to get device!');
    }

    console.log('iotDevice', iotDevice)

    // If there is no error status and there is last heartbeat time, assume device is online
    if (!iotDevice.lastErrorStatus && iotDevice.lastHeartbeatTime) {
        return dbUpdate(true)
    }

    // Add iotDevice.config.deviceAckTime to root of object
    // For some reason in all my tests, I NEVER receive anything on lastConfigAckTime, so this is my workaround
    if (iotDevice.config && iotDevice.config.deviceAckTime) iotDevice.deviceAckTime = iotDevice.config.deviceAckTime

    // If there is a last error status, let's make sure it's not a stale (old) one
    const lastSeenEpoch = deviceLastSeen()
    const errorEpoch = iotDevice.lastErrorTime ? moment(iotDevice.lastErrorTime, "YYYY-MM-DDTHH:mm:ssZ").unix() : false

    console.log('lastSeen:', lastSeenEpoch, 'errorEpoch:', errorEpoch)

    // Device should be online, the error timestamp is older than latest timestamp for heartbeat, state, etc
    if (lastSeenEpoch && errorEpoch && (lastSeenEpoch > errorEpoch)) {
        return dbUpdate(true)
    }

    // error status code 4 matches
    // lastErrorStatus.code = 4
    // lastErrorStatus.message = mqtt: SERVER: The connection was closed because MQTT keep-alive check failed.
    // will also be 4 for other mqtt errors like command not sent (qos 1 not acknowledged, etc)
    if (iotDevice.lastErrorStatus && iotDevice.lastErrorStatus.code && iotDevice.lastErrorStatus.code === 4) {
        return dbUpdate(false)
    }

    return dbUpdate(false)
})

I also created a function to use with commands, to send a command to the device to check if it's online:

export const isDeviceOnline = functions.https.onCall(async (data, context) => {

    if (!context.auth) {
        throw new functions.https.HttpsError('failed-precondition', 'You must be logged in to call this function!');
    }

    // deviceID is passed in deviceID object key
    const deviceID = data.deviceID

    await dm.setAuth()

    const dbUpdate = (isOnline) => {
        if (('wasOnline' in data) && data.wasOnline !== isOnline) {
            console.log( 'updating db', deviceID, isOnline )
            db.collection("devices").doc(deviceID).update({ online: isOnline })
        } else {
            console.log('NOT updating db', deviceID, isOnline)
        }

        return isOnline
    }

    try {
        await dm.sendCommand(deviceID, 'alive?', 'alive')
        console.log('Assuming device is online after succesful alive? command')
        return dbUpdate(true)
    } catch (error) {
        console.log("Unable to send alive? command", error)
        return dbUpdate(false)
    }
})

This also uses my version of a modified DeviceManager, you can find all the example code on this gist (to make sure using latest update, and keep post on here small): https://gist.github.com/tripflex/3eff9c425f8b0c037c40f5744e46c319

All of this code, just to check if a device is online or not ... which could be easily handled by Google emitting some kind of event or adding an easy way to handle this. COME ON GOOGLE GET IT TOGETHER!