Yes, if you are trying to do a stateful upgrade of your Flink application, there are a few things that can cause it to fail.
The UIDs of the stateful operators are used to find the state for each operator. If you haven't set the UIDs, then if the job's topology has changed, state restore will fail because Flink won't be able to find the state. See the docs on Assigning Operator IDs for details.
If you have dropped a stateful operator, then you should run the new job while specifying -allowNonRestoredState
.
If you have modified your data types, the job can fail when attempting to deserialize the state in the checkpoint or savepoint. Flink 1.7 did not have any support for automatic schema evolution or state migration. In more recent versions of Flink, if you stick to POJOs or Avro, this is handled automatically. Otherwise you need custom serializers.
If this doesn't help you figure out what's going wrong, please share the information from the logs showing the specific exception.