0
votes

I am working on a Salesforce integration for an high-traffic app where we want to be able to automate the process of importing records from Salesforce to our app. To be clear I am not working from the Salesforce side (i.e. Apex), but rather using the Salesforce Rest API from within the other app.

The first idea was to use the cutoff time for when the record was created where we would increase that time on each poll based on the creation time of the applicant in the last poll. It was quickly realized this wouldn't work for this. There can be other filters in the query that might include a status field in Salesforce, for example, where the record should only import after a certain status is set. This would make checking creation time or anything like that unreliable since an older record could later become relevant to our auto importing.

My next idea was to poll the Salesforce API to find records every few hours. In order to avoid importing the same record twice, the only way I could think to do this is by keeping track of the IDs we already attempted to import and using these to do a NOT IN condition:

SELECT #{columns} FROM #{sobject_name} 
WHERE Id NOT IN #{ids_we_already_imported} AND #{other_filters}

My big concern at this point was whether or not Salesforce had a limitation on the length of the WHERE clause. Through some research I see there are actually several limitations:

https://developer.salesforce.com/docs/atlas.en-us.salesforce_app_limits_cheatsheet.meta/salesforce_app_limits_cheatsheet/salesforce_app_limits_platform_soslsoql.htm

The next thing I considered was doing queries to find the all of the IDs in Salesforce that meet the conditions of the other filters without checking the ID itself. Then we could take that list of IDs and remove the ones we already tracked on our end to find a smaller IN condition we could set to find all of the data on the records we actually need.

This still doesn't seem completely reliable though. I see a single query can only return 2000 rows and only have an offset up to 2000. If we already imported 2000 records the first query might not have any necessary rows we'd want to import, but we can't offset it to get the relevant rows because of these limitations.

With these limitations I can't figure out a reliable way to find the relevant records to import as the number of records we already imported grows. I feel like this would be common usage of a Salesforce integration, but I can't find anything on this. How can I do this without having to worry about issues when we reach a high volume?

1
there's no 2000 limit on query, the first 2000 rows will be returned but you can access the rest of them [by GETing the nextRecordsUrl field value]superfell
Thanks! I had no idea there was a built-in way to get the rest of the results of the query! This would definitely allow for the plan as I explained above to work.mikewoj

1 Answers

0
votes

Not sure what all of your requirements are or if the solution needs to be generic, but you could do a few of things.

  1. Flag records that have been imported, but that means making a call back to salesforce to update the records, but that can be bulkified to reduce the number of calls and modify your query to exclude the flag
  2. Reverse the way you get the data to push instead of pull, so have salesforce push records that meet the criteria to you app whenever the record meets the criteria with workflow and outbound messages
  3. Use the streaming API to setup a push topic that you app can subscribe to that would get notified when a records meets the criteria