1
votes

I'm trying to download a CSV file from a website in Data Factory using the HTTP connector as my source linked service in a copy activity. It's basically a web call to a url that looks like https://www.mywebsite.org/api/entityname.csv?fields=:all&paging=false.

The website uses basic authentication. I have manually tested by using the url in a browser and entering the credentials, and everything works fine. I have used the REST connector in a copy activity to download the data as a JSON file (same url, just without the ".csv" in there), and that works fine. But there is something about the authentication in the HTTP connector that is different and causing issues. When I try to execute my copy activity, it downloads a csv file that contains the HTML for the login page on the source website.

While searching, I did come across this Github issue on the docs that suggests that the basic auth header is not initially sent and that may be causing an issue.

As I have it now, the authentication is defined in the linked service. I'm hoping that maybe I can add something to the Additional Headers or Request Body properties of the source in my copy activity to make this work, but I haven't found the right thing yet.

Suggestions of things to try or code samples of a working copy activity using the HTTP connector and basic auth would be much appreciated.

2

2 Answers

0
votes

The HTTP connector expects the API to return a 401 Unauthorized response after the initial request. It then responds with the basic auth credentials. If the API doesn't do this, it won't use the credentials provided in the HTTP linked service.

If that is the case, go to the copy activity source, and in the additional headers property add Authorization: Basic followed by the base64 encoded string of username:password. It should look something like this (where the string at the end is the encoded username:password):

Authorization: Basic ZxN0b2njFasdfkVEH1fU2GM=`

It's best if that isn't hard coded into the copy activity but is retrieved from Key Vault and passed as secure input to the copy activity.

-1
votes

I suggest you try to use the REST connector instead of the HTTP one. It supports Basic as authentication type and I have verified it using a test endpoint on HTTPbin.org

Screenshot of linked service configuration

Above is the configuration for the REST linked service. Once you have created a dataset connected to this linked service you can include it in you copy activity.

Screenshot of including REST dataset as source in copy activity

Once the pipeline executes the content of the REST response will be saved in the specified file.

Screenshot of data written to sink file