Loading CSV into Neo4j on Azure

0

votes

I have Neo4j operational on Azure. I can load data using python and a series of create statements:

create (n:Person) return n

I can query successfully using python.

Using LOAD CSV requires a file in the Neo4j import directory. I've located that directory, but moving a file into it is blocked. I've also tried putting the file in an accessable directory, but then cannot figure out how to address the path in the LOAD CSV statement.

This LOAD gives an error because the file cannot get into the Neo4j import directory:

USING PERIODIC COMMIT 10000 LOAD CSV WITH HEADERS FROM 'file:///FTDNATree.csv' AS line FIELDTERMINATOR '|' merge (s:SNPNode{SNP:toString(line.Parent)})

This statement does not find the file and gives an error: EXTERNAL file not found

USING PERIODIC COMMIT 10000 LOAD CSV WITH HEADERS FROM 'file:///{my directory path/}FTDNATree.csv' AS line FIELDTERMINATOR '|' merge (s:SNPNode{SNP:toString(line.Parent)})

Even though the python and neo4j are in the same resource group, they are different VMs. The problem seems to be the interoperability between the two VM?

pythonazurecsvneo4j

1

votes

If you have access to neo4j.conf, then you can modify the value of dbms.directories.import to point to an accessible directory

See https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/#config_dbms.directories.import

0

votes

The solution was NOT well documented in one place. But here is what evolved by trial and error and which works.

I created a storage account within the resource
Created a directory accessible from code in which the upload file was placed.
Added container, called it neo4j-import
I could then a tranfer the file to the container as a blob (i.e., *.csv file)
You then need to make the file accessible. This involves creating an sas token and attaching it to a URL pointing to the container and the file (see python code to do this below).
You can test this URL in your local browser. It should retrieve the file, which is not accessible without the sas token
This URL is used in the LOAD CSV statement and successfully loads the Neo4j database

The code for step 4; pardon indent issues upon pasting here.

          from azure.storage.blob import BlobServiceClient, BlobClient, 
    ContainerClient, generate_account_sas, ResourceTypes, AccountSasPermissions

    def UploadFileToDataStorage(FileName,
    UploadFileSourceDirecory=ImportDirectory,BlobConnStr=AzureBlobConnectionString,
Container="neo4j-import"):
           #uploads file as blob to data storage
           #https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python  #upload-blobs-to-a-container
           blob_service_client = BlobServiceClient.from_connection_string(BlobConnStr)
           blob_client = blob_service_client.get_blob_client(container=Container, blob=FileName)
           with open(UploadFileSourceDirecory + FileName, "rb") as data:
               blob_client.upload_blob(data)

The key python code (step 5 above).

    def GetBlobURLwithSAS(FileName,Container="neo4j-import"):
    #https://pypi.org/project/azure-storage-blob/
    #https://docs.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobserviceclient?view=azure-python
    #generates sas token for object blob so it can be consumed by another process
    sas_token = generate_account_sas(
    account_name="{storage account  name}",
    account_key="{storage acct key}",
    resource_types=ResourceTypes(service=False, container=False, object=True),
    permission=AccountSasPermissions(read=True),
    expiry=datetime.utcnow() + timedelta(hours=1))
    return "https://{storage account name}.blob.core.windows.net/" + Container + "/" + FileName + "?" + sas_token

The LOAD statement looks like this and does not use the file:/// prefix:

LOAD CSV WITH HEADERS FROM '" + {URL from above} + "' AS line FIELDTERMINATOR '|'{your cypher query for loading csv}

I hope this helps other to navigate this scenario!

Loading CSV into Neo4j on Azure

2 Answers