0
votes

I am new to the MS Azure. I am trying to download Microsoft Academic Graph for various analysis, and they don't offer bulk-downloading the structured dataset. External sources such as openacademicgraph weren't really useful, so I thought I could try downloading the data through Azure.

Luckily, there were manuals for just that - "Get Microsoft Academic Graph on Azure storage - docs.microsoft.com/en-us/academic-services/graph/get-started-setup-provisioning".

I followed the steps in the manual to create a Azure account for MAG, getting a following email from Academic Knowledge API -


Welcome to the Microsoft Academic Graph (MAG) Azure Storage (AS) Distribution preview. Please be advised that this distribution is in free preview stage. Pricing structure is subject to change.

Your Azure Storage is successfully setup to receive MAG update through Azure Data Factory. Each MAG dataset is provisioned to a separate container named "mag-yyyy-mm-dd". The 2020-02-14 dataset was pushed to your Azure Storage.

As MAG comes with ODC-BY license, you are granted the rights to add values and redistribute the derivatives based on the terms of the open data license, e.g., the attribution to MAG in your products, services or community events.

Each snapshot of MAG will show up in your Azure Storage as a distinct container. In Microsoft Academic Graph documentation, you could find a sample to extract knowledge from MAG for your application using Azure Databricks. Also there is a sample using U-SQL, a member of Azure Data Lake Analytic Framework.

We also put together great Analytics and visualization samples that we used for our WWW Conference Analytics blog post. We hope this can help accelerate your development process and spark imagination!


Next step was "Set up Azure Databricks for Microsoft Academic Graph - docs.microsoft.com/en-us/academic-services/graph/get-started-setup-databricks", which I followed. I was able to create an Azure Databricks for MAG (I have no idea what they are as I'm new to this), but now I cannot get it to run.

Following is the error message I get:


Message

Cluster terminated. Reason: Cloud Provider Launch Failure

A cloud provider error was encountered while launching worker nodes. See the Databricks guide for more information.

Azure error code: OperationNotAllowed

Azure error message: Operation could not be completed as it results in exceeding approved Total Regional Cores quota. Additional details - Deployment Model: Resource Manager, Location: centralus, Current Limit: 4, Current Usage: 4, Additional Required: 4, (Minimum) New Limit Required: 8. Submit a request for Quota increase at https://aka.ms/ProdportalCRP/?#create/Microsoft.Support/Parameters/~~~ by specifying parameters listed in the ‘Details’ section for deployment to succeed. Please read more about quota limits at https://docs.microsoft.com/en-us/azure/azure-supportability/regional-quota-requests.


I'm not sure what I'm supposed to do.

"Total Regional Cores quota" is exceeded, not my personal subscription etc. How would I ask to increase the quota for the whole region? They say I need to apply for a larger quota, which cannot be done with the free trial account I created as per the manual. Does this mean that the manual is wrong, and I have to become Pay-As-You-Go? "Current Usage: 4" but I am not using anything at the moment. All I have is an Azure storage and a Databrick cluster which aren't running. I re-tried starting the cluster, and the second time it was successfully started, only to deactivated a couple of minutes later with the same error message.

I'm not going to do any complex querying and stuff - it's going to be pretty expensive. Being the poor research and such, all I am looking to get is the dataset following the MAG schema; I will run whatever analysis on them on my desktop which would be free, while slower. Any help would be really appreciated.

5

5 Answers

4
votes

To try Azure Databricks, you need to have “Pay-As-You-Go” subscription.

Azure Free Trail has a limit of 4 cores, and you cannot create Azure Databricks cluster using a Free Trial Subscription because to create a spark cluster which requires more than 4 cores.

If you have a free account, go to your profile and change your subscription to pay-as-you-go. Then, remove the spending limit, and request a quota increase for vCPUs in your region. When you create your Azure Databricks workspace, you can select the Trial (Premium - 14-Days Free DBUs) pricing tier to give the workspace access to free Premium Azure Databricks DBUs for 14 days.

For more details, refer "Sign up for a Free Azure Databricks Trial".

2
votes

You can try most of the examples with databricks community edition :

https://community.cloud.databricks.com/login.html

0
votes

Your subscription has a limit of 4 total cores for the whole thing. The picture in your guide (https://docs.microsoft.com/en-us/academic-services/graph/get-started-setup-databricks) shows setting up a databricks cluster using Standard_DS3_v2 sized VMs with a minimum of 2 workers. The picture further shows that the DS3 vm has 4 cores, so 2x4 = 8 Cores for your subscription.

You need to either have a minimum of 1 worker or use a smaller VM size. I'd also recommend turning off autoscale to avoid issues.

0
votes

Using free Azure subscription and trial tier for databricks I got the same error while doing this module https://docs.microsoft.com/en-us/learn/modules/describe-azure-databricks/

When creating the cluster I modified the cluster mode from 'standard' to 'none', problem solved; I could run the python notebook.

0
votes

I change the cluster mode as the single node. It works for me.