In the previous exercise, you issued a simple query against a single file in an Azure Data Lake Store. To make things more interesting, you are now going to create a SQL database in your Azure subscription and configure it to serve as a federated data source in Data Lake Analytics. This will allow you to not only query the database with U-SQL, but also join data from the database to data already residing in your Data Lake Store. This demonstrates the power of Azure Data Lake as a distributed storage and analytics engine.
Enabling federated queries will require a series of steps:
Let's get started!
In the Azure Portal, click + Create a resource, followed by Storage and Storage account.
Adding a storage account
In the ensuing "Create storage account" blade, enter a name for the new storage account in Name field. Storage account names must be 3 to 24 characters in length and can only contain numbers and lowercase letters. In addition, the name you enter must be unique within Azure. If someone else has chosen the same name, you'll be notified that the name isn't available with a red exclamation mark in the Name field.
Once you have a name that Azure will accept, select Use existing under Resource group and select the resource group that you created in Exercise 1. Then select the location that you selected for the Data Lake Store. Finish up by clicking the Create button at the bottom of the blade to create the new storage account.
Creating a new storage account
Click Resource groups in the ribbon on the left side of the portal, and then click the resource group that holds the storage account. Wait for the storage account to finish deploying. Then click the storage account to open a blade for that account.
Opening the storage account
You need to create a container in the storage account to hold the database backup. To begin, click Blobs in the storage account's blade.
Opening blob storage
Click + Container to create a new container. Enter the name "bacpacs" (without quotation marks) for the new blob container, and then click OK.
Adding a container
Click the "bacpacs" container to open it.
Opening the "bacpacs" container
Uploading to the "bacpacs" container
Click the Open button to the right of the Files box. Navigate to the folder containing the lab resources and select the file named academics-stackexchange-users.bacpac. Then click the Upload button to upload the file to the container.
Uploading a file
Wait until the upload has completed. Then close the "Upload blob" blade and return to the blade for the "bacpacs" container. Confirm that the container now contains a blob named academics-stackexchange-users.bacpac.
The uploaded blob
The next step is to create a new SQL database server. In the Azure portal, click More services in the ribbon on the left and type "sql" (without quotation marks) in the search box. Then click SQL servers:
Searching for SQL servers
Click + Add in the "SQL servers" blade. In the "SQL Server" blade, enter a unique name for your SQL server. (It must be unique across all of Azure; be sure a green check mark appears in the box.) Enter "azureuser" as the user name, and "Azure4Research!" as the password. Under Resource group, select Use existing and select the same resource group you have used throughout this lab. For Location, select the same location you selected in previous exercises. When you're finished, click the Create button at the bottom of the blade.
Creating a new SQL server
Next, you need to create a new database instance using the blob you uploaded earlier. Wait until the new SQL server is created. (It will probably take a minute or two. Click the Refresh button in the "SQL servers" blade from time to time until it appears.) In the "SQL servers" blade, click the SQL server you just created.
Opening the SQL server
Click Import database.
Importing a database
In the "Import database" blade, click Storage and select the storage account that you uploaded the .bacpac file to, followed by the "bacpacs" container and, after that, the blob you uploaded to that container. Then click the Select button at the bottom of the blade. Return to the "Import database" blade and enter "azureuser" as the user name and "Azure4Research!" as the password. Finish up by clicking OK at the bottom of the blade.
Specifying database import options
While you wait for the database instance to be created, click Show firewall settings in the database-server blade.
Viewing firewall settings
Add an IP range to allow Data Lake Analytics to communicate with your server during federated query execution. Type the following values into the three input fields, and then click Save at the top of the blade:
Configuring the firewall
Now that you have a SQL database instance up and running, the final step is to register it with Data Lake Analytics for federation. Navigate back to your Data Lake Analytics account in the portal and click + New Job at the top of the blade. In the "New U-SQL Job" blade, enter the following statement and then click Submit Job to run the job:
CREATE DATABASE UserIntegration;
If you haven't installed the Azure CLI, take a moment to install it now. Then open a command shell (Bash, Terminal, Command Prompt, etc.) and execute the following command:
As an alternative to installing the Azure CLI, you can use the Azure Cloud Shell available in the Azure Portal. Simply open the portal in your browser and click the Cloud Shell button in the toolbar at the top of the page. One of the benefits of using the Cloud Shell is that you're always running an up-to-date version. Note that you can use Shift+Insert to paste commands into the Cloud Shell, and Ctrl+Insert to copy text from the Cloud Shell to the clipboard.
Opening the Azure Cloud Shell
Copy the access code presented to you in the command shell to the clipboard. Then open a browser window and navigate to https://aka.ms/devicelogin and enter the code. If prompted to sign in, do so using your Microsoft account. Upon successful authentication, your command-line session will be connected to your Azure subscription.
Assuming you are using the Azure Pass subscription provided to you for these labs, execute the following command to ensure that Azure Pass is the active subscription (the subscription that will be charged against) for operations performed with the CLI, replacing SUBSCRIPTION_NAME_OR_ID with the subscription's name or ID:
az account set --subscription "SUBSCRIPTION_NAME_OR_ID"
Now execute the following commands to create a Data Lake catalog credential used to authenticate when executing federated queries. Substitute your Data Lake Analytics account name for ANALYTICS_ACCOUNT_NAME, your database server name (the one specified in Step 11 of this exercise) for DATABASE_SERVER_NAME, and the database server user (also specified in Step 11) for DATABASE_USER:
az config mode arm az dla catalog credential create --account ANALYTICS_ACCOUNT_NAME --database-name UserIntegration --uri tcp://DATABASE_SERVER_NAME.database.windows.net --credential-name FederatedCredential --user-name DATABASE_USER
When prompted for a password, enter the SQL server password ("Azure4Research!") you specified in Step 11.
Return to your Data Lake Analytics account in the Azure Portal. Then click + New Job and execute the following query:
USE DATABASE UserIntegration; CREATE DATA SOURCE IF NOT EXISTS AcademicSEDb FROM AZURESQLDB WITH ( PROVIDER_STRING = "Database=academics-stackexchange-users;Trusted_Connection=False;Encrypt=True", CREDENTIAL = FederatedCredential, REMOTABLE_TYPES = (bool, byte, sbyte, short, ushort, int, uint, long, ulong, decimal, float, double, string, DateTime) ); CREATE EXTERNAL TABLE User ( [id] int, [reputation] int, [created] DateTime, [displayname] string, [lastaccess] DateTime, [location] string ) FROM AcademicSEDb LOCATION "dbo.User";
This query configures your SQL database as a data source authenticated with the credential you created in Step 21, and then creates a named table in your local Data Lake Analytics database which is backed by the SQL data source.
That was a lot of work, but you are now ready to issue federated queries. Let's try it out!