Exercise 5: Create an Azure SQL Database as a federated data source

In the previous exercise, you issued a simple query against a single file in an Azure Data Lake Store. To make things more interesting, you are now going to create a SQL database in your Azure subscription and configure it to serve as a federated data source in Data Lake Analytics. This will allow you to not only query the database with U-SQL, but also join data from the database to data already residing in your Data Lake Store. This demonstrates the power of Azure Data Lake as a distributed storage and analytics engine.

Enabling federated queries will require a series of steps:

  • Create an Azure storage account in your Azure subscription
  • Upload a SQL database backup file (a .bacpac file) to the new storage account
  • Create a new SQL database in your Azure subscription and restore it from the .bacpac file
  • Configure your Data Lake Analytics account to query against the database

Let's get started!

  1. In the Azure Portal, click + Create a resource, followed by Storage and Storage account.

    Adding a storage account

    Adding a storage account

  2. In the ensuing "Create storage account" blade, enter a name for the new storage account in Name field. Storage account names must be 3 to 24 characters in length and can only contain numbers and lowercase letters. In addition, the name you enter must be unique within Azure. If someone else has chosen the same name, you'll be notified that the name isn't available with a red exclamation mark in the Name field.

    Once you have a name that Azure will accept, select Use existing under Resource group and select the resource group that you created in Exercise 1. Then select the location that you selected for the Data Lake Store. Finish up by clicking the Create button at the bottom of the blade to create the new storage account.

    Creating a new storage account

    Creating a new storage account

  3. Click Resource groups in the ribbon on the left side of the portal, and then click the resource group that holds the storage account. Wait for the storage account to finish deploying. Then click the storage account to open a blade for that account.

    Opening the storage account

    Opening the storage account

  4. You need to create a container in the storage account to hold the database backup. To begin, click Blobs in the storage account's blade.

    Opening blob storage

    Opening blob storage

  5. Click + Container to create a new container. Enter the name "bacpacs" (without quotation marks) for the new blob container, and then click OK.

    Adding a container

    Adding a container

  6. Click the "bacpacs" container to open it.

    Opening the "bacpacs" container

    Opening the "bacpacs" container

  7. Click Upload.

    Uploading to the "bacpacs" container

    Uploading to the "bacpacs" container

  8. Click the Open button to the right of the Files box. Navigate to the folder containing the lab resources and select the file named academics-stackexchange-users.bacpac. Then click the Upload button to upload the file to the container.

    Uploading a file

    Uploading a file

  9. Wait until the upload has completed. Then close the "Upload blob" blade and return to the blade for the "bacpacs" container. Confirm that the container now contains a blob named academics-stackexchange-users.bacpac.

    The uploaded blob

    The uploaded blob

  10. The next step is to create a new SQL database server. In the Azure portal, click More services in the ribbon on the left and type "sql" (without quotation marks) in the search box. Then click SQL servers:

    Searching for SQL servers

    Searching for SQL servers

  11. Click + Add in the "SQL servers" blade. In the "SQL Server" blade, enter a unique name for your SQL server. (It must be unique across all of Azure; be sure a green check mark appears in the box.) Enter "azureuser" as the user name, and "Azure4Research!" as the password. Under Resource group, select Use existing and select the same resource group you have used throughout this lab. For Location, select the same location you selected in previous exercises. When you're finished, click the Create button at the bottom of the blade.

    Creating a new SQL server

    Creating a new SQL server

  12. Next, you need to create a new database instance using the blob you uploaded earlier. Wait until the new SQL server is created. (It will probably take a minute or two. Click the Refresh button in the "SQL servers" blade from time to time until it appears.) In the "SQL servers" blade, click the SQL server you just created.

    Opening the SQL server

    Opening the SQL server

  13. Click Import database.

    Importing a database

    Importing a database

  14. In the "Import database" blade, click Storage and select the storage account that you uploaded the .bacpac file to, followed by the "bacpacs" container and, after that, the blob you uploaded to that container. Then click the Select button at the bottom of the blade. Return to the "Import database" blade and enter "azureuser" as the user name and "Azure4Research!" as the password. Finish up by clicking OK at the bottom of the blade.

    Specifying database import options

    Specifying database import options

  15. While you wait for the database instance to be created, click Show firewall settings in the database-server blade.

    Viewing firewall settings

    Viewing firewall settings

  16. Add an IP range to allow Data Lake Analytics to communicate with your server during federated query execution. Type the following values into the three input fields, and then click Save at the top of the blade:

    • Rule Name: Allow Data Lake
    • Start IP: 25.66.0.0
    • End IP: 25.66.255.255

    Configuring the firewall

    Configuring the firewall

  17. Now that you have a SQL database instance up and running, the final step is to register it with Data Lake Analytics for federation. Navigate back to your Data Lake Analytics account in the portal and click + New Job at the top of the blade. In the "New U-SQL Job" blade, enter the following statement and then click Submit Job to run the job:

     CREATE DATABASE UserIntegration;
    
  18. If you haven't installed the Azure CLI, take a moment to install it now. Then open a command shell (Bash, Terminal, Command Prompt, etc.) and execute the following command:

     az login
    

    As an alternative to installing the Azure CLI, you can use the Azure Cloud Shell available in the Azure Portal. Simply open the portal in your browser and click the Cloud Shell button in the toolbar at the top of the page. One of the benefits of using the Cloud Shell is that you're always running an up-to-date version. Note that you can use Shift+Insert to paste commands into the Cloud Shell, and Ctrl+Insert to copy text from the Cloud Shell to the clipboard.

    Opening the Azure Cloud Shell

    Opening the Azure Cloud Shell

  19. Copy the access code presented to you in the command shell to the clipboard. Then open a browser window and navigate to https://aka.ms/devicelogin and enter the code. If prompted to sign in, do so using your Microsoft account. Upon successful authentication, your command-line session will be connected to your Azure subscription.

  20. Assuming you are using the Azure Pass subscription provided to you for these labs, execute the following command to ensure that Azure Pass is the active subscription (the subscription that will be charged against) for operations performed with the CLI, replacing SUBSCRIPTION_NAME_OR_ID with the subscription's name or ID:

     az account set --subscription "SUBSCRIPTION_NAME_OR_ID"
    
  21. Now execute the following commands to create a Data Lake catalog credential used to authenticate when executing federated queries. Substitute your Data Lake Analytics account name for ANALYTICS_ACCOUNT_NAME, your database server name (the one specified in Step 11 of this exercise) for DATABASE_SERVER_NAME, and the database server user (also specified in Step 11) for DATABASE_USER:

     az config mode arm
     az dla catalog credential create --account ANALYTICS_ACCOUNT_NAME --database-name UserIntegration --uri tcp://DATABASE_SERVER_NAME.database.windows.net --credential-name FederatedCredential --user-name DATABASE_USER

    When prompted for a password, enter the SQL server password ("Azure4Research!") you specified in Step 11.

  22. Return to your Data Lake Analytics account in the Azure Portal. Then click + New Job and execute the following query:

    USE DATABASE UserIntegration;
    
    CREATE DATA SOURCE IF NOT EXISTS AcademicSEDb FROM AZURESQLDB WITH
       ( PROVIDER_STRING = "Database=academics-stackexchange-users;Trusted_Connection=False;Encrypt=True",
         CREDENTIAL = FederatedCredential,
         REMOTABLE_TYPES = (bool, byte, sbyte, short, ushort, int, uint, long, ulong, decimal, float, double, string, DateTime) );
    
    CREATE EXTERNAL TABLE User (
                            [id] int,
                            [reputation] int,
                            [created] DateTime,
                            [displayname] string,
                            [lastaccess] DateTime,
                            [location] string
                        ) FROM AcademicSEDb LOCATION "dbo.User";
    

    This query configures your SQL database as a data source authenticated with the credential you created in Step 21, and then creates a named table in your local Data Lake Analytics database which is backed by the SQL data source.

That was a lot of work, but you are now ready to issue federated queries. Let's try it out!