Azure Serverless Computing Cookbook
上QQ阅读APP看书,第一时间看更新

Integrating Azure Functions with Data Factory pipelines

In many enterprise applications, the need to work with data is definitely there, especially when there are a variety of heterogeneous data sources. In such cases, we need to identify tools that help us to extract the raw data, transform it, and then load the processed data into other persistent media to generate reports.

Azure assists organizations in carrying out the preceding scenarios by using a service called Azure Data Factory (ADF).

Azure Data Factory is another cloud-native serverless solution from Microsoft Azure. ADF can be used as an Extract, Transform, and Load (ETL) tool to process the data from various data sources, transform it, and load the processed data into a wide variety of data destinations. Before we start working with the recipe, I would recommend that you learn more about Azure Data Factory and its concepts at https://docs.microsoft.com/azure/data-factory/introduction.

When we have complex processing requirements, ADF will not let us write complex custom logic. Fortunately, ADF supports the plugging of Azure functions into ADF pipelines, where we can pass input data to Azure Functions and also receive the returned values from Azure Functions.

In this recipe, you'll learn how to integrate Azure Functions with ADF pipelines. The following is a high-level architecture diagram that depicts what we are going to do in this recipe:

Figure 3.41: Integration of Azure Functions with an ADF pipeline

As shown in the preceding architecture diagram, we are going to implement the following steps:

  1. Client applications upload the employee data in the form of CSV files to the storage account as blobs.
  2. Trigger the ADF pipeline and read the employee data from the storage blob.
  3. Call a ForEach activity in the Data Factory pipeline.
  4. Iterate through every record and invoke Azure Function HTTP trigger to implement the logic of sending emails.
  5. Invoke SendGrid output bindings to send the emails.
  6. The end user receives the email.

Getting ready…

In this section, we'll create the prerequisites to start working on this recipe. The prerequisites for this recipe are the following.

  1. Upload the CSV files to a storage container.
  2. Create an Azure Function HTTP trigger with the authorization level set to Function.
  3. Create a Data Factory instance.

Uploading the CSV files to a storage container

Please create a storage account and a container, and upload the CSV file that contains the employee information, as shown in Figure 3.42:

Figure 3.42: Storage container

The following is an example of the CSV file. Please make sure that there is a column named Email. We'll be using this field to pass data from the Data Factory pipeline to Azure Functions:

Figure 3.43: Employee data in a CSV file

Having uploaded the Employees.csv file to a storage container, let's move on to the next section.

Creating an Azure Function HTTP trigger with the authorization level set to Function

In this section, we are going to create an HTTP trigger and also a linked service for the function app in the Data Factory service.

Create an HTTP function named SendMail. This receives an input name email and it also prints the values, as shown in line 18 in Figure 3.44:

Figure 3.44: Creating an Azure function HTTP trigger

In this section, we have created an Azure function with the HTTP authorization set to Function. Let's now move on to the next section to create the Data Factory instance.

Creating a Data Factory instance

In this section, we'll create a Data Factory instance by performing the following steps.

  1. Click on Create a resource and search for Data Factory, as shown in Figure 3.45. This will take you to the next step, where you must click on the Create button:
    Figure 3.45: Searching for Data Factory
  2. In the New data factory blade, provide the name and other details, as shown in Figure 3.46, and click on the Create button:
    Figure 3.46: Creating a new Data Factory instance
  3. Once the Data Factory service is created, click on the Author & Monitor button available in the Overview blade, as shown in Figure 3.47:
    Figure 3.47: Author & Monitor
  4. Now, it will open up a new browser tab and take you to the https://adf.azure.com page, where you can see the Let's get started section.
  5. In the Let's get started view, click on the Create pipeline button, as shown in Figure 3.48, to create a new pipeline:
    Figure 3.48: ADF—Let's Get Started
  6. This will take you to the Authoring section, where you can author the pipeline, as shown in Figure 3.49:
    Figure 3.49: ADF—new pipeline
  7. Before you start authoring the pipeline, you need to create connections to the storage account and Azure Functions. Click on the Connections button, as shown in Figure 3.49.
  8. In the Linked services tab, click on the New button, search for blob in the Data store section, and select Azure Blob Storage, as shown in Figure 3.50:
    Figure 3.50: ADF—New linked service—choosing a linked service
  9. In the New linked service pop-up window, provide the name of the linked service, choose Azure subscription and Storage account name, test the connection, and then click on the Create button to create the linked service for the storage account, as shown in Figure 3.51:
    Figure 3.51: ADF—New linked service—providing connection details
  10. Once you click on the Create button, this will create a linked service, as shown in Figure 3.52:
    Figure 3.52: ADF—Linked services
  11. After reviewing the linked service, click on Publish all to save the changes to the Data Factory instance.
  12. Now create another linked service for Azure Functions by again clicking on New button in the Connections tab.
  13. In the New linked service pop-up window, choose the Compute drop-down option, select Azure Function, and then click on the Continue button, as shown in Figure 3.53:
    Figure 3.53: ADF—New linked service—choosing Azure Function
  14. In the next step, provide a name to the linked service, choose the subscription function app, provide the Function Key value, and then click on Create, as shown in Figure 3.54:
    Figure 3.54: ADF—New linked service—Azure function app

    Note

    You can get the Function Key value from the Manage blade of the function app. Function keys are discussed in detail in the Controlling access to Azure Functions using function keys recipe of Chapter 9, Configuring security for Azure Functions.

  15. Once the Azure function linked service is created, you should see something similar to Figure 3.55. Click on Publish all to save and publish the changes to the Data Factory service:
Figure 3.55: ADF—Linked services

In this section, we have created the following:

  1. A Data Factory instance
  2. A Data Factory pipeline
  3. A linked service to the storage account
  4. A linked service to Azure Functions

We will now move on to the next section to see how to build the Data Factory pipeline.

How to do it...

In this section, we are going to create the Data Factory pipeline by performing the following steps:

  1. Create a Lookup activity that reads the data from the storage account.
  2. Create a ForEach activity that takes input from the Lookup activity. Add an Azure Function activity inside the ForEach activity.
  3. The ForEach activity iterates based on the number of input items that it receives from the Lookup activity and then invokes the Azure function to implement the logic of sending the emails.

Let's begin by creating the Lookup activity by performing the following steps:

  1. Drag and drop the Lookup activity that is available in the General section and name the activity as ReadEmployeeData, as shown in Figure 3.56. Learn more about the activity by clicking on the Learn more button highlighted in Figure 3.56:
    Figure 3.56: ADF—Lookup activity settings
  2. Select the Lookup activity and click on the Settings button, which is available in Figure 3.56. By default, the Lookup activity reads only the first row. Your requirement is to read all the values available in the CSV file. So, uncheck the First row only checkbox, which is shown in Figure 3.57:
    Figure 3.57: ADF—Lookup activity—new source dataset
  3. The Lookup activity's responsibility is to read data from a blob. The Lookup activity requires a dataset to refer to the data stored in the blob. Let's create a dataset by clicking on the New button, as shown in Figure 3.57.
  4. In the New dataset pop-up window, choose Azure Blob Storage and then click on the Continue button, as shown in Figure 3.58:
    Figure 3.58: ADF—Lookup activity—new source dataset—choosing Azure Blob Storage
  5. In the Select format pop-up window, click on the Delimited Text option, as shown in Figure 3.59, and click Continue:
    Figure 3.59: ADF—Lookup activity—new source dataset—choosing the blob format
  6. In the Set properties pop-up window, choose AzureBlobStorage under Linked service (which we created in the Getting ready section of this recipe) and click on the Browse button, as shown in Figure 3.60:
    Figure 3.60: ADF—Lookup activity—new source dataset—Set properties
  7. In the Choose a file or folder pop-up window, double-click on the blob container:
    Figure 3.61: ADF—Lookup activity—new source dataset—selecting the blob container
  8. This opens up all the blobs in which the CSV files reside, as shown in Figure 3.62. Once you have chosen the blob, click on the OK button:
    Figure 3.62: ADF—Lookup activity—new source dataset—selecting the blob
  9. You'll be taken back to the Set properties pop-up window. Click on the OK button to create the dataset.
  10. Once it is created, navigate to the dataset and mark the First row as header checkbox, as shown in Figure 3.63:
    Figure 3.63: ADF—Lookup activity—new source dataset—First row as header checkbox
  11. Now, the Lookup activity's Setting blade should look something like this:
    Figure 3.64: ADF—Lookup activity—selecting Source dataset
  12. Drag and drop the ForEach activity to the pipeline and change its name to SendMailsForLoop, as shown in Figure 3.65:
    Figure 3.65: ADF—creating a ForEach activity
  13. Now, drag the green box that is available in the right-hand side of the Lookup activity and drop it on the ForEach activity, as shown in Figure 3.66, to connect them:
    Figure 3.66: ADF—linking the Lookup and ForEach activities
  14. Once the Lookup activity and the ForEach activity are connected, the Lookup activity can send the data to the ForEach activity as a parameter. In order to receive the data by the ForEach activity, go to the Settings section of the ForEach activity and click on the Add dynamic content option, available below the Items field as shown in Figure 3.67:
    Figure 3.67: ADF—ForEach activity settings
  15. In the Add dynamic content pop-up window, click the ReadEmployeeData activity output, which adds @activity('ReadEmployeeData') output to the text box. Now, append a value by typing .value, as shown in Figure 3.68, and click on the Finish button:
    Figure 3.68: ADF—ForEach activity settings—choosing the output of the lookup activity
  16. You should see something similar to what is shown in Figure 3.69 in the Items text box:
    Figure 3.69: ADF—ForEach activity settings—configured input
  17. Let's now click on the pen icon, which is available inside the ForEach activity as shown in Figure 3.70:
    Figure 3.70: ADF—ForEach activity—editing
  18. Drag and drop the Azure Function activity to the pipeline and change its name to SendMail, as shown in Figure 3.71, and click on the Settings button:
    Figure 3.71: ADF—ForEach activity—adding a function activity
  19. In the Settings tab, choose the AzureFunction linked service that is created in the Getting ready section of this recipe and also choose the Function name option, as shown in Figure 3.72:
    Figure 3.72: ADF—ForEach activity—passing inputs to the function activity
  20. As shown in Figure 3.72, you need to provide the input to the Azure function named SendMail, which receives email as input. The expression provided in the Body field is called an ADF expression. Learn more about these at https://docs.microsoft.com/azure/data-factory/control-flow-expression-language-functions.
  21. Now, click on the Publish all button to save the changes.
  22. Once the changes are published, click on Add trigger and then the Trigger now button, as shown in Figure 3.73:
    Figure 3.73: ADF—running the pipeline
  23. A new pop-up window will appear, as shown in Figure 3.74. Click on OK to start running the pipeline:
    Figure 3.74: ADF—Pipeline run parameters
  24. Click OK and immediately navigate to the Azure function and view the logs, as shown in Figure 3.75, to see the inputs received from the Data Factory instance:
Figure 3.75: ADF—Azure Functions—console logs

That's it! You have learned how to integrate Azure Functions as an activity inside the ADF pipeline.

The next step is to integrate the functionality of sending an email to the end user based on the input received. These steps have already been discussed in the Sending an email notification dynamically to the end user recipe in Chapter 2, Working with notifications using the SendGrid and Twilio services.

You can also monitor the progress of the pipeline execution by clicking on the Monitor tab, as shown in Figure 3.76:

Figure 3.76: ADF—monitoring the pipeline

Click on the pipeline name to view detailed progress, as shown in Figure 3.77:

Figure 3.77: ADF—monitoring individual activities

In this recipe, you have learned how to integrate Azure Functions as an activity in an ADF pipeline.

In this chapter, you have learned how to integrate Azure Functions with various Azure services, including Cognitive Services, Logic Apps, and Data Factory.