Chaining multiple HTTP API’s via Apache NiFi

Nagaraj Tantri
6 min readJan 15, 2021

--

May it be streamlining a set of defined steps or any finite automata, automation has become a routine in many organisations. I believe there are many tools and frameworks which help us achieve this. Apache NiFi is one such tool for any organisation which requires to onboard data and transform the same via a defined workflow.

Apache NiFi has been around the industry from long and to quote Wikipedia on its origin, it has a unique introduction:

Leveraging the concept of Extract, transform, load, it is based on the “NiagaraFiles” software previously developed by the US National Security Agency (NSA), which is also the source of a part of its present name — NiFi. It was open-sourced as a part of NSA’s technology transfer program in 2014.

There are various connectors or so-called Processors in Apache NiFi which can be leveraged to orchestrate a set of instruction. In this blog, I will provide inputs over how to automate a data pull strategy via chaining a set of API in Apache NiFI.

For this blog please note the tools used:

  • Apache NiFi Version: 1.11.1
  • The OS under consideration: MacOS

What are we going to build?

Something like this

The above diagram shows too many processors, but trust me, it’s basically these below steps:

  1. Hit a POST API to get a Token using a Login and Password
  2. Consume the token from step 1 and hit another POST API to get the Report URL
  3. Consume the Report URL from step 2 and use the same token from step 1 to get the actual report downloaded via a GET API
  4. Upload the report to AWS S3

How are going to build this?

So we need Apache’s InvokeHttp to do the job for hitting the API via POST and GET request.

Apache NiFi’s InvokeHttp Processor and it’s current usage

Apache NiFi has a couple of processors to get API data from any given API provider. For instance, they already had GetHTTP, PostHTTP processors and then later introduced InvokeHttp which has support for Expression Language, support for incoming flowfile. The InvokeHttp processor helps us to trigger a Post API and consume its response. This, however, isn’t straightforward and would require us to configure it right to ensure the post body or headers are passed to the same via a set of supporting processors.

In order to being, let’s start with step by step:

  1. Hit a POST API to get a Token using a Login and Password

An InvokeHTTP for POST requires a Body/Headers to be passed via a flowfile and we can generate one using the following:

  • GenerateFlowFile
GenerateFlowFile

This processor generates the required FlowFile (which is to store data for HTTP). The configurations here depict that its size is 1byte to start with and 1 batch size. Do note that the Scheduling tab has been changed to Run Schedule of 60 mins. Unless this, it would keep generating flowfile randomly.

  • UpdateAttribute

Now the update attribute is used to add the attributes to the FlowFile. A flowfile has attribute and content. Attributes can be used for HTTP Headers and Contents as HTTP Body. The example here shows how we can add a new key-value pair and then in the next step make it part of flowfile’s content

  • AttributeToJSON

As the previous processor ends up adding it to FlowFile, we now use this processor firstly to CHOOSE the right set (i.e. via AttributesList) of attributes and make it part of the Post body via flowfile-content. Now, the Login, Password and so on can come into flowfile content.

  • InvokeHttp

The previous flowfile contents are default sent as a Body to the HTTP and thus it would then result in a trigger of an API. The above-mentioned API is from an example of one of our data provider TheTradeDesk

2. Consume the token from step 1 and hit another POST API to get the Report URL

Once I have the API triggered, this API would send the token as part of the response body. So we need to use consume the token as the next step:

  • EvaluateJSONPath

In order to consume the previous API’s output which is JSON Body, we need to use EvaluateJSONPath processor. The output was as follows with:

{ “Token”: “<actual_token>” }

So we ended up with $.Token

  • UpdateAttribute

Once we have the Token evaluated from the response body of the HTTP API, we then need to make it part of the FlowFile to feed it into the next API which requires us to download the file.

  • AttributeToJSON

To ensure the above token can be sent as an HTTP Header in InvokeHTTP, we should pass the above attributes as an HTTP header to the next post API. Thus choosing the Destination as flowfile-attribute as expected by InvokeHTTP.

  • ReplaceText

Now comes the most important part for this API, we need to pass the JSON Body to the HTTP Post request as:

{
"PartnerIds": [
"61ikbrz"
],
"ReportExecutionStates": [
"Complete"
],
"SortFields": [
{
"FieldId": "ReportStartDateInclusive",
"Ascending": "false"
}
],
"PageStartIndex": 0,
"ReportScheduleNameContains": "Report Name",
"PageSize": 2
}

So ensured to search and replace all the data from existing flowfile's content body using, .* and then putting the Replacement Value as the above JSON Post Body. This will ensure to update the corresponding flowfile content which is used as a post request to the InvokeHTTP processor.

  • InvokeHTTP

Now do note the above screenshot, we have added the header information gathered from the flowfile’s attribute as TTD-Auth from the previous step of UpdateAttribute. This will ensure the HTTP API’s headers and post body is set as expected.

3. Consume the Report URL from step 2 and use the same token from step 1 to get the actual report downloaded via a GET API

Once the above InvokeHTTP is successful, the mentioned API would return us a response with the file’s Download URL, which is basically an HTTP redirect based download.

  • EvaluateJsonPath

If you notice carefully, the following processor helps to extract the download URL and add’s it to the flowfile-attribute which is the header of the HTTP request. The response is a nested JSON which would have the download URL for the file.

  • InvokeHTTP

We then use the ${DownloadURL} key from the previous step and then pass it to be downloaded in the remote URL property.

4. Upload the report to AWS S3

Lastly being downloaded to S3 of our choice.

Concluding

It is definitely a rollercoaster ride to build something so simple in Apache NiFi, but I am pretty sure a tool of its stature is definitely not for only such tasks and can be used for many other requirements and heavy lifting like S3 data transfer with huge volumes (running in cluster mode), or real-time data ingestion to Kafka and so on. I just wanted to provide an update on how to perform this in Apache NiFi, as for sure I struggled to initially get this automated.

--

--