Apache Nifi – Combining MySQL and PostgreSQL records over REST API


Prasada Jayasinghe - Global Wavenet
      Written by:
      Prasad Jayasinghe,
      Tech Lead,
      Global Wavenet

 

This blog, will assist you in how to ETL (Extra – Transform – Load) data from two different databases simultaneously over REST APIs.

Prerequisites:
For this tutorial, you will need an understanding of Apache NiFI,MySQL, PostgreSQL, Postman, Docker software.

Introduction
Here I’m using Apache NiFi ETL tool to assess 2 DBs and combined the output together. Also I’m running MySQL and PostgreSQL Docker containers with sample data to simulate this use case.

MySQL Table

 

 

 

 

 

 

 

 

 

 

PostgreSQL table

 

 

 

 

 

 

 

 

 

 

 

 

 

The Use Case:

Need to query two databases with a unique ID and combined results, remove similar fields and output as js on object.

The Solution:

Using Apache NifI as a ETL tool and combined 2 DBs records and output as json object.

The Big Picture:

Apache Nifi Telco Software

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Required Components — Nifi Processors

  1. HandleHttpRequest
  2. HandleHttpResponse
  3. GenerateTableFetch
  4. ExecuteSQLRecord
  5. SplitRecord
  6. ConvertRecord
  7. MergeContent
  8. ExtractText
  9. ReplaceText
  10. LogAttribute

LLD

Telco Software Coding

Nifi Processor Configuration

  1. Processor Name : HandleHttpRequest

Configuration:

Telco software

Note:

We can give any available port under the Listening Port property.

Under Hostname property, we can give NiFi server IP or localhost. (If you give Server IP make sure to configure it on nifi.properties file).

Under HTTP Context Map select “StabdardHttpContextMap” keep below configurations.

Telco SOftare Coding

2) Processor Name : HandleHttpResponse

Configuration:

Telco software coding

Note: 

Same as HandleHttpRequest process we can use “StabdardHttpContextMap”.

Telco Software coding

3) Processor Name : GenerateTableFetch

Configuration:

telco software coding

Note:

Under Table nam,e we have to provide our RDBMS table name.

Here I have configured NiFi expression language in the additional WHERE clause property. What it does is,  it can read the REST request and get some values to use as WHERE clause property.

Eg : http://localhost :8008/field/?id=111111 -> this id=111111 will take as WHERE clause

Under the Database Connection Pooling Service,  select the “DBCPConnectionPool” and clink on -> icon.

Then you will move forward to;

telco software coding

Here,  select the “DBCPConnectionPool” configuration and click on * icon, Then you will have the configuration screen as below:

Telco software coding

On this window we have provided,

  • Databases connection url
  • DB Driver class name
  • DB driver location (Local NiFi server)
  • Database user and Password

Once all completed, we can enable the configuration as below;

Telco Software coding

You can conduct the same procedure for the PostgreSQL as well. The only change you have to make is provide

  • Databases connection url
  • DB Driver class name
  • DB driver location (Local NiFi server)
  • Database user and Password

Telco Software Coding

4) Processor Name : ExecuteSQLRecord

Configuration:

Telco Softare Coding

Note :

As per earlier actions, we have to configure the same configuration for MySQL and PostgreSQL.

Record Writer Property we can configure as below;

Telco Software Coding

5) Processor Name: SplitRecord

Configuration:

Telco software coding

Note:

Record Reader configurations.

Telco software coding

Record Writer configurations.Telco softaware coding

6) Processor Name: ConvertRecord

Configuration:

Note:

We can use the same Record Reader, Record Writer configurations where we have user for “SplitRecord” process.

 

7) Processor Name: MergeContent

Configuration:

Coding

Note :

This Process will merge two database query records together.

8) Processor Name: ExtractText

Configuration:

Note :

This process will extract the CSV record as an attributes by the help of NiFi expression language

Eg:

CSV Property values

(.+),(.+)

(.+),(.+)

(.+),(.+)

(.+),(.+)

 

9) Processor Name: ReplaceText

Configuration:

Note :

This process will replace the attribute according to our required format.

Eg : {

${“csv.1”} : ${“csv.3”},

${“csv.2”} : ${“csv.4”},

${“csv.6”} : ${“csv.8”}

}

10) Processor Name: LogAttribute

Configuration:

Note :

We can forward failure logs in to LogAttribute (if needed) and it will help us to T-shoot the process.

Results over Postman

Once the configuration is completed, we can start all the processors and access the configured URL from the Postman tool.

 

Written by

 

 

 

More from our blog

See all posts