Apache Nifi – Combining MySQL and PostgreSQL records over REST API
- August 13, 2020
- Code, technology, Telco Software, Uncategorized
This blog, will assist you in how to ETL (Extra – Transform – Load) data from two different databases simultaneously over REST APIs.
Prerequisites:
For this tutorial, you will need an understanding of Apache NiFI,MySQL, PostgreSQL, Postman, Docker software.
Introduction
Here I’m using Apache NiFi ETL tool to assess 2 DBs and combined the output together. Also I’m running MySQL and PostgreSQL Docker containers with sample data to simulate this use case.
MySQL Table
PostgreSQL table
The Use Case:
Need to query two databases with a unique ID and combined results, remove similar fields and output as js on object.
The Solution:
Using Apache NifI as a ETL tool and combined 2 DBs records and output as json object.
The Big Picture:
Required Components — Nifi Processors
- HandleHttpRequest
- HandleHttpResponse
- GenerateTableFetch
- ExecuteSQLRecord
- SplitRecord
- ConvertRecord
- MergeContent
- ExtractText
- ReplaceText
- LogAttribute
LLD
Nifi Processor Configuration
-
Processor Name : HandleHttpRequest
Configuration:
Note:
We can give any available port under the Listening Port property.
Under Hostname property, we can give NiFi server IP or localhost. (If you give Server IP make sure to configure it on nifi.properties file).
Under HTTP Context Map select “StabdardHttpContextMap” keep below configurations.
2) Processor Name : HandleHttpResponse
Configuration:
Note:
Same as HandleHttpRequest process we can use “StabdardHttpContextMap”.
3) Processor Name : GenerateTableFetch
Configuration:
Note:
Under Table nam,e we have to provide our RDBMS table name.
Here I have configured NiFi expression language in the additional WHERE clause property. What it does is, it can read the REST request and get some values to use as WHERE clause property.
Eg : http://localhost :8008/field/?id=111111 -> this id=111111 will take as WHERE clause
Under the Database Connection Pooling Service, select the “DBCPConnectionPool” and clink on -> icon.
Then you will move forward to;
Here, select the “DBCPConnectionPool” configuration and click on * icon, Then you will have the configuration screen as below:
On this window we have provided,
- Databases connection url
- DB Driver class name
- DB driver location (Local NiFi server)
- Database user and Password
Once all completed, we can enable the configuration as below;
You can conduct the same procedure for the PostgreSQL as well. The only change you have to make is provide
- Databases connection url
- DB Driver class name
- DB driver location (Local NiFi server)
- Database user and Password
4) Processor Name : ExecuteSQLRecord
Configuration:
Note :
As per earlier actions, we have to configure the same configuration for MySQL and PostgreSQL.
Record Writer Property we can configure as below;
5) Processor Name: SplitRecord
Configuration:
Note:
Record Reader configurations.
6) Processor Name: ConvertRecord
Configuration:
Note:
We can use the same Record Reader, Record Writer configurations where we have user for “SplitRecord” process.
7) Processor Name: MergeContent
Configuration:
Note :
This Process will merge two database query records together.
8) Processor Name: ExtractText
Configuration:
Note :
This process will extract the CSV record as an attributes by the help of NiFi expression language
Eg:
CSV Property values
(.+),(.+)
(.+),(.+)
(.+),(.+)
(.+),(.+)
9) Processor Name: ReplaceText
Configuration:
Note :
This process will replace the attribute according to our required format.
Eg : {
${“csv.1”} : ${“csv.3”},
${“csv.2”} : ${“csv.4”},
${“csv.6”} : ${“csv.8”}
}
10) Processor Name: LogAttribute
Configuration:
Note :
We can forward failure logs in to LogAttribute (if needed) and it will help us to T-shoot the process.
Results over Postman
Once the configuration is completed, we can start all the processors and access the configured URL from the Postman tool.
Written by