John E Pugh - Data Transformation and Integration

Data Transformation and Integration

Below is an example of a tool that highlights my ability to bring together several skills to perform a complex task. This is a real-world example, so the details of each page are blurred to protect the data of the organization that I did the work for. This required me to have some advanced capabilities with Microsoft Excel functions and data queries, I had to gain a deep understanding of the data structure of two disparate systems, and then build a tool that could ingest an export from one, transform the data, and produce a properly structured import for the other. The mechanisms in the tool were then used as the roadmap to replace this tool with a fully automated integration.

The first step in the process was to bring the export in from the source data, which was a payroll file in this case. Time was spent working with the payroll company to ensure that the dimensions and headers in this export would remain consistent. They were not able to produce a raw data export, so it was critical that the data of in this matrix be structured in a way that the organization could use the tool with a few button clicks and little need for troubleshooting.

Next, Power Query is used to remove unnecessary data columns, create some duplicate columns that are used for detailed debit and credit activity, and then unpivot the whole matrix so that it is in a usable data table format.

The resulting table uses several lookup functions to retrieve the appropriate data stored in two mapping tabs. One of these houses information about the accounting software's Ledger of Accounts, and the other contains information about the accounting software's Cost Categories. The majority of the file is balanced in one account, and each individual account is also defined as either a debit or credit, which causes the related value to occupy the cell in the debit or credit columns of the table.

It is not shown here, because the entire page is filled with sensitive information, however the "GL Mapping" tab also contains some filter functions that provide the user a way to select from a list of company codes and pay dates found in the data in the "Variables" tab, through data validation. This helps limit the size of the resulting import, which cannot be as large as the list of all of the records included in the export from the payroll file. Those filter functions look like this:

SORT(UNIQUE(FILTER(Transform!A:A,Transform!B:B=Variables!B3)))

SORT(UNIQUE(FILTER(Input!$E:$E,(NOT(Input!$E:$E="PAY DATE"))*(NOT(Input!$E:$E="")))))

Below is the resulting import structure that is used to move this data into their accounting software. The column headers are static, and the remaining cells use dynamic array functions to call the correct data from the query table based on the selections made in the "Variables" tab. Once complete, the user can save this tab as a .csv file, which is importable into the software.

Since the mapping between these two systems was produced in this tool, the tool served two purposes. First, it allowed the organization a method of moving large volumes of payroll data into their accounting software each week using minimal labor and with very little risk of the errors introduced through manual input. Second, it served as a roadmap for their integration teams to create a direct connection between the two platforms, allowing the data to move directly from one system to the other without any human intervention each week. The benefits to the organization include not only reduced cost and accurate reporting, but also the peace of mind that they'll have clean, synchronous data to look at when the time comes for their annual audit at the end of the year, despite making a change in two major accounting systems.

Page updated

Google Sites

Report abuse