Introducing: Dirty Master Data Management Application
This is a side project of mine, something to keep me out of mischief at home. My aim for this project is to be an example/template/starting point for using flask for a quick (and dirty) approach to a Master Data Management (MDM) application AKA a Data Warehouse Management Application (DWMA). In every environment I’ve worked, there has been at least one technical edge case that none of the “off the shelf” tools at hand managed to solve. The solution I’ve found in some cases was to “roll my own”, or make do without. This is a modular approach to collecting together some of the “applications” I’ve created (or wanted to create) to solve technical problems working with data.
As well as creating a collection of working examples for solutions to share with the wider community, I wanted to use this as an opportunity to improve my python programming skills… I’m a data guy, and many of us use python for bits of our work, but it’s not a core skill for most of us. This project gives me a reason to improve my code quality and approach. I’m happy for feedback and constructive criticism.
Design, Architectural & Philosophical Decisions
I’ve made some architectural decisions early on, as well as some philosophical ones to my approach both from a technical and personal perspective.
I’m using python and flask because:
- Python is a common language known to data people.
- Flask uses jinja for templating, similar to dbt (used by some data people).
- I wanted to improve my python skills.
- Flask allows for web applications to be built quickly, leveraging common python libraries.
I wanted to take a modular approach to the design, with a core framework and an array of independent “applications”. This keeps each example a little more self-contained, and also paves the way for them to be easily removed/modified depending on people’s use cases.
As this is a learning and example project, I’ve decided that experimentation and examples of different approaches are worth conserving over consistency. To this end, there are cases where different approaches have been taken side by side or different solutions created for similar problems. In any production environment consistency is key to reducing maintenance effort.
The demos use SQLite as the database of choice, this can be modified easily for other database engines. I just found SQLite to be the easiest to work with for development as it doesn’t require more substantial database installations (the whole thing can be installed using pip).
See it in Action
It’s still a work in progress (and probably always will be), but it can be seen:
- Working Demo: https://dmdma.roman-halliday.com/ (you can log in with the username
d
and passwordd
) - Code: https://github.com/d-roman-halliday/Dirty-Master-Data-Management-Application
Applications Implemented
- iframes – A simple example of including iframes to combine multiple resources such as documentation, file access or other externally hosted resources.
- Reporting – A collection of approaches for creating simple reports (resultsets from SQL queries) which can in some cases be downloaded as CSV files.
- Mapping Data CRUD – Implementation of three tables (entities, groups and mappings) and CRUD (Create, Retrieve, Update and Delete) functions using python and Flask-sqlalchemy.
Future Applications
Some things that I’d like to add in the future are:
- CSV Loading & Checking – If you have worked with data long enough, you will know that some people/departments live for spreadsheets. They either need a CSV of everything (which is where the reporting application comes in) or they have some custom numbers that need to find their way into the database… That’s what this is about.
- DBT Integrations – Many of us use the open-source version of DBT, DBT Core. This will be some ways of interacting with associated processes and data.