Generating Data | kebabShopBlues

I’ve worked on a couple of projects recently where the design of the underlying data-model has, I think, surprised my colleagues. In both cases, there were elements to the design where I have chosen to persist something to the database which I think they may have tried to avoid saving that information; so I wanted to jot down some reasons why it’s a good idea (in these cases anyway).

Working Hours

This was part of a Human Resources system, and there was a need to track ‘working hours’ for each employee. The shift patterns were complex; they operated on a four-week cycle but also had overrides in the form of 1st Sunday in the calendar month, Last Saturday, and so on. Furthermore, there were employees who worked shifts that scheduled them for 6 days in one week, but they got one of those days off individually, to work 5 days in total.

Generating EmployeeWorkDay records from template data

Reporting: Separating the generated working time data from the template data allowed for very direct reporting on the information; if you wanted to find out someone’s scheduled working hours, the reporting tools would only need to examine one table (as long as the data had been generated far enough in advance);
Calculation Overhead: Generating the data into the future ensured, with certain constraints, that the calculations would only have to be done once. If the client wanted programs to warn of late arrival of staff or whatever, the program could again query the generated data, and not have to redo all the calculations to find out when an employee was due to start that day;
Holiday: Really this is just a restatement of the previous point; if I want to figure out how many days holiday someone is requesting, it is far, far easier to look at the days and working times that they are expected to work on and between their requested holiday, than it is to calculate their working pattern, and then count how many days of holiday they are requesting;
Storing Actual Data: There was also a demand for storing the actual times employees logged-in and logged-out of the system (first and last only). Therefore, we were inevitably going to need a table to store this, and it seemed to me that storing it alongside the generated data was appropriate.

The Problems with Generated Data

This main problem with generating data like this is that it is really a cache (one that happens to be persisted in the DB) that we need to manage; or at least that is the case for future data. As such, we need to put a whole bunch of stuff around that to manage it, keep it up-to-date, allow changes and so forth. For example, we now have to provide batch routines to generate these ‘Employee Work Days’ into the future, or screens to allow someone to move shift patterns, again at some specific data in the future.

However, the summary of this design is that it works, and it is very flexible; even allowing manual swaps of shifts (e.g. allowing someone to swap an afternoon for a double-shift later in the week), or allowing the system to swap a shift temporarily (e.g. if a team is short-staffed, it’s easy to copy an employee’s expected work hours to another employee, to facilitate holiday or sickness cover).