I recently got around to researching a topic on the internet that had intrigued me, that being the question: “Should Table Names be in the singular, or plural?” For example, should you call a table Employee… or Employees? For as long as I can remember, my impression has always veered towards the singular. I don’t believe this is because I am a programmer, concentrating on row-based operations rather than set-based (because, as much as anything else, I actually started learning SQL with Oracle as pure SQL… not in the context of learning it with a third-generation programming language).
I’ll probably wax lyrical about the distinctions I make between the two modes of naming at some other time, but for the moment, my research indicated to me that there are essentially the two opposing schools of thought that are entirely contradictory, and they are both rational and sane thought-processes, when looked at through certain perspectives. In other words, neither approach is ‘right’ all of the time.
Because of this duality, and the frequent observation that the best thing to do was to pick one approach, and stick to it, I reminded myself of the need for standards in development… and also that my current workplace has no such standards.
Here, then, are the rules that make up my personal approach to naming rules for database tables and columns… though of course if there is already a consistent approach in place I would use that (e.g. I’m sure I will have used a different approach on a PHP / MySQL based shopping cart I once used and changed).
Casing, Underscores, etc
Currently, on SQL Server and Windows based development (I also use C#) I prefer to use PascalCase table and column names, without the use of underscores, and certainly not spaces. However, on Unix / Oracle environments in the past, I have almost certainly used underscores and the environment has probably also dictated no usage of upper-case letters. Examples in this post use my current approach for my current work development environment.
I use the singular table name, as already noted. This is because of my personal past and learning, and probably also because this is by definition a table with zero to many rows. The natural implication is that it is ‘obvious’ that there are ‘up to’ many rows of the ‘thing’ the table is of. Consequently to consider a pluralised naming scheme is as superfluous as adding ‘Table’ to every table name. In context, ‘Employee’ is as symbolically useful as ‘EmployeesTable’, and the former is more succinct.
There are also more reasons for preferring this style when you involve consideration of Data Access Layers and tools like Gentle, Subsonic or (N/J/Whatever)Hibernate where the table name is typically mapped to an object name which represents the data from a single row.
In commercial environments, table prefixes are perhaps a little less-likely to be used than open-source software that (often) expects that it may have to share a single DB instance with other applications. For example, all the tables in MySQL that are related to this WordPress blog are prefixed ‘wp_’. It’s a fairly reasonable way to ensure there are no clashes with other applications that may have to share the namespace of the DB, but (as with WordPress), the prefixes are not applied in the discussion above regarding the naming of key columns. For example, the key of the WordPress table ‘wp_comments’ is ‘comment_id’ which seems very reasonable to me (excepting my preference for a singular table name!).
Key Column Names and Foreign Keys
In general, if a table row is uniquely identified by an integer numeric field, it ought to be called
EmployeeType.EmployeeTypeId = Employee.EmployeeTypeId.
Personally, I find this kind of equi-join far easier to understand than:
Employee.EmployeeTypeId = EmployeeType.Id.
Of course, the use of table aliases will abbreviate that to
e.EmployeeTypeId = et.EmployeeTypeId.
Furthermore, in some SQL dialects, you can skip the ‘ON’ part of an equi-join by using a ‘USING’ clause; for example
Employee JOIN EmployeeType USING EmployeeTypeId. To do this, both primary key and foreign-key columns must have the same name ‘EmployeeTypeId’.
On those occasions where there are multiple Foreign Key columns to a table e.g. CreatedByUserId and LastUpdatedByUserId I postfix the full name of the lookup table column in the column name on the referencing table.
I suppose that if you had some other non-integer primary-key column for your table, such as a Guid, it would be appropriate to use Guid (or similar) postfix instead. However, I personally think there are far too many people out there who appear to love Guids far too much. Indeed, I work on a DB which is only about 16GB in size, but bigints (8 bytes) have been used for so many lookup columns that we probably waste over a GB of space on lookup values that could have been shorter integer values. Guids are even worse for space consumption and generation time and comparison time, and for manual debugging (when you have to type them in manually yourself)!
Whichever way, all columns should be a different name from the table, as modelling tools that convert table-names to classes, and field names to members, will have problems in languages like C# where a member can not have the same name as the enclosing type.
Other Column Names
My choice to name Primary keys and Foreign Keys the same name, with the primary-table-name prefix perhaps belies my preference to name non-relational columns without the table name prefix. It is far quicker and easier to alias a table name in a SQL query than it is to constantly have to type a long column name: so
et.Name can be as obvious as
Furthermore, where does one stop with such a naming convention? CustomerFirstName, CustomerSurname etc.? Table aliases are so easy to use that if you have to distinguish an employee’s first-name from a customer’s first-name it’s easy: e.FirstName = ‘Peter’ OR c.FirstName = ‘Paul’.
My general rule is to avoid abbreviations unless the long version is simply too long to start off with. Some DB’s allow you to name a column anything you like up to silly numbers like hundreds of characters long… but will then only pay attention to the first 32 or so characters. Clearly, there is a limit to how far you can go.
Some people seem remarkably reluctant to use reserved words in their database; for example, ‘User’. I don’t know how many databases that is a reserved-word (probably several) but I am less inclined to be worried by this. Part of the point of the SQL ‘standard’ (however unstandard it may all be) is that in theory, we should be able to transition from one database server to another flavour, and have limited changes to make during that transition. Considering that you are never entirely safe from new reserved words appearing in later versions of the same DB, the only safe mechanism in your SQL is to quote all column and table names appropriately to identify them as such, e.g. [User] instead of just User (SQL Server-style quoting). Some people suggest changing the table-name to something for focused, e.g. XxxUser… but I dislike this because, if you ever find yourself wanting to make your user table modular and used across several systems, the ‘Xxx’ may now be mis-describing the table.