Database Design in a Nutshell

Database design is a very extensive subject and I intent to build an entire course on that, but we need to understand at least a few basic things about database design. We’ll be dealing mostly with table design here in this introductory course, so we will set aside the complexity involved with retaliating tables. Still there are some basic principles that should be acknowledged. Database design deals first and foremost with how we combine data items or columns to define a table. As we add more tables to a database we must consider whether a new table is related to any of our existing tables and what that relationship is. We must then consider how those related tables are too accessed and how to manifest their relationship in the table structures themselves. In MySQL this also involves our choice of storage engine for each of these tables. Our basic goal is to create an efficient operational structure that minimizes storage space, access time and maintenance complexity. But this is an ideal that is nearly impossible to achieve without some compromises.

There are some basic rules for a table structure that we can apply without being concerned with other tables though. So that’s where we’ll begin. The most basic is rule that each column in a table should be design to hold a single value, and that that value is not derived from other columns in that table. We can always derived composite values in our result sets for a specific query, so it’s waste of space to store both the raw values and then one or more composite values to derive from them. For example, in a database of people information if we need to have separate access to first and last name of a person we should have separate columns for these two values. We can always combine them in query use print mailing label if we need to. But we would not store the two separately and then store them combine together in a third column that would be redundant. And an obvious waste of space. We should always define a column, or if absolutely necessary a group of columns to act as the primary key. This is a unique identifier for a given row, and it’s defined to maintain uniqueness among the rows in a table. The primary key can be either a value from the real world such as employee id, or social security number, or product serial number, or batch code etc, or student id etc. We will see how important this is later. A table should be defined in such a way, that all columns are solely dependent on the primary key value that is no part of a given table should be masking grater complexity in table structure of the database. This issue is most often noticed by examining the data once we have some data in the table. We would notice redundant grouping of the same values in certain columns in row after row. This is a situation we want to avoid wherever possible. As it leads to data entry or some other things.

A design flaw that is much easier to spot is repeating groups of columns, another thing we need to avoid. The occurrence of repeating columns in a table’s definition is a clear indicator that another table or two is required in our design. Rules like these have been canonized in to a principle database normalization long ago. The rules of normalization make great cross check for us as we design our tables. But beware of reference sources that suggest that normalization should be used as a primary step by step design technique. That can quickly lead to errors in database design, especially when the designer is not intimately familiar with the data being mapped into the database.

Here are the four logical relationships that there can be between two tables

  1. One to One
  2. One to Many
  3. Many to One
  4. Many to Many

Understanding these relationship types plus understanding that there are also structural and operational aspects to them is beyond our needs in this chapter.