Loading and Updating Data

This chapter shows how we can load data into a database and make changes and deletions.

In the last chapter, we created a database by defining the schema (i.e., columns and constraints) for each table. After creating the schema, the table is, however, empty. In this chapter, we'll see how to fill tables with data. The most direct way of inserting data into a table is by writing corresponding SQL commands. The following command inserts data into a table:

INSERT INTO [Table] VALUES [Data];

This command inserts data into the table [Table] and specifies the data to insert. The data to insert, [Data], is specified as a comma-separated list of rows. Each row is specified as a comma-separated list of values for each table column, embedded between parentheses. For instance, let's insert two students, Marc and Isabel, with IDs 1 and 2:

INSERT INTO Students VALUES
	(1, 'Marc'), (2, 'Isabel');

We specify values for each of the table columns, the student ID and the student name, in the order in which those columns appear in the table. Separated by a comma, we provide values for both students. As a result, two new rows will be added to the Students table.

So far, we have inserted rows that specify a value for each column in the table. Sometimes, we may not know the final value for each single column yet. Still, as long as we know the values for some of the columns, we might already want to insert the data we have (and fill in the blanks later). The insertion command supports those cases using the following syntax:

INSERT INTO [Table] ([Column-List]) VALUES [Data];

Different from the previous version, we now specify additionally a (comma-separated) list of columns. When specifying data for the rows to insert, we only need to specify values for this subset of columns. For instance, let's assume the university plans to offer a new course on databases but the precise title of the course is still being debated. In that case, we can insert a row that only specifies a value for the course ID but not yet for the course name:

INSERT INTO Courses (courseID) values (1);

What happens to the columns whose value is not specified? The database system will automatically insert SQL NULL values for those columns. This means also that we cannot omit values for columns in which NULL values are not allowed. For instance, in the last chapter, we added constraints to make the courseID column in the Courses table the primary key. That means NULL values are not allowed for that column. The following statement should therefore lead to an error:

INSERT INTO Courses (name) values ('Intro to Databases');

Quite often, we don't need to start inserting data from scratch. Instead, we might already have data, albeit in a different format. For instance, relational data is commonly represented as CSV (comma-separated values) files on disk. If we have such data available, we can easily load the file content into a table in our relational database.

Commands for loading data from disk may differ across different database management systems. The following command loads CSV data from disk into a PostgreSQL database:

COPY [Table] FROM [File-Path] DELIMITER [Delimiter] NULL [Null-Representation] CSV;

The command loads data from the file at [File-Path] into the table named [Table]. Note that the file structure must match the table schema. For instance, the file must contain values for the precise number of columns found in the table. Also, the value types in the file must match the value types of the table columns (and satisfy all other constraints such as unique values in certain columns across different rows).

Different files may use different symbols to separate values of different columns for the same row. Also, the SQL NULL value may be represented in different ways (e.g., using the string "N/A" or by simply by the empty string). To avoid parsing the data incorrectly, we can explicitly specify the symbols separating values ([Delimiter]) as well as the strings representing SQL NULL values ([Null-Representation]). If we omit those specifications, the database system will assume default values which may or may not match the ones used in the file we want to load.

For instance, the following command loads data for our Courses table:

COPY Courses FROM 'courses.csv' DELIMITER ',' NULL 'N/A' CSV;

This command loads data from the "courses.csv" file, assuming that values of different columns are separated via commata and NULL values are represented by the string "N/A". Finally, note that, beyond the two options discussed here (delimiters and representation of NULL values), the COPY command offers more options (e.g., to specify how different rows are separated). Have a look at the PostgreSQL documentation for more details.

After inserting data, we can delete all or a subset of rows using the following command:

DELETE FROM [Table] WHERE [Condition];

This commands deletes a subset of rows from [Table]. More precisely, the command deletes all rows that satisfy the condition specified behind the keyword WHERE. The [Condition] may refer to all columns and use operators such as equality (=) or inequality (>=, <=) comparisons. The rows for which the condition is satisfied are deleted. SQL conditions are discussed in more detail in the following chapters. For now, we will restrict ourselves to viewing a few simple example conditions. For instance, let's assume we want to delete the course with ID 5. The following SQL command takes care of that:

DELETE FROM Courses WHERE courseID = 5;

In some cases, we do not want to delete rows but merely change their values for a few columns. This is possible using the following SQL command:

UPDATE [Table] SET [Column] = [Value] WHERE [Condition];

This command changes all rows in table [Table] that satisfy [Condition]. It changes those rows by setting [Column] to [Value]. Note that the value does not have to be a constant. Instead, it can be an SQL expression (e.g., performing arithmetic operations between multiple columns) that evaluates to different values in different rows. For instance, let's assume we want to add the suffix " course" to the name of each course in our database. The following SQL command takes care of that (the || operator concatenates two strings in PostgreSQL):

UPDATE Courses SET name = name || ' course';

In this case, we omit the condition completely, meaning that the change affects all rows in the table. Note that we assign a new value to the name column by referencing its old value (at the right side of the equality symbol), then concatenating with the " course" suffix.

To avoid having to issue multiple UPDATE commands to change values in multiple columns, we can simply specify all requested changes in a comma-separated list. E.g., to change the course name and multiply all course IDs with 100 at the same time (careful, since the course ID column must contain unique values, this only works as long as the database contains no courses with the ID values that result from multiplication), we can use the following command:

UPDATE Courses SET name = name || ' course', courseID = courseID * 100;

We have seen how to load, delete, and update data in our database. Next, we will see how we can analyze data using SQL queries.