SQL Table Identity Columns

Jun 30, 2026

Identity columns let a table assign a new numeric value during an insert. They’re usually used as surrogate row identifiers, which means the value identifies the row inside the database instead of explaining the customer, invoice, order, employee, ticket, or other business item stored in that row. The mechanics are worth knowing because the database is not trying to create perfect counting numbers. It is handing out values safely while inserts, failed inserts, rollbacks, deletes, restarts, imports, and manual resets can all occur. Good table modeling treats identity values as technical identifiers, not as meaningful business data.

Identity Values During Inserts

During an insert, an identity column lets the database fill a numeric value without asking the application to calculate the next number. We usually use that value as a surrogate identifier, which means it gives the row a stable database-level identity instead of carrying meaning about the customer, invoice, employee, ticket, or order stored in that row. The insert statement can focus on the data the application already has, while the database handles the generated value according to the rules attached to that column.

Column Definition

Relational databases support this generated-number behavior, but the syntax is not identical across products. PostgreSQL uses GENERATED ALWAYS AS IDENTITY or GENERATED BY DEFAULT AS IDENTITY, and the identity column is backed by an implicit sequence. SQL Server uses the IDENTITY(seed, increment) property. MySQL commonly uses AUTO_INCREMENT. Oracle Database also supports identity columns with options such as START WITH, INCREMENT BY, CACHE, and NOCYCLE.

The same table idea can look different depending on the database. In PostgreSQL, a customer table can define the row identifier this way:

CREATE TABLE customers (
    customer_id BIGINT GENERATED ALWAYS AS IDENTITY,
    email VARCHAR(255) NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    UNIQUE (customer_id)
);

The customer_id column receives a generated value during insert. BIGINT gives the column a large numeric range, which fits tables that may grow for a long time. The UNIQUE constraint is separate from the identity rule. The identity rule supplies values, while the constraint protects the table from duplicate identifiers.

SQL Server expresses the same table idea with IDENTITY(1, 1). The first number is the seed, which is the starting point. The second number is the increment, which is the amount added as new identity values are requested:

CREATE TABLE dbo.customers (
    customer_id BIGINT IDENTITY(1, 1) NOT NULL,
    email VARCHAR(255) NOT NULL,
    created_at DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME(),
    CONSTRAINT UQ_customers_customer_id UNIQUE (customer_id)
);

This definition starts at 1 and moves forward by 1 for each generated value. Different seeds or increments can be chosen, although most tables use 1 and 1 because the number only needs to identify rows inside the table. The value does not need to match a business sequence unless the table was built for that separate rule.

MySQL uses AUTO_INCREMENT, usually on an integer column indexed as a primary key or unique key:

CREATE TABLE customers (
    customer_id BIGINT NOT NULL AUTO_INCREMENT,
    email VARCHAR(255) NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (customer_id)
);

That definition tells MySQL to generate the customer_id value when the insert leaves it out or requests automatic behavior for that column. Although the syntax differs from PostgreSQL and SQL Server, the table design idea stays close. The database owns the next generated value, and the application stores the returned identifier if it needs to refer to that row later.

The column type should be treated as part of table planning rather than a small detail. SMALLINT or INT can be enough for lookup tables with limited growth, while high-volume tables usually call for a larger type. Moving from a smaller identity type to a larger one after growth has already happened can affect the table, indexes, foreign keys, and application testing. Starting with a type that fits the table’s expected life usually saves trouble later.

Generated Value Flow

During a normal insert, we usually leave the identity column out of the column list. The database fills it during the insert, which keeps application code from reading the current largest id and adding one. That application-side calculation can break when multiple inserts happen at the same time.

PostgreSQL can insert rows by leaving the identity column out:

INSERT INTO customers (email)
VALUES ('alex@example.com');

INSERT INTO customers (email)
VALUES ('kaitlyn@example.com');

Both rows receive generated customer_id values. The application sends the email values, and the database fills the identity column. After the insert succeeds, the application can read the generated id through the database feature or driver method it already uses for inserted rows.

The DEFAULT keyword can also ask the database to generate the value:

INSERT INTO customers (customer_id, email)
VALUES (DEFAULT, 'pippin@example.com');

That form is useful when a statement includes the identity column in the column list for consistency with a larger insert builder. DEFAULT still tells the database to apply the column’s generation rule instead of accepting a manually typed number.

SQL Server has the same general flow, although applications commonly retrieve the generated value with an output clause when the inserted id is needed right away:

INSERT INTO dbo.customers (email)
OUTPUT inserted.customer_id
VALUES ('alex@example.com');

The insert writes the row and returns the generated customer_id from the inserted row. That is safer than asking for the largest id after the insert, because another session could insert a different row between those steps.

MySQL applications often read the most recent automatic value from the current session after an insert. The session part is important because the returned value belongs to the connection that performed the insert, not a global query for the largest value in the table:

INSERT INTO customers (email)
VALUES ('kaitlyn@example.com');

SELECT LAST_INSERT_ID();

This flow is still driven by the database’s automatic value generator. The application does not pick the number before sending the insert. It lets the storage engine assign the value, then reads back the value tied to that insert session.

Generated value flow also affects multi-row inserts. The database may assign values to more than one row from a single statement, and those values belong to the rows created by that statement. The exact return syntax changes by product, but the modeling rule stays the same. We let the database assign the identity value, then capture it through the database’s supported return feature when application code needs it.

Default Rules With Constraints

Rules like GENERATED ALWAYS, GENERATED BY DEFAULT, and BY DEFAULT ON NULL control how strict the database is when an insert statement supplies a value for the identity column. The difference is most obvious during imports, migrations, and administrative inserts. GENERATED ALWAYS is the stricter form, the database is expected to generate the value, and regular inserts should not provide their own value for that column. PostgreSQL allows an explicit value only when the statement includes OVERRIDING SYSTEM VALUE. Oracle Database also treats GENERATED ALWAYS as database-controlled, so a normal insert cannot supply its own identity value.

CREATE TABLE support_tickets (
    ticket_id BIGINT GENERATED ALWAYS AS IDENTITY,
    requester_email VARCHAR(255) NOT NULL,
    subject VARCHAR(200) NOT NULL,
    opened_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    UNIQUE (ticket_id)
);

The table expects ticket_id to come from the database. Insert statements normally provide the request data and leave the identifier alone:

INSERT INTO support_tickets (requester_email, subject)
VALUES ('alex@example.com', 'Password reset link expired');

The generated value becomes the row’s internal identifier. The request email and subject are application data, while ticket_id is database-assigned.

GENERATED BY DEFAULT gives inserts more room. The database generates a value when the insert does not provide one, but an insert can supply a value directly. That can help during a migration where old rows already have internal ids from a prior system.

CREATE TABLE archived_accounts (
    account_id BIGINT GENERATED BY DEFAULT AS IDENTITY,
    email VARCHAR(255) NOT NULL,
    imported_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    UNIQUE (account_id)
);

Normal inserts can still let the database assign the id:

INSERT INTO archived_accounts (email)
VALUES ('kaitlyn@example.com');

Migration inserts can also provide an older id directly:

INSERT INTO archived_accounts (account_id, email)
VALUES (9001, 'pippin@example.com');

That flexibility has a cost. If manual values are allowed, the table needs a constraint to reject duplicates. The identity rule controls generation behavior, but it does not replace a primary key, unique constraint, or unique index in every database. PostgreSQL identity columns are marked NOT NULL, yet uniqueness still has to be declared separately. SQL Server also treats uniqueness as a separate table rule that should be enforced with a constraint or index.

Oracle’s BY DEFAULT ON NULL form fills a generated value when the insert supplies NULL. That form can help when an import process sends NULL for missing identifiers but still needs the database to generate the value. The plain BY DEFAULT form and the ON NULL form are not the same rule. One responds to an omitted column, while the other can also respond to an explicit NULL.

Tables commonly combine identity generation with a primary key when the identity value is meant to be the main row identifier:

CREATE TABLE shipments (
    shipment_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    destination_city VARCHAR(120) NOT NULL,
    destination_state VARCHAR(80) NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

The PRIMARY KEY gives the table its uniqueness and row reference rule. The identity property supplies the value. Those two parts are commonly placed on the same column, but they are not the same feature.

Foreign keys can then refer to that generated identifier from related tables:

CREATE TABLE shipment_events (
    shipment_event_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    shipment_id BIGINT NOT NULL,
    event_name VARCHAR(120) NOT NULL,
    event_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT FK_shipment_events_shipments
        FOREIGN KEY (shipment_id)
        REFERENCES shipments (shipment_id)
);

The child table has its own identity column for event rows, while shipment_id points back to the parent shipment. That layout keeps row identity local to each table and lets relationships be expressed through constraints rather than copied text values.

Sequence Movement After Data Changes

Identity generators move forward as the database assigns values, and that movement is separate from the final row count in the table. We can insert rows, roll back transactions, delete data, reload test tables, or import older records, yet the identity generator still keeps its own state. That is why identity values should be read as generated row identifiers instead of perfect counting numbers.

Rollbacks Create Gaps

Transaction behavior explains an early surprise with identity columns. The database can consume an identity value during an insert attempt, then the transaction can fail or roll back later. The row goes away, but the generated value usually does not return to the pool.

SQL Server states this behavior directly for identity values. If an insert fails or a transaction rolls back, the consumed identity value is not reused. MySQL InnoDB follows the same general rule for AUTO_INCREMENT, where an automatic value is not rolled back after assignment. The result is a table that can have missing numbers without damaged data.

We can see the behavior with a small SQL Server example:

CREATE TABLE dbo.order_attempts (
    order_attempt_id BIGINT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
    customer_email VARCHAR(255) NOT NULL
);

BEGIN TRANSACTION;

INSERT INTO dbo.order_attempts (customer_email)
VALUES ('alex@example.com');

ROLLBACK;

INSERT INTO dbo.order_attempts (customer_email)
VALUES ('kaitlyn@example.com');

SELECT order_attempt_id, customer_email
FROM dbo.order_attempts;

The committed row may receive order_attempt_id 2 instead of 1 because the rolled-back insert already consumed the first identity value. That gap is normal, and the database is not trying to make the visible ids match the number of committed rows.

Failed statements can create the same result, an insert may request an identity value and then fail because another column violates a constraint. The row is rejected, but the identity generator may have already advanced.

CREATE TABLE dbo.account_requests (
    account_request_id BIGINT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE
);

INSERT INTO dbo.account_requests (email)
VALUES ('pippin@example.com');

BEGIN TRY
    INSERT INTO dbo.account_requests (email)
    VALUES ('pippin@example.com');
END TRY
BEGIN CATCH
    SELECT ERROR_MESSAGE() AS error_message;
END CATCH;

INSERT INTO dbo.account_requests (email)
VALUES ('morgan@example.com');

SELECT account_request_id, email
FROM dbo.account_requests
ORDER BY account_request_id;

The duplicate email insert fails, yet the later successful row can still receive a value that skips a number. That skip does not mean a row disappeared from the table. It means the generator assigned a value before the rest of the insert passed every table rule.

Concurrent inserts add more reasons not to expect perfect numbering. Databases let multiple sessions insert data without forcing every session to wait for a gap-free count. Some products also cache generated values for speed, and cached values can be lost after a restart or failure. The practical rule is that identity values should stay stable after assignment, but they should not be treated as gap-free.

Deletes Do Not Reorder Existing Rows

Removing rows does not cause the database to renumber the remaining rows. If rows with ids 1 through 10 exist and row 5 is deleted, rows 6 through 10 keep their original values. The next insert normally continues from the generator’s current state rather than filling the missing value.

That behavior protects references. Other tables may point to the row through a foreign key, logs may contain the old id, and application records may have stored the id for later lookups. Reassigning ids after every delete would make those references unreliable. Identity values are meant to stay attached to the row that received them.

PostgreSQL gives us a nice way to see how the ids can stop matching the row count:

CREATE TABLE help_requests (
    help_request_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    requester_email VARCHAR(255) NOT NULL,
    subject VARCHAR(200) NOT NULL
);

INSERT INTO help_requests (requester_email, subject)
VALUES
    ('alex@example.com', 'Billing question'),
    ('kaitlyn@example.com', 'Login issue'),
    ('pippin@example.com', 'Profile update');

DELETE FROM help_requests
WHERE requester_email = 'kaitlyn@example.com';

INSERT INTO help_requests (requester_email, subject)
VALUES ('morgan@example.com', 'Address change');

SELECT help_request_id, requester_email, subject
FROM help_requests
ORDER BY help_request_id;

The remaining rows can have ids 1, 3, and 4. The missing 2 only means a row with that value was deleted. It does not mean the table needs repair, and it does not mean the next generated value should move backward.

The same idea applies to SQL Server and MySQL. Deleting data changes which rows are visible, but it does not normally change the identity generator’s next value. That separation is part of why identity columns fit internal row references. The id belongs to the row, not to the row’s current position in a count.

Resets Need Care

Changing an identity generator affects future inserts, so resets need careful handling. Development databases and test tables may be reset frequently, but production tables call for a stricter review because a reset can send the generator back into values that already exist. SQL Server provides DBCC CHECKIDENT for checking or changing the current identity value. NORESEED reports the current identity information, while RESEED changes the current value that SQL Server uses to calculate the next generated value.

DBCC CHECKIDENT ('dbo.order_attempts', NORESEED);

DBCC CHECKIDENT ('dbo.order_attempts', RESEED, 1000);

After reseeding a non-empty table to 1000 with an increment of 1, the next generated value is normally 1001. If the table was emptied with TRUNCATE TABLE, the next inserted row can receive the reseed value itself. If rows were removed with DELETE, SQL Server still follows the reseed-plus-increment behavior for the next generated value.

The risk appears when the new value is lower than values already stored in the table. If a primary key or unique constraint exists, future inserts can fail when the generator reaches an existing value. Without such a constraint, duplicate identifiers could be stored, which would create serious trouble for row lookups and relationships.

DBCC CHECKIDENT ('dbo.order_attempts', RESEED, 1);

INSERT INTO dbo.order_attempts (customer_email)
VALUES ('riley@example.com');

If order_attempt_id 2 already exists, this insert can fail because the reseeded generator may try to reuse that value. The constraint catches the duplicate, but the table still ends up with a broken insert flow until the generator is moved above the existing maximum value.

For PostgreSQL identity columns, the backing sequence state can be adjusted when needed. That commonly comes up after loading rows with explicit ids during a migration. The generator should be moved above the largest stored id before normal inserts resume.

SELECT setval(
    pg_get_serial_sequence('help_requests', 'help_request_id'),
    (SELECT MAX(help_request_id) FROM help_requests)
);

That statement sets the related sequence based on the current maximum id in the table. The next generated value then follows from the sequence state instead of colliding with values that were imported earlier.

Resets are usually safe for empty test tables, disposable local databases, or carefully planned migrations. They are not a cosmetic fix for missing numbers. Gaps do not need repair just because the visible ids are not consecutive.

Business Meaning Belongs Elsewhere

Identity values should not carry business meaning. They do not prove creation order across every process, they do not prove that no row was deleted, and they do not prove that every attempted insert committed. They only tell us which generated value was assigned to a row.

Invoice numbers are a common place where this distinction becomes important. An internal identity value can identify the invoice row, while a separate invoice number can follow accounting, customer-facing, or legal rules. Those rules may involve prefixes, fiscal-year ranges, branch codes, approval timing, or a separate numbering table. The identity column should not carry all of that meaning.

CREATE TABLE invoices (
    invoice_row_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    invoice_number VARCHAR(40) NOT NULL UNIQUE,
    customer_id BIGINT NOT NULL,
    issued_on DATE NOT NULL,
    total_due DECIMAL(12, 2) NOT NULL
);

The invoice_row_id value helps with joins and internal references. The invoice_number value is the business-facing identifier. Keeping those values separate lets the database tolerate identity gaps without changing the number customers see on an invoice.

Order numbers, ticket numbers, receipt numbers, claim numbers, and public reference codes follow the same rule. If people outside the database read the value as meaningful, it deserves its own column and its own validation rules. Identity values can still support internal joins, but they should not be treated as proof of order, count, or business sequence.

Public APIs can benefit from this separation too. Exposing sequential identity values can reveal growth trends or make record discovery easier than intended. A separate public reference can be random, formatted, or generated by a different rule, while the identity column remains inside the database model.

CREATE TABLE customer_cases (
    case_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    public_case_code VARCHAR(32) NOT NULL UNIQUE,
    customer_email VARCHAR(255) NOT NULL,
    case_title VARCHAR(200) NOT NULL,
    opened_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

The table can rely on case_id for joins and foreign keys while public_case_code appears in emails, URLs, support messages, and customer screens. That separation keeps the internal row identifier useful without forcing it to act like business data.

Conclusion

Identity columns give SQL tables a database-managed way to assign row identifiers during inserts, but their mechanics are tied to generation rules, sequence state, constraints, and transaction behavior rather than perfect counting. We can rely on them for stable internal references, while still expecting gaps after failed inserts, rollbacks, deletes, cached values, and reseeds. Treating the identity value as technical data keeps the table model safer, while business-facing numbers can follow their own rules in separate columns.

Share Alexander Obregon's Substack

Alexander Obregon's Substack

Discussion about this post

Ready for more?