Generated Columns and Computed Columns in SQL

Apr 06, 2026

Derived values can cut down on repeated expressions in day-to-day queries. In MySQL, this feature is called a generated column. For SQL Server, the comparable idea is called a computed column. Both let the database calculate a value from other columns in the same row, which keeps that logic attached to the table instead of repeating it across query text or application code. The main decisions come down to where that value lives, how the engine keeps it current, and what changes after an index is built on top of it. Those choices affect storage space, write overhead, and how much calculation still takes place during reads.

Core Mechanics

Table definitions can hold derived columns, not just values entered directly. With that arrangement, the database keeps a formula at the schema level and applies it row by row under the rules of the engine. Query text stays shorter because the same expression does not need to be written again for every statement that touches the table. That gives the schema a central place for row-level calculations and makes the column part of the table itself rather than something rebuilt in scattered query text.

MySQL Generated Columns

Within MySQL, a generated column is declared with GENERATED ALWAYS AS (expression). The column still has a name and data type, but its value comes from the expression instead of direct input. MySQL supports two forms, VIRTUAL and STORED. A virtual generated column is calculated from other column values in the row, while a stored generated column is materialized as part of the row data. That choice is written directly into the column definition, so the table schema states how MySQL should treat the derived value from the start.

We can see this in action with this syntax example:

CREATE TABLE employees (
    employee_id BIGINT PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    full_name VARCHAR(101)
        GENERATED ALWAYS AS (CONCAT(first_name, ' ', last_name)) VIRTUAL
);

full_name is derived from first_name and last_name, so the application does not send a separate value for it. Every row gets the same formula, and every query that reads full_name is reading from that single schema rule rather than rebuilding the concatenation in ad hoc SQL. That keeps the table definition in charge of the expression and cuts down on repeated formula text in queries.

Generated columns are also helpful when part of a stored value needs to be pulled into its own column. JSON data is a common case because queries can become noisy when the same extraction expression is repeated again and again. Moving that extraction into a generated column keeps the table definition more readable and gives the derived value a normal column name.

CREATE TABLE shipments (
    shipment_id BIGINT PRIMARY KEY,
    payload JSON NOT NULL,
    postal_code VARCHAR(20)
        GENERATED ALWAYS AS (
            JSON_UNQUOTE(JSON_EXTRACT(payload, '$.postalCode'))
        ) STORED
);

postal_code now comes from the payload document, but it behaves like a named column in the table definition. Queries can reference postal_code directly instead of carrying the full JSON_EXTRACT expression every time that value is needed. That is a practical reason generated columns show up in table schemas that carry semi-structured data.

MySQL treats generated columns as part of table structure, not as a loose display feature. That means they interact with schema rules such as indexing support, partitioning support, and foreign key restrictions in documented ways. At this stage, the main point is that a generated column is built into the table definition itself. It is not a query alias, not a view-only expression, and not a temporary value that exists only during a single SELECT.

MySQL also lets you add a generated column later through ALTER TABLE, which is useful when a table already exists and a repeated expression needs to move into the schema:

ALTER TABLE employees
ADD email VARCHAR(255) NOT NULL,
ADD email_domain VARCHAR(255)
    GENERATED ALWAYS AS (SUBSTRING_INDEX(email, '@', -1)) VIRTUAL;

That statement adds email_domain as a derived column without turning it into a value users type in directly. The table now carries the expression as part of its definition, so future queries can refer to email_domain as a column name instead of repeating the string function.

SQL Server Computed Columns

For SQL Server, the matching feature is a computed column. The table definition uses AS (expression) to tell SQL Server that the column value comes from an expression tied to other columns in the same row. By default, a computed column is virtual. SQL Server calculates it when the column is referenced instead of storing it physically in the table. If physical storage is wanted, PERSISTED can be added to the definition.

Let’s see how that looks:

CREATE TABLE Sales.Payroll (
    PayrollId int NOT NULL PRIMARY KEY,
    HoursWorked decimal(9,2) NOT NULL,
    HourlyRate decimal(9,2) NOT NULL,
    GrossPay AS (HoursWorked * HourlyRate)
);

GrossPay is derived from HoursWorked and HourlyRate, so there is no separate insert value for it. SQL Server keeps the formula with the table definition and applies that formula whenever GrossPay is referenced. That keeps arithmetic out of repeated query text and gives the schema a fixed rule for that derived value.

Data type behavior is part of the mechanics too. SQL Server resolves the computed column data type through its normal type precedence rules. If an expression combines values in a way that does not convert cleanly, the definition can fail or produce a type you did not expect. In those cases, CAST or CONVERT is used to state the intended result type directly. That becomes very helpful with money values, formatted strings, or expressions that combine integers with decimals.

This example shows that type choice inside the expression can be stated directly:

CREATE TABLE Sales.Invoices (
    InvoiceId int NOT NULL PRIMARY KEY,
    Subtotal decimal(10,2) NOT NULL,
    TaxRate decimal(5,4) NOT NULL,
    TaxAmount AS (CAST(Subtotal * TaxRate AS decimal(10,2)))
);

TaxAmount is still derived, but the expression now fixes the result type instead of leaving the final type entirely to implicit conversion rules. That keeps the schema aligned with the type you want the column to expose.

Physical storage is optional in SQL Server, and PERSISTED is the keyword that changes the column from virtual to stored in the table. A computed column marked that way is updated by SQL Server when the source columns change.

ALTER TABLE Sales.Payroll
ADD MonthlyPay AS (
    CAST(HoursWorked * HourlyRate * 4.33 AS decimal(12,2))
) PERSISTED;

MonthlyPay is now part of the stored row data rather than a value SQL Server recalculates only at read time. That storage choice becomes more important later when read cost and indexing come into view, but the mechanical point belongs here because PERSISTED is written right into the column definition.

Computed columns also follow stricter schema rules than regular columns. SQL Server does not treat them as ordinary input columns, so you do not insert or update them directly. They cannot be used as DEFAULT definitions, and they have documented rules around nullability, foreign keys, determinism, and indexing. Those rules make more sense once you remember what the column really is. Its value is tied to an expression owned by the table schema, not to direct user-supplied data.

Persistence Indexing Performance Tradeoffs

Storage choices change what the engine has to do during reads and writes. Virtual definitions leave the derived value out of the stored row, stored forms place it in row data, and indexes can place that value inside a search structure the optimizer can read directly. Read speed, row size, and update cost all move when that choice changes. That is where generated columns in MySQL and computed columns in SQL Server stop being only table-definition features and start affecting daily query behavior.

Read activity is only part of the story. Every derived value has to come from somewhere, and the database has to decide when that expression is evaluated, where the result is kept, and what extra maintenance follows when base columns change. Storage stays lower when the value is left virtual, but reads still have to evaluate the expression. Storing the result removes that repeated calculation for the column value itself, though row growth and update overhead rise in exchange. When an index is added, the trade becomes even more noticeable because the engine can search the derived value directly instead of recalculating it during a scan.

Leaving the Value Virtual

Virtual definitions leave the derived value outside the base row. MySQL handles that with VIRTUAL, while SQL Server leaves a computed column virtual until PERSISTED is added. In both products, the formula stays in the schema and the result is calculated when the column is referenced. That keeps row storage lower, which can be useful when the expression is short and the derived value is not pulled into every query.

Let’s look at a lightweight MySQL case:

CREATE TABLE page_views (
    view_id BIGINT PRIMARY KEY,
    viewed_at DATETIME NOT NULL,
    viewed_month CHAR(7)
        GENERATED ALWAYS AS (DATE_FORMAT(viewed_at, '%Y-%m')) VIRTUAL
);

viewed_month does not take space in the row as stored data. MySQL derives it from viewed_at whenever the column is needed. For a short date-format expression like this, that can be a reasonable trade because the source value already exists and the derived value is just a small transformation of it.

This form also helps keep queries from repeating the same expression again and again. Instead of writing DATE_FORMAT(viewed_at, '%Y-%m') in every report query, the table exposes viewed_month as a named column. That keeps query text shorter and reduces the chance that two queries end up with slightly different formulas for what should be the same value.

Read cost still exists, though. If a query touches a large number of rows and needs the virtual column for each row, the engine has to evaluate that expression across the read set. For light expressions that may be perfectly fine. For heavy expressions or very busy read paths, that repeated calculation can become more noticeable. Virtual form, then, fits best when lower row storage is more valuable than pre-saving the result.

SQL Server follows the same broad idea with different terminology. A computed column stays virtual by default, so SQL Server calculates it when the column is referenced instead of storing it physically in the row. That makes virtual computed columns a reasonable fit for columns that tidy up query logic but are not central to the busiest filters and result sets.

Storing the Value in the Row

Stored forms place the derived result inside the row itself. MySQL does that with STORED. SQL Server does it with PERSISTED. Both products then refresh the saved value when dependent columns change. Reads no longer need to recalculate the column value just to return it, but inserts and updates now carry extra maintenance and extra storage.

We can look at a short SQL Server definition that shows that change directly:

CREATE TABLE Sales.SubscriptionBilling (
    BillingId int NOT NULL PRIMARY KEY,
    MonthlyRate decimal(10,2) NOT NULL,
    BillableMonths int NOT NULL,
    ContractTotal AS (CAST(MonthlyRate * BillableMonths AS decimal(12,2))) PERSISTED
);

ContractTotal is stored with the row, so SQL Server updates it when MonthlyRate or BillableMonths changes. That can fit repeated reads of the same amount, mainly when the value appears in result sets again and again and there is little reason to recalculate it every time a query returns it.

MySQL presents the same trade with STORED generated columns. The derived value becomes part of row data, which means a read can fetch it directly rather than evaluating the expression for that column at that moment. That can be useful for values such as extended prices, tax totals, extracted JSON fields referenced constantly, or formatted values that appear across a broad slice of read traffic. Write activity becomes more expensive at the same time. If the dependent columns change, the saved derived value has to be refreshed. That refresh is automatic, but it still means more row maintenance during INSERT and UPDATE. Row size also grows because the result is now part of stored data rather than something built only when needed.

Stored form is really a trade between repeating CPU calculation during reads and carrying extra bytes and maintenance during writes. Tables that are read far more than they are changed may benefit from that trade. Tables with heavy write traffic and only occasional reference to the derived value may not gain much from storing it.

Indexing the Derived Column

Indexes can turn a derived column from a convenience feature into a search-friendly value the optimizer can use for filtering, ordering, and in selected cases index-only retrieval. MySQL allows indexes on stored generated columns, and InnoDB also supports secondary indexes on virtual generated columns. SQL Server allows indexes on computed columns too, but the expression must satisfy documented rules around determinism and precision, or be deterministic and persisted.

Let’s look at a MySQL table where a derived value comes out of JSON:

CREATE TABLE customer_events (
    event_id BIGINT PRIMARY KEY,
    payload JSON NOT NULL,
    region_code VARCHAR(8)
        GENERATED ALWAYS AS (
            JSON_UNQUOTE(JSON_EXTRACT(payload, '$.region'))
        ) VIRTUAL,
    INDEX ix_region_code (region_code)
);

The base row does not store region_code, yet the secondary index does store the generated value in its index records. That gives MySQL something searchable without requiring the table itself to carry the derived value as stored row data. Queries that filter by region_code can then use the index instead of recalculating the JSON_EXTRACT expression across a scan of the table. Also, if a query can be satisfied from the secondary index alone, MySQL can read the generated value from the index record rather than recalculating it from the base row. That makes indexed virtual generated columns more useful than they first appear, because the value may still live in searchable storage even though it is not part of the stored row itself.

SQL Server handles indexed computed columns with stricter rules. Determinism is part of that rule set, which means the expression must always return the same result for the same input values. Precision also matters. If a computed expression is deterministic and precise, indexing can be allowed without storage in some cases. If the expression is deterministic but not precise, PERSISTED becomes part of the route to indexing.

This definition shows a common SQL Server case:

CREATE TABLE Sales.InvoiceLines (
    InvoiceLineId int NOT NULL PRIMARY KEY,
    Quantity int NOT NULL,
    UnitPrice decimal(10,2) NOT NULL,
    ExtendedAmount AS (CAST(Quantity * UnitPrice AS decimal(12,2))) PERSISTED
);

CREATE INDEX IX_InvoiceLines_ExtendedAmount
    ON Sales.InvoiceLines (ExtendedAmount);

ExtendedAmount is now both stored and indexed, which gives SQL Server a searchable derived amount instead of forcing a fresh multiplication across rows during every relevant filter or sort. That can help range predicates, ordered retrieval, and some join cases where the computed value is central to the query.

Constraints can follow those same indexing rules in SQL Server. A computed column may participate in PRIMARY KEY or UNIQUE constraints if the expression satisfies the documented requirements. That matters because it shows the engine is treating the derived value as more than a display convenience. Once the expression meets the rule set, the column can become part of schema-level uniqueness and search structures.

Write overhead rises here too. Each index on a derived column has to be maintained when the dependent values change. In MySQL that means generated values are materialized into secondary index records during row changes. In SQL Server the indexed computed value must also stay in sync with updates to the underlying columns. Reads can get much faster, but that speed comes with added write cost and extra storage in index structures.

Picking the Right Form

Choice comes down to query behavior, row size, expression cost, and indexing rules in the database engine in front of you. Virtual or nonpersisted form fits best when the formula is light, the derived value mainly keeps query text shorter, and storing the result inside every row would add little value. Stored or persisted form fits better when the same derived value is returned constantly or when storage in the row lines up better with the indexing rules needed for that column.

In MySQL, a table that keeps base rows lean while still exposing a searchable derived value shows that balance well:

CREATE TABLE support_requests (
    request_id BIGINT PRIMARY KEY,
    payload JSON NOT NULL,
    priority_code VARCHAR(10)
        GENERATED ALWAYS AS (
            JSON_UNQUOTE(JSON_EXTRACT(payload, '$.priority'))
        ) VIRTUAL,
    INDEX ix_priority_code (priority_code)
);

priority_code stays out of the base row, but the secondary index still carries the derived value for search use. That can be a good fit when preserving row space is more important than storing the extracted value inside every table record, yet fast filtering on that derived field is still needed.

SQL Server makes the choice a little tighter because indexing rules are more restrictive. If an expression is deterministic and precise, an indexed computed column may be allowed without persistence. If the expression is deterministic but not precise, PERSISTED can become necessary before the column is indexed. That means the storage decision is tied not only to read frequency, but also to the rule set attached to computed-column indexing.

Read-heavy reporting, repeated range filters, and frequent ordering by the derived value all push the decision toward indexed forms. Tables with heavy write traffic and only occasional reference to the derived value usually fit better with leaner definitions. Good schema choices come from reading the query load first, then matching that load to row growth, update overhead, and the indexing limits of the engine involved.

Conclusion

Generated columns in MySQL and computed columns in SQL Server give the table a built-in way to derive values from other columns, which moves repeated formulas out of day-to-day queries and into the schema itself. From there, the mechanics come down to when the value is calculated, where it is kept, and how that choice affects reads, writes, and indexing. Virtual definitions save row space but keep calculation on the read side, while stored or persisted definitions add row maintenance in exchange for direct access to the derived value itself. After indexing enters the discussion, those derived values can also become searchable parts of the table structure, which is why the best fit comes from how the expression is read, updated, and filtered in the database.

Share Alexander Obregon's Substack

Alexander Obregon's Substack

Discussion about this post

Ready for more?