admin管理员组

文章数量:1435859

In contrast to CHAR, VARCHAR values are stored as a 1-byte or 2-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.

MySQL official documentation

Let us consider this problem on EC store case. Here is how could be defined the product (item) entity (most likely you can understand this code even has not learned the TypeScript):

import { FIXED_CHARACTERS_COUNT_IN_UNIVERSAL_UNIQUE_ID__VERSION_4 } from "fundamental-constants";


type Product = {
  readonly ID: Product.ID;
  label: string;
  price__dollars__withoutTaxes: number;
};


namespace Product {

  export type ID = string;
  export namespace ID {
    export const TYPE: StringConstructor = String;
    export const REQUIRED: boolean = true;
    export const FIXED_CHARACTERS_COUNT: number = FIXED_CHARACTERS_COUNT_IN_UNIVERSAL_UNIQUE_ID__VERSION_4;
  }

  export namespace Label {
    export const TYPE: StringConstructor = String;
    export const REQUIRED: boolean = true;
    export const MINIMAL_CHARACTERS_COUNT: number = 2;
    export const MAXIMAL_CHARACTERS_COUNT: number = 127;
  }

  export namespace Price__Dollars__WihtoutTaxes {
    export const TYPE: NumberConstructor = Number;
    export const REQUIRED: boolean = true;
    export const MINIMAL_VALUE: number = 0;
  }

}

Both inputted data validation on the frontend side and request data validation at backend same as database definition must obey to above business rules. Particularly, the product label must include from 2 to 127 characters:

Assume that above values are never directly inputted twice at both frontend and backend - instead, it is been referred:

<!-- BAD: the maximal characters count has been HARDCODED -->
<label for"PRODUCT_LABEL--INPUT">Please input 2-127 characters.</label>
<input type="text" maxlen="127" id="PRODUCT_LABEL--INPUT" />

<!-- GOOD: the maximal characters count has been referred (no matter what is the template engine)-->
<label for"PRODUCT_LABEL--INPUT">Please input {{ Product.Label.MINIMAL_CHARACTERS_COUNT }}-{{ Product.Label.MAXIMAL_CHARACTERS_COUNT }} characters.</label>
<input type="text" maxlen="{{ Product.Label.MAXIMAL_CHARACTERS_COUNT }}" id="PRODUCT_LABEL--INPUT" />

When defining the database (now matter, how exactly - via raw SQL request, GUI tool or ORM), we also will set the VARCHAR-like type for the label column of the Products tables with 127 characters maximal length (again, by referring to Product.Label.MAXIMAL_CHARACTERS_COUNT instead of direct input of 127 value).

Then, assume that the seller has inputted the product label consists almost 127 characters, but including 2-byte ones. Validation on the frontend has not been threat the inputted value same as validation of request data at the backend. But once the server application will try to save the added (or updated) product to the table, we'll get the exception about label's value is exceeding the maximal length!

Question: which value must be set in Product.Label.MAXIMAL_CHARACTERS_COUNT? (Let me repeat that this value is being referred from both frontend and the backend).

In contrast to CHAR, VARCHAR values are stored as a 1-byte or 2-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.

MySQL official documentation

Let us consider this problem on EC store case. Here is how could be defined the product (item) entity (most likely you can understand this code even has not learned the TypeScript):

import { FIXED_CHARACTERS_COUNT_IN_UNIVERSAL_UNIQUE_ID__VERSION_4 } from "fundamental-constants";


type Product = {
  readonly ID: Product.ID;
  label: string;
  price__dollars__withoutTaxes: number;
};


namespace Product {

  export type ID = string;
  export namespace ID {
    export const TYPE: StringConstructor = String;
    export const REQUIRED: boolean = true;
    export const FIXED_CHARACTERS_COUNT: number = FIXED_CHARACTERS_COUNT_IN_UNIVERSAL_UNIQUE_ID__VERSION_4;
  }

  export namespace Label {
    export const TYPE: StringConstructor = String;
    export const REQUIRED: boolean = true;
    export const MINIMAL_CHARACTERS_COUNT: number = 2;
    export const MAXIMAL_CHARACTERS_COUNT: number = 127;
  }

  export namespace Price__Dollars__WihtoutTaxes {
    export const TYPE: NumberConstructor = Number;
    export const REQUIRED: boolean = true;
    export const MINIMAL_VALUE: number = 0;
  }

}

Both inputted data validation on the frontend side and request data validation at backend same as database definition must obey to above business rules. Particularly, the product label must include from 2 to 127 characters:

Assume that above values are never directly inputted twice at both frontend and backend - instead, it is been referred:

<!-- BAD: the maximal characters count has been HARDCODED -->
<label for"PRODUCT_LABEL--INPUT">Please input 2-127 characters.</label>
<input type="text" maxlen="127" id="PRODUCT_LABEL--INPUT" />

<!-- GOOD: the maximal characters count has been referred (no matter what is the template engine)-->
<label for"PRODUCT_LABEL--INPUT">Please input {{ Product.Label.MINIMAL_CHARACTERS_COUNT }}-{{ Product.Label.MAXIMAL_CHARACTERS_COUNT }} characters.</label>
<input type="text" maxlen="{{ Product.Label.MAXIMAL_CHARACTERS_COUNT }}" id="PRODUCT_LABEL--INPUT" />

When defining the database (now matter, how exactly - via raw SQL request, GUI tool or ORM), we also will set the VARCHAR-like type for the label column of the Products tables with 127 characters maximal length (again, by referring to Product.Label.MAXIMAL_CHARACTERS_COUNT instead of direct input of 127 value).

Then, assume that the seller has inputted the product label consists almost 127 characters, but including 2-byte ones. Validation on the frontend has not been threat the inputted value same as validation of request data at the backend. But once the server application will try to save the added (or updated) product to the table, we'll get the exception about label's value is exceeding the maximal length!

Question: which value must be set in Product.Label.MAXIMAL_CHARACTERS_COUNT? (Let me repeat that this value is being referred from both frontend and the backend).

Share Improve this question edited Nov 17, 2024 at 6:57 Progman 19.7k7 gold badges55 silver badges82 bronze badges asked Nov 16, 2024 at 2:29 Takeshi Tokugawa YDTakeshi Tokugawa YD 1,0408 gold badges66 silver badges178 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 0

The length argument for the VARCHAR(L) column specifies how many characters can be saved. This does not include the number of additional bytes needed for the MySQL database to store the value in the table. The quoted documentation only specifies how many bytes are needed additionally to store a value in a VARCHAR column. See the following example:

mysql> CREATE TABLE Dummy (Label VARCHAR(10));
Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO Dummy(Label) VALUES('12345');
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO Dummy(Label) VALUES('123456789');
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO Dummy(Label) VALUES('1234567890');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO Dummy(Label) VALUES('12345678901');
ERROR 1406 (22001): Data too long for column 'Label' at row 1

As you see it is possible to save the string 123456789 (length of nine) in the VARCHAR(10) column, since 9<=10. It will however require additional 1 byte to save the data.

When you try to save the string 1234567890 (length of ten) in the VARCHAR(10) column, it will works as well since 10<=10. Again, it needs additional 1 byte for the length of the string.

The value 12345678901 cannot be saved since the string has a length of eleven and is too big to save in a column of type VARCHAR(10).

So when you want to save only labels with a maximum length of 127, then use VARCHAR(127). A user will be able to save values with a string up to a length of 127, but no bigger strings.

Keep in mind that the data is stored as characters, not bytes. This means that the value äöüäöüäöü (nine umlauts) can be saved in a VARCHAR(10) column, since 9<=10, even though 18+1 bytes are needed to save the data in the table. See the following SELECT statement:

mysql> SELECT Label, LENGTH(Label) FROM Dummy;
+--------------------+---------------+
| Label              | LENGTH(Label) |
+--------------------+---------------+
| 12345              |             5 |
| 123456789          |             9 |
| 1234567890         |            10 |
| äöüäöü             |            12 |
| äöüäöüöäü          |            18 |
+--------------------+---------------+
5 rows in set (0.00 sec)

mysql> EXPLAIN Dummy;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| Label | varchar(10) | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+
1 row in set (0.00 sec)

Your quote from the MySQL documentation at the top seems to show that you're conflating two concerns. The "2-byte length prefix" the documentation refers to is just a number stored at the beginning of every varchar column value which represents the length of the string contained within that column. For your purposes, it's not something you really need to be thinking about.

At least from my understanding of your question, the 2-byte values that you seem to concerned about would actually be Unicode characters within the text which require multiple bytes to be represented (and it is worth noting, that there are plenty of Unicode characters out there that require significantly more than two bytes as well).

As a general rule of thumb, you should consider all of your character limits in terms of actual unicode character units, rather than as e.g. byte limits --- e.g. if I have 10 unicode characters that each require 4 bytes to store, I should be at 10/127 of your character limit, not 40/127.

This is how MySQL works, assuming you're on a version > 5, and have your table configured to use UTF-8 (docs):

For definitions of character string columns (CHAR, VARCHAR, and the TEXT types), MySQL interprets length specifications in character units.

However, this is not how maxlength and minlength in HTML work --- they measure in single UTF-16 code units (so essentially two bytes per character), so if you have e.g. a large emoji they will, out-of-the-box, not count it correctly:

<p>You can't type any additional characters into this box, as
the flag emojis  use 4 code points each:</p>
<input type="text" value="

本文标签: