PHP version
8.4
PHP SQLSRV or PDO_SQLSRV version
5.13
Microsoft ODBC Driver version
unixODBC 2.3.11 / ODBC Driver 17 for SQL Server
Also tested with version 18, problem still exists.
SQL Server version
Microsoft SQL Server 2019
Client operating system
Windows 11
Table schema
CREATE TABLE dbo.roundtripjson
(
id int IDENTITY(1,1) NOT NULL,
name varchar(255) COLLATE Latin1_General_100_CI_AS_SC_UTF8 NULL,
json varchar(max) COLLATE Latin1_General_100_BIN2_UTF8 NOT NULL,
CONSTRAINT PK_roundtripjson PRIMARY KEY (id)
);
Problem description
We are currently testing a migration from the legacy collation SQL_Latin1_General_CP1_CI_AS to UTF-8 collations (_UTF8) in SQL Server 2019.
When using PDO::SQLSRV_ENCODING_UTF8, UTF-8 data roundtrips correctly. However, when using prepared statements, string parameters appear to be sent through the Unicode path, resulting in NVARCHAR semantics in SQL Server.
This leads to implicit conversions when comparing against VARCHAR(_UTF8) columns and makes it difficult to use UTF-8 collations efficiently.
At the moment, there does not appear to be a documented or supported way in pdo_sqlsrv to bind parameters as VARCHAR while still using UTF-8 encoding.
Expected behavior and actual behavior
Expected behavior
When using PDO::SQLSRV_ENCODING_UTF8 with VARCHAR columns that have a _UTF8 collation, string parameters in prepared statements should be treated as VARCHAR (UTF-8), so that:
- no implicit conversion is required
- indexes on
VARCHAR columns can be used efficiently
- query plans remain optimal
Actual behavior
- UTF-8 data roundtrips correctly
- however, prepared string parameters appear to be treated as
NVARCHAR
- this results in implicit conversions when compared against
VARCHAR(_UTF8) columns
- as a result, index usage and query plans may be negatively affected
Repro code or steps to reproduce
//$pdo->setAttribute(\PDO::SQLSRV_ATTR_ENCODING, \PDO::SQLSRV_ENCODING_SYSTEM); // varchar is used, but utf-8 is broken
//$pdo->setAttribute(\PDO::SQLSRV_ATTR_ENCODING, \PDO::SQLSRV_ENCODING_UTF8); // nvarchar is used, so performance is broken
$sql = <<<'SQL'
IF OBJECT_ID('dbo.roundtripjson', 'U') IS NULL
BEGIN
CREATE TABLE dbo.roundtripjson
(
id int IDENTITY(1,1) NOT NULL,
name varchar(255) COLLATE Latin1_General_100_CI_AS_SC_UTF8 NULL,
json varchar(max) COLLATE Latin1_General_100_BIN2_UTF8 NOT NULL,
CONSTRAINT PK_roundtripjson PRIMARY KEY (id)
);
END;
SQL;
$pdo->exec($sql);
// Clear table for this test run
$pdo->exec('TRUNCATE TABLE dbo.roundtripjson');
$testCases = [
[
'label' => 'basic_utf8',
'name' => 'Example – name with UTF-8 👍',
'data' => ['text' => 'Example – test with UTF-8 👍'],
],
[
'label' => 'europe_mixed',
'name' => 'Straße Köln – Grüß Gott',
'data' => ['text' => 'Straße Köln – Grüß Gott'],
],
[
'label' => 'multi_script_emoji',
'name' => '日本語 / русский / عربى 😀',
'data' => ['text' => '日本語と漢字 / Привет мир / مرحبا 😀'],
],
];
foreach ($testCases as $case) {
$json = json_encode($case['data'], JSON_UNESCAPED_UNICODE);
// Insert using PDO prepared statement
$stmt = $pdo->prepare('INSERT INTO roundtripjson (name, json) VALUES (?, ?)');
$stmt->execute([$case['name'], $json]);
$id = $pdo->lastInsertId();
// Read back using PDO
$stmt = $pdo->prepare('SELECT name, json FROM roundtripjson WHERE id = ?');
$stmt->execute([$id]);
$row = $stmt->fetch(\PDO::FETCH_ASSOC);
$nameMatches = $case['name'] === $row['name'] ? 'YES' : 'NO';
$jsonMatches = $json === $row['json'] ? 'YES' : 'NO';
echo "==== CASE (PDO): {$case['label']} ====\n";
echo "id: {$id}\n";
echo "name original : {$case['name']}\n";
echo "name from DB : {$row['name']}\n";
echo "name matches : {$nameMatches}\n";
echo "json original : {$json}\n";
echo "json from DB : {$row['json']}\n";
echo "json matches : {$jsonMatches}\n\n";
}
PHP version
8.4
PHP SQLSRV or PDO_SQLSRV version
5.13
Microsoft ODBC Driver version
unixODBC 2.3.11 / ODBC Driver 17 for SQL Server
Also tested with version 18, problem still exists.
SQL Server version
Microsoft SQL Server 2019
Client operating system
Windows 11
Table schema
Problem description
We are currently testing a migration from the legacy collation
SQL_Latin1_General_CP1_CI_ASto UTF-8 collations (_UTF8) in SQL Server 2019.When using
PDO::SQLSRV_ENCODING_UTF8, UTF-8 data roundtrips correctly. However, when using prepared statements, string parameters appear to be sent through the Unicode path, resulting inNVARCHARsemantics in SQL Server.This leads to implicit conversions when comparing against
VARCHAR(_UTF8)columns and makes it difficult to use UTF-8 collations efficiently.At the moment, there does not appear to be a documented or supported way in
pdo_sqlsrvto bind parameters asVARCHARwhile still using UTF-8 encoding.Expected behavior and actual behavior
Expected behavior
When using
PDO::SQLSRV_ENCODING_UTF8withVARCHARcolumns that have a_UTF8collation, string parameters in prepared statements should be treated asVARCHAR(UTF-8), so that:VARCHARcolumns can be used efficientlyActual behavior
NVARCHARVARCHAR(_UTF8)columnsRepro code or steps to reproduce