Skip to content

Using UTF-8 encoding in PDO_SQLSRV results in NVARCHAR parameters instead of VARCHAR(_UTF8) #1587

@jochem-blok

Description

@jochem-blok

PHP version
8.4

PHP SQLSRV or PDO_SQLSRV version
5.13

Microsoft ODBC Driver version
unixODBC 2.3.11 / ODBC Driver 17 for SQL Server

Also tested with version 18, problem still exists.

SQL Server version
Microsoft SQL Server 2019

Client operating system
Windows 11

Table schema

    CREATE TABLE dbo.roundtripjson
    (
        id   int IDENTITY(1,1) NOT NULL,
        name varchar(255) COLLATE Latin1_General_100_CI_AS_SC_UTF8 NULL,
        json varchar(max) COLLATE Latin1_General_100_BIN2_UTF8 NOT NULL,
        CONSTRAINT PK_roundtripjson PRIMARY KEY (id)
    );

Problem description
We are currently testing a migration from the legacy collation SQL_Latin1_General_CP1_CI_AS to UTF-8 collations (_UTF8) in SQL Server 2019.

When using PDO::SQLSRV_ENCODING_UTF8, UTF-8 data roundtrips correctly. However, when using prepared statements, string parameters appear to be sent through the Unicode path, resulting in NVARCHAR semantics in SQL Server.

This leads to implicit conversions when comparing against VARCHAR(_UTF8) columns and makes it difficult to use UTF-8 collations efficiently.

At the moment, there does not appear to be a documented or supported way in pdo_sqlsrv to bind parameters as VARCHAR while still using UTF-8 encoding.


Expected behavior and actual behavior

Expected behavior
When using PDO::SQLSRV_ENCODING_UTF8 with VARCHAR columns that have a _UTF8 collation, string parameters in prepared statements should be treated as VARCHAR (UTF-8), so that:

  • no implicit conversion is required
  • indexes on VARCHAR columns can be used efficiently
  • query plans remain optimal

Actual behavior

  • UTF-8 data roundtrips correctly
  • however, prepared string parameters appear to be treated as NVARCHAR
  • this results in implicit conversions when compared against VARCHAR(_UTF8) columns
  • as a result, index usage and query plans may be negatively affected

Repro code or steps to reproduce

//$pdo->setAttribute(\PDO::SQLSRV_ATTR_ENCODING, \PDO::SQLSRV_ENCODING_SYSTEM); // varchar is used, but utf-8 is broken
//$pdo->setAttribute(\PDO::SQLSRV_ATTR_ENCODING, \PDO::SQLSRV_ENCODING_UTF8); // nvarchar is used, so performance is broken

$sql = <<<'SQL'
IF OBJECT_ID('dbo.roundtripjson', 'U') IS NULL
BEGIN
    CREATE TABLE dbo.roundtripjson
    (
        id   int IDENTITY(1,1) NOT NULL,
        name varchar(255) COLLATE Latin1_General_100_CI_AS_SC_UTF8 NULL,
        json varchar(max) COLLATE Latin1_General_100_BIN2_UTF8    NOT NULL,
        CONSTRAINT PK_roundtripjson PRIMARY KEY (id)
    );
END;
SQL;

$pdo->exec($sql);

// Clear table for this test run
$pdo->exec('TRUNCATE TABLE dbo.roundtripjson');

    $testCases = [
        [
            'label' => 'basic_utf8',
            'name'  => 'Example – name with UTF-8 👍',
            'data'  => ['text' => 'Example – test with UTF-8 👍'],
        ],
        [
            'label' => 'europe_mixed',
            'name'  => 'Straße Köln – Grüß Gott',
            'data'  => ['text' => 'Straße Köln – Grüß Gott'],
        ],
        [
            'label' => 'multi_script_emoji',
            'name'  => '日本語 / русский / عربى 😀',
            'data'  => ['text' => '日本語と漢字 / Привет мир / مرحبا 😀'],
        ],
    ];

    foreach ($testCases as $case) {
        $json = json_encode($case['data'], JSON_UNESCAPED_UNICODE);

        // Insert using PDO prepared statement
        $stmt = $pdo->prepare('INSERT INTO roundtripjson (name, json) VALUES (?, ?)');
        $stmt->execute([$case['name'], $json]);

        $id = $pdo->lastInsertId();

        // Read back using PDO
        $stmt = $pdo->prepare('SELECT name, json FROM roundtripjson WHERE id = ?');
        $stmt->execute([$id]);
        $row = $stmt->fetch(\PDO::FETCH_ASSOC);

        $nameMatches = $case['name'] === $row['name'] ? 'YES' : 'NO';
        $jsonMatches = $json === $row['json'] ? 'YES' : 'NO';

        echo "==== CASE (PDO): {$case['label']} ====\n";
        echo "id: {$id}\n";
        echo "name original : {$case['name']}\n";
        echo "name from DB  : {$row['name']}\n";
        echo "name matches  : {$nameMatches}\n";
        echo "json original : {$json}\n";
        echo "json from DB  : {$row['json']}\n";
        echo "json matches  : {$jsonMatches}\n\n";
    }

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions