Skip to content

Use null_value as implicit default when adding/defining nullable columns #631

@Jacc4224

Description

@Jacc4224

Background

When adding a new column with add_column(), a default value is always required. Similarly, when defining a schema for a new CTable, columns with a null_value sentinel still need a separate default. This is redundant: if a column is nullable, the most natural default for existing rows is the null sentinel itself.

Proposed behaviour

  • If a spec declares null_value=X and no explicit default is provided, use X as the default automatically.
  • If an explicit default is provided, it takes precedence as today.
  • Same logic should apply at CTable construction time (when columns are pre-filled).

Example

import blosc2 as b2
from dataclasses import dataclass

@dataclass
class Row:
    temperature: float = b2.field(b2.float32(null_value=-9999.0))
    label: str = b2.field(b2.string(null_value=""))

t = b2.CTable(Row)
t.extend([{"temperature": 22.5, "label": "sunny"}] * 10)

# Current behaviour: must repeat the null sentinel
t.add_column("pressure", b2.float32(null_value=-9999.0), default=-9999.0)

# Proposed: null_value is picked up automatically
t.add_column("pressure", b2.float32(null_value=-9999.0))

# All existing rows should now read as null
assert t["pressure"].null_count() == 10

Open question

Should default become keyword-only and optional (default=None meaning "use null_value if available, else raise")? Or keep it positional but allow omitting it when null_value is set? Worth discussing before implementing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions