Background
When adding a new column with add_column(), a default value is always required. Similarly, when defining a schema for a new CTable, columns with a null_value sentinel still need a separate default. This is redundant: if a column is nullable, the most natural default for existing rows is the null sentinel itself.
Proposed behaviour
- If a spec declares
null_value=X and no explicit default is provided, use X as the default automatically.
- If an explicit
default is provided, it takes precedence as today.
- Same logic should apply at
CTable construction time (when columns are pre-filled).
Example
import blosc2 as b2
from dataclasses import dataclass
@dataclass
class Row:
temperature: float = b2.field(b2.float32(null_value=-9999.0))
label: str = b2.field(b2.string(null_value=""))
t = b2.CTable(Row)
t.extend([{"temperature": 22.5, "label": "sunny"}] * 10)
# Current behaviour: must repeat the null sentinel
t.add_column("pressure", b2.float32(null_value=-9999.0), default=-9999.0)
# Proposed: null_value is picked up automatically
t.add_column("pressure", b2.float32(null_value=-9999.0))
# All existing rows should now read as null
assert t["pressure"].null_count() == 10
Open question
Should default become keyword-only and optional (default=None meaning "use null_value if available, else raise")? Or keep it positional but allow omitting it when null_value is set? Worth discussing before implementing.
Background
When adding a new column with
add_column(), adefaultvalue is always required. Similarly, when defining a schema for a newCTable, columns with anull_valuesentinel still need a separate default. This is redundant: if a column is nullable, the most natural default for existing rows is the null sentinel itself.Proposed behaviour
null_value=Xand no explicitdefaultis provided, useXas the default automatically.defaultis provided, it takes precedence as today.CTableconstruction time (when columns are pre-filled).Example
Open question
Should
defaultbecome keyword-only and optional (default=Nonemeaning "use null_value if available, else raise")? Or keep it positional but allow omitting it whennull_valueis set? Worth discussing before implementing.