Skip to content

Strive for "equivalent" semantics of datasplash.api fns and clojure.core fns #134

@FiV0

Description

@FiV0

I think it would be nice if the datasplash.api functions have kind of the same semantics as the clojure.core ones as it otherwise makes debugging things at the repl extremely painful.
As an example.

  (def data1 {:a (int 1)})
  (def data2 {:a (long 1)})

  (= data1 data2)
  ;; => true

  (clojure.set/intersection #{data1} #{data2})
  ;; => #{{:a 1}}

  (let [p (ds/make-pipeline {})
        input1 (ds/generate-input [data1] p)
        input2 (ds/generate-input [data2] p)
        _ (ds/->> :intersect-pipeline
                  (ds/intersect-distinct {:name :intersect} input1 input2)
                  (ds/write-json-file "test-output" {}))]
    (-> (ds/run-pipeline p)
        (ds/wait-pipeline-result)))

The last pipeline produces no results (it will when changing data2 to {:a (int 1)}. The problem is that if there is need to compare, intersect or group-by a lot of data, it is first needed to make all the rows comparable (with something like clojure.walk ) which can be very expensive.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions