Skip to content

[Feature] ISS 35: Semantic Compression Schema Design#96

Open
jonpsy wants to merge 2 commits intomainfrom
semantic_pr
Open

[Feature] ISS 35: Semantic Compression Schema Design#96
jonpsy wants to merge 2 commits intomainfrom
semantic_pr

Conversation

@jonpsy
Copy link
Copy Markdown
Collaborator

@jonpsy jonpsy commented Sep 10, 2025

@jonpsy
Copy link
Copy Markdown
Collaborator Author

jonpsy commented Sep 10, 2025

Addresses issue #35

@MartinLeitgab
Copy link
Copy Markdown
Owner

MartinLeitgab commented Sep 10, 2025

Hi Team @jonpsy @mmulet @Hacxmr , thank you for creating the PR!

  • Please provide documentation on the definition of the new attributes cycled_id, created_at and updated_at in the PR and in the code (compact).
  • From scalability considerations (memory management) @axellabs
    • I think cycled_id may relate to how many compression operations this node was already touched by- if we only consider parent nodes in recursive compression, would this for any single node necessarily be either 0 and 1 (because child nodes of a former compression step are not directly considered anymore in subsequent compressions), so that we could remove it?
    • What can we learn from the created_at and updated_at timestamps? If we retain information whether a node was the result of a compression operation e.g. by the tombstone or compression-edge assignments, can we remove the timestamps?

@jonpsy
Copy link
Copy Markdown
Collaborator Author

jonpsy commented Sep 10, 2025

@MartinLeitgab

  1. Sure I'll make the documentation
  2. Correct, cycle_id will be a unique id which stores which "generation" this node belongs to. This helps us to get a picture of "what was the state of things in generation X". Happy to hear alternatives though.
  3. It's for checking against tampering and for sanity purposes. It's possible someone might accidentally edit these nodes or say re-create an existing node destroying its relation with others (since we do an UPSERT). This will be caught easily with updated_at != created_at. It might also be useful to capture the identity of query push, but I left that bit because I like to trust people :P

@MartinLeitgab MartinLeitgab linked an issue Sep 11, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Semantic Compression with LLMs

3 participants