Skip to content
This repository was archived by the owner on Apr 27, 2026. It is now read-only.

Adjust reading order when there are no columns#190

Open
zuphilip wants to merge 1 commit intomasterfrom
kassenzettel
Open

Adjust reading order when there are no columns#190
zuphilip wants to merge 1 commit intomasterfrom
kassenzettel

Conversation

@zuphilip
Copy link
Copy Markdown
Collaborator

@zuphilip zuphilip commented Mar 9, 2017

When there are no white column separators and no black
columns separators, then we determine the reading order
by simply looking which line is above which other lines.

When there are no columns white separators and no black
columns separators, then we determine the reading order
by simply looking which line is above which other lines.
@zuphilip
Copy link
Copy Markdown
Collaborator Author

The ordering of the textual parts on the same line is sometimes wrong.

@mittagessen
Copy link
Copy Markdown

This might not be entirely pertinent to this pull request but inclusion of an equivalent switch arose a few days ago in kraken (from an RTL reading order bug). I am not confident that there aren't edge cases where horizontal intra-column ordering is desirable that such a switch would disable. Have you looked into scenarios where such a switch would fail?

It isn't like the current segmenter is particulary good and I still plan on replacing it reasonably soon but it still shouldn't be made worse for some inputs.

@zuphilip
Copy link
Copy Markdown
Collaborator Author

@mittagessen Another PR #118 is also about the reading order. The picture there shows some of the cases which you might want to test as edge cases. The current code about this is hard to understand and will also only provide a partial ordering which is then extended to the transient hull later.

Maybe the question is also: do we expect that the columns recognition are not giving us all the details which we then have to handle in the reading order step?

@mittagessen
Copy link
Copy Markdown

I would expect the reading order determination to be completely independent of column detection, just as it is right now, mainly because the metric for column separation is different from the one used to separate lines horizontally.

Disabling the horizontal ordering results in minor vertical variations in typesetting causing reordering. A theoretical example I found is certain (single column) poetry that is justified through whitespace in the middle of the line; without y_overlap logic the ordering of the two line parts should be mostly random.

Admittedly, that's an esoteric use case but the proposed behavior would be worse for these inputs than the old one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants