Adjust reading order when there are no columns#190
Conversation
When there are no columns white separators and no black columns separators, then we determine the reading order by simply looking which line is above which other lines.
|
The ordering of the textual parts on the same line is sometimes wrong. |
|
This might not be entirely pertinent to this pull request but inclusion of an equivalent switch arose a few days ago in kraken (from an RTL reading order bug). I am not confident that there aren't edge cases where horizontal intra-column ordering is desirable that such a switch would disable. Have you looked into scenarios where such a switch would fail? It isn't like the current segmenter is particulary good and I still plan on replacing it reasonably soon but it still shouldn't be made worse for some inputs. |
|
@mittagessen Another PR #118 is also about the reading order. The picture there shows some of the cases which you might want to test as edge cases. The current code about this is hard to understand and will also only provide a partial ordering which is then extended to the transient hull later. Maybe the question is also: do we expect that the columns recognition are not giving us all the details which we then have to handle in the reading order step? |
|
I would expect the reading order determination to be completely independent of column detection, just as it is right now, mainly because the metric for column separation is different from the one used to separate lines horizontally. Disabling the horizontal ordering results in minor vertical variations in typesetting causing reordering. A theoretical example I found is certain (single column) poetry that is justified through whitespace in the middle of the line; without y_overlap logic the ordering of the two line parts should be mostly random. Admittedly, that's an esoteric use case but the proposed behavior would be worse for these inputs than the old one. |
When there are no white column separators and no black
columns separators, then we determine the reading order
by simply looking which line is above which other lines.