You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docling --device cuda --num-threads 8 --table-mode accurate --ocr-lang en --from pdf --to text --ocr --verbose "Alejandro Villamor - Subtracting Suffering - An Anti-Aggregationist Approach to Suffering in Nature (2024).pdf" --debug-visualize-ocr --debug-visualize-cells --debug-visualize-layout
The problem is that in the PDF, on the second page, just above the line demarcating the main text from the footnotes there is this text:
Ontological Prevalence of Suffering in Nature. There is an ontological prevalence of suffering over welfare in nature. That is, the net sum or iterative comparison (one by one) of
This fragment of the text has not been converted, so it's missing from the generated text file. (Of course, this is just one example of a missing text from the PDF.)
I investigated further by converting the file to pictures:
convert -background white -alpha remove -density 300 +antialias -interpolate Nearest -quality 90 "Alejandro Villamor - Subtracting Suffering - An Anti-Aggregationist Approach to Suffering in Nature (2024).pdf" /mnt/d/imgs/page-%d.png
And after a long while it generated the below output:
D:\AI\Tools\docling\venv\Lib\site-packages\easyocr\utils.py:221: RuntimeWarning: overflow encountered in scalar add
curr.entries[labeling].prTotal += prBlank + prNonBlank
D:\AI\Tools\docling\venv\Lib\site-packages\easyocr\utils.py:248: RuntimeWarning: overflow encountered in scalar add
curr.entries[newLabeling].prNonBlank += prNonBlank
D:\AI\Tools\docling\venv\Lib\site-packages\easyocr\utils.py:249: RuntimeWarning: overflow encountered in scalar add
curr.entries[newLabeling].prTotal += prNonBlank
D:\AI\Tools\docling\venv\Lib\site-packages\easyocr\utils.py:219: RuntimeWarning: overflow encountered in scalar add
curr.entries[labeling].prNonBlank += prNonBlank
[[[656, 289], [1821, 289], [1821, 420], [656, 420]], 'Subtracting Suffering: An Anti-Aggregationist A Alejandro Villamor Iglesias']
[[[1141, 504], [1338, 504], [1338, 560], [1141, 560]], 'Resumen']
[[[344, 576], [2137, 576], [2137, 1422], [344, 1422]], 'En los ultimos anos, cada vez es ma prevalencia del sufrimiento sobre el b Esta creencia suele coincid Una axiologia sensocentrista segun la cual lo moralmente relevante placer y dolor: Esta combinacion conduce tiene una enorme relevancia moral, Est y argumenta, en su lugar, que podria no ser coherente: La afirmacion de que existe una prevalencia ontologica, en abstracto, del s embargo, no sucede lo mismo al respecto de su puede considerar que un calculo agregacionista sea moralmente valioso, estri pues no hay sujeto que lo sienta. No obstante, podria mantenerse la necesidad d una intervencion positiva en la naturaleza Palabras clave: agregacionismo, antiagregacionismo, etica animal, sufrimiento animal, intervencionismo.']
[[[351, 1537], [666, 1537], [666, 1586], [351, 1586]], '1. Introduction']
[[[344, 1605], [2141, 1605], [2141, 2277], [344, 2277]], "In recent decades; more and more p suffering of non-human animals. In aca phenomenon translates into a growing theoretical interest in the suffering of wild animals (e.g;: Dawkins, 1995; Rolston III, 1992 Horta, 2010a, 2010b, 2015; Faria, 2016; Villamor, 2 Although not a necessary condition;' most of these authors maintain that som that suffering predominates over well-being aggregationist component? into their theories, these positions conduct a controversial inference from the following statement: Ontological Prevalence of Suffering in Nature. There is an ontological prevalence of suffering over welfare in nature: That i comparison (one by one) of"]
[[[344, 2398], [2135, 2398], [2135, 2883], [344, 2883]], "It is important to remember that there is no relation of necessity between consequentialism and aggregationism: Some theories, such as Maximin or Leximin, are clearly consequentialists but not aggregationists (Hirose, 2015, 30-31) Likewise, as Hirose has shown, a be present in deontological theories such as Scanlon's th 2 Even though the consequences could be s a conception of additive aggregation. As Larry Temkin emphasize for example, one might have principles o on weighted totals, like prioritarianism, OI on the highest or best achievements, like some forms of perfectionism, 0 on the wellbeing of those who are worst off, like max"]
[[[1000, 2925], [1479, 2925], [1479, 2974], [1000, 2974]], 'RHV, 2024, No 26,243-267']
[[[1025, 3043], [1054, 3043], [1054, 3058], [1025, 3058]], 'CC']
[[[1074, 3032], [1473, 3032], [1473, 3083], [1074, 3083]], 'CC BY-NC-ND BY Nc ND']
[[[1200, 3157], [1279, 3157], [1279, 3206], [1200, 3206]], '244']
As can be seen, the missing fragment is present there:
Ontological Prevalence of Suffering in Nature. There is an ontological prevalence of suffering over welfare in nature: That i comparison (one by one) of
So, the OCR seems to be working OK. Something else fails in the process, but I don't know what.
I don't know which step in the conversion fails. Hence, I don't know where should I post a specific bug report: here or in a dependent project. Could you please help?
The text was updated successfully, but these errors were encountered:
I'm trying to convert a pdf file (free, openly available file):
Alejandro Villamor - Subtracting Suffering - An Anti-Aggregationist Approach to Suffering in Nature (2024).pdf
Using the following command:
docling --device cuda --num-threads 8 --table-mode accurate --ocr-lang en --from pdf --to text --ocr --verbose "Alejandro Villamor - Subtracting Suffering - An Anti-Aggregationist Approach to Suffering in Nature (2024).pdf" --debug-visualize-ocr --debug-visualize-cells --debug-visualize-layout
on
Debug images are generated, but they're not very helpful. I'm posting examples for ocr, cells, and layout.
Cells:
Layout:
OCR:
The generated file:
Alejandro Villamor - Subtracting Suffering - An Anti-Aggregationist Approach to Suffering in Nature (2024).txt
The problem is that in the PDF, on the second page, just above the line demarcating the main text from the footnotes there is this text:
This fragment of the text has not been converted, so it's missing from the generated text file. (Of course, this is just one example of a missing text from the PDF.)
I investigated further by converting the file to pictures:
convert -background white -alpha remove -density 300 +antialias -interpolate Nearest -quality 90 "Alejandro Villamor - Subtracting Suffering - An Anti-Aggregationist Approach to Suffering in Nature (2024).pdf" /mnt/d/imgs/page-%d.png
and then running
easyocr
manually:easyocr --download_enabled True --detector True --decoder beamsearch --workers 4 --paragraph True --lang en --gpu False --verbose True -f "D:\imgs\page-1.png"
And after a long while it generated the below output:
As can be seen, the missing fragment is present there:
So, the OCR seems to be working OK. Something else fails in the process, but I don't know what.
I don't know which step in the conversion fails. Hence, I don't know where should I post a specific bug report: here or in a dependent project. Could you please help?
The text was updated successfully, but these errors were encountered: