Long time no write.. didn't meant I'm not around. This post is yet another prove that I'm a faithful Windows user :-). Admittedly I have using Windows 8.1 for months now, and damn what a buggy explorer!
You probably heard about djvu format about 5 years ago. But it kinda failed to gain momentum against the venerable PDF. Despite its rather simple (or rather "sufficient") feature, it meant to be less malice format than PDF.
Djvudigital has been around with no cmd batch port... its Ghostscript driver also a bit controversial (for linux zealot). But in this windows world anything is good as long as it apparently works. duh
This batch of djvudigital is converted into exe for convenience and will act just like batch file (aka it *need* cmd) see "BAT to EXE" page for info. It not complete port of bash version but parameters is the same as linux one, note: it doesn't do sanity check of the Ghostscript executable whether it support djvu or not.
Requirements:
- Ghostscript with djvusep device
- Csepdjvu (part of djvulibre)
- and off course working CMD
- optionally gzip or 7za/7z
Installation:
Put all executables above in the same directory as djvudigital, you may put it in your PATH environment so you can called it anywhere. Type djvudigital --help for manual
Download:
djvudigital.exe 83 Kb
I've been using/abusing online-convert.com for a while for on-the-fly 'downversion' stuff from the internet, but no djvu there.
You probably heard about djvu format about 5 years ago. But it kinda failed to gain momentum against the venerable PDF. Despite its rather simple (or rather "sufficient") feature, it meant to be less malice format than PDF.
Djvudigital has been around with no cmd batch port... its Ghostscript driver also a bit controversial (for linux zealot). But in this windows world anything is good as long as it apparently works. duh
This batch of djvudigital is converted into exe for convenience and will act just like batch file (aka it *need* cmd) see "BAT to EXE" page for info. It not complete port of bash version but parameters is the same as linux one, note: it doesn't do sanity check of the Ghostscript executable whether it support djvu or not.
Requirements:
- Ghostscript with djvusep device
- Csepdjvu (part of djvulibre)
- and off course working CMD
- optionally gzip or 7za/7z
Installation:
Put all executables above in the same directory as djvudigital, you may put it in your PATH environment so you can called it anywhere. Type djvudigital --help for manual
Download:
djvudigital.exe 83 Kb
I've been using/abusing online-convert.com for a while for on-the-fly 'downversion' stuff from the internet, but no djvu there.
To make convertion I needed the following files:
ReplyDeletedjvudigital.exe
gswin32c.exe
csepdjvu.exe (DjVuLibre)
libdjvulibre.dll (DjVuLibre)
libjpeg.dll (DjVuLibre)
msvcp100.dll (DjVuLibre)
msvcr100.dll (DjVuLibre)
I got nice looking DjVu file but without the text layer? I do not know where is a fault.
Best regards, Andy
do you use --words or --lines option? in case still not working you might want to try my version of ghostscript and djvulibre from http://opensourcepack.blogspot.co.id/p/converter.html
DeleteI used option –words/-lines and your version of ghostscript. I tried also your version of csepdjvu. I do not receive any warning and do not get a text layer. With “Pdf To Djvu GUI” all works fine but they say that djvudigital should be faster.
ReplyDeleteI test it again and it works fine (text selectable), could you give your pdf test case?
DeleteNote that djvudigital isn't that important, work are done by djvusep in ghostscript. Fast or slow depends on Ghostscript
I believe you and I understand, but "Pdf To Djvu GUI" does not use Ghostscript: https://en.wikisource.org/wiki/Help:DjVu_files
ReplyDeleteI have send on your email a folder with all files I use.
I see, could it be that Pdf To Djvu GUI have ocr engine? I haven't tried it myself.
DeleteNo. There is no OCR process. It uses text layer of pdf file but without Ghostscript. Pdf To Djvu Gui is based on pdf2djvu: http://jwilk.net/software/pdf2djvu
ReplyDeleteThis is very good thing.
I just receive the file, so it's a semi ocr'ed pdf file and djvusep don't know how to handle existing text layer (djvusep only support true text). You might want to report this issue upstream. indeed it seems important issue as Ghostscript itself able to preserve text layer in "pdf to pdf" conversion.
DeleteI did a comparison test and I see that quality of images in pdf2djvu and djvudigital are the same. The latter is really faster but in my case, it never created a text layer, while pdf2djvu never had a problem with that. Speed is not so important for me, because I do not convert many files. I was just curious.
ReplyDeleteThanks for your help and best regards, Andy
I made bugreport here https://sourceforge.net/p/djvu/bugs/263/ as you can see now djvudigital can preserve text overlay
ReplyDeleteHi,TumaGonx Zakkum. Thank you so much for your windows version of djvudigital. It works great except that I also encountered problem with missing text in the output djvu. Here is the sample pdf. https://drive.google.com/file/d/0By4WA9GK6t9DOWhrVnBZSm10TEk/view?usp=sharing
DeleteSorry, haven't test your file but just to let you know that this file here have NOT updated, you should download from the bugreport link above (https://sourceforge.net/p/djvu/bugs/_discuss/thread/18627b82/6a9f/attachment/djvudigital.exe)
DeleteThank you for reply. Yeah, I also tried that updated version. But still not working.
DeleteI test your and got this error:
Delete*** Syntax error in text data: missing parenthesis,
near 'the")'
*** (djvused.cpp:380)
*** 'void verror(const char*, ...)'
seems got to do with parser. Do you get the error? How about other files?
Well, I got something like "GPL Ghostscript 9.16: Warning: 'loca' length 1676 is greater than numGlyphs 1674 in the font RTFHRY+TimesNewRomanPSMT." I never succeed in getting a text layer except those text pdf. For example, this is another one "https://drive.google.com/file/d/0By4WA9GK6t9DTjBaMHdRTVlfUGM/view?usp=sharing", there is no error, however, there is still no text layer
DeleteTry to put other tools (gswin32c, djvused, csepdjvu, pdftotext, gawk, gzip, and xml2dsed.awk) together with djvudigital.exe and try again, there should error message at the very last lines.
DeleteThank you for reply. But I cannot find xml2dsed.awk. So you also can't get a text layer from my pdf files, right?
Deletepdftotext able to get the text, but gawk incorrectly parse one of "the" word (see error message above) so djvused complain about it.
DeleteOK, I already contained pdftotext and gawk. Cannot reproduce your message. Anyway, have you succeeded in extracting text from my second pdf file https://drive.google.com/file/d/0By4WA9GK6t9DTjBaMHdRTVlfUGM/view?usp=sharing Thank you very much
DeleteOh the xml2dsed.awk file is inside djvudigital.exe (it's self extracting 7zip archive). The last pdf works fine here. FYI the command is
Deletedjvudigital --poppler=text "Science1972_anderson_More Is Different.pdf"
Oh, never thought djvudigital.exe is a self extracting file. I didn't use --poppler=text previously. However, now I use this option, still not text. But the CMD output is strange in the sense that it outputs the help info of pdftotext at last. See the screen capture here http://pasteboard.co/1Ijb4hbP.png
Delete@balabi.rss Ah I think we talk about slightly different pdftotext the one I use is by Poppler project, the one you use is the precursor, by x-pdf. You can download mine here http://sourceforge.net/projects/tumagcc/files/converters/poppler.exe/download and rename it to pdftotext.exe
ReplyDeleteFinally it works! I am really appreciate your patience : )
DeleteThis is a great advance for Windows users who may not know that up until your work only the linux version of DjVuLibre includes the ability to build and use the sophisticated and fast pdf-to-djvu converter djvudigital. Is this windows executable actually free to distribute? In linux, owing to a conflict between the CPL and the GPL, one is allowed to build and use, but not distribute, the executable.
ReplyDeleteWhoops... well the "correct" way to use it is to download gsdjvu-1.9 source, build it yourself (including all thirs party dependencies) and use the included contrib/djvudigital.bat and xml2dsed.awk
DeleteThanks for making things clearer. A good and useful project.
Delete