24 June 2015

Windows version of djvudigital

Long time no write.. didn't meant I'm not around. This post is yet another prove that I'm a faithful Windows user :-). Admittedly I have using Windows 8.1 for months now, and damn what a buggy explorer!

You probably heard about djvu format about 5 years ago. But it kinda failed to gain momentum against the venerable PDF. Despite its rather simple (or rather "sufficient") feature, it meant to be less malice format than PDF.

Djvudigital has been around with no cmd batch port... its Ghostscript driver also a bit controversial (for linux zealot). But in this windows world anything is good as long as it apparently works. duh

This batch of djvudigital is converted into exe for convenience and will act just like batch file (aka it *need* cmd) see "BAT to EXE" page for info. It not complete port of bash version but parameters is the same as linux one, note:  it doesn't do sanity check of the Ghostscript executable whether it support djvu or not.

Requirements:
- Ghostscript with djvusep device
- Csepdjvu (part of djvulibre)
- and off course working CMD
- optionally gzip or 7za/7z

Installation:
Put all executables above in the same directory as djvudigital, you may put it in your PATH environment so you can called it anywhere. Type djvudigital --help for manual

Download:
djvudigital.exe 83 Kb

I've been using/abusing online-convert.com for a while for on-the-fly 'downversion' stuff from the internet, but no djvu there.

26 comments:

  1. To make convertion I needed the following files:

    djvudigital.exe
    gswin32c.exe
    csepdjvu.exe (DjVuLibre)
    libdjvulibre.dll (DjVuLibre)
    libjpeg.dll (DjVuLibre)
    msvcp100.dll (DjVuLibre)
    msvcr100.dll (DjVuLibre)

    I got nice looking DjVu file but without the text layer? I do not know where is a fault.

    Best regards, Andy

    ReplyDelete
    Replies
    1. do you use --words or --lines option? in case still not working you might want to try my version of ghostscript and djvulibre from http://opensourcepack.blogspot.co.id/p/converter.html

      Delete
  2. I used option –words/-lines and your version of ghostscript. I tried also your version of csepdjvu. I do not receive any warning and do not get a text layer. With “Pdf To Djvu GUI” all works fine but they say that djvudigital should be faster.

    ReplyDelete
    Replies
    1. I test it again and it works fine (text selectable), could you give your pdf test case?
      Note that djvudigital isn't that important, work are done by djvusep in ghostscript. Fast or slow depends on Ghostscript

      Delete
  3. I believe you and I understand, but "Pdf To Djvu GUI" does not use Ghostscript: https://en.wikisource.org/wiki/Help:DjVu_files
    I have send on your email a folder with all files I use.

    ReplyDelete
    Replies
    1. I see, could it be that Pdf To Djvu GUI have ocr engine? I haven't tried it myself.

      Delete
  4. No. There is no OCR process. It uses text layer of pdf file but without Ghostscript. Pdf To Djvu Gui is based on pdf2djvu: http://jwilk.net/software/pdf2djvu
    This is very good thing.

    ReplyDelete
    Replies
    1. I just receive the file, so it's a semi ocr'ed pdf file and djvusep don't know how to handle existing text layer (djvusep only support true text). You might want to report this issue upstream. indeed it seems important issue as Ghostscript itself able to preserve text layer in "pdf to pdf" conversion.

      Delete
  5. I did a comparison test and I see that quality of images in pdf2djvu and djvudigital are the same. The latter is really faster but in my case, it never created a text layer, while pdf2djvu never had a problem with that. Speed is not so important for me, because I do not convert many files. I was just curious.
    Thanks for your help and best regards, Andy

    ReplyDelete
  6. I made bugreport here https://sourceforge.net/p/djvu/bugs/263/ as you can see now djvudigital can preserve text overlay

    ReplyDelete
    Replies
    1. Hi,TumaGonx Zakkum. Thank you so much for your windows version of djvudigital. It works great except that I also encountered problem with missing text in the output djvu. Here is the sample pdf. https://drive.google.com/file/d/0By4WA9GK6t9DOWhrVnBZSm10TEk/view?usp=sharing

      Delete
    2. Sorry, haven't test your file but just to let you know that this file here have NOT updated, you should download from the bugreport link above (https://sourceforge.net/p/djvu/bugs/_discuss/thread/18627b82/6a9f/attachment/djvudigital.exe)

      Delete
    3. Thank you for reply. Yeah, I also tried that updated version. But still not working.

      Delete
    4. I test your and got this error:
      *** Syntax error in text data: missing parenthesis,
      near 'the")'
      *** (djvused.cpp:380)
      *** 'void verror(const char*, ...)'

      seems got to do with parser. Do you get the error? How about other files?

      Delete
    5. Well, I got something like "GPL Ghostscript 9.16: Warning: 'loca' length 1676 is greater than numGlyphs 1674 in the font RTFHRY+TimesNewRomanPSMT." I never succeed in getting a text layer except those text pdf. For example, this is another one "https://drive.google.com/file/d/0By4WA9GK6t9DTjBaMHdRTVlfUGM/view?usp=sharing", there is no error, however, there is still no text layer

      Delete
    6. Try to put other tools (gswin32c, djvused, csepdjvu, pdftotext, gawk, gzip, and xml2dsed.awk) together with djvudigital.exe and try again, there should error message at the very last lines.

      Delete
    7. Thank you for reply. But I cannot find xml2dsed.awk. So you also can't get a text layer from my pdf files, right?

      Delete
    8. pdftotext able to get the text, but gawk incorrectly parse one of "the" word (see error message above) so djvused complain about it.

      Delete
    9. OK, I already contained pdftotext and gawk. Cannot reproduce your message. Anyway, have you succeeded in extracting text from my second pdf file https://drive.google.com/file/d/0By4WA9GK6t9DTjBaMHdRTVlfUGM/view?usp=sharing Thank you very much

      Delete
    10. Oh the xml2dsed.awk file is inside djvudigital.exe (it's self extracting 7zip archive). The last pdf works fine here. FYI the command is
      djvudigital --poppler=text "Science1972_anderson_More Is Different.pdf"

      Delete
    11. Oh, never thought djvudigital.exe is a self extracting file. I didn't use --poppler=text previously. However, now I use this option, still not text. But the CMD output is strange in the sense that it outputs the help info of pdftotext at last. See the screen capture here http://pasteboard.co/1Ijb4hbP.png

      Delete
  7. @balabi.rss Ah I think we talk about slightly different pdftotext the one I use is by Poppler project, the one you use is the precursor, by x-pdf. You can download mine here http://sourceforge.net/projects/tumagcc/files/converters/poppler.exe/download and rename it to pdftotext.exe

    ReplyDelete
    Replies
    1. Finally it works! I am really appreciate your patience : )

      Delete
  8. This is a great advance for Windows users who may not know that up until your work only the linux version of DjVuLibre includes the ability to build and use the sophisticated and fast pdf-to-djvu converter djvudigital. Is this windows executable actually free to distribute? In linux, owing to a conflict between the CPL and the GPL, one is allowed to build and use, but not distribute, the executable.

    ReplyDelete
    Replies
    1. Whoops... well the "correct" way to use it is to download gsdjvu-1.9 source, build it yourself (including all thirs party dependencies) and use the included contrib/djvudigital.bat and xml2dsed.awk

      Delete
    2. Thanks for making things clearer. A good and useful project.

      Delete