25 July 2010

Optimized FFMPEG with AoTuV

Newer binary ffmpeg 0.9 moved here

In my previous post I have demonstrate a wacky batch conversion from mp3 to ogg (vorbis). Unfortunately today aac is in charge to unset mp3. So this time let's set it once for all, Let's convert all kind of inferior lossy format to ogg vorbis! Alright.

This time I compile a special FFMPEG that use AoTuV instead of vanilla libvorbis. The change was dramatic, when encoding very low bitrate (32, 48 Kbps or quality -2,-1) AoTuV is up to 4x faster, higher quality and regular vorbis simply can't do that low bitrates. Thus with ogg emerged as one of prefered format for HTML5 AudioVideo, it will provide bandwidth-starved streaming ability in.

FFMPEG-SVN-r24538 w/ AoTuV 2.67MB (win32 binary with MT and proc. runtime detection)
VorbisGain (from rarewares)
FFMPEG docs and presets file

License: GPLv3

Put them (vorbisgain.exe & ffmpeg.exe) to a comfortable place like C:\Windows or anywhere you've defined in PATH variable. Use ffplay to test WebM video

A simple batch to convert know audio files (excluding lossless/flat format like flac, wav):
from command prompt under root folder where your music located, type:
for /r %f in (*.mp3 *.aac *.m4a *.vqf *.mp2 *.ac3 *.wma *.ra) do ffmpeg -loglevel quiet -i "%f" -acodec libvorbis -aq 0 -map_meta_data 0:0 -y "%~dpnf.ogg" && vorbisgain -q -s -f "%~dpnf.ogg" && del "%f"

Meaning: for every those kind of files in current folder and all subfolders, run ffmpeg silently to convert it into ogg vorbis at quality 0  also copy source's tag (metadata) into target file, after that run vorbisgain to apply replaygain on encoded file then if everything is fine delete source file.

Or you can do it over video clip files too:
for /r %f in (*.avi *.wmv *rm *.asf *.mov *.mpg) do ffmpeg -loglevel quiet -i "%f" -vn -acodec libvorbis -aq 0 -map_meta_data 0:0 -y "%~dpnf.ogg" && vorbisgain -q -s -f "%~dpnf.ogg"

Output will have tag fully transfered and replaygain enabled. -aq 0 is roughly equal to 64kbps (-ab 64k)

Notes for audiophile people:
- I'm aware that vorbisgain have somewhat lower than expected value (loudness) but IMO it's more likely a safe bet.
- I've no HQ ears nor sound system so IMH ears those quality between SOX and FFMPEG is not too dissimilar.

Notes for existing ffmpeg user:
- This build is roughly the same as official autobuild one except I use vorbiscommet patch and AoTuV instead vanilla libvorbis else it also bundling latest encoders: x264, Theora, Lame, WebM and so on. see the msg for detail

20 July 2010

Stardict edisi KBBI

Beberapa bulan lalu seseorang bernama Steven Haryanto mengunduh seluruh halaman2 web dari http://pusatbahasa.diknas.go.id/kbbi/ memakai perl (regex?) untuk dikonversi menjadi format IFO (Stardict). Nah aku jadi tertarik unutk membuatkan installer Stardict yang khusus Inggris<->Indonesia; isinya antara lain: Kamus kosakata+sinonim Inggris (US) WordNet, Kamus Indonesia<->Inggris MasNDon dan Kamus Besar Bahasa Indonesia versi online itu sendiri.

Aspek legalitas: Menurutku versi IFO tersebut tidak bermasalah toh pada situsnya memang diperuntukkan untuk semua orang yang penting namanya tetap sama yakni Kamus Besar Bahasa Indonesia, Edisi III (2005). Hanya saja tidak setiap orang kan punya internet dan lagi bukunya mahal, jadi ide mas Steven ini patut dikasih 4 jempol!

Stardict adalah program kamus open source yang dibuat oleh Hu Zheng

Installer: Stardict 3.0.1 edisi KBBI untuk Windows (12.9 MB)

19 July 2010

Thoughts about wikipedia xml dump

They are the BIGGEST single plain text file that I've ever seen in my life! and I'm talking about the 27GB english version.
So they choose bzip2 why? OK they recently move to 7z anyway (but I still got the bz2 one)... 7zip decompression is far faster than bzip2.

They provide a python library called mwlib to work with the dump. This is where I getting interested with. For my country Indonesia, I think this is a great asset for education. People here are mostly (even in Java island) are still alien to the internet. Worse, when recent videotape scandal of local artist boasted over the media, most people set a negative feeling about the internet. Even our minister are outspoken about his plan to censor the internet! LOL talk about China. Anyway with a little mindset changes, for example: instead of forcing to buy secondary bike which mostly for show off, people should better invest on computer even the cheapest one they can get. At least it doesn't eat gas. With the idea of making portable wiki in flashdrive like wikitaxi. People get access to one of the best knowledge source legally for free. Youngster can be forced to "read" more than foolishly chatting on facebook and playing oxymoron games.

17 July 2010

Downloading huge file under slow dial-up modem

Today I'm progressing 70% of wikipedia download and I've found a good setting for downloading this 6GB monster of science. http://download.wikimedia.org/enwiki/20100622/enwiki-20100622-pages-articles.xml.bz2

When a download stuck for a long time it seems a sign that a corrupted data has take place. At least that what happen during the first 800MB download using Free Download Manager (FDM). I switch to curl after the prolonged glitches didn't stopped, and continue the unfinished download. After patching 10 corrupted areas, I found later that curl didn't hampered by the same issue. And here is how...

Pimp your GTK application

Fact: Visually ugly application will get poor acceptance in Windows world (even with blessed functionality)

PyGTK/GTK applications in windows are commonly distributed with its own version of GTK runtime. Why? cause Windows' user hate bloated dependencies, even when official GTK runtime available for download it's still a bad excuse there even more than one source, the one at sourceforge and other one at gnome's ftp which add more confusion. Eventually every developer start abandoning shared GTK runtime (those that installed in Programs Files\Common Files\GTK or Program Files\GTK). In short, there is no strict convention to follow since GTK ported on Windows.

This situation should bring good chance for developer to pimp their apps indepedently since there is a gtkrc for each apps. The de facto theme is MS-Windows which took windows' native theme. But there is much more! visit http://gnome-look.org/ for user contributed theme or http://art.gnome.org/themes/ for official distribution, some major linux distros also often design their own theme too. If you want to make your own theme visit http://live.gnome.org/GnomeArt/Tutorials/GtkThemes

16 July 2010

Two most valuable and gigantic download for FREE

1st is wikipedia monthly database dump currently sized 6GB and growing (I still downloading and fighting corrupted data) : http://download.wikimedia.org/enwiki/

2nd is about my hobby -observing maps- of Blue Marble NG satellite imagery from NASA Observatory. currently I have the level 4 which I got it 3 years ago by secretly implanted emule on public cybercafe :). Now (a year ago) NASA publish 500m/pixel which each month globe image sized more than 3GB (totaling almost 40GB). Of course it's not as detailed as google earth not even google maps. But you can get it offline for free! no fuss no compromise! : http://mirrors.arsc.edu/nasa/ or http://worldwind28.arc.nasa.gov/public/

To open very high resolution png/jpg, I use nip2 (other than WorldWind itself)

14 July 2010

Blender 2.4 Documentation in CHM

This is my first attempt to create CHM format of Blender's Wiki pages. I use HTTrack with some deliberate scan rules (filter) to only include Manual, Reference, Books and Tutorials in english. Later I use some simple regex to trim and strip unnecessary part.

However I want to keep the navigation feature to aid the poor bookmark so I need several regexes to be applied. Once completed, I start to realize that it was too big, the images alone consume 180MB.
I've managed to optimize images file to almost half of original. Now it's 100MB CHM (with images)
PNG: First, files > 50KB converted to 8bit depth, then all PNG optimized with optipng
JPG: Optimized with jpegoptim 75%
GIF: Optimized with gifsicle

Here is how I fetch it from blender's wiki using HTTrack
At first, I don't know how the sitemap looks like so I start it without filter and downloading only html pages. After some hours HTTrack finished with very messy result. I browse it quite a while to find out which important, which should be included etc. And here is the scan rules for English documentation:

+*.png +*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar -*title=* -*BlenderWiki:* -*Category:* -*Org:* -*Meta:* -*Talk:* -*User:* -*-Flag-* -*Special:* -*File:* -*:2.5/* -*action=* -*section=* -*stylish.swf* -*Attic:* -*Robotics:* -*Dev:* -*index.php/BlenderDev* -*Template:* -*/invalid_files/* -*Help:* -*index.php/AR/* -*:AR/* -*index.php/BG/* -*:BG/* -*index.php/BR/* -*:BR/* -*index.php/CA/* -*:CA/* -*index.php/CZ/* -*:CZ/* -*index.php/DE/* -*:DE/* -*index.php/DK/* -*:DK/* -*index.php/EL/* -*:EL/* -*index.php/ES/* -*:ES/* -*index.php/FA/* -*:FA/* -*index.php/FI/* -*:FI/* -*index.php/FR/* -*:FR/* -*index.php/ID/* -*:ID/* -*index.php/IT/* -*:IT/* -*index.php/KO/* -*:KO/* -*index.php/MN/* -*:MN/* -*index.php/NL/* -*:NL/* -*index.php/PL/* -*:PL/* -*index.php/PT/* -*:PT/* -*index.php/RO/* -*:RO/* -*index.php/RU/* -*:RU/* -*index.php/SV/* -*:SV/* -*index.php/TH/* -*:TH/* -*index.php/TR/* -*:TR/* -*index.php/ZH/* -*:ZH/*

The language filter could be much simpler if HTTrack support regex...
After refetch the wiki for second time, I have relatively clean dump.

Here is the files: (dedicated to all users who CAN'T afford internet connection, somebody please mirror but only with direct links not rapidshit, ziddumb or other craps)


Blender249Man.chm Contain manual in English, 30MB (This is the main file and linked to 4 files below)
Blender249Books.chm Books that converted to html, 32MB
Blender249Tuts.chm Tutorials and Theory, 31MB
Blender249API.chm Python API and Game Engine API for Blender 2.49, 1MB
Blender249Ref.chm Contain Reference, FAQ and Scripts Catalog, 6MB

To decompile use 7zip or run hh -decompile [chmfile]

Blender249Doc.chm (july 13, 2010) the whole text only version and very basic, 6.8MB