csv-blobs improvements: windows binary, more speed, dramatically less memory
Connected component labeling is a relatively simple form of computer vision. The goal is to identify contiguous regions of pixels and note their outline, size, number or position.
In the simplest sense, how many blobs are in the image?
There is plenty of CCL software existing, see below for a list. But none of them filled my need for a small, standalone, fast and free library/utility. First and foremost, QuickBlobs was made to meet all those goals.
4k test image, with white and black areas marked
So how does QuickBlobs stack up? It is written in C and is extremely light weight, compiling to less than 5KB as a library or 15KB as an executable. The worst case memory use of the algorithm is O(sqrt(n)) and a 10 thousand pixel wide image needs (in theory) 600KB. QuickBlob is naturally fairly quick. My low end 1.6GHz laptop can do a 4k by 4k image in just 0.4 seconds. Benchmarks suggest that a 640x480 x 30fps video stream with around 100 blobs in each frame could be processed in real time on a single 1GHz ARM core. Finally it is licensed LGPL, so you can use it freely in your applications.
How is it so fast? It uses a single pass algorithm based on sorted quadruply linked lists. Most other algorithms work one pixel at a time, while QuickBlob operates on run-length-encoded line segments. It only holds at most a single RLE-ed row of data in memory at any given moment. This particular implementation allocates the maximum amount of memory possible from the start for best cache spatial locality.
Building the QuickBlob library should need almost nothing. It is very close to not even needing a libc. To use it as a library, you'll need to define a few stub functions for loading images and logging results. Take a look at
quickblob.h for a summary.
QuickBlob comes with a small example utility,
csv-blobs. This takes an image as input and produces a comma separated list of every blob found. The output includes the size of blob, the color, the centroid and the bounding box. The utility needs the LibPNG library for image loading. For those of you on windows, here is a small pre-compiled and statically linked version: quickblob-mingw32.zip On Linux or OSX, just run
make to build the library and utility.
For most common workloads, preprocessing with Imagemagick and the
csv-blobs utility should meet your needs. For more demanding uses link the library into your code. If you need help building QuickBlob for your platform or integrating it into your application or even just have a feature request, feel free to email me.
After some profiling, the best-case input (a blank page) runs twice as fast and the worst-case input (100% static) runs 25 times faster. There still remains room for improvement. It is single threaded but parts of the algorithm could be run in parallel. If you only need to track blobs of a single color it could be made several times faster and lighter.
make libc optional
Dotcount proprietary, windows only, non-commericial
RoboRelm proprietary, windows only, $150
Insight Tool Kit Apache licensed, very large (550 pages of docs)
cvBlobsLib BSD, based on OpenCV
cvBlob LGPL, C++ library, based on OpenCV
Blobscanner GPLv3, Processing Language
Blob Tool Lisp based image processing toolkit
Imagemagick is not too bad, see the Benchmarks for more about it.
Plugins for Matlab, ImageJ and other similar large math/image suites. Often these plugins do not work with recent versions.
Of those other programs, Imagemagick is by far the easiest to try out. We'll be using the
lorem.png image as a test case. The image is 4000 pixels on a side and has around 500 blobs in it. Tests show the following:
csv-blobs -- 0.38 seconds, 2.36MB ram
imagemagick -- 25 seconds, 695MB ram
Imagemagic is 65x slower and uses 290x more memory. But this was on a 64 bit platform. On a 32 bit platform,
csv-blobs uses only 1.1MB to process a 4k image. Imagemagick OOM-ed on that test, so I don't think it sees a similar benefit.
Imagemagick also tends to choke up easily. The
lorem.png image is antialiased but those benchmarks convert it to a 2-level bitmap first. What if we force both programs to use the full 256 level grayscale? This bumps the number of blobs up by 280x, to more than 140 thousand. Csv-blobs handles this without complaint. Memory use stays the same but runtime takes 5x as long. Imagemagick completely breaks, with a "too many objects" error.