Debugging TensorFlow ImportError: DLL load failed Exception

I have encountered this error at least twice on two different machines and have spent too much time tracking down all different reasons it can occur.

import tensorflow
Traceback (most recent call last):
File "C:\...\site-packages\tensorflow\python\", line 18, in swig_import_helper
return importlib.import_module(mname)
File "C:\...\", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 986, in _gcd_import
File "", line 969, in _find_and_load
File "", line 958, in _find_and_load_unlocked
File "", line 666, in _load_unlocked
File "", line 577, in module_from_spec
File "", line 906, in create_module
File "", line 222, in _call_with_frames_removed
ImportError: DLL load failed: The specified module could not be found.

So anyone facing this issue (especially with TensorFlow 1.4.0) here is how to debug this:

  1. First make sure you have correct versions of CUDA Toolkit and cuDNN. NVidia has newer versions as default downloads and they won't work. See my post on TensorFlow installation.
  2. I would highly recommend using Python 3.5 version instead of 3.6 with TensorFlow 1.4. If you have latest Anaconda version, you probably have Python 3.6. You can check this by using command conda info. If you indeed have Python 3.6 then you can downgrade to 3.5 by using command conda install python=3.5.
  3. Make sure you have NVidia's CUDA Toolkit path as well as cuDNN path - both - listed before Anaconda path in Environment variable Path. Anaconda now seem to supply same DLLs in its own folder but they seem to cause ImportError.
  4. Use where command to actually see if you can find these DLLs on path:
                 where cuDNN64_6.dll
                 where curand64_80.dll

    The first path should be where you downloaded cuDNN 6 and the second path should be in NVidia's CUDA Toolkit folder.

  5. If you still get this error, download Process Monitor from sysinternals.  You will see icons to monitor registry, disk etc in toolbar. Disable those except for icon that says "Show Process and Thread Activities". Then click on filter icon and add a filter for ImagePath contains python. Now you should see only process and thread activities from python.exe. Close all python instances, open a new one and execute
    import tensorflow as tf

    . Now Process Monitor will show you DLLs being loaded by TensorFlow. The last DLL in this list is usually is the one causing problem.

Installing TensorFlow GPU Version on Windows

TensorFlow 1.4 installation on Windows is still not as straightforward so here are quick steps:

  1. Install Anaconda. Grab the version that has Python 3.6. After installation, you will need to downgrade to Python 3.5 as quite a few libraries like OpenCV still aren't compatible with Python 3.6 and also TensorFlow does seem to have few issues with 3.6. So after installing Anaconda run following command to downgrade:
    conda install python=3.5

    . Another alternative is to create virtual environment but I want to keep this post short.

  2. Next, install NVidia CUDA Toolkit 8.0. This is currently not the version so you can only find in archives and you might have to register on NVidia website first. You will see Base Installer and Patch 2. Install both in one after another. Note that you must do this after step 1. If you did this before then go to Environment Path editor and make sure NVidia Cuda Toolkit's path is listed before Anaconda path. Currently Anaconda also supplies CUDA 8.0 DLLs and they don't seem to be compatible causing DLL ImportError.
  3. Next, Download cuDNN v6.0 (April 27, 2017), for CUDA 8.0. Extract the zip file to some folder and add path path cuDNN6\cuda\bin in to your machine's Environment Path. Make sure this path comes before Anaconda's path in your Environment Path variable. TensorFlow 1.4.0 looks for cuDNN64_6.dll and this folder has this DLL. Again this step should be done after step 1 otherwise Anaconda's cuDNN DLL will be found first on path and you will get DLL ImportError.
  4. Open Command Prompt as Administrator and install TensorFlow using command:
    pip install --ignore-installed --upgrade tensorflow-gpu

    . Sometime this command might fail. In which case, try to update pip (if you see warning) and run this command again.

  5. Validate your installation.

If you are getting ImportError for DLL load then see my post on how to debug it.

Git Workflow: Branch - Rebase - Squash - Merge

So you want to make a change to your git repo while other people may also be simultaneously working on the same repo. As it takes you longer to make your changes, there is a greater chance that your local repo might already be out of date as other people have pushed their changes. In this setting, you don't want to make your changes directly in master because otherwise you might end up creating large merge commits which makes your repo's history convoluted and not very nice.

Here's the better git workflow you might want to use in any team of size > 1.

Before you make changes, create a branch.

git checkout -b MyFeature

Next make changes, do commits as usual.

If you don't want to rely on your hard drive, you can also keep pushing the changes in your branch on the server every once in a while,

git push -u origin MyFeature

Once you are done with all your changes, first you want to rebase your branch to master. If master has no new changes since you had created your branch, this will be essentially be no-op. But otherwise, git will take all your commits and play them back on the top of master. This way your commits will look like as if they happened on latest version of master instead of the version you branched out from. This will make commit history of your repo clean and easy to reason about. If you were the only developer, this might not be very important but if there is more than just you then it makes easy to see for other people changes every one is making.

To do rebase, first get latest master.

git checkout master
git pull origin master

Then go back to your branch and rebase, i.e.,

git checkout Myfeature
git rebase master

If you are lucky, you won't see the word "conflict" in git messages but otherwise there is more work for you! If someone already changed file sections you have also changed then you might see list of conflicts. If you get lost in too many messages, use this command to see pending conflicts:

git diff --name-only --diff-filter=U

Now about resolving conflicts... there are lots of tools out there and most unfortunately have some problem/confusion installing or using. If you absolutely want GUI tool, install DiffMerge, make sure its in your path and invoke it like,

git mergetool -t diffmerge .

However my preferred method is to simply open up conflicted file in editor, search for ">>>" and review sections that looks like:

<<<<<<< HEAD
This is change in master
This is change in your branch
>>>>>>> branch-a

Now keep the change you want, delete the markers and you are done with that conflict. Another shortcut is to just tell git to take master's version ("ours") or your branch's version("theirs"). For example, to resolve all conflicts by overriding using your changes:

git checkout . --theirs

Another tricky conflict is when file gets deleted by one person and simultaneously changed by you or vice versa. In this case, git will put a deleted file back in your repo and you have to decide either keep that one and/or remove/update your version. You won't have markers this time like above. I tend to use tool like Beyond Compare to compare two files and make edits as needed.

To tell git that you have resolved all conflict,

git add .

Now you can continue with your rebase,

git rebase --continue

If you don't want to continue because of whatever reason,

git rebase --abort

Sometime git might error out while doing continue because there is nothing to commit (may be it detected that the change already exists upstream). In that case you can do,

git rebase --skip

At this point, your changes are now on the top of latest master. You can verify this by looking at quick history of latest 10 commits,

git log --pretty=oneline -n 10

Note that everything still reside in your own branch. If you are not yet ready to push to master, keep working in your branch doing more commits as you go. After rebase if you want to save your branch on server, you must do --force because you are rewriting history.

git push --force origin Myfeature

This is perfectly fine as long as you are the only one working on the branch.

Once you are ready to push, first merge your branch with master,

git checkout master
git merge --squash MyFeature

This shouldn't give any errors or conflict messages because your branch was already synced up to latest master. The --squash tells git to combine all your commit in to single commit. This is good idea most of the time if you have done lots of commits like "added forgotten file", "fixed minor typo" and so on. It's too much noise and not nice to other people for having to scroll through tons of minor commits to figure out your higher level goals. However its also ok if you don't want --squash option.

Finally do the commit after the merge,

git commit -m "MyFeature does X"

If you did --squash above then you will see only one commit in your history at the top of previous commits with above message.

At this point, you can decide to push your changes to master OR move your changes to new branch and keep working. To move to new branch and revert master to original state,

git checkout -b MyFeature2
git checkout master
git reset --hard origin/master

OR if you are happy, go ahead and

git push

In either case you can delete the old branch,

git push origin -delete MyFeature
git branch -d MyFeature

And you are done!

As usual, there are many ways to do things in git. There is another quicker and simpler way to achieve goal of clean history but its bit limited.

Make your changes in master, do commits as usual - but don't push. Once in a while you want to sync up with master. To do this use,

git pull --rebase

This will get all changes from master and then play back your unpushed commits on the top of them. This may generate conflicts as described above so resolve them in same way. Once you are done with your changes, you can push your commits and they should appear on the top without extra merge commits. An obvious problem here is that you can't push until you are really done with changes so this might be ok for quick short changes. If you want to "save" your commits on server or work from multiple machines for multiple days without pushing to master then above workflow would work better.

Playing with HoloLens

Few days ago I got chance to play with HoloLens. Here's the clip I shot from HoloLens in my bedroom during first few minutes with the device. The video posted directly from the HoloLens to YouTube, no postprocessing, and it's very close to what I was seeing:

Few things that surprised me was how easy it was to just put on the HoloLens and start using it. No calibrations or other complicated settings required! My 3 year old and mother-in-law could take turns without modifying any settings. You can operate the whole device pretty much by using just two gestures and it took about 30 seconds to learn them. Given that how often gesturing fails, I was actually more impressed by its accuracy than almost every other feature. I can even type on virtual keyboard, although it does get tiring.

I put HoloLens through the battery of tests, some in low light environments like in above video, others around untextured walls and still more in outdoors with relatively large areas and bunch of trees. In all cases, virtual objects maintained their poses in real world and HoloLens knew its own pose fairly accurately. I can put an object in one of the bushes around my home, quickly look away, go around the corner and come back from totally different angle and it's there just like I'd put it! Beyond this simplistic tests to see how well SLAM and feature matching algos worked, I also tried out apps like HoloTours and shootout game. In HoloTours, you can see places in Rome and Peru rendered around you in 360-degree! It absolutely impressed me that even though glass is transparent, I can't actually see objects in real world when these 360-degree video was rendered. This is like having VR instead of AR and the black actually looked fairly black. However this is true only for reasonably lit environments. If you have a bright lights in the background, you can "see through" the virtual environment and "VR mode" is not convincing any more. In this VR like environment, I thought resolution was fairly good. It's not as if you are watching HD but still it felt better than SD and not falling short for the VR-like experience.

Things can get pretty real though. I put a roaring dragon spewing fire on the floor and gave HoloLens to my 3 year old to view it. He took one look and said he didn't wanted to see that scary thing again! This was absolutely mesmerizing and addictive. I also checked out galaxy and solar system apps that you often see in Magic Leap demo videos except that you can actually try out in HoloLens right now! I can totally see how this can become indispensable educational tool with well made apps.

So the question in everyone's mind: what about field of view? At first I noticed it quite a bit but then got used it. The bottom line is that the whole experience is so magical that you will forget about it pretty soon and your brain seems to adjust to it. It's like when first TVs came out and they were small and black-and-white and low resolution but hey, you are seeing moving images in your home and people still can get immersed in it! In fact, the more bothering thing for me was the color bleeding when you move your head around. I'm suspecting if that was the reason I often started to feel bit of headaches after 30 minutes of use. Still having completely untethered device that can do such heavy computational lifting on single battery charge is just totally amazing.

Writing Generic Container Functions in C++11

Let's say we want to write function to append one vector to another. It can be done like,

template<typename T>
void append(Vector<T>& to, const Vector<T>& from)
    to.insert(to.end(), from.begin(), from.end());

One problem with this approach is that we can only use this function with Vector. So what about all other container types? In languages such as C#, we have IEnumerable that simplifies lot of things but with C++ templates are duck typed and it takes bit more to make above function generic for various container types. Another quick and dirty route is this:

template<typename Container>
void append(Container& to, const Container& from)
    to.insert(to.end(), from.begin(), from.end());

The problem with this approach is that any class with begin() and end() will now qualify for this call. In fact, just in case someone has class with these methods which actually isn't implemented as iterator, you can get some nasty surprises. A simple modification is to make sure we call begin() and end() from std namespace instead of the ones defined on class:

template<typename Container>
void append(Container& to, const Container& from)
    using std::begin;
    using std::end;
    to.insert(end(to), begin(from), end(from));

Sure, this is better but wouldn't it be nice if can we restrict the types passed on to this function to only those which strictly behaves like STL containers? Enter type traits! First, we need to define SFINAE type trait for containers. Fortunately Louis Delacroix who developed prettyprint library has already fine tuned this code extensively. Below is mostly his code with a my slight modification that allows to pass it through GCC strict mode compilation. This is lot of code so I would usually put this in a separate file, say, type_utils.hpp, so you can use it for many generic container methods:

#ifndef commn_utils_type_utils_hpp
#define commn_utils_type_utils_hpp

#include <type_traits>
#include <valarray>

namespace common_utils { namespace type_utils {
	//also see
    namespace detail
        // SFINAE type trait to detect whether T::const_iterator exists.

        struct sfinae_base
            using yes = char;
            using no  = yes[2];

        template <typename T>
        struct has_const_iterator : private sfinae_base
            template <typename C> static yes & test(typename C::const_iterator*);
            template <typename C> static no  & test(...);
            static const bool value = sizeof(test<T>(nullptr)) == sizeof(yes);
            using type =  T;

            void dummy(); //for GCC to supress -Wctor-dtor-privacy

        template <typename T>
        struct has_begin_end : private sfinae_base
            template <typename C>
            static yes & f(typename std::enable_if<
                std::is_same<decltype(static_cast<typename C::const_iterator(C::*)() const>(&C::begin)),
                             typename C::const_iterator(C::*)() const>::value>::type *);

            template <typename C> static no & f(...);

            template <typename C>
            static yes & g(typename std::enable_if<
                std::is_same<decltype(static_cast<typename C::const_iterator(C::*)() const>(&C::end)),
                             typename C::const_iterator(C::*)() const>::value, void>::type*);

            template <typename C> static no & g(...);

            static bool const beg_value = sizeof(f<T>(nullptr)) == sizeof(yes);
            static bool const end_value = sizeof(g<T>(nullptr)) == sizeof(yes);

            void dummy(); //for GCC to supress -Wctor-dtor-privacy

    }  // namespace detail

    // Basic is_container template; specialize to derive from std::true_type for all desired container types

    template <typename T>
    struct is_container : public std::integral_constant<bool,
                                                        detail::has_const_iterator<T>::value &&
                                                        detail::has_begin_end<T>::beg_value  &&
                                                        detail::has_begin_end<T>::end_value> { };

    template <typename T, std::size_t N>
    struct is_container<T[N]> : std::true_type { };

    template <std::size_t N>
    struct is_container<char[N]> : std::false_type { };

    template <typename T>
    struct is_container<std::valarray<T>> : std::true_type { };

    template <typename T1, typename T2>
    struct is_container<std::pair<T1, T2>> : std::true_type { };

    template <typename ...Args>
    struct is_container<std::tuple<Args...>> : std::true_type { };

}}	//namespace

Now you can use these traits to enforce what types gets accepted in to your generic function:

#include "type_utils.hpp"

template<typename Container>
static typename std::enable_if<type_utils::is_container<Container>::value, void>::type
append(Container& to, const Container& from)
    using std::begin;
    using std::end;
    to.insert(end(to), begin(from), end(from));

Much better!

How to use Windows network share from domain joined machine on Linux

I'm seeing lot of websites with bit outdated or incomplete instructions. So here are the full steps that works for Ubuntu 14 for mounting Windows network file share on Ubuntu through active directory domain account:

First you need to install cifs-utils. Check if you already have it:

dpkg -l cifs-utils

If not, just install it:

sudo apt-get install cifs-utils

You can mount Windows shares anywhere but /mnt is generally preferred. Another often used location is /media but in modern environments /mnt is more preferred for things that users mount manually while /media is more preferred for things that system would mount for you. Regardless, you should create a folder where the content of your share will appear. Run following command to do that:

Note: In this guide, replace ALL_CAPS words with values you want.

mkdir -p /mnt/FOLDER

Then run the mount command:

sudo mount -t cifs //SERVER/FOLDER /mnt/FOLDER -o username=USER,domain=DOMAIN,iocharset=utf8

Note that,

  1. We set iocharset to utf8. This is optional but better than default charset ISO 8859-1 that Linux uses for mount.
  2. Some websites uses filemod/dirmode to 777 (i.e. grant all permissions). This is usually not necessary.

That's it! If you need to unmount share then run the command,

sudo umount //SERVER/FOLDER

One problem is that, you will need to run mount command every time after you restart again. While there are ways to connect network shares at startup, they often involve storing your passwords and not recommended. So usually I would just add a line in my ~/.bash-aliases file like this:

alias mountshare='sudo mount -t cifs //SERVER/FOLDER /mnt/FOLDER -o username=USER,domain=DOMAIN,iocharset=utf8'

So next time when I need share, I just type mountshare on command line.

How to Enable and Use GCC Strict Mode Compilation

One of the great feature that many C++ programmers rarely use is GCC strict mode compilation. Enabling this lets compiler warn you about any potential issues that might often get unnoticed in build noise. Unfortunately there is little documentation, let alone quick tutorial on this subject so I thought to write this up.

First, let's clear this up: There is no official GCC mode called "strict". I just made that term up. Fortunately there are enough compiler options that you can rig up to create "strict" mode that is often available in many other languages.

To get the "strict" mode, I use following command line options for gcc/g++. Below are written in format consumable in CMakeList.txt but you can use same options from pretty much anywhere.

set(CMAKE_CXX_FLAGS "-std=c++11 -Wall -Wextra  -Wstrict-aliasing -pedantic -fmax-errors=5 -Werror -Wunreachable-code -Wcast-align -Wcast-qual -Wctor-dtor-privacy -Wdisabled-optimization -Wformat=2 -Winit-self -Wlogical-op -Wmissing-include-dirs -Wnoexcept -Wold-style-cast -Woverloaded-virtual -Wredundant-decls -Wshadow -Wsign-promo -Wstrict-null-sentinel -Wstrict-overflow=5 -Wswitch-default -Wundef -Wno-unused -Wno-variadic-macros -Wno-parentheses -fdiagnostics-show-option ${CMAKE_CXX_FLAGS}")

That's a looong list of compiler options so now I hope you can agree that we really mean "strict" business here :). In essence it enables extra warnings and makes all warnings as errors, points out coding issues that borderlines on pedantic and then on top of that enables some more warnings. Rest assured, above is not an overkill. You are going to thank compiler for taking care of these stuff as your code base becomes larger and more complex.

Unfortunately, road from here has lots of twist and turns. The first thing that might happen to you is that you will get tons of errors, most likely not from your own code but from the included headers that you don't own! Because of the way C++ works, other people's bad code in their included header becomes your liability. Except for Boost and standard library, I haven't found many packages that can get through strict mode compilation. Even for relatively nicely written packages such as ROS you will get tons of compiler errors and for badly written packages such as DJI SDK, forget about it. Right... So now what?

Here's the fix I have used with fair amount of success. First, declare these two macros in some common utility file you have in your project:

#define STRICT_MODE_OFF                                                                 \ 
    _Pragma("GCC diagnostic push")                                            \
    _Pragma("GCC diagnostic ignored \"-Wreturn-type\"")             \
    _Pragma("GCC diagnostic ignored \"-Wdelete-non-virtual-dtor\"") \
    _Pragma("GCC diagnostic ignored \"-Wunused-parameter\"")        \
    _Pragma("GCC diagnostic ignored \"-pedantic\"")                 \
    _Pragma("GCC diagnostic ignored \"-Wshadow\"")                  \
    _Pragma("GCC diagnostic ignored \"-Wold-style-cast\"")          \
    _Pragma("GCC diagnostic ignored \"-Wswitch-default\"")

/* Addition options that can be enabled 
    _Pragma("GCC diagnostic ignored \"-Wpedantic\"")                \
    _Pragma("GCC diagnostic ignored \"-Wformat=\"")                 \
    _Pragma("GCC diagnostic ignored \"-Werror\"")                   \
    _Pragma("GCC diagnostic ignored \"-Werror=\"")                  \
    _Pragma("GCC diagnostic ignored \"-Wunused-variable\"")         \
#define STRICT_MODE_ON                                                                  \
    _Pragma("GCC diagnostic pop")          

Here we have two macros, one tells GCC to turn off selected warnings before some chunk of code and second tells GCC to re-enable it. Why can't we just turn off all strict mode warnings at once? Because GCC currently doesn't have that option. You must list every individual warning :(. Above list is something I just put together while dealing with ROS and DJI SDK and is obviously incomplete. Your project might encounter more stuff in which case you will need to keep adding in to above list. Another issue you might encounter is that GCC currently doesn't support suppressing every possible warnings! Yes, a big oops there. One of them that I recently encountered in DJI SDK was this:

warning: ISO C99 requires rest arguments to be used

The only way out for me in this case was to modify DJI's source code and submit the issue to them so hopefully they will fix it in next release.

Once you have above macros, you can place them around problematic headers. For example,

#include <string>
#include <vector>

#include <ros/ros.h>
#include <actionlib/server/simple_action_server.h>
#include <dji_sdk/dji_drone.h>

#include "mystuff.hpp"

We are not out of the water yet because above trick will work only for some header files. The reason is that GCC sometime doesn't compile entire file as soon as it encounters #include statement. So it's pointless to put macros around those #include statements. Solving those issues requires some more work, and in some cases a lot more work. The trick I used was to create wrappers around things you use from bad headers such that only those wrappers needs to use #include <BadStuff.h> statements and rest of your code doesn't need those header. Then you can disable strict mode for the wrappers and rest of your code remains clean. To do this, you would need to implement pimpl pattern in your wrapper classes so that all objects in BadStuff.h are behind opaque member. Notice that #include <BadStuff.h> statements would be in your wrapper.cpp file, not wrapper.hpp file.

Even though this might require significant work in big project, it's often worth it because you are clearly separating interface and dependency for the external stuff. Your own code then remains free of #include <BadStuff.h>. This will enable you to do even more things like static code analysis just for your code. In either case, consider contributing to those project with bad stuff and make them pass strict compilation!

So as it happens, working strict mode requires buy off from C++ community. If everyone isn't doing it then it becomes hard for others. So, tell everyone and start using yourself today!

Downloading All of Hacker News Posts and Comments


There are two files that contains all stories and comments posted at Hacker News from its start in 2006 to May 29, 2014 (exact dates are below). This was downloaded using simple program available I wrote Hacker News Downloader by making REST API calls to HN's official APIs. The program used API parameters to paginate through created date of items to retrieve all posts and comments. The file contains entire sequence of JSON responses exactly as returned by API call in JSON array.


Contains all the stories posted on HN from Mon, 09 Oct 2006 18:21:51 GMT to Thu, 29 May 2014 08:25:40 GMT.

Total count


File size

1.2GB uncompressed, 115MB compressed

How was this created

I wrote a small program Hacker News Downloader to create these files, available at Github.


Entire file is JSON compliant array. Each element in array is json object that is exactly the response that returned by HN Algolia REST API. The property named `hits` contains the actual list of stories. As this file is very large we recommend json parsers that can work on file streams instead of reading entire data in memory.

	"hits": [{
		"created_at": "2014-05-31T00:05:54.000Z",
		"title": "Publishers withdraw more than 120 gibberish papers",
		"url": "",
		"author": "danso",
		"points": 1,
		"story_text": "",
		"comment_text": null,
		"num_comments": 0,
		"story_id": null,
		"story_title": null,
		"story_url": null,
		"parent_id": null,
		"created_at_i": 1401494754,
		"_tags": ["story",
		"objectID": "7824727",
		"_highlightResult": {
			"title": {
				"value": "Publishers withdraw more than 120 gibberish papers",
				"matchLevel": "none",
				"matchedWords": []
			"url": {
				"value": "",
				"matchLevel": "none",
				"matchedWords": []
			"author": {
				"value": "danso",
				"matchLevel": "none",
				"matchedWords": []
			"story_text": {
				"value": "",
				"matchLevel": "none",
				"matchedWords": []
	"nbHits": 636094,
	"page": 0,
	"nbPages": 1000,
	"hitsPerPage": 1,
	"processingTimeMS": 5,
	"query": "",
	"params": "advancedSyntax=true\u0026analytics=false\u0026hitsPerPage=1\u0026tags=story"


Contains all the comments posted on HN from Mon, 09 Oct 2006 19:51:01 GMT to Fri, 30 May 2014 08:19:34 GMT.

Total count


File size

9.5GB uncompressed, 862MB compressed

How was this created

I wrote a small program Hacker News Downloader to create these files, available at Github.


Entire file is JSON compliant array. Each element in array is json object that is exactly the response that returned by HN Algolia REST API. The property named `hits` contains the actual list of stories. As this file is very large we recommend json parsers that can work on file streams instead of reading entire data in memory.

	"hits": [{
		"created_at": "2014-05-31T00:22:01.000Z",
		"title": null,
		"url": null,
		"author": "rikacomet",
		"points": 1,
		"story_text": null,
		"comment_text": "Isn\u0026#x27;t the word dyes the right one to use here? Instead of dies?",
		"num_comments": null,
		"story_id": null,
		"story_title": null,
		"story_url": null,
		"parent_id": 7821954,
		"created_at_i": 1401495721,
		"_tags": ["comment",
		"objectID": "7824763",
		"_highlightResult": {
			"author": {
				"value": "rikacomet",
				"matchLevel": "none",
				"matchedWords": []
			"comment_text": {
				"value": "Isn\u0026#x27;t the word dyes the right one to use here? Instead of dies?",
				"matchLevel": "none",
				"matchedWords": []
	"nbHits": 1371364,
	"page": 0,
	"nbPages": 1000,
	"hitsPerPage": 1,
	"processingTimeMS": 8,
	"query": "",
	"params": "advancedSyntax=true\u0026analytics=false\u0026hitsPerPage=1\u0026tags=comment"

Where to download

As GitHub restricts each file to be only 100MB and also has policies against data ware housing, these files are currently hosted at Unfortunately FileDropper currently shows ads with misleading download link so be careful on what link you click. Below is the screenshot FileDropper shows and currently the button marked in red would download the actual file.


HN Stories Download URL

Using Browser:

Using Torrent Client: magnet link (thanks to @saturation)

Archived at: Internet Archive (thanks to Bertrand Fan)

HN Comments Download URL

Using Browser:

Using Torrent Client: magnet link (thanks to @saturation)

Archived at: Internet Archive (thanks to Bertrand Fan)

Few points of interests

  • API rate limit is 10,000 requests per hour or you get blacklisted. I tried to be even more conservative by putting 4 sec of sleep between calls.
  • I like to keep entire response from the call as-is. So return value of this function is used to stream a serialized array of JSON response objects to a file.
  • As the output files are giant JSON files, you will need a JSON parser that can use streams. I used JSON.NET which worked out pretty well. You can find the sample code in my Github repo.
  • In total 1.3M stories and 5.8M comments were downloaded and each took about ~10 hours.
  • It's amazing to see all of HN stories and comments so far fits in to under just 1GB compressed!

Issues and Suggestions

Please let me know any issues and suggestions in comments. You can also file issue at "shell" Github repo I'd created for this data.

Moving from dasBlog to WordPress

I've written earlier why I decided to move my site to WordPress instead choosing Jekyll or keep updating my custom code. In this post I'll go in to some details on how I moved to WordPress with the hope that others might have easier time.

Previously I'd decided to use dasBlog because it was fairly minimal and hackable. In the end I modified dasBlog in such a way that it would be hard to tell for normal users where my own code ended and dasBlog started. As it happens with so many open source projects, people moved on and now this project isn't even updated since 2012. So please move on!

Installing WordPress

As I still have some legacy ASP.Net code so I decided to host WordPress on IIS. Fortunately famous 5-minute installation claims does holds on Windows as well. You just install it through Microsoft Web Platform Installer (WebPI) and off you go, well, except few things.

  1. It's best to install and test everything on your local machine first and then move it to web host. By default WebPI uses WebMatrix server but you might want to use full IIS with all its goodies for experimentation. There are plenty of instructions on installing IIS on Windows.
  2. Search for WordPress in WebPI and choose WordPress product that has WordPress logo, avoid variants such as Brandoo. In WebPI, make sure you click on options:


    On the options screen you should select these options:


  3. When WordPress installation dialog comes up select New Web Site instead of using default (its a good practice!) and specify some local folder for all WordPress files. WebPIWordPressOptions
  4. Start IIS, stop Default site and start wordpress site. Navigate to localhost, fill in username, password and you should be able to log on to brand new WordPress website!
  5. I would highly recommend that you move WordPress installation to a subfolder instead of keeping it in the root. This has several advantages. First, you keep WordPress files separate so during updates there are no worries for overwriting your own stuff in root. Second, in root folder your can host your own code or override WordPress behavior using URL redirects. Finally, this arrangement allows you to host other Web applications and sub-sites in to its own folders sitting next to WordPress. The instructions are very easy and your external WordPress URLs don't change by doing this.
  6. There are few essential settings you want to no set: Timezone in Settings > General and url formats in Settings > Permalinks. For permalinks I used the Custom Structure with value


Exporting from dasBlog

The easiest way to migrate posts and comments from dasBlog is using DasBlogML tool. Unfortunately it seem to have gotten lost from Web altogether after MSDN folks decided to reorganize few things. I've put the copy I used on GitHub and for me the process went smoothly without errors. If you do encounter some issues there - are - few - posts out there for help.

Importing to WordPress

While WordPress doesn't have any built-in way to import BlogML content, there is a plugin BlogML Importer. Again, this plugin hasn't been maintained and is broken with current version of WordPress. So I forked it on GitHub and updated it with the fix. You just need to install original plugin and overwrite files I've on repository. Also look at this article for tips.

Cleaning Up the Markup

Over the period I'd use quite a few tools to post to my blogs and some of these produced a really messed up HTML. So one thing I needed was to clean up the markup in my posts. The Tidy2 plugin that can be downloaded from within Notepad++ is a godsend for this purpose. However you might need to invest significant time in configuring it. I've put the Tidy2 config that I tweaked for hours at Github. This config is fairly robust and does good job at cleaning bad markup in HTML fragments, even those awful MS Word extensions.

Managing Redirects for the Old Links

One of the things I care about a lot is making sure that links on my website remains valid during the moves. I've practically obsessed over this even if this website is not popular and there are hardly any old links out there pointing back here. But still, I have 301 redirects from all the way back from year 2000 so even those links remains valid today after 3 major technology stack changes. With IIS Rewrite Maps things have become much more easier. Here's what I did: Create a file that looks like below with list of URLs for individual posts dumped from the DasBlogML tool plus others you add manually. For categories you should have one URL for each category in dasBlog.

	<rewriteMap name="ShitalShahV3Redirects">
		<!-- redirects for the pages -->
		<add key="/aboutme.asp" value="/about/author/"/>
		<add key="/aboutme.aspx" value="/about/author/"/>
		<!-- etc -->
		<!-- 401s detected from Google Webmaster tools -->
		<add key="/?s=CategoryView.aspx" value="/p/category/" />
		<add key="/blog/CategoryView.aspx" value="/p/category/" />
		<add key="/?s=content/AllComments.xml" value="/comments/feed/" />
		<add key="/?s=CommentView.aspx" value="/comments/feed/" />
		<add key="/blog/content/AllComments.xml" value="/comments/feed/" />
		<!-- redirects for the feeds -->
		<add key="/blog/SyndicationService.asmx/GetRss" value="/feed/"/>
		<add key="/blog/SyndicationService.asmx/GetAtom" value="/feed/atom/"/>
		<!-- Redirects for categories -->
		<add key="/blog/CategoryView.aspx?category=AI" value="/p/category/machine-learning/" />
		<add key="/blog/CategoryView.aspx?category=Announcement" value="/p/category/personal-news/" />
		<!-- etc -->

Now you can put reference above map in your web.config. Below example also takes care of other URL patterns that dasBlog had. This however does not take care of guid based URLs that dasBlog had as permalinks. Unfortunately its just too much effort to mine theme and map them to new WordPress URLs. I used Google Webmaster Tools to find external guid links that were getting 404s. For me there were only couple so it was a quick fix.

		<!--Include the map -->
		<rewriteMaps configSource="ShitalShahV3Redirects.config" /> 
			<!-- If we find match in the map then just that -->
			<rule name="dasBlogTitleRedirects" stopProcessing="true">  
				<match url="(.*)" />  
					<add input="{ShitalShahV3Redirects:{REQUEST_URI}}" pattern="(.+)" />  
				<action type="Redirect" url="{C:1}" appendQueryString="false" redirectType="Permanent" />  
			<!-- Redirects for various URL patterns that dasBlog provided -->
			<!-- date based URLs -->
			<rule name="dasBlogDateRedirect" stopProcessing="true">  
				<match url="^blog(.*)" />  
					<add input="{QUERY_STRING}" pattern="(?:^|&amp;)date=(\d+)-(\d+)-(\d+)(?:&amp;|$)" />
				<action type="Redirect" url="/p/{C:1}/{C:2}/{C:3}/" appendQueryString="false" redirectType="Permanent" />  
			<!-- month based URLs -->
			<rule name="dasBlogMonthRedirect" stopProcessing="true">  
				<match url="^blog(.*)" />  
					<add input="{QUERY_STRING}" pattern="(?:^|&amp;)month=(\d+)-(\d+)(?:&amp;|$)" />
				<action type="Redirect" url="/p/{C:1}/{C:2}/" appendQueryString="false" redirectType="Permanent" />  
			<!-- year based URLs -->
			<rule name="dasBlogYearRedirect" stopProcessing="true">  
				<match url="^blog(.*)" />  
					<add input="{QUERY_STRING}" pattern="(?:^|&amp;)year=(\d+)(?:&amp;|$)" />
				<action type="Redirect" url="/p/{C:1}/" appendQueryString="false" redirectType="Permanent" />  
			<!-- Any other URLs -->
			<rule name="dasBlogRootOtherRedirect" stopProcessing="true">  
				<match url="^blog\/(.+)" />  
				<action type="Redirect" url="/?s={R:1}" appendQueryString="false" redirectType="Permanent" />  
			<!-- Blog's root -->
			<rule name="dasBlogRootRedirect" stopProcessing="true">  
				<match url="^blog[\/]?" />  
				<action type="Redirect" url="/" appendQueryString="true" redirectType="Permanent" />  
			<!-- main website old redirects -->
			<rule name="defaultAspxRedirect" stopProcessing="true">  
				<match url="^(default\.asp[x]?)$" /> 
				<action type="Redirect" url="/" appendQueryString="true" redirectType="Permanent" />  
			<rule name="wordpress" patternSyntax="Wildcard">
				<match url="*" />
					<add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
					<add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
				<action type="Rewrite" url="index.php" />

Themes, Plugins, Pages, Commenting and General Organization

I'll write more detailed post on how to find programmer-friendly themes and essential plugins to make WordPress more hackable in a separate post (the short answer is I'm using Decode for the theme). However eventually question will come and haunt you if you should use page or post for X where X = photo albums or your projects or articles and so on. I eventually settled on the principle to use post for pretty much everything except for few rare cases such as About and Disclaimer. The primary reason is that posts can be categorized which ultimately appears as navbar on my website. Also I've stopped treating posts as immutable pieces of textual stream that they were decade ago in RSS word. Instead I look at them as evolving articles that gets refreshed as new information becomes available. Just that underlying principle has helped me to clear up my mind about using posts as opposed to pages for most scenarios.

I decided not to write my own gallery code or use WordPress's built-in option (which I think is pretty bad). The thing is that photo galleries are finally becoming sophisticated enough that just like blog engines they would take a lot of your time to do them really well. That's the time you could have been working on more interesting problems. So I left hosting of photo galleries to PicasaWeb (with Flickr - they are offering whooping 1TB for free!). The way this would work is I will simply create a post for new albums that will have description and link to PicasaWeb and/or Flickr.

Next thing to get rid of is hosting my projects, binaries and code on my website. Github has evolved to be an obvious choice to browse, view and download so there is now little point in creating my own stuff to do the same thing. So again, the strategy is to create a post for each new project that points to my repos on Github.

Finally I also decided to dump built-in commenting system of WordPress. For an experiment I'd left it turned on for few weeks and I got 150 spam comments. It's huge pain to clean that up and disappointing that even in 2014 WordPress out-of-box comment system is just not usable. There are plugins like Akismet but its not completely free. The next obvious option was Disqus. They have proven that they can scale, they are robust, have great community support and most importantly, they allow exporting all your data so you can switch to something else if you want to. Despite of all these positives, I did encountered few unhappy moments. For example, current markup injected by Disqus actually doesn't validate for HTML5. This is super bad for product that is almost viral. I contacted their support which apparently didn't consider that was an alarming issue and asked me to post it in their community forums where all their devs handout. Huh? Why can't their support just forward it to their own devs instead of me having to find them in their public forums?

Where Do You Host This Thing?

I'd WebHost4Life as my hosting provider for very long time. However recently they have been going downhill. Their control panel is ancient and a mess of Frankenstein apps. They haven't yet gotten around to supporting latest versions of .Net, IIS and so on. Just doing FTP on their servers gave me nightmares by frequent errors and disconnections. Plus their prices are no longer competitive. So I took this opportunity to check out all the cloud providers. It turns out that none of the popular providers (Amazon, Rackspace, Azure) has a viable option for low traffic website like this one at a price that is comparable to something like WebHost4Life (while Azure has option for free website, they don't allow custom domains). Even after recent price cuts from Google, Azure and Amazon hosting website like this can easily cost $30 per month and that's with severe limitations on bandwidth, storage and compute. So I reverted back to finding regular web hosts and zeroed in to folks. These guys are just great. They really have very modern control panel, nice support, easy to manage emails, multiple websites, databases, FTP and so on. Plus their advertised storage and bandwidth is unlimited which had been my primary criteria even though it really doesn't mean that in practice. I just don't want my users to see errors "This website has exceeded its bandwidth quota" ever.

Deploying to Production

Finally its time to move your localhost WordPress installation to actual web host. How do you do it? It turns out that there is no built-in easy way. Sure, you can export your content as archive and import somewhere else but what about all the themes and plugins and customizations you had been doing all along? Fortunately there is fantastic WordPress plugin called Duplicator that worked like a charm in my case. It moved everything without a hitch to my actual server.

Welcome to V4

It's that time of the year again: Upgrading the technology stack behind my site! Actually, much more than that. I'd been neglecting to post here for very long time. Pretty much everything that could happen to prevent me from posting often seem to have indeed happened in past few years. There have been indeed lots of glowing moments of insights, clarity and awesomeness which all now slipped away from my keystrokes to remain buried in the fragile volatile memory of mine. Only thing I can say is, you, poor reader!

People have argued that social media will spell the end of posting on personal websites and blogs. In reality, social media has so much optimized itself on to sharing statuses, links, photos etc that it is rather dull tool for meaningful longer writings especially technical writing. I guess no one in social media currently cares about ability to syntax highlight the code, add LaTeX equations or embed latest commits from Github repos.

The decision to revive this website came with many choices. From the way back in 1990s, I'd insisted to build my own computers for my home and write my own software for my homepage. I enjoyed doing both because I loved obsessing over all the tiny details of hardware specs and software behavior. But then two things happened: Writing blog engines with all bells and whistles started becoming a full time job and 2nd, it's hard to beat MacMini on size, specs, price and ability to run MacOS as well as Windows. Of course, I could dumb down my blog engine to minimum, but then where's the fun?

Result is that last year I finally bought MacMini instead of building my own desktop. This year I decided to shelve my old SyFastPage framework as well as new ccBlog project at least until I'm done with other more important things. The decision got much easier by the fact that WordPress has finally evolved in to something that is robust, easy to use, hackable, extensible and has enormous community support that would be hard to replicate.

These days any hacker can't possibly make a choice to use WordPress without looking at Jekyll and its semi-clones. Initially I was excited by the whole concept of throwing away fat server side stack and having every change archived at Github but as I thought more about it I felt Jekyll wasn't passing this litmus test: Everything Should Be Made as Simple as Possible, But Not Simpler.

If you think about it, even though modern CMS/blog software dynamically generate pages using their fat footprints, for most of the requests content is served right off of the cache. In essence, these fat complex infrastructures are static site generator but storing their generated content in memory instead of disc. This actually enables simplicity in use which would be otherwise be sacrificed to keep technology stack simple.

So the new version of this website is mostly WordPress and I'm pretty happy with everything so far. The reason for "mostly" because I'm still running some code that I wrote using ASP.Net. The ability to use WordPress side-by-side your own code and override any WordPress idiosyncrasy is very important to me. This gives me an escape hatch to write my own code for whatever I wanted using whatever stack I preferred. The source code of old version of this website will remain available like all of my open source projects.

In past, I'd kept content of this blog more personal and less technical because at that time social media didn't existed and many of my friends and family would have glazed their eyes over technical content. Thanks to social media, I can now continue posting all those personal opinionated blurbs there and use this website for sharing something more serious. If you are interested in the former you can follow my social feeds.