How Much Is There To See?

We often say there is just too much out there. You can't even possibly see everything that this universe has to offer in a million year or even a billion year.

Consider an image with resolution of 640X420 (i.e. better than standard TVs and pretty close to DVD) and RGB color depth of 0-255. Although not the best, this is perhaps a good approximation for the human eye. With this setup there are actually only 4457 billion images possible. Let's say you were watching one image per second then in about 141,332 years you would have seen everything that is possible to see! If you were making video at 25 fps with each frame being unique then within 5653 years you would no longer have anything else to shoot.

Fast Asymmetric Generalized Hebbian Algorithm

I can get sucked in to challenges very easily especially when that involves Artificial Intelligence or statistical analysis. The challenge that has occupied my interests these days is the one that was put up by Netflix. It’s easy to describe: They give you 100 million data points for a triplet (Customer, Movie, Rating) and you have to predict the rating for given (customer, movie) pairs. If the average of squared errors of your predictions is below certain value you get a million dollar prize.

The problem is nothing new in the field. The researchers have been developing techniques for this class of problems since centuries – often without anticipating rewards. Any such material reward would be embarrassingly insignificant compared to the real prize - understanding the most unique and powerful thing in existence that we are aware of: The intelligence.

What makes Netflix prize an interesting challenge, however, is that it’s very well defined and several researchers are trying out their tools of trade so it also provides quantitative measurement for comparison. There are some consensus that Netflix have set the bar just high enough that no one would ever be able to achieve the lowest required RMS. But that shouldn’t stop you to enjoy the game and push extremes to new boundaries.

So how am I doing this? I started out brushing up on all existing techniques: Neural networks with linear elements, back propagation, principle component analysis and SVD, logistic regression (still many more to go: Bayesian networks, Markov decision process, SOMs and Recurrent networks). It’s one thing to read about these algorithms from text books and other to actually put in the practice to solve real world problems efficiently. The difficulty using these techniques straight from textbook (without domain specific enhancements) is that they suck when your data set is huge (matrix with 8.6 billion elements) and that there is no real generalized algorithms to determine several parameters such as learning rate, number of units and so on effectively.

The one algorithm that swept me away among all of these is called Generalized Hebbian Algorithm (GHA) - which probably is the most practical algorithm out there for linear problems since 1985 but is not described in even latest well known textbooks! This algorithm can deal with essentially infinite data that available serially, it will use only required amount of memory to hold eigenvectors and perform SVD starting with most significant eigenvalue!

In any case, I'm making my implementation of highly optimized version of this neural network algorithm available with source code.

Again limitation of GHA (and AGHA) is that they work best on linear problems only.

Math In Office 2007

I’d been attempting to post this entry right from inside Word 2007 Beta 2, but no luck so far interfacing with dasBlog. But the good thing is that most Office 2007 apps seems to be now blog and RSS aware. You can manage your blogging accounts right inside the Word or OneNote, although few things like pinging Technorati is missing.

The coolest thing in Office 12 is, though, new math related functionality. Now you can insert math equation in Word with rendering quality that matches LaTex (well, most of the time in Beta 2). The equation can be entered very interactively or in linear format which also allows TeX like symbol naming convention such as \alpha, \to, \infty and so on. Full TeX syntax however is not yet supported and some features such as equation numbering is missing in this release. But the interactive UI to build equations is pretty funcky. You can align equations and form equation array by using shift+Enter key and right clicking on ‘=’ symbol.

Even cooler is the fact that this functionality is available from any Office app. So you can actually start writing equation in your emails and Excel sheet! The equations are converted in to png file when sent in an email so even the lousy email reader can handle it. For example, here’s the Prime Number Theorem typed in Outlook 2007 is rendered as png image like this:

PrimeNumberTheorem

Andrei Burago working in with authoring team demoed this awesome power of Word12 and set out to rewrite my entire paper on twin primes just using Word 2007 (which I had written originally in TeX)!

The Ribbon bar is ultra cool and has replaced both menus and toolbar. This might drive you nuts however when you are trying to find some usual stuff such as list of recent files, print preview and so on. The trick is to click on Office symbol on upper left corner (hack, who would have known that!). The Lookout doesn’t seem to work any longer in Outlook 2007 but the instant searching is pretty satisfactory once you give it sometime to index stuff.

World's Most Beautiful Equation

UPDATE: Check out my article on detailed HowTo for this topic.

A weekend worth of effort has paid off so I can finally write about this equation. For those who wants to write about mathematics in their blogs knows what I'm talking about. Quite ironically, there is no built-in support for writing math equations in HTML. Of all types of knowledges, mathematics is something that remains invariant over time, cultures, languages. But the fact that the Internet, the largest knowledge resource of our times, does not yet have the capability to easily represent theses crown jewels is ironic.

That got to be changed. Thanks to John Forkosh who authored MimeTeX, the C code that parses equations in TeX format and renders them in to images. The code could be compiled as CGI executable to run under Windows. But my goal is different. I want to enable all forums, blogs, wikis and even desktop apps like Yahoo/MSN messenger so users can quickly write math equations. Considering some of forums and wikis have thousands of simultaneous users, the CGI executable just won't cut it. Neither it's usable for integration with desktop apps. So my decision was to convert original MimeTex code in to Win32 DLL and that's were the trouble begins (and weekend plans ends). I realized the MimeTeX code had several memory leaks which don't matter that much when you run it as a CGI EXE but could bring down the server if I'd to run it in-proc. Fortunately I was able to fix those leaks in just a weekend worth of effort and finally have my C# test app talking to MimeTeX Win32 DLL and displaying equations as I type! However the coolest part of the whole process was the long long emails with John Forkosh over next few days discussing every change I made in his code, carefully scrutinizing it, sending back and forth our changes to each other. While John would be updating his distro soon, you can get the code with fixes along with VB.Net and C# samples, DLL for desktop apps and my HttpHandler HttpModule code. This will enable you to integrate this functionality in any website or desktop app and let your users write equations as simply as:

Fermat's Last Theorem is <img src="$x^n + y^n = z^n$">

And Now without further ado, here's the answer to life, the universe and everything (and no, it's not 42):

e^{i\pi}=-1

Known as Euler's Identity, this equation reflects the relationship between four most fundamental numbers in the universe. There are not many important mathematical and physical equations where e, i or /pi hasn't invaded yet and that in essence implies that these fundamental constants very tightly controls the ways the universe works (did you noticed G_{\mu\nu}=\frac{8\pi G}{c^4} T_{\mu\nu}). So in nutshell, this equation just might be the concise definition of the universe ;).

Digits Of Pi

A famous quote from John Von Neumann goes like this,

Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.

This is something I've intuitively believed since I was 15 and even hadn't heard of Neumann. Pure random numbers is (or probably more) as fascinating concept as \infty or i. It is impossible to generate sequence of purely random numbers without tapping in to nature. That means I could never write a computer code that generates a sequence of random numbers without showing up absolutely any patterns in a long run. There are only better random generators, never a perfect one, except thy nature itself.

So when I saw an article that digits of Pi are so far empirically proven to be randomly distributed, I was shocked. Infect huge progress has been made to prove that digits of Pi are indeed randomly distributed. Now the fact is, p can indeed be calculated algebraically (i.e. without tapping in to any natural phenomenon) and the idea that this can produce a pure random distribution just gives me a feeling as if sky is falling. I'd been hypothesizing since long time that the ability to generate infinite sequence of pure random numbers is the most significant (and probably the only) property to identify the existence of real universe, if it at all exist, that is ;). Consider this question: How do you know, at this precise moment, that you aren't part of some simulation running on some huge alien computer, or that you aren't some character in StarTrek holosuit or that you aren't dreaming with all these things around you (however "real" they may feel) aren't really "real"? Ok, it's hard to explain what I'm asking you but in nutshell, I'm trying to find out from pure mathematical perspective if there is anything in the nature that I can't masquerade, a property of the physical world around us that is impossible to simulate by any artificial means however sophisticated.

My hypothesis is that this property of the real world is an ability to generate infinite sequence of pure random numbers. That means, if you really want to find out whether you are some simulation running in a giant alien computer, all you have to do is to observe some natural phenomenon over a time with precision P and verify that your readings demonstrate pure randomness over the period of time T, where the P and T depends on sophistication of that alien simulation. The P and T can be very large but can never be infinity, except unless you are in the real world, of course. This is the mathematician's version of "I exist because I think".

So now you know why randomness of digits of pi made my stomach cringe. When I think about it, I'm starting to feel that any transcendental number obtained through convergence of infinite series (lets call them algebric transcendentals or ATs) must indeed have its digits distributed randomly. If you remember Cantor, there are more transcendental numbers than any other kind. But what this really means is I'm able to generate sequence of pure random numbers only using algebric means. It's as simple as finding new AT and emitting its digits. If you were someone who had given lot of thoughts to the nature of random numbers for years, this would sound both frightening and exciting to you. But hold on, could this really be true? After giving this some thought I believe it couldn't possibly be. I've finally constructed the following conjecture:

From a finite sequence of minimum length L of digits of any AT, there exist a Turing machine program G(L) to calculate the next digit in that sequence in finite steps. In other words, for any AT there always exist a number L which is finite and for which G(L) is a computable function.

In simple language, if you just give me sequence of AT's digits I should be able to predict the next digits provided you gave me enough of them to start with. This simply means if the alien computer was trieng to fool you by feeding you digits of some AT as a stream of random numbers, you can just sit back, collect these digits for a while and when you get handful of those, you can run through your algorithm to predict the next digits and find out you are not really in a real world (and also the fact that aliens didn't knew about Shital's AT Conjecture)! So random distribution is not the one and only property to identify a sequence of pure random numbers. The sequence of pure random numbers would not satisfy this conjecture (i.e. L would be 8). Infect this should be outright obvious: For sequence of natural numbers 0, 1, 2, \ldots we have all digits equally distributed but this sequence isn't by any means random.

This also gets us on to something else: the L now becomes a valuable property of an AT. A random number with infinite digits can be considered as a special class of AT with L\approx\infty. Let's call set of all such number ? (greek capital letter Rho) then the cardinality of \rho should be \aleph.

If all of this went over your head, here is fun part: Here you can look it up if your phone number has showed up in digits of pi calculated so far or even your name expressed as hex codes! For example, I can be found in pi at 67357954th digit ;).