Like most early online communities, the graduating class of 1989 from IIT Kanpur has a Yahoo! group: iitk-89. It was created way back in 1999 and was quite active till a few months ago. We discussed stuff that our most other friends would find uninteresting. Some one will send a link to an article that he (there were girls in our batch but they rarely participated) liked or found outrageous and then a heated discussion will ensue. Sometimes we collectively solved mathematical puzzles. It was fun.
But then a dispute arose about an off hand comment made by one the members. Without going into details, I'll only say that this incident polarized the group and the nature of discussion became very different. At this point, one of the members wondered: "it would have been nice if Yahoo! allowed a simple form of expressing likeness/dislikeness of posts". Posting response to a message you disagree with takes too much energy, is seen as an attack and is delivered as email to everyone in the group. A click to express agreement or disagreement which is then aggregated and shown as count to only those who visit the group pages would be milder and much more effective. Think of this as simple yes- or no- nodding of head during normal conversation. These are cues that get picked up and changes the conversation in subtle ways before it gets to heated and loud verbal exchange.
I kept thinking that adding a capability like this would be very beneficial to the Yahoo! group communities. So when the opportunity came this month in form of Yahoo! (internal) hackday, I coded up YLike, a hack that adds like and dislike buttons. With a little bit of extra work, I was able to make it work on my personal server and make it available to others. Visit Ylike page and give it a try. If you are a member of iitk-89 group then you can even see my votes for some of the recent messages.
Posted on March 27, 2011 10:35 PM | Permalink | Comments (2)
This blog post is motivated by three seemingly unrelated events -- a mail by an IIT Kanpur batchmate pointing out the availability of JEE 2009 marks of each of its 384,977 test takers, down to name, father's name, gender, PIN, category, and marks in different subjects (yes, this would be a major violation of privacy in US or in any of the European countries, but apparently not in India); a brief encounter with R-Project, a software package to do statistical analysis, in course of doing some day job related number crunching; and a simmering interest in comparing relative performance in tests. The giant list of marks (warning: it is a 67 MB PDF) of individual details turned out to be the starting point for questions like: How does the frequency distribution of marks look like? Is it bell-shaped? Is it same for boys and girls? Is their any perceptible difference in marks for different subjects? Is there any correlation between marks of different subjects -- say, Maths and Physics, or Maths and Chemistry, or Chemistry and Maths? Is the correlation, if any, different for boys and girl? for students scoring high or low total marks?
Continue reading "Statistical Analysis of JEE 2009 Results" »
Posted on July 2, 2010 3:22 PM | Permalink | Comments (0)
Let us say your client program running on machine chost is talking to the Server program running on machine shost and listening for connections at port 8000. To capture the request and response traffic in files, you need to do two things:
Posted on May 17, 2010 5:49 PM | Permalink | Comments (3)
Posted on May 8, 2009 4:09 PM | Permalink | Comments (1)
The most common question I have answered in last few days, second only to "why Yahoo!", is this: how long did I stay with HP? A straight-forward question that should have a simple and definite answer. But it isn't so, and usually prompts me launch into a long narrative -- I became an HPer through VeriFone acquisition in June 1997. This is the same VeriFone that went for IPO in year 2005 after being sold to a private investment group by HP sometime in year 2000 or 2001, and has been in news recently for all the wrong reasons. I had joined Bangalore office of VeriFone in 1993 January, relocated to US office in October 1998 and then moved to E-speak group within HP in July 1999. So when did I really join HP? As per HP HR records for service anniversary awards and leave calculation, I am HPer since the day of joining VeriFone in Bangalore. For certain other benefits, it is the day VeriFone got acquired by HP. Personally, I felt like an HPer only after moving to the HP E-speak group in one of the Cupertino campus buildings.
You see, it isn't that simple. So, I just picked the round number 10. A bit less than what the official records indicate, a bit more than my real years at HP and pretty close to the average of these two figures.
Besides the obvious aging and graying (or rather, loss) of hair, these 10 years have brought numerous changes: relocation from Bangalore to Bay Area and all its attendant transitions in the lifestyle, addition of Unnati (my younger daughter) to our three member family, fulfilling part of the American dream, naturalization to US citizenship and many others.
My years at HP saw many historically significant events: spinning off of Agilent, merger with Compaq, colorful days of Carly Fiorina and a resurgent HP under Mark Hurd, to name a few. However these had much less impact on my day to day professional life than events less well known but much closer to what and with whom I worked on in the software business of HP: the initial excitement and euphoria around E-speak and its subsequent unfolding along with dotcom bust of 2001 (I personally and HP as a company did a learn a thing or two with this whole endevour), acquisition of Bluestone (a company that developed a J2EE App Server) and its subsequent closing for business reasons, and the rapid expansion of HP Software business through acquisition of Peregrin, Mercury Interactive and Opsware in recent years. Each of these touched and affected my professional life in a much more profound way and saw me go through a succession of roles, each building upon the previous one: developer, development manager, product design architect and then a solution architect.
Besides the customary project deliveries and customer visits, what I remember most about working for HP is the meeting and working with very different, interesting and wonderful people. Attending TechCons, invite-only annual gathering of HP technologists from all over the world to share ideas and showcase best of their works, has been another highlight, though the competition to get invited has become much more fierce in recent years.
Projects at work, though interesting and important, weren't quite as exciting and fulfilling as semi-professional projects at home: assembling a PC in early 2000 with individually purchased part at local Frys, authoring a book on J2EE Security (though the torrid pace of change in technology has made it obsolete in less than 5 years), launching a hobby Web 2.0 site which found a mention in the venerable Wall Street Journal, and numerous other smaller projects at home including a home radio based on iTunes and a FM transmitter, a modded NSLU2 and this blog.
My latest home project: a Linux based media server that can rip song/book CDs and self-recorded DVDs into shorter clippings and then serve to the living room TV through Wii Internet Channel or a future intenet enabled phone (it will iphone 2.0 or an android based phone -- haven't made up my mind yet!)over the home network, a combination of PowerLine Network and wifi Access Points. A ffmpeg based prototype running Fedora Core 7 within a VM is almost ready but lacks the the usability that 11-year old Akriti demands for ripping and 7-year old Unnati demands for viewing.
As you would most certainly agree, these were wonderful 10 years!
Posted on June 30, 2008 5:40 AM | Permalink | Comments (0)
Posted on June 27, 2008 3:35 PM | Permalink | Comments (0)
I should clarify upfront that I love PHP for its simplicity in developing web applications and this post is not meant to be a PHP bashing by any stretch of imagination. My only motivation is to plainly state certain facts that I came across while researching/experimenting about a design decision on how best to keep track of structured information within a PHP program. What I found was quite surprising, to say the least.
One of my function calls returned a collection of pairs of integers and I was wondering whether to store the pair as an array of two named values (as in array('value1' => $value1, 'value2' => $value2)
) or a PHP5 class (as in class ValuePair { var $value1; var $value2; }
). As the number of pairs could be quite large, I thought I'll optimize for memory. Based on experience with compiled languages such as C/C++ and Java, I expected the class based implementation to take less space. Based on a simple memory measurement program, as I'll explain later, this expectation turned out to be misplaced. Apparently PHP implements both arrays and objects as hash tables and in fact, objects require a little more memory than arrays with same members. In hindsight, this doesn't appear so surprising. Compiled languages can convert member accesses to fixed offsets but this is not possible for dynamic languages.
But what did surprise me was the amount of space being used for an array of two elements. Each array having two integers, when placed in another array representing the collection, was using around 300 bytes. The corresponding number for objects is around 350 bytes. I did some googling and found out that a single integer value stored within an PHP array uses 68 bytes: 16 bytes for value structure (zval), 36 bytes for hash bucket, and 2*8 = 16 bytes for memory allocation headers. No wonder an array with two named integer values takes up around 300 bytes.
I am not really complaining -- PHP is not designed for writing data intensive programs. After all, how much data are you going to display on a single web page. But it is still nice to know the actual memory usage of variables within your program. What if your PHP program is not generating an HTML page to be rendered in the browser but a PDF or Excel report to be saved on disk? Would you want your program to exceed memory limit on a slightly larger data set?
Coming back to the original problem -- how should I store a collection pair of values? array of arrays or array of objects? For memory optimization, the answer may be to have two arrays, one for each value.
For those who care for nitty-gritties, here is the program I used for measurements:
<?php class EmptyObject { }; class NonEmptyObject { var $int1; var $int2; function NonEmptyObject($a1, $a2){ $this->int1= $a1; $this->int2= $a2; } }; $num = 1000; $u1 = memory_get_usage(); $int_array = array(); for ($i = 0; $i < $num; $i++){ $int_array[$i] = $i; } $u2 = memory_get_usage(); $str_array = array(); for ($i = 0; $i < $num; $i++){ $str_array[$i] = "$i"; } $u3 = memory_get_usage(); $arr_array = array(); for ($i = 0; $i < $num; $i++){ $arr_array[$i] = array(); } $u4 = memory_get_usage(); $obj_array = array(); for ($i = 0; $i < $num; $i++){ $obj_array[$i] = new EmptyObject(); } $u5 = memory_get_usage(); $arr2_array = array(); for ($i = 0; $i < $num; $i++){ $arr2_array[$i] = array('int1' => $i, 'int2' => $i + $i); } $u6 = memory_get_usage(); $obj2_array = array(); for ($i = 0; $i < $num; $i++){ $obj2_array[$i] = new NonEmptyObject($i, $i + $i); } $u7 = memory_get_usage(); echo "Space Used by int_array: " . ($u2 - $u1) . "\n"; echo "Space Used by str_array: " . ($u3 - $u2) . "\n"; echo "Space Used by arr_array: " . ($u4 - $u3) . "\n"; echo "Space Used by obj_array: " . ($u5 - $u4) . "\n"; echo "Space Used by arr2_array: " . ($u6 - $u5) . "\n"; echo "Space Used by obj2_array: " . ($u7 - $u6) . "\n"; ?>And here is a sample run:
[pankaj@fc7-dev ~]$ php -v PHP 5.2.4 (cli) (built: Sep 18 2007 08:50:58) Copyright (c) 1997-2007 The PHP Group Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies [pankaj@fc7-dev ~]$ php -C memtest.php Space Used by int_array: 72492 Space Used by str_array: 88264 Space Used by arr_array: 160292 Space Used by obj_array: 180316 Space Used by arr2_array: 304344 Space Used by obj2_array: 349144 [pankaj@fc7-dev ~]$
Posted on March 31, 2008 4:49 PM | Permalink | Comments (3)
Okay, a short blog post like this (or even a big one, like those penned by Steve Yegge) can't tell you everything *known today* about Ajax, forget "all you ever need to know". In fact, it can't tell you everything about anything worth knowing. There is just way too much information and knowledge around us about almost everything, consequential or not. To make things worse, at least for those who claim to "tell everything", this body of information and knowledge keeps growing every minue.
So why did I choose this particular title? No, I didn't intend to write everything I know about Ajax. It is just a link-bait. Seems to have worked quite well for others. Might work for me as well.
What I really want to do in this post is to write a short review of "Ajax -- The Definitive Guide", a book published by O'Reilly. Those who are familiar with Oreilly's The Definitive Guide series know that these books have a reputation of being very comprehensive and all encompassing about the chosen topic. This certainly seems to be the case for a number of books in this series on my bookshelf, such as "JavaScript: The Definitive Guide" and "SSH, The Secure Shell: The Definitive Guide". But a definitive guide on something like Ajax? It would have to cover a lot of stuff, in all their fullness and fine details, to do justice to the title: the basics of Ajax interactions, (X)HTML, JavaScript, XML, XmlHttpRequest, CSS, DOM, browser idiosyncrasies, Ajax programming style and design patterns, tips-n-tricks, numerous browser side Ajax libraries such as prototype, YUI library, jQuery etc. and their integration with server side frameworks such as RoR, Drupal etc. The list is fairly long, if not endless. And each topic worthy of a book by itself.
On the other hand, the book does provide good introduction to basic concepts, is quite readable, includes a lot of source code for non-trivial working programs and lists relevant resources, such as Ajax libraries, frameworks and applications, in its References section. I especially liked the "chat" and "whiteboard" application that allows two or more users to share a whiteboard and chat through their browsers.
Okay, so how does this book compares with other books on the same topic? This is a tough question, for I haven't been paying attention to most books that have come out on this topic. Though there is a answer, and it comes from this Amazon Sales Rank comparison chart:
A higher Sales Rank for an item implies that more people are buying it from Amazon. This doesn't tell how well a particular book will meet your needs but just that the high ranking items, in general, are being bought by more people than the low ranking ones. The above chart does indicate that Ajax -- The Definitive Guide is outselling its rivals, at least at the time of this review (March 17-18, 2008).
Posted on March 11, 2008 4:20 PM | Permalink | Comments (0)
"Should innovation-minded managers look at the fast-growing Internet company as a model — or an anomaly?" This is the question posed by Nick G Carr in a Strategy & Business article. Delving into various aspects of the enigmatic company, he opines:
The way Google makes money is actually straightforward: It brokers and publishes advertisements through digital media. ... snip ... Google’s protean appearance is not a reflection of its core business. Rather, it stems from the vast number of complements to its core business. ... snip ... For Google, literally everything that happens on the Internet is a complement to its main business. The more things that people and companies do online, the more ads they see and the more money Google makes. In addition, as Internet activity increases, Google collects more data on consumers’ needs and behavior and can tailor its ads more precisely, strengthening its competitive advantage and further increasing its income. As more and more products and services are delivered digitally over computer networks - entertainment, news, software programs, financial transactions - Google’s range of complements is expanding into ever more industry sectors.
Though this argument appears plausible, I don't think it will withstand critical scrutiny. Not all online activities can be equally monetized through ads. It is well documented that ads alongside search results perform much better than ads on content pages, email messages, online productivity apps, video clips or social networks (to be fair the verdict on last two is still not out). Would a company as focussed on effectiveness as Google try to increase the online ad market by doing things which are proven not to be very effective?
In my opinion, Google's core competency is in developing and running highly customized hardware and software systems and they will use this competency to solve mega-problems that others are ill-equipped to address. In the process, they will disrupt a number of established businesses.