Monthly Archives: July 2011

Need of Hadoop Distributed File System


People would always think how the organizations like Yahoo, Google, Facebook store large amounts of data of the users. We should take a note that Facebook stores more photos than Google’s Picassa. Any guesses??

The answer is Hadoop and it is a way to store large amounts of data in petabytes and zettabytes. This storage system is called as Hadoop Distributed File System. Hadoop was developed by Doug Cutting based on ideas suggested by Google’s papers. Mostly we get large amounts of machine generated data. For example, the Large Hadron Collider to study the origins of universe produces 15 petabytes of data every year for each experiment carried out.

The next thing which comes to our mind is how quick we can access these large amounts of data. Hadoop also uses Map Reduce. It follows ‘Divide and Conquer’. The data is organized as key value pairs. It processes the entire data that is spread across countless number of systems in parallel chunks from a single node. Then it will sort and process the collected data.

With a standard PC server, Hadoop will connect to all the servers and distributes the data files across these nodes. It used all these nodes as one large file system to store and process the data , making it a 100% unadulterated distributed file system. Extra nodes can be added if data reaches the maximum installed capacity making the setup highly scalable. It is very cheap as it is open source and doesn’t require special processors like used in traditional servers. Hadoop is also one of the NoSQL implementations.

The Tennessee Valley Authority uses smart-grid field devices to collect data on its power-transmission lines and facilities across the country. These sensors send in data at a rate of 30 times per second – at that rate, the TVA estimates it will have half a petabyte of data archived within a few years. TVA  uses Hadoop to store and analyse data. Our own Power Grid Corporation of India intends to install these smart devices in their grids for collecting data to reduce transmission losses. It is better they also emulate TVA.

Connecting Informatica and Social Media (Facebook)


Informatica Power Exchange is a data access tool to connect and access enterprise wide data irrespective of the type of databases like SQL Server, Oracle, Netezza, Sybase, Salesforce, Teradata, Green Plum, all kinds of Mainframe data, SAP data, etc,. Power exchange is similarly used to access Hadoop data.

The most interesting thing is using Informatica to access social media data on Facebook, LinkedIn and Twitter. Nowadays this happens to be latest real time data. By using Informatica we can collect data from any of the above social media and provide that to sales and marketing teams to leverage the more from the social media to meet their sales and revenue targets.

 Consider the scenario, if a FMCG or mobile company launches a new product, immediately the social media is flooded with the comments and opinions of the people. It will decide whether the product is success or flop in the market. Based on the outcome the product can be tweaked and released as some version 1.1. The management can plan and budget their revenue forecast and profits. The power of Power Exchange can be used along with Power Center to reap maximum of benefits.

My worst Nightmare


I was thinking to write this nightmare which happened to me when I was kid. Every time I think. I perspire and my body freezes with my thought process running out of control.

My father was working for a private bank on the banks of River Cauvery. The place which I lived was known as Pandamangalam, with the name derived from the visit of Pandava’s who were in the middle of their 13 year exile in the forests. Legends say Draupati prayed to Lord Krishna to have darshan and he showed in front of them and a temple stands there today for both of them with a huge irrigation canal passing besides it. This discreet village till today supplies betel leaves to as far as Mumbai for Pan.

 I remember it was mid of May 1998, being a school kid I was enjoying the summer holidays which are hard to come now. After lot of pleading my parents bought me a brand new bright red Hero cycle, inspite of their weak financial condition. They bought me after months of budgeting my father’s meagre salary. I was in a cloud nine and he rode the bicycle from the nearby town to my village for a distance of 5 kms. I enjoyed the pillion ride on the bumpy road. On the next day, I was in the hot seat and my father was running behind me holding the carrier. The same happened to me for the next two days. Like what happens in cinema and ads he left his hold on the cycle which I didn’t realize and I bit dust when I realized he left me all alone. Happily it was not a terrible fall. Brim with confidence that I mastered the art of riding bicycle I took it and rode the cycle on the four rath streets of the village.

Little I realized was that the village was in the middle of the sand mining mafia’s route to smuggle out river sand from the river. I pedaled fast on the sloping part of the road hitting the speed which I wanted. I felt like I was cruising in the air. The road turned 45 degree and I realized too late that a sand laden lorry was coming towards me and I was confident that I’m going to die. I was totally numb and forgot to hit the brakes. I hit the bitumen gravel and fell from bicycle on the road infront of the lorry. I just closed eyes, God was Great as He saved me from the lorry running over my neck. When I opened the eyes , I was still under lorry just a small gap between my neck and tyres of lorry. The driver had a split second time to apply brakes and it came to halt just in time to save my life.

A mid aged women who was carrying a bundle of firewood, threw it on road and rushed to my rescue. She dragged me from the road and enquired me whether everything is fine. I was touched by her kindness and I left the place hoping that my parents won’t be aware of this thing.

I didn’t have guts to face my father and I lied to my mother about the bruises. I hit the bed early and woke up late hoping that my father would have left for the office. Little I realized was that a group of elders identified me at the spot as I frequented the temple with my father every weekend and they met him at home the late last night. They discussed about the near fatal crash and came to conclusion that I will be barred from riding bicycle till my legs touched the ground for balance. No monkey pedals which they called in Tamil Nadu. I took a sabbatical for nearly a year and was allowed to ride when the government came heavily and cracked its whip on the sand mafia.

 Even though this chilling incident happened nearly 15 yrs before, it still gives night mare whenever I ride motor cycle. This made me never to ride fast and I’m always fearsome for me to travel above 70 km/hr which I have set as a threshold value based on my instinct. I still cherish the warmth and kindness of the village people which is totally absent in this city and I realized why my father always liked to work in rural and semi urban areas.

Implementing Nested Aggregation in SQL-Server 2005


In order to implement nested aggregate functions in SQL-Server 2005, we have to use the existing options in a fine tuned manner.

We can’t directly use

Select  max(count(city) , country from dimgeography group by country

Now to implement this use the below query.

Select country,max(count(city))  over() from dimgeography group by country