Blog Archives

Marriage between Big data and Cloud


When I was starting my career in IT 3 years back, everyone was talking about cloud and it was called as the next big thing in IT. But still it hadn’t made the anticipated impact. Now the industry is buzzing ‘Big Data’ and I hope it don’t be just a hype.

Talking ‘Big Data’ means the application solutions developed on Hadoop need not be restricted only to handle huge volumes of Petabytes of data. The best of big data can also be used for databases of smaller size as every organization is not going to handle huge volume of data. So if we think who will be using the Big Data solutions for small volume of data, the focus will be definitely on those who are actively using cloud for their business.

Cloud came into picture for variety of reasons like green IT, affordability and its robust nature. The primary target audience of cloud were mostly the SMB (Small and Medium scale Business) units who were previously finding difficult to use IT solutions due to cost constraints. All early starters in cloud have tremendously benefited and have seen the benefits in due course. Moving away from SMB’s, we should also remember the fact that cloud also has capabilities to store petabytes of data. Eg: Amazon’s data centers providing EC2 cloud service is a perfect example for this.

To provide Big Data solutions like analytic services to SMB’s a marriage of convenience between cloud and big data has to be considered to provide best in value to all companies. and with its marriage, Big data will extract high quality information needed for business improvement. With Hadoop being open source the companies need to work out their finances only for cloud. Amazon provides hadoop in cloud but it is not providing any other services to derive value from the data.

The IT services companies are currently providing most of the industry specific solutions as a template in their respective cloud environments. To point a few REPUBLIC from Hexaware, Ion from TCS are some of them. It will be better if we include big data in their offerings.

All readers, kindly let me know your thoughts also on this.

Need of Hadoop Distributed File System


People would always think how the organizations like Yahoo, Google, Facebook store large amounts of data of the users. We should take a note that Facebook stores more photos than Google’s Picassa. Any guesses??

The answer is Hadoop and it is a way to store large amounts of data in petabytes and zettabytes. This storage system is called as Hadoop Distributed File System. Hadoop was developed by Doug Cutting based on ideas suggested by Google’s papers. Mostly we get large amounts of machine generated data. For example, the Large Hadron Collider to study the origins of universe produces 15 petabytes of data every year for each experiment carried out.

The next thing which comes to our mind is how quick we can access these large amounts of data. Hadoop also uses Map Reduce. It follows ‘Divide and Conquer’. The data is organized as key value pairs. It processes the entire data that is spread across countless number of systems in parallel chunks from a single node. Then it will sort and process the collected data.

With a standard PC server, Hadoop will connect to all the servers and distributes the data files across these nodes. It used all these nodes as one large file system to store and process the data , making it a 100% unadulterated distributed file system. Extra nodes can be added if data reaches the maximum installed capacity making the setup highly scalable. It is very cheap as it is open source and doesn’t require special processors like used in traditional servers. Hadoop is also one of the NoSQL implementations.

The Tennessee Valley Authority uses smart-grid field devices to collect data on its power-transmission lines and facilities across the country. These sensors send in data at a rate of 30 times per second – at that rate, the TVA estimates it will have half a petabyte of data archived within a few years. TVA  uses Hadoop to store and analyse data. Our own Power Grid Corporation of India intends to install these smart devices in their grids for collecting data to reduce transmission losses. It is better they also emulate TVA.