ie8 fix

mapreduce

Shared storage in a 'shared nothing' environment

The computing industry is seeing dramatic growth in the use of "shared nothing" database architectures where each node functions independently of one another and is self-sufficient (Hadoop Distributed File System for example). For the sake of performance, contention among nodes for shared disk resources (SAN and NAS) is one of the things these architectures avoid by dedicating storage resources to each node, i.e. no shared disk.

While these computing architectures are best-known in the context of Web-based applications and development activities, they are no longer confined to the Web. EMC Greenplum, IBM Netezza, and ParAccel are all … Read more

Could open source abandon the Google train?

As arguably the world's largest open-source company, Google has a big stake in maintaining its place at the heart of the open-source ecosystem. Recent events, however, suggest that Google can't rest on its laurels if it wants to secure the hearts and minds of open-source developers.

Make no mistake: Google needs those developers. Android, Chrome (and Chrome OS), and other Google initiatives depend upon fostering vibrant open-source communities that can help it to surpass Microsoft and Apple.

Such communities may be ready to cut the Google umbilical cord, however, which should be worrying to Google.

There have been … Read more

MySpace to open source data processing

MySpace today announced a new open-source project called Qizmt, a distributed computation framework developed by its data mining team.

Qizmt is based on the MapReduce distributed processing framework, well-known as a core part of Google's search indexing infrastructure. Qizmt, however, runs on large clusters of Microsoft Windows servers, an interesting sidebar to a computing style we most commonly associate with commodity Linux machines.

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, … Read more

Hadoop buzz continues to excite the cloud

Hadoop is the popular open-source implementation of MapReduce, a powerful tool designed for deep analysis and transformation of very large data sets. It enables you to explore complex data, using custom analyses tailored to your information and questions. It's also one of the most buzz-worthy, talked about open-source projects around.

I spoke with Christophe Bisciglia, Hadoop World organizer and founder of Cloudera, to ask some questions about this inaugural event. And by the way, if you're interested in attending, click on the link in the answer to question No. 5. (My readers get a 25 percent discount if you register before September 15.)

Q: How can you explain the buzz around Hadoop? It's deafening. … Read more

More universities join Yahoo for Net-scale research

Yahoo has signed up three new universities to participate in Internet-scale computing research, the Internet pioneer said Thursday.

The University of California-Berkeley, Cornell University, and the University of Massachusetts-Amherst have joined an effort that already included Carnegie Mellon University, Yahoo said Thursday. The universities get access to a cluster of Yahoo computers called M45 that runs open-source software called Hadoop that can be used to process data rapidly.

Yahoo is a major contributor to Hadoop, a project within the Apache Software Foundation's collection, but Google created the underlying technology through its MapReduce algorithm. MapReduce and Hadoop can be used … Read more

Amazon launches Hadoop data-crunching service

This was originally posted at ZDNet's Between the Lines.

A correction has been made to this story. See details below.

Amazon on Thursday announced a new cloud computing service that uses Hadoop, a free software framework, to crunch tons of data.

The service, called Amazon Elastic MapReduce, is designed for businesses, researchers and analysts trying to conduct data intensive number crunching (statement). Hadoop, which is used by companies like Yahoo, is trying to be pushed into the enterprise data center by start-ups like Cloudera.

Correction, 7:15 a.m. PDT: This story initially miscast Google's connection to Hadoop. … Read more

Understanding MapReduce and Hadoop (Video)

For those of you interested in just how cloud computing (and I do mean, computing) works, check out this video from a recent AWSome Atlanta Cloud Computing User's Group. Twitpay's Don Brown explains how open source applications MapReduce and Hadoop are used to process enormous amounts of data at Google and other large websites.

For more on MapReduce, check out these articles by Eugene Ciurana. For more on Hadoop (including support) check out Cloudera.

Via John M. Willis

You can follow me on Twitter @daveofdoom

Google spotlights data center inner workings

SAN FRANCISCO--The inner workings of Google just became a little less secret.

The search colossus has shed only occasional light on its data center operations, but on Wednesday, Google fellow Jeff Dean turned a spotlight on some parts of the operation. Speaking to an overflowing crowd at the Google I/O conference here on Wednesday, Dean managed simultaneously to demystify Google a little while also showing just how exotic the company's infrastructure really is.

On the one hand, Google uses more-or-less ordinary servers. Processors, hard drives, memory--you know the drill.

On the other hand, Dean seemingly thinks clusters of … Read more