Wednesday, March 27, 2013

GridGain - In-memory Big Data

Process TBs of data on 1000s of nodes with in-memory speed anddatabase reliability using Java or Scala.

http://www.gridgain.com/
http://www.gridgain.com/blog/gridgain-hadoop-differences-synergies/

GridGain is Java-based middleware for in-memory processing of big data in a distributed environment. It is based on high performance in-memory data platform that integrates fast In-Memory MapReduce implementation with In-Memory Data Grid technology delivering easy to use and easy to scale software. Using GridGain you can process terabytes of data, on 1000s of nodes in under a second.
GridGain typically resides between business, analytics, transactional or BI applications and long term data storage such as RDBMS, ERP or Hadoop HDFS, and provides in-memory data platform for high performance, low latency data storage and processing.
Both, GridGain and Hadoop, are designed for parallel processing of distributed data. However, both products serve very different goals and in most cases are very complementary to each other. Hadoop is mostly geared towards batch-oriented offline processing of historical and analytics payloads where latencies and transactions don’t really matter, while GridGain is meant for real-time in-memory processing of both transactional and non-transactional live data with very low latencies.
 

Tuesday, March 26, 2013

Heteroskedasticity

http://en.wikipedia.org/wiki/Heteroscedasticity

The term means "differing variance" and comes from the Greek "hetero" ('different') and "skedasis" ('dispersion').

In statistics, a collection of random variables is heteroskedastic (often spelled heteroscedastic,[1] and commonly pronounced with a hard k sound regardless of spelling) if there are sub-populations that have different variabilities from others. Here "variability" could be quantified by the variance or any other measure of statistical dispersion. Thus heteroscedasticity is the absence of homoscedasticity.