Spark, The Rising Start of Big Data

SPARK THE RISING STAR OF BIG DATA ADVANTAGES 1 SPARK - REAL INTERACTION FOR EVERY QUERY (LESS THAN 40 SEC). SPARK - THE ONLY TECHNOLOGY OF BIG DATA ABLE TO COMBINE STORED DATA WITH REAL TIME DATA STREAMING. Amplab Uc Berkeley BENCHMARK Scan Query Query 1C 89,974,976 results MEDIAN RESPONSE TIME (S) 40 35 30 25 20 15 10 IMPALA IMPALA SPARK SPARK HIVE TEZ DISK MEM DISK MEM HADOOP 2.0 Aggregation Query Query 2C 253,890,330 groups 800 700 600 500 400 300 200 100 IMPALA IMPALA SPARK SPARK HIVE TEZ DISK MEM DISK MEM HADOOP 2.0 ACTIVITY 7X CODE 350K 55K SIZE 140K 120K 120K 100K 90K 80K 75K 65K 60K 55K 40K 20K HADOOP STORM IMPALA GIRAPH SPARK MAP REDUCE The most active Big Data Apache Project April 1 to April 8, 2014 70 7X COMMITS АРАСНЕ HADOOP 13 AUTHORS АРАСНЕ SPARK 99 4X EVOLUTION OF SPARK ADOPTION MOVED ADOPTED A SPARK ING * BaidE * > WANDISCO mahout GROUPON ClearStory DATABRICKS * YAHOO! > (intel) ☆ *OOYALA HADOOP TREND CONVIVA → sharethrough > 2013 2014 HEADLINES SPARK PROVIDES SPEED AND EASE OF PROGRAMMING Derrick Harris,Gigaom SPARK REDUCES RUN TIME, LATENCYAND COST Gavin Li, Jaebong Kim, Andy Feng, Yahoo FASTAND EASY BIG DATA PROCESSING WITH SPARK Justin Kestelyn, Cloudera EL SOURCES Benchmark, Amplab Uc Berkeley, 2014 apache/ spark, GitHub, 2014 - apache/hadoop-common, GitHub, 2014 Matei Zaharia, The State of Spark, Databriks, 2013 Derrick Harris, Apache Mahout, Hadoop's original machine learning project, is moving on from MapReduce, Gigaom, 2014 Gavin Li, Jaebong Kim, Andy Feng, Yahoo Audience Expansion: Migration from Hadoop Streaming to Spark, Yahoo, 2014 Justin Kestelyn, Putting Spark to Use: Fast In-Memory Computing for Your Big Data Applications, Cloudera, 2013 Contact and information [email protected] STRATIO

Gonzalo Alamar




