Hadoop: the Intel Way (Hadoop的英特尔之道) Bring New Analytics Capabilities to Hadoop Stack 何 京 翔 英特尔亚太研发有限公司总经理 Software and Services Group
Cloud and IOT: More Users, More Device, More Data Immersive Experiences Cloud Workload Connectivity Consolidation Open Cloud Architecture Security & Trust Data Analytics Software and Services Group ‹#›
Intel's Vision This decade we will create and extend computing technology to connect and enrich the lives of every person on earth Software and Services Group ‹#›
Our Big Data Goal: Make Hadoop the Foundation of Next-Gen Data Analytics Platform Data Mining and Analytics Business Machine IntelligencStatistic Modeling Learning… e RDBMS EDW Existing Data IT & Data Marts Systems BI … All of Your Big Data (Structured & Unstructured) Sensor LogTabl Document ReadinImage … e g Software and Services Group ‹#4 ›
Hadoop in Telecom Carrier Network Optimizations Hive User Segmentation MapReduce ETL 3G HBase Base StInstantaneous query of ation HDFS 3G records by s subscribers Software and Services Group ‹#5 ›
Hadoop in Smart City Data mining (., vehicle tracking) Instantaneous query Hive(., road image)Legacy applications MapReduce HBase Stream processing (., real-time road conditions) Software and Services Group ‹#6 ›
Hadoop的英特尔之道 企业级解决方案 Enterprise-Grad前沿技术开发 e SoAdvanced Development lution 即时分析 (Instantaneous Analysis) 英特尔Hadoop发行版 “Project Panthera” 更易用•稳定的企业级软件产•Advanced development 品 and path-finding (Reduced Complexity) •针对垂直行业的功能•Open source and 更高效 增强 community driven (Improved Efficiency) Bring New Analytics Capabilities to Hadoop Stack Software and Services Group ‹#7 ›
英特尔Hadoop发行版 优化的大数据处理软件产品 稳定的企业级Hadoop发行版 利用硬件新技术进行优化 为Hadoop提供即时数据处理能力 针对行业的功能增强,应对不同行业的大数据挑战 数据处理数据分析、统计和挖掘 工具集 Mahout R 数据统计 Hive Pig 英特尔 机器学习 from Revolution Analytics 交互式数据仓库 数据流处理语言 Hadoop Sqo op 关系数据ETL工具Manage MapReduce 稳定高效的分布式计算框架r Flum e 日志收 集工具 分布式、高维数据库安装、部署、HBase 配置、监控、 HBa和创新,提供即时数据处理告警和访问控se 的改进 制 Zooke eper 分布式协 作服务 HDFS 可靠的分布式文件系统 Software and Services Group ‹#8 ›
“Project Panthera” Open source initiatives to enable advanced analytics capabilities on Hadoop Document store on SQL engine for HBase Hive/MapReduce •Document •Efficient utilization •Better integration semantics & of new HW … with existing significantly platform infrastructure speedup query technologies using SQL processing on HBase Software and Services Group ‹#9 ›
即时分析 (Instantaneous Analysis) Instantaneous analysis with greatly enhanced HBase •Stream new data into HBase for analysis in real time •Support high update rate workloads (to keep the system always up to date) •Allow very low latency, online data serving •Etc. Software and Services Group ‹1#0›
Interactive Query on HBase (英特尔Hadoop发行版) 10X faster than MapReduce For certain queries on HBase (., group-by aggregation) HBase Query Engine Layer HBase Query Engine as New Hive •Fast, distributed aggregations Backend directly inside HBase •Most “SELECT” automatically •Parallel scanning over multiple optimized to use HBase Query Engine regions “WHERE” using advanced •Advanced, distributed filtering scanner/filter (CRC32 comparator, fuzzy row “GROUP-BY” using distributed filter, etc.) aggregations •“JOIN” stills go to MapReduce Software and Services Group ‹1#1›
A Document Store on HBase (“Project Panthera”) Up-to 3x storage reduction and 3x query speedup For Hive/MapReduce query processing on HBase (See and HBASE-6800) DOT (Document Oriented Table) on HBase •Each row contains a collection of documents •Each document contains a collection of fields •A document is mapped to a HBase column and serialized using Avro •Complete transparent to existing HBase applications Software and Services Group ‹1#2›
更易用 (Reduced Complexity) •Better data mining and statistics capabilities Full-text indexing and search Statistic modeling with R language •Better integration with existing infrastructures Geo-distributed datacenters Full SQL support for OLAP Software and Services Group ‹1#3›
Full-Text Indexing and Search (英特尔Hadoop发行版) Full-text indexing and near real-time search for advanced data mining (., log and click stream analysis, healthcare record analysis, etc.) Incremental full-text indexing on HBaseNear real-time search •Distributed, keyword or •Full-text indexing for semi-structured data logical expression based (text, strings, numbers, etc.) search •Index incrementally built when records •Zero delay of searching inserted or updated •Supplatest data that are just ort very high data insertion / update rate inserted Software and Services Group ‹1#4›
Bring R Statistics into Hadoop (英特尔Hadoop发行版) Distributed Statistic Modeling on Hadoop using R language Software and Services Group ‹1#5›
Cross-Datacenter BigTable/HBase (英特尔Hadoop发行版) A virtual Big Table overlaid over existing geo-distributed data centers •Global table view •Data stored in geo-distributed data centers Data Center •Better locality & higher A availability •Data transfer eliminated Virtual through distributed Big Table aggregation Data Center C Data Center B Async Replication Software and Services Group ‹1#6›
An analytical SQL engine for Hive/MapReduce (“Project Panthera”) Goal: Provide Full SQL support for OLAP in Hadoop Required by business users, enterprise applications, 3rd party tools (., BI applications), etc. (See and HIVE-3472) SQL(Op-AST Analyzer & en SQL-HiHive Semantic ve-SQLTranslator ASTAnalyzer QuSource) AST Hadoop ery Driver SQLSubquery Multi-Table INTERSECT MINUS Unnesting SELECT Support SuppoMR rt Parser* … HiveQL Hive Hive-AST Parser * Software and Services Group ‹1#7› …
更高效 (Improved Efficiency) •Performance benchmarks & tools •Efficient utilizing of new HW platform technologies (., SSD, infiniband) Software and Services Group ‹1#8›
英特尔Hadoop发行版高效支撑海量移动上网记录分析 联通全国移动用户上网记录查询分析系统 •国内首个基于Hadoop/HBase的商用电信服务系统 •系统部署 英特尔Hadoop发行版 满足高性能的数据导入和快速查询。 稳定、易于部署和管理的企业级方案。 180+节点Hadoop/HBase集群 •系统性能指标 上网记录入库时间:一般小于30分钟,实际约10分钟 具备存储全国移动用户不小于6个月的原始上网记录能力 统计分析的中间报表数据保存不小于5年 上网记录查询速度:不高于1秒 支持并发查询数目:1000请求/秒 Software and Services Group ‹1#9›
HiBench & HiTune Performance Tools (“Project Panthera”) HiBench: Hadoop Benchmark Suite (See HiTune: Hadoop Performance Analyzer (See Software and Services Group ‹2#0›
Trying is Believing 英特尔Hadoop发行版免费版 , 为最终用户和应用提供商提供了一个功能强大、方便易用的大数据入门平台。 •免费版和企业版共用相同的核心代码 •免费版包含所有核心增强功能 •免费版在节点数和系统存储容量上有所限制 英特尔Hadoop发行版主页: CSDN英特尔Hadoop发行版社区: Software and Services Group ‹2#1›
Summary Immersive Computing = Big Data = Big Opportunities Intel is committed to deliver better and faster Hadoop solutions for big data analytics Intel Hadoop Distribution (IHD) Free Edition is here, try it out! Software and Services Group ‹#›
Software and Services Group ‹2#3›