ORACLE数据库云服务器技术揭密娄恒@高级电信行业架构师
提纲•Exadata架构设计原理•Exadata硬件体系架构•Exadata软件体系技术•Exadata数据库整合与资源管理•Exadata系统指标和配置
问题1:随着系统规模增加,传统数据库架构瓶颈凸显•存储层:1)数据量不断增加,带来的IO瓶颈;2)随着数据长时间运行带来的数据分布不均匀,存在IO热点•网络层:传输带宽不足,无法快速传输大量数据到服务器•服务器层:接收过多数据进行处理,内存优势无法发挥
解决思路:减轻负载、加宽通道、提高并行传统数据库架构解决思路:1.加宽通道、增加通道数量2.减少需要传送到服务器处理的数据量提高系统扩展能力3.增加系统并行处理
Exadata设计原则:存储层智能化、数据预处理能力Oracle Database ServersExadata CellExadata CellExadataExadatasoftwaresoftwareD……iskDisk
问题2:多系统资源互相独立,无法共享有些资源过度,有些资源不足,由于生产环境动态变化,无法动态满足Database ServersStorage Arrays
Exadata设计原则:资源共享和资源控制Oracle Database ServersSmartstorageI/O Resource ManagementoperationsHigh performancestorage networkStorageconsolidation(Transparent todatabases)Data compression
问题3:复杂的数据库系统均衡化配置Each machine has 2 CPUs All four servers drive about 2 * 200MB/s * 4 = 1600 MB/s•“The weakest link” defines the throughputEac•hEach building block is a machine has 2 Gb HBAsAbalanced unitll 8 HBAs can sustain8• C* 2o00mMpB/os n= e16n0t0s M tBo/s consider:•CPU: Quantity and speed Each• sHwBitcAh ( nHeoesdts Btou ssu Appdoarpt t8e0r0)M: B/s to guarQauntaene a total system throughput tity and speedFC-Switch1FC-Switch2of1600 MB/s•Switch speed•Controller: Quantity and speed•DiEsakc:h Q duisakn array has one 2Gbit contity and speedtroller All 8 disk arrays can sustain 8 * 200MB/s = 1600 MB/sDiskDiskDiskDiskDiskDiskDiskDiskArray 1Array 2Array 3Array 4Array 5Array 6Array 7Array 8HBA1HBA2HBA1HBA2HBA1HBA2HBA1HBA2
Exadata设计原则:平衡且优化配置•客户化系统无法达到最佳性能-组件不平衡,错误配置,瓶颈•Exadata 端到端优化-从磁盘到数据库传输是两倍的GB/sec-磁盘,闪存,控制器,总线,HBA, 网络,CPU等-数据库从固件,驱动,操作系统,网络的优化-多年的调优经验-没有瓶颈端到端优化•将IT 智慧转化成业务需求-而不要设计,调优,维护,硬件配置
问题4:系统的维护和扩容过程复杂均衡化配置方法论分区和HA规划网络互联配置规划数据库存储规划
Exadata设计原则:简化部署•消除了数据库系统部署的复杂性-数月的配置、排错、调优•当天即可完成部署-预置的、经过测试的、可稳定运行的标准配置-连接网线、电源线、加电即可用-无需更改现有应用程序,you are Exadatacertified数月到数天•即可获得极限性能
提纲•Exadata架构设计原理•Exadata硬件体系架构•Exadata软件体系技术•Exadata数据库整合与资源管理•Exadata系统指标和配置
Exadata硬件架构•高性能•超级性能&开箱即用, 低成本, 冗余, 线性扩展数据库网格•8台数据库服务器(X2-2)Exadata智能存储网格96 CPU cores (12Core per server,2x Six-Core Intel X5670 Processors ( GHz)•14 x高性能低成本存储服务器(2U)768 GB memory•或2台数据库服务器(X2-8)128 CPU cores (64Core per server)2 TB (1 TB per server)•100 TB High Speed disk, 或336 TB High Capacitydisk•112 Intel cores in storage• TB PCI 闪存InfiniBandNetwork•Redundant 40Gb/s switches•跨存储服务器的数据镜像保护Data mirrored across storage servers•Unifiedserver & storage networkExadata低功耗Maximum Power usage of a Full Rack Database Machine is 14KW, typical is KW. A single high end SMP platform without storage or switches can consume well over 20 KW.
Database Machine X2-2 Full Rack14 Exadata Storage Servers (all high performance or all high 8 Intel-based 2 CPUcapacity)Sun Fire database servers1 Cisco Ethernet switch(48-port)Keyboard, Video, Mouse (KVM) hardware3 Sun Datacenter 2 Power Distribution InfiniBand Switch 36Units (PDUs) at rear(36-port managed QDR switch -40Gb/s)
Start Small and GrowX2-2 X2-2 Half X2-2 Full Quarter RackRackRack
ExadataX2-2 Full Rack •8 Dual-procesorDatabase Servers (Sun Fire X4170 M2)-96 cores (12 per server)-768 GB memory (96GB per server)-10 GigEconnectivity to Data Center-16 x 10GbE ports (2 per server)•14 ExadataStorage Servers X2-2-All with High Performance 600GB SAS disks OR-All with High Capacity 2 TB SAS disks•3 Sun Datacenter InfiniBandSwitch 36-36-port Managed QDR (40Gb/s) switch•1 “Admin” Cisco Ethernet switch•Keyboard, Video, Mouse (KVM) hardware•2 Redundant Power Distributions Units (PDUs)Add more racks for additional scalability
ExadataX2-2 Half Rack •4 Dual-procesorDatabase Servers (Sun Fire X4170 M2)-48 cores (12 per server)-384 GB memory (96GB per server)-10 GigEconnectivity to Data Center-8 x 10GbE ports (2 per server)•7 ExadataStorage Servers X2-2-All with High Performance 600GB SAS disks OR-All with High Capacity 2 TB SAS disks•3 Sun Datacenter InfiniBandSwitch 36-36-port Managed QDR (40Gb/s) switch•1 “Admin” Cisco Ethernet switch•Keyboard, Video, Mouse (KVM) hardware•2 Redundant Power Distributions Units (PDUs)Can Upgrade to a Full Rack
ExadataX2-2 Quarter Rack •2 Dual-procesorDatabase Servers (Sun Fire X4170 M2)-24 cores (12 per server)-192 GB memory (96GB per server)-10 GigEconnectivity to Data Center-4 x 10GbE ports (2 per server)•3 ExadataStorage Servers X2-2-All with High Performance 600GB SAS disks OR-All with High Capacity 2 TB SAS disks•2 Sun Datacenter InfiniBandSwitch 36-36-port Managed QDR (40Gb/s) switch•1 “Admin” Cisco Ethernet switch•Keyboard, Video, Mouse (KVM) hardware•2 Redundant Power Distributions Units (PDUs)Can Upgrade to an Half Rack
Database Machine X2-2 Architecture
X2-2 Database Server (Sun Fire X4170 M2)Processors2 Six-Core Intel®Xeon® X5670 processors ( GHz)Memory96 GB (12 x 8 GB)Local DisksFour 300 GB 10K RPM SAS disksDisk ControllerDisk controller HBA with 512 MB battery backed cacheNetworkTwo InfiniBand4X QDR (40 Gb/s) ports (1 dual-port HCA)Four 1GbE Ethernet portsTwo 10GbE Ethernet SFP+ ports (1 dual-port 10GbE network card based on the Intel 82599 10GbE Controllertechnology)Remote Management1 Ethernet port (ILOM)Power Supplies2 redundant hot-swappable power supplies
Exadata Storage Server(Sun Fire X4270 M2)Processors2 Six-Core Intel®Xeon® L5640 Processors ( GHz)Memory24 GB (6 x 4 GB)Local Disks12 x 600 GB 15K RPM High Performance SASor12 x 2 TB RPM High Capacity SASFlash4 x 96 GB Sun Flash Accelerator F20 PCIeCardsDisk ControllerDisk controller HBA with 512 MB battery backed cacheNetwork2 InfiniBand4X QDR (40Gb/s) ports (1 dual-port HCA)4 embedded Gigabit Ethernet portsRemote Management1 Ethernet port (ILOM)Power Supplies2 redundant hot-swappable power supplies
Exadata X2-8 Full Rack14 Exadata Storage Servers (all high performance or all high 2 Intel-based 8 CPUcapacity)Sun Fire database servers1 Cisco Ethernet switch(48-port)3 Sun Datacenter 2 Power Distribution InfiniBand Switch 36Units (PDUs) at rear(36-port managed QDR switch -40Gb/s)
ExadataX2-8 Full Rack •2 Eight-processor Database servers (Sun Fire 4800)-High Core, High Memory Database Servers-128 CPU cores (64 per server)-2 TB (1 TB per server)-10 GigEconnectivity to Data Center-16 x 10GbE ports (8 per server)•14 ExadataStorage Servers X2-2-All with High Performance 600GB SAS disks OR-All with High Capacity 2 TB SAS disks•3 Sun Datacenter InfiniBandSwitch 36-36-port Managed QDR (40Gb/s) switch•1 “Admin” Cisco Ethernet switch•2 Redundant Power Distributions Units (PDUs)Add more racks for additional scalability
X2-8 Database Server (Sun Fire X4800)Processors8 x Eight-Core Intel®Xeon® X7560 Processors ( GHz)Memory1 TB (128 x 8 GB)Local DisksEight 300 GB 10K RPM SAS disksDisk ControllerDisk controller HBA with 512 MB battery backed cacheNetworkEight InfiniBand 4X QDR (40Gb/s) ports(4 dual-port PCE express modules)Two Network Express Modules (NEM), providing a total of Eight 1GbE Ethernet ports andEight 10 GbE Ethernet SFP+ ports (4 Fabric Express Modules (FEM) using Intel 82599 10GbE controllertechnology)Remote Management1 Ethernet port (ILOM)Power supplies4 redundant hot-swappable power supplies
提纲•Exadata架构设计原理•Exadata硬件体系架构•Exadata软件体系技术整体架构和Smart ScanASMEHCC和存储索引SCAN (Single Client Access Name)Service 和Server Pool•Exadata数据库整合与资源管理•Exadata系统指标和配置
Exadata Software Architecture OverviewSingle-instance DBRAC DBDB ServerDB ServerDB ServerEnterpriseDBManager InstanceDB InstanceDB InstanceDBRMDBRMDBRMSingle ASMASM clusterASMASMLIBCELLLIBCELLLIBCELLiDB Protocol over InfiniBand with PInfiniBand Storage Switch/Networkath FailoverOELOELOELCell ControlCLICELLSRVMSCELLSRVMSCELLSRVMS(cellcli/dcli)IORMRSIORMRSIORMRSSSHExadata ServerExadata ServerExadata Server
Exadata Software Architecture OverviewSingle-instance DBRAC DBDB ServerDB ServerDB ServerEnterpriseDBManager InstanceDB InstanceDB InstanceDBRMDBRMDBRMSingle ASM clusterASMASMASMLIBCELLLIBCELLLIBCELLiDB Protocol over InfiniBand with PInfiniBand Storage Switch/Networkath FailoverOELOELOELCell ControlCLICELLSRVMSCELLSRVMSCELLSRVMS(cellcli/dcli)IORMRSIORMRSIORMRSSSHExadata ServerExadata ServerExadata Server
传统SQL和数据库I/O处理过程SELECT customer_id16FROM ordersRow returnedWHERE order_amount>20000;Extents identified25SQL processing:2 MB returnedcuted:I/O issuedI/O exe3410 GB returned
Exadata 智能化存储处理过程SELECT customer_id16FROM ordersRow returnedWHERE order_amount>20000;iDB command Consolidated resultconstructed25set built from alland sent to Exadata cellsExadata cellsSQL processing342 MB returnedin Exadatato server
Exadata Smart Scan Scale-Out ExampleDatabasedbs1ServerInfiniBand Storage Network40 Gb/s Maximum ExadataCelledsc1edsc2…edsc13edsc14Each cell can deliver GB/ of 14 cells that can deliver 14 x = GB/sDisks(12/cell)
Exadata Smart Scan Scale-Out Exampleselect /*+ full(lineitem) */ count(*)from lineitemwhere l_orderkey < 0;DatabaseDatabase asks to retrieve all dbs1Serverblocks by doing a full table scan, and then filters matching the table is evenly distributedIf the table is 4800 GB in size, the across all disks, each cell complete scan would take cannot send more than 40 / 14 = approximately 16 Gb/s = GB/sto the database … GB/sDisks are throttled by the network bandwidth!Disks(12/cell)
Exadata Smart Scan Scale-Out Exampleselect /*+ full(lineitem) */ count(*)from lineitemwhere l_orderkey < 0;Databasedbs1Database asks Exadata cellsServerto send back all matching the table is evenly distributedIf the table is 4800 GB in size, the complete across all disks, each cell table scan will complete in approximately cannot send more than 40 / 14 = three minutes and ten seconds! GB/s = GB/sto the database … GB/sEach cell can scan at aspeed of GB/s,and send its matchingrows to the database instance. This represents a total scan at a speed of GB/s!Disks(12/cell)
Exadata Software Architecture DetailsExadata CellDatabase ServerSmartRDBMS instanceASM instanceSGASGADataFlash CacheASMASMI/OdskmdskmI/OProcProcLIBCELLLIBCELL/opt/oracle/cell/cellsrv/deploy/ ProtocoliDB ProtocolCELLSRV/etc/oracle/cell/ internalList accessibleList localdictionaryExadata cellsinterface IPandCELLSRVinternalInfiniBand switchparameters andlocal interface IP
Exadata Smart Storage Capabilities-Predicate filtering:-Only the rows requested are returned to the database server rather than all the rows in a table.-Column filtering:-Only the columns requested are returned to the database server rather than all the columns in a table.-Join processing:-Simple star join processing is performed within Exadata.-Scans on encrypted data-Scoring for Data Mining:-All data mining scoring functions are offloaded.-Up to 10x performance gains.-Backups:-I/O for incremental backups is much more efficient because only changed blocks are returned to the database server.-Create/extend tablespace:-Exadataformats database blocks.
Disk Storage Entities and RelationshipsDiskLUNCELLDISKGRIDDISKASM diskExadata CellCellCLI> CREATE GRIDDISK ...First twoDataLUNs onlyStorageGridPartitionDiskSystem AreaORCellORVisible to DiskASMGrid Disk(hot part)LUNOther ten LUNsGrid Disk(cold part)
Interleaved Grid DisksGrid Disk 1Grid Disk 3Fastest Tracks Fast TracksFastest Tracks Slower Tracks50%50%50%50%Slower Tracks Slowest TracksFast Tracks Slowest TracksGrid Disk 2Grid Disk 4Grid Disk1 benefits from the higher performance outer tracks of the diskThe performance of Grid Disk 3 and Grid Disk 4 is more evenly balanced
Creating Grid DisksCellCLI> CREATE GRIDDISK ALL PREFIX=data, SIZE=300GUse fastestGridDiskdata_CD_00_cell01 successfully createddisk portion...GridDiskdata_CD_11_cell01 successfully createdCellCLI> CREATE GRIDDISK ALL PREFIX=fraGridDiskfra_CD_00_cell01 successfully created...GridDiskfra_CD_11_cell01 successfully createdBeforeAfterCellGridCellCLI> LIST GRIDDISKdiskdisksdata_CD_00_cell01 active...data_CD_11_cell01 activefra_CD_00_cell01 active...fra_CD_11_cell01 active……CellCLI> exit[celladmin@cell01 ~]$
Exadata Smart Flash Cache ArchitectureWrite OperationRead OperationRead Operationon previously cached dataon uncached dataDB1DB1DB313322424Exadata SmartFlash CacheAcknowledgementcellsrvRead RequestcellsrvRead Requestcellsrv
Exadata Smart Flash Cache•High performance cache that understands different types of database I/O:-Frequently accessed data and index blocks are cached.-Control file reads and writes are cached.-File header reads and writes are cached.-DBA can influence caching priorities.-I/Os to mirror copies are not cached.-Backup-related I/O is not cached.-Data Pump I/O is not cached.-Data file formatting is not cached.-Table scans do not monopolize the cache.
Flash Cache and Flash Grid Disks•384 GB flash memory per cell is used to create 16 cell disks 24 GB each •Flash-based cell disks can be used for-Smart Flash cache-默认设置下使用全部可用flash空间-自动管理实现空间高效利用-对OLTP与DW均可起到加速作用-Flash-based grid disks-可被看作额外的数据库存储空间使用-需要精细的使用计划从而发挥其最大效能-对频繁更新的写操作对象有明显帮助
Flash Storage Entities and RelationshipsFlashLUNCELLDISKGRIDDISKASM diskORFLASHCACHEExadataCellCellCLI> CREATE FLASHCACHE ...CellCLI> CREATE GRIDDISK ... FLASHDISK ...FlashCacheFlashCellLUNDiskORFlash CacheGrid DiskVisible to ASM
Creating Flash-Based Grid DisksCellCLI> DROP FLASHCACHEFlash cache cell01_FLASHCACHE successfully droppedCellCLI> CREATE FLASHCACHE ALL SIZE=100GFlash cache cell01_FLASHCACHE successfully createdCellCLI> CREATE GRIDDISK ALL FLASHDISK PREFIX=flashGridDisk flash_FD_00_cell01 successfully createdGridDisk flash_FD_01_cell01 successfully createdBeforeAfterFlashCache...FlashGridDisk flash_FD_15_cell01 successfully createdCacheGrid diskCellCLI> LIST GRIDDISK...flash_FD_00_cell01 active...flash_FD_15_cell01 active……CellCLI> exit[celladmin@cell01 ~]$
Pinning DB Objects in Cache•DBA can enforce that an object is kept in flash cache-CELL_FLASH_CACHE = Keep, Default, None•Different cache retention policy for „keep‟ objects-Cached more aggressively-Cannot be pushed out by „default‟ objects•Keep blocks are automatically „un-pinned‟ if-Object is dropped, shrunk, or truncated-Object is not accessed on the cell within 24 hours-Block is not accessed on the cell within 48 hours-Downgraded to „DEFAULT‟ behavior
Disk Group ConfigurationSQL> CREATE DISKGROUPExadata CellExadata CellCELLDATA 1 Failure GroupDisCELL2 Failure Groupk GroupCELLFRA 1 Failure GroupDisCELL2 Failure Groupk Group
Configuring ASM Disk Groups for ExadataDisk group DATAFailure group cell01Failure group cell02o/<cell01 IP address>/data_cd_00_cell01o/<cell02 IP address>/data_cd_00_cell02o/<cell01 IP address>/data_cd_01_cell01o/<cell02 IP address>/data_cd_01_cell02......o/<cell01 IP address>/data_cd_11_cell01o/<cell02 IP address>/data_cd_11_cell02o/<cell01 IP address>/fra_cd_00_cell01o/<cell02 IP address>/fra_cd_00_cell02o/<cell01 IP address>/fra_cd_01_cell01o/<cell02 IP address>/fra_cd_01_cell02......o/<cell01 IP address>/fra_cd_11_cell01o/<cell02 IP address>/fra_cd_11_cell02All candidate disks on cell01and cell02CREATE DISKGROUP data NORMAL REDUNDANCYDISK 'o/*/data*'ATTRIBUTE ''='',''='','_scan_capable'='TRUE','au_size'='4M';
Screen reader hint:This slide contains a diagram that is explained in the speaker notes. Disk Group OverviewDisk 1Disk 2Disk 3Disk 4Disk 5Disk 6File 3File 4File 1File 2
提纲•Exadata架构设计原理•Exadata硬件体系架构•Exadata软件体系技术整体架构和Smart ScanASMEHCC和存储索引SCAN (Single Client Access Name)Service 和Server Pool•Exadata数据库整合与资源管理•Exadata系统指标和配置
自动存储管理ASM•大幅度简化数据库存储管理•消除对专业存储管理产品的需求•实现真正的按需增加容量•提供最佳的I/O 吞吐率Application•卷管理系统Database•文件管理系统File•可以有:SystemASM镜像(基于文件作不同的镜像)Volume条带化(对文件可指定不同的条Manager带大小)Operating System动态平衡(可控)48
ASM存储保护及负荷分担•均衡及镜像:ASM在Allocation Unit级别实现镜像,AU被均衡分ASM布在各智能存储单元上(1AU=DBDB缺省4M块)File #1File #2•高可用:主及镜像版本被分配到不同的硬件单元上•自动恢复:在磁盘或存储单元失效时,自动执行再镜像过程可抵御磁盘及Cell单元的故障•透明性:磁盘或智能存储单元的损坏对数据库透明
ASM对数据的全面保护•采用Normal保护方式,任何一份数据会同时分布到两个不同Fail Group•任何两个不同的Fail Group一定不来自同一个Storage Cell•如果需要增加数据保护,可以增加Fail Group数量实现数据更多重的保护50
ASM和传统非ASM相比不使用ASM使用ASMTablesTablesTablespaceTablespace0010 0010 0010 0010 0010Files 0010 0010 0010 0010 0010File Names AutomaticFile SystemFile SystemStorageLogical VolLogical VolManagementDisksDisk Group网络存储(SAN, NAS, DAS)51
ASM 磁盘群组-磁盘群组视为一个逻辑单位来ASM管理Instance-将每个文件平均分散到所有的磁盘-依照文件类型选择大范围的条带(Coarse-grain)分割或小范围的条带(fine-grain) 分割-管理磁盘群组而不是管理文件Disk Group52
磁盘群组镜像-在扩展(extent)层作完成镜像-每个磁盘都混合主要扩展和镜像扩展-外部冗余:依赖外部硬件镜像冗余,ASM不做冗余-一般冗余: -两路镜像-高度冗余: -三路镜像53
磁盘组的动态数据平衡-当存储配置修改时,会自动动态数据平衡-向新增的磁盘增加成比例的数据-不需要手工调整I/O -可支持在线存储升级54
ASM动态数据平衡Automatic Storage Management Extends SAME–Allows dynamic online Storage reconfiguration–Efficient relocation of data during rebalance–Eliminate manual I/O tuning in all storage configurations55
Exadata 内部数据的分布数据的分布只有在系统变更时才发生改变–当新StorageCell添加时均衡仍然得以保持–当硬件出故障时均衡仍然得以保持–单块磁盘损坏/单个storagecell损坏都能忍受–缺省分配单元AU_SIZE=4M,Strip_size=512KStorage cell1Storage cell 2Node 1Node 2Storage cell 3Node 356
ASM对性能的提升数据自动分布,提供最高性能–当新硬件添加时均衡仍然得以保持–当旧硬件移除时均衡仍然得以保持–当硬件出故障时均衡仍然得以保持Storage cell1Storage cell2Storage cell 3Storage cell 4Add new storage cell57
Traditional vs ASM –Add Disk to Disk to volume(s) with the Add Disk Volume Manager File System over volume out data to move to new disk data to new files in -tune I/O
Traditional vs ASM –Remove all data that is on drop disk existing filesystem the disk from OS to hold data from dropped when Oracle says it is done data to new files in disk from -tune I/O
Traditional vs ASM –Tune I/ monitor I/O This space intentionally left performance hot out how to remove hot data to new files in that hot spot is gone
提纲•Exadata架构设计原理•Exadata硬件体系架构•Exadata软件体系技术整体架构和Smart ScanASMEHCC和存储索引SCAN (Single Client Access Name)Service 和Server Pool•Exadata数据库整合与资源管理•Exadata系统指标和配置
ExadataHybrid Columnar CompressionWarehouse CompressionArchival CompressionOptimized for SpeedOptimized for Space•10x average storage savings•15x average storage savings•10x scan I/O reduction–Up to 50x on some data•Optimized for query performance•Some access overhead •For cold or historical dataReduced Warehouse SizeReclaim DisksBetter PerformanceKeep Data OnlineCan mix compression types by partition for ILM
Exadata Hybrid Columnar CompressionArchitecture OverviewCompression Unit (CU)Block HeaderBlock HeaderBlock HeaderBlock HeaderCU HeaderC2C5C7C8C1C4C3C2C5C6-A compression unit is a logical structure spanning multiple database blocks.-Each row is self-contained within a compression unit.-Data organized by column during data load.-Each column compressed separately.-Smart Scan is supported.-Typically 32k (4 blocks x 8k block size)
Warehouse CompressionBuilt on Hybrid Columnar Compression•10x average storage savings-100 TB Database compresses to 10 TB-Reclaim 90 TB of disk space -Space for 9 more „100 TB‟ databases•10x average scan improvement-1,000 IOPS reduced to 100 IOPS10 TB100 TBOracle Confidential
Archive CompressionBuilt on Hybrid Columnar Compression•Compression algorithm optimized for max storage savings•Benefits any application with data retention requirements•Best approach for ILM and data archival-Minimum storage footprint-No need to move data to tape or less expensive disks-Data is always online and always accessible-Run queries against historical data (without recovering from tape)-Update historical data-Supports schema evolution (add/drop columns)Oracle Confidential
Archive CompressionILM and Data Archiving Strategies•OLTP Applications-Table Partitioning-Heavily accessed data -Partitions using OLTP Table Compression-Cold or historical data -Partitions using Online Archival Compression•Data Warehouses-Table Partitioning-Heavily accessed data-Partitions using Warehouse Compression-Cold or historical data-Partitions using Online Archival CompressionOracle Confidential
Efficient Data Storage•Up to 50x savings in storage and associated costs•EHCC is a feature of ExadataStorage V2-Decompression, selection and projection performed on storage-Data is stored compressed on disk and compressed in the Flash Cache, frequently accessed data cached on Flash-Table can be forced to be on Flash by setting cell_flash_cache_keep•EHCC is tightly integrated with Oracle DB 11gR2-Data is stored compressed in buffer cache (DRAM)•With EHCC, entire databases can now run in memory-DRAM can hold 5TB of a compressed database-Flash can hold 50TB of a compressed database
Efficient Data Movement•No back and forth conversions between compressed and uncompressed formats•Read/Write Compressed data to disk-Write Compressed data to ASM mirrors•Read/Write Compressed data in Flash Cache-10x improvement for Flash price performance•Send Compressed data over Infiniband•Write Compressed data to Redo Logs•Send Compressed data to Standby-10x reduction in WAN bandwidth cost: makes ADG appealing for DW•Write Compressed data to Backups
Efficient Data Processing•End-to-end optimization of data processing for queries•Specialized columnar query processing engine runs in ExadataStorage Server to run directly against compressed data-Column optimized processing of query projection and filtering-Vector processing techniques used to fully leverage columnar format•10x to 100x smaller subset of qualifying data returned over Infinibandto database server for further query processing•Optimized single row lookupsto perform efficient I/O of a contiguous set of blocks that form a Compression Unit
Exadata Storage IndexStorage Index in MemoryOnly first block can matchRegion IndSELECT * FROM T1 WHERE B<2;exB:E:a/j…1/5B:3/8…G:4/91 ASM AU…1MBStorage RegionDBA1 ASM DiskTable T1Table T1Table T2ABCDABCDEFG…1………5……a…4Min B = 1= 5…3………Min B = 3Max B 8……Max Bd…7 = 8…5………3……j…9
提纲•Exadata架构设计原理•Exadata硬件体系架构•Exadata软件体系技术整体架构和Smart ScanASMEHCC和存储索引SCAN (Single Client Access Name)Service 和Server Pool•Exadata数据库整合与资源管理•Exadata配置和系统指标
Connection Load BalancingApplication ServerOracle RAC DatabaseSCANListenersLocal ListenersClients
Connection Load BalancingApplication ServerOracle RAC DatabaseSCANListenersListenersClients
Oracle Database 11g Release 2Single Client Access Name (SCAN)•Used by clients to connect App to any database in the Serverscluster•Removes the requirement to change the client connection if cluster changes•Load balances across the DWinstances providing a service•Provides failover between “moved instances”© 2009 Oracle Corporation –Proprietary and ConfidentialRAC ONE ARAC ONE BRAC ONE CFrontRAC ONE DRAC ONE EOfficeRAC ONE FRAC ONE GRAC ONE HBackRAC ONE IRAC ONE JFreeRAC ONEOffice KRAC ONE
Single Client Access Name (SCAN)•A DNS entry is required (unless using GNS)-A single name defined to resolve to 3 addresses (the scanvips) •Each cluster will have 3 scan listeners, each having a scanvipdefined as cluster resources•A SCAN VIP/LISTENER will failover to another node in clusterCluster 1 ONLINE 1 ONLINE 1 ONLINE ONLINEnode3
Oracle ClusterwareSCAN Resources •指定SCAN Listener 和SCAN VIP$srvctlconfigscan_listenerSCAN Listener LISTENER_SCAN1 exists. Port: TCP:1521SCAN Listener LISTENER_SCAN2 exists. Port: TCP:1521SCAN Listener LISTENER_SCAN3 exists. Port: TCP:1521$srvctlconfigscanSCAN name: sales1-scan, Network: 1/ VIP name: scan1, IP: / VIP name: scan2, IP: / VIP name: scan3, IP: /
Single Client Access Name (SCAN)•For client connections, SQL*Net (11g Release 2) will get the 3 IP addresses that the SCAN resolves to. It will randomly select one of the addresses and try connecting to that listener.•When a SCAN Listener receives a connection request, the SCAN Listener will check for the least loaded instance providing the requested service. It will then re-direct the connection request to the local listener on the node where the least loaded instance is running. The local listener will create the connection to the database instance.
Single Client Access Name (SCAN)•SCAN listener is the load balancer for the cluster.•Allows clients to use EZConnector simple JDBC connections•Instance registers with local listener on its node•Database “remote_listener” registers instances with all SCAN listenerssqlplussystem/manager@ sales1-scan:1521/oltpjdbc:oracle:thin:@sales1-scan:1521/oltp
Client-Side Connection Load =(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=sales1-scan)(PORT=1521)) ))(CONNECT_DATA= (SERVICE_NAME=oltp)))SCANListenersEZConnectsqlplus scott@sales1-scan/oltpClients
Server Side Connection Load Balancing with SCANLOCAL_LISTENERSCAN_LISTENEROLTP on PMDB3Service OLTP? PMDB1 on N1PMDB2 on N2NePMDB3 on N3tworkNetworkSCANListenersOracle RAC Database
Connection Load BalancingCRM work requestsConnection Cache 10%?60%30%RACRACRACInst1Inst2Inst3CRM isCRM isCRM is very .
提纲•Exadata架构设计原理•Exadata硬件体系架构•Exadata软件体系技术整体架构和Smart ScanASMEHCC和存储索引SCAN (Single Client Access Name)Service 和Server PoolDBRM 和IORM•Exadata数据库整合与资源管理•Exadata配置和系统指标
Services•Serviceis an abstraction that customers use to divide work into logical workloads-globally unique name -service quality levels (performance, Priority, availability)-Unit of Management-Location Transparency-Traditional HA(stop/start/relocate) operations-Workload Management -Dependencies and notification
=(DESCRIPTION=GL=(DESCRIPTION=(ADDRESS=(ADDRESS=(PROT=TCP)(PROT=TCP)(HOST=BLEU)(HOST=BLEU)PORT=1521) PORT=1521) ))CRM(CONNECT_DATA=(CONNECT_DATA=GL(SERVICE_NAME=CRM)(SERVICE_NAME=GL)))))PRODGLCRMDatabase servicesPRODGLCRMInstance parameter:SERVICE_NAMES=„PROD,GL,CRM‟
Use EM to Define Services
Published services
Server Pool•Logical division of a cluster into pools of servers. •Hosts applications (which could be database or application)•Defined by 3 attributes (min, max, importance) or a defined list of nodes-Min-minimum number of servers (default 0)-Max –maximum number of servers (default 0) -Importance –0 (least important) to 1000Best practice for Oracle RAC: min 0, max=cardinality
Database Server Pool•Special server pool used for Oracle databases•Non-overlapping: server can only be in one pool at a time•DBCA asks for cardinality (number of instances)-Server pool max=cardinality, min 0 (default), importance 0 (default)•Managed by srvctl(only)-Not allowed to modify with crsctl-Cluster resource prefixed by orasrvctl add serverpool –g name –u max
Cluster Managed Services with Server Pools•One to one mapping-IE a service can only run in one server pool•Services are uniform(run on all instances in the pool) or singleton(runs on only one instance in the pool)
定义服务于服务池部署关系服务器资源池化服务可用性定义(服务池部署)服务属性定义服务级别阀值资源定义
GENERIC Server Pool•Used to model databases that are “Administrator Managed”. . “The OLD WAY!”-Parent of all server pools for “Administrator Managed” databases•Always exists but may be of size 0•Used for upgrade from 10g or 11g Release 1-IE includes all nodes that have an APPLICATION resource•Managed by srvctlmanaging the database•Servers in Generic are named (hosting member in CRS resource)
FREE Server Pool•A special server pool created by CRS-Always exists but may be of size 0•Any unassigned server will go to FREE•Default importance is 0 (least important)-Importance can be modified
Server Assignment•Servers are assigned in the following order: server pool assigned server •CRS uses importance of server pool to determine all server pools in order of importance until they meet their all server pools in order of importance until they meet their Default any left over go into FREE-You can increase the importance of free for manual assignment
Example –Server Allocation to Server Pool•Cluster with 6 nodes-PRODDB server pool for prod database (min 0, max 3, Imp 4)
Example –Server Allocation to Server Pool•Cluster with 6 nodes-PRODDB server pool for prod database (min 0, max 3, Imp 2)-EMAIL server pool for email db (min 0, max 3, Imp 2)
Example –Server Allocation to Server Pool•Cluster with 6 nodes-PRODDB server pool for prod database (min 0, max 3, Imp 2)-Services OLTP–uniform, BATCH-singleton-EMAIL server pool for email db (min 0, max 3, Imp 2)-Service
Example –Server Allocation to Server Pool•Cluster with 6 nodes-PRODDB server pool for prod database (min 0, max 2, Imp 2)-BATCH server pool for prod database (min 0, max 2, Imp 1)-Services OLTP–uniform, BATCH–singleton (isolated)-EMAIL server pool for email db (min 0, max 3, Imp 2)-Service
Example –Administrator Managed Database•Cluster with 4 nodes-PRODDB RAC Database runs on Node 1, Node 2-EMAIL RAC database runs on Node3, Node 1NODE 2NODE 3NODE 4Generic
Example –Administrator Managed Database•Cluster with 4 nodes-PRODDB RAC Database runs on all nodes-EMAIL RAC database runs on all -NODE 1NODE 2NODE 3NODE 4Generic
Example –Mixed Environment•Cluster with 6 nodes-PRODDB server pool for prod database (min 0, max 3, Imp 4)-EMAIL Oracle RAC database runs on Node 4, Node 5, Node 4NODE 5NODE
Cluster Reconfiguration•If a server leaves the cluster Oracle Clusterwaremay move servers from one server pool to another only if a server pool falls below its min. •It chooses the server to move from -A server pool that is less important -A server from a server pool of the same importance which has more servers than its min•Oracle Clusterwarewill only move servers if you have non-default values for min, importance
Example –Server Allocation to Server Pool•Cluster with 6 nodes-PRODDB server pool for prod database (min 0, max 3, Imp 4)-EMAIL server pool for email db (min 0, max 3, Imp 2)
Example –Server Allocation to Server Pool•Cluster with 6 nodes-PRODDB server pool for prod database (min 0, max 3, Imp 4)-EMAIL server pool for email db (min 0, max 3, Imp 2)
Example –Server Allocation to Server Pool•Cluster with 6 nodes-PRODDB server pool for prod database (min 0, max 3, Imp 4)-EMAIL server pool for email db (min 0, max 3, Imp 2)
Example –Server Allocation to Server Pool•Cluster with 6 nodes-PRODDB server pool for prod database (min 0, max 3, Imp 4)-EMAIL server pool for email db (min 0, max 3, Imp 2)
Example –Server Allocation to Server Pool•Cluster with 6 nodes-PRODDB server pool for prod database (min 0, max 3, Imp 4)-EMAIL server pool for email db (min 0, max 3, Imp 2)
Example –Server Allocation to Server Pool•Cluster with 6 nodes-PRODDB server pool for prod database (min 0, max 3, Imp 4)-EMAIL server pool for email db (min 0, max 3, Imp 2)
Example –Server Allocation to Server Pool•Cluster with 6 nodes-PRODDB server pool for prod database (min 1, max 3, Imp 4)-EMAIL server pool for email db (min 0, max 3, Imp 2)
提纲•Exadata架构设计原理•Exadata硬件体系架构•Exadata软件体系技术整体架构和Smart ScanASMEHCC和存储索引SCAN (Single Client Access Name)Service 和Server PoolDBRM 和IORM•Exadata数据库整合与资源管理•Exadata配置和系统指标
混合型数据库系统资源需求-管理混合事务的资源负载-控制系统性能Database resource managerOLTPOLTP userMore resourcesDSS Less resourcesOracleDatabaseDSSuser
资源计划DAY_PLANResource ConsumerAllocation MethodsGroupCPU_P1CPU_P2CPU_P3OLTP_GROUP100%40%0%OTHER_GROUPS0%0%100%DSS_GROUP0%60%0%NIGHT_PLANResource ConsumerAllocation MethodsGroupCPU_P1CPU_P2CPU_P3OLTP_GROUP30%0%0%OTHER_GROUPS0%0%100%DSS_GROUP70%100%0%
Oracle Enterprise Manager:资源管理器1234567CPU_MTHvalues
Maximum Utilization LimitIn 11g Release 2, the “max_utilization_limit” directive limits the CPU consumption of an applicationDB Consolidation Plan #1DB Consolidation Plan #2CPU Max CPUCPU Max CPUAllocationAllocationAllocationAllocationApp 150%50%App 150%App 220%50%App 220%App 320%50%App 320%App 410%50%App 410%Specify minimum and maximum Specify maximum CPU utilization CPU utilization limits limits only© 2010 Oracle Corporation
Max_utilization_limit25% –50% -75% Proof PointsGraph Explanation:Test results indicating the max_utilization_limit directive set at no limit, 25%, 50%, 75%Workload Description: A mix of OLTP and DSS queries and DMLs from the Oracle Financials application
设置活动会话池
配置资源组切换
设置网络空闲时间
Resource Consumer Group Mapping
Activating a Resource Planfor an Instance
I/O Scheduling, the Traditional WayWith traditional storage, disks service I/Os in FIFO are reordered only to improve disk cannot influence their behavior!ReportReportReportReportA burst of Report I/Os will be queued ahead of OLTP I/ serviced ahead of OLTP I/Os!
I/O Scheduling, the Exadata WayI/O Resource Manager controls order that I/Os are issued to issues enough I/Os to keep disks busy and are queued per database, as necessary, within Storage CellResource PlanI/O Resource Manager ReportReportReportReportUses Resource Plan to determine the order of I/O requestsPrevents a database from flooding the disk© 2010 Oracle Corporation
I/O Resource ManagementTraditionalI/O Storage ServerFIFO Disk QueueRequestsRDBMSHLHLLLYou cannot influence theHigh-priorityLow-priorityI/O scheduler based on prioritization schemeI/O RHHequestsRDBMSLHHHLLLL
I/O Resource Management PlansI/OResourceManagementInsideAcrossonemultipledatabasedatabasesIntradatabaseInterdatabaseCategoryResourceResourceResourcePlanPlanPlanIORM Plan
I/O Resource Management Plans ExampleDatabase ADatabase B(Single Inst)(RAC)IntradatabasePlan AIntradatabasePlan B(DBMS_RESOURCE_MANAGER)(DBMS_RESOURCE_MANAGER)Consumer group 1: 15%Consumer group 5: 22%Consumer group 2: 10%Consumer group 6: 18%Consumer group 3: 35%Consumer group 7: 15%Consumer group 4: 40%Consumer group 8: 45%Controlled I/OdistributionExadata Storage ServerDisk…DiskDB A PlanDB B PlanInterdatabasePlanIORM PlanCategory Plan(CellCLI)(CellCLI)Database A: 70%INTERACTIVE : 60%Database B: 30%BATCH : 40%
I/O Resource Management Plans ExampleDatabase ADatabase BDatabase ADatabase BIORMallocationIntradatabaseInterdatabaseCategories40%60%BATCHINTERACTIVEAllUser I/Os(100%)45%30%15%40%70%35%18%30%22%10%70%15%
提纲•Exadata架构设计原理•Exadata硬件体系架构•Exadata软件体系技术•Exadata数据库整合与资源管理•Exadata配置和系统指标
数据库整合和资源管理•Server Consolidation-Managing CPU-Managing IO•Database Consolidation-Managing IO-Managing CPU-Managing Parallel Execution-Managing Runaway Queries-Putting It Together
Scenarios for Server ConsolidationScenario 1-My server has enough CPU to run 4 medium-size production databases-These databases have strict performance requirements-Surges in one database‟s CPU utilization cannot be tolerated by the othersNeed a way to dedicate CPU resources to each databaseScenario 2-My quarter-rack Exadata is shared by multiple development databases-I want to limit the CPU utilization of each database to control the load on the server-I want each database to be able to use as much CPU as possible when the server is not fully utilizedNeed a way to limit the CPU utilization of each database© 2010 Oracle Corporation
Instance Caging•Instance Caging is an Oracle feature for “caging” or limiting the amount of CPU that a database instance can use at any time•Important tool for server consolidation •Available in •Just 2 steps:1. Set “cpu_count” parameter-Maximum number of CPUs the instance can use at any time2. Set “resource_manager_plan” parameter-Enables CPU Resource . out-of-box plan “DEFAULT_PLAN”
CPU Usage WithoutInstance CagingWait for CPU on O/S run queueOracle processes from one Database Instance try to use all CPUsRunning Processes© 2010 Oracle Corporation
CPU Usage WithInstance CagingWait for CPU on Resource Manager run queuesInstance Caging limits the number of Oracle processes running at any moment in timeRunning Processes© 2010 Oracle Corporation
Instance Caging: Under the CoversIf cpu_countis set to 4 on a 16 CPU server-All foreground processes make progress-But only 4 foregrounds are running at any time-Fine-grained scheduling!-No CPU affinity! -All CPUs may be used-CPU utilization averaged across all CPUs ≤ 25%© 2010 Oracle Corporation
Partitioning ApproachCPU Allocations32•Provides maximum isolation28•For performance-critical databases24•If one database is idle, 20its CPU allocation is Number unused16InstanceD: 2 CPUsof CPUs on ServerInstanceC: 2 CPUs12Instance B: 4 CPUs84Instance A: 8 CPUs0
Over-Provisioning ApproachCPU Allocations32•For non-critical databases that are 28typically well-behaved•Contention for CPU if 24databases are Instance D: 4 CPUs20sufficiently loadedInstance C: 4 CPUsNumber -Not enough contention to 16of CPUs on destabilize OS or Serverdatabase Instance B: 8 CPUsinstances12•Best approach if goal 8is fully utilize CPUs4Instance A: 8 CPUs0
Instance Caging Results•4 CPU server•Workload is a mix of OLTP transactions, parallel queries, and DMLs from Oracle Financials
Best Practices for Instance Caging•Cage size, cpu_count, is a dynamic parameter-Changes take place immediately!-Changes to cpu_countalso affects other settings, such as parallel execution•Monitor Instance Caging throttling-AWR reports: “resmgr:cpuquantum” wait event-Indicates that this instance would benefit from larger cage size
数据库整合和资源管理•Server Consolidation-Managing CPU-Managing IO•Database Consolidation-Managing IO-Managing CPU-Managing Parallel Execution-Managing Runaway Queries-Putting It Together
Scenarios for Server ConsolidationScenario 1-My server has enough CPU to run 4 medium-size production databases-These databases have strict performance requirements-Surges in one database‟s CPU utilization cannot be tolerated by the othersUse Instance Caging to dedicate CPU resources to each the partitioning approach!Scenario 2-My quarter-rack Exadata is shared by multiple development databases-I want to limit the CPU utilization of each database to control the load on the server-I want each database to be able to use as much CPU as possible when the server is not fully utilizedUse Instance Caging to limit the CPU utilization of each the over-provisioning approach!© 2010 Oracle Corporation
Exadata I/O Resource ManagerHow to configure I/O Resource Manager?Customer Sales Finance Service WWarehouseWarehousearehouseAllocation: 30%Allocation: 60%Allocation: 10%Exadata StorageConfigure an IORM Resource Plan, setting allocations for each database Resource allocation is equivalent to disk utilization
Exadata I/O Resource ManagerWhat happens if the Customer Service database is temporarily idle? Customer Sales Finance Service WWarehouseWarehousearehouseAllocation: 30%Allocation: 60%Actual: 75%Allocation: 10%Actual: 0%Actual: 25%Exadata StorageResource plan specifies how unused bandwidth is reallocatedGoal is to keep disks fully utilized
Exadata I/O Resource ManagerWhat happens if the Finance database should not use more than 20% of disk bandwidth?Customer Sales Finance Service WWarehouseWarehousearehouseAllocation: 30%Allocation: 60%Actual: 80%Allocation: 10%Actual: 0%Limit: 20%Actual: 20%Exadata StorageResource plan can specify hard utilization limits per database (new in !)Useful for hosted environments and providing consistent performance
Exadata I/O Resource ManagerExadata I/O Resource Manager manages disk I/O at two levels•Between databases •Between Consumer Groups within a . OLTP vsreports, tactical queries vsanalytics-Configure using the Database Resource Plan -Same plan used to manage CPU-Specify resource allocations per Consumer Group-Specify disk utilization limits per Consumer Group-Use IORM metrics to track -I/O load per Consumer Group (IOPS, MBPS, disk utilization %)-I/O throttling per Consumer Group-Specify optimization objective-Auto –automatically determine, based on actual traffic and resource plan-High throughput -optimize for high throughput-Balanced -balance between throughput and latency-Low Latency -optimize for low latency
Exadata I/O Resource Manager1. Pick a databaseSales Database2. Pick a Consumer GroupOLTP Queue3. Issue the head I/O requestOOSalesResource DatabasePlansRRRReports QueueI/O FResource Rinance DatabaseOTOManager Tactical Queries QueueTOutstanding I/O FinanceRequestsDatabaseBBBBExadata Storage CellBatch Queries Queue
I/O Utilization Limit Results100%90%80%70%60%No I/O LimitDisk Utili50%75% I/O Limitzation40%50% I/O Limit30%25% I/O Limit20%10%0%TimeQueries from TPC-H benchmark suiteDisk utilization measured via iostat© 2010 Oracle Corporation
Scenarios for Storage ConsolidationScenario 1-My Exadata storage cells are shared by an OLTP database and data warehouse-The data warehouse has a throughput-intensive I/O load that causes huge increases in buffer cache and log write latencies –up to 10x!-My OLTP performance is critical –it requires low I/O latenciesNeed a way to optimize disks for low latency and to prioritize I/Os from the OLTP databaseScenario 2-My Exadata storage cells are being shared by multiple data warehouses-Users on one data warehouse issue giant batch jobs, utilizing all of the disk bandwidth-I want to limit the disk bandwidth for each data warehouses for predictable, consistent performanceNeed a way to specify a disk utilization limit for each data warehouse
数据库整合和资源管理•Server Consolidation-Managing CPU-Managing IO•Database Consolidation-Managing IO-Managing CPU-Managing Parallel Execution-Managing Runaway Queries-Putting It Together
Managing Contending Workloads100%20%CPUWith Resource 90%Usage80%90%80%Manager, youcontrol how CPU resources should be allocated10%OLTP Reports OLTP + ReportsOLTP + ReportsonlyonlyOLResource MaRneager EnabledTP ports PrioritizedPrioritized
Configuring Resource sessions with similar performance objectives into Consumer resources to consumer groups usingResource Resource Plan
Step 1: Create Consumer Groups-Create Consumer Groups for each type of workload, .-OLTP consumer group-Reports consumer group-Low-Priority consumer group-Create rules to dynamically map sessions to consumer groupsMapping RulesConsumer Groups OLTPservice = „Customer_Service‟client program = „Siebel Call Center‟Oracle user = „Oscar‟Reportsmodule = „AdHoc%‟query has been running > 1 hourestimated execution time of query > 1 hourLow-Priority
Attributes for Consumer Group MappingsSession Attributes:Query Attributes:•Oracle user name•Estimated execution time•Client O/S user name•CPU time used so far•Client program name•I/O requests issued so far•Client machine name•Amount of I/O issued so far•Client id•Function being performed•Service name•“backup” (RMAN)•Module name•“dataload” (datapump)•Action name
Step 2: Create Resource PlansRatio PlanPriority PlanPriority 1: OLTPReports Priority 2: Reports30%Priority 3: Low-PriorityOLTP60%Low-Priority 10%Hybrid Plan with Hard LimitsLevel 1 Level 2 MaxOLTP 90% Reports 60% 75%Low-Priority 40% 50%
Step 3: Enable Resource Plan•Enable manually-Set resource_manager_planparameter•Enable automatically-Configure a job scheduler window to use a resource plan-Weekdays 8 am –6 pm: DAYTIME_PLAN-Weeknights and weekends: OFFHOURS_PLAN
CPU Usage with Resource ManagerSessions wait on “resmgr:cpuquantum” Oracleevent-Internal CPU QueueOLTPReportsResource Plan: Sessions CPU ResourceOLTP 75% ManagerReports 25%scheduled every (OLTPpicked3 out of 4 times)100 ms
数据库整合和资源管理•Server Consolidation-Managing CPU-Managing IO•Database Consolidation-Managing IO-Managing CPU-Managing Parallel Execution-Managing Runaway Queries-Putting It Together
Parallel Statement QueuingNPaor malloerle s pearvrearllse la sre ravvearisl aabvlaei l–able –Parallel statements aruren nimowm eqduieauteldyAvailable Servers: 3610248ParallelStatementQueue Parallel CoordinatorStatement QueueRunning Parallel Statements
Parallel Statement Queuing, Part IIDBAs want to control the order that parallel queries are dequeued-Prioritize tactical queries over batch and ad-hoc queries-Impose a user-defined policy for ordering queued parallel statementsNew in :Separate queues per Consumer GroupResource Plan specifies which queue‟s parallel statements are issued next
Parallel Statement QueuingSince there SaWirnehc neeno T pmaacortriacelal eTl la issc etPircvraieolr rpsita ybr ea1cl,l oeitlms s epta aatrveaamlillealnb tlse,, wthee p riecsko urce either Batchs potalrat Aenm dis-eH nuotssce .d a r Bteoa a tsclwehl aeisyc sts aes leqeluceteceutdeed .7 0fTi%rhs eto. hf ethaed tpimarea laleftle r from that queue is SSeerrvveerrss:: 10664Tactical QueueParallelStatementQueue Batch QueueCoordinatorResource Plan:Ad-Hoc QueuePriority 1: TacticalPriority 2, 70%: BatchRunning QueriesPriority 2, 30%: Ad-Hoc
Parallel Statement QueuingFlood of batch queries can use up all parallel servers-Tactical queries are forced to queueNew in :Limit the percentage of parallel servers a Consumer Group can use-For example, parallel queries from the Batch Consumer Group can only use 50% of the parallel servers-Reserves parallel servers for Tactical queries
Parallel Statement QueuingSince parallel servers are available, Tactical queries can be run immediatelyAvailable Servers: 436824Resource Plan:64Batch limited to 50% of the parallel servers!Tactical QueueParallel StatementQueue Batch QueueCoordinatorResource Plan:Ad-Hoc QueuePriority 1: TacticalRunning QueriesPriority 2, 70%: BatchPriority 2, 30%: Ad-Hoc
数据库整合和资源管理•Server Consolidation-Managing CPU-Managing IO•Database Consolidation-Managing IO-Managing CPU-Managing Parallel Execution-Managing Runaway Queries-Putting It Together
失控的作业Runaway Queries•Runaway queries are caused by-Missing indices-Unexpected inputs-Bad execution plans•Severely impact performance of well-behaved queries•Very hard to completely eradicate!Query 1Query 2Query 3Query 4Query Time
Managing Runaway Queries with Resource ManagerDefine runaway Manage runaway queries:queries:Estimated execution timeSwitch to another consumer Actual execution timegroupActual number of I/Os ()-Lower-priority consumer groupActual bytes of I/O -Consumer group with CPU ()utilization limit ()Abort callKill session
Managing Runaway Queries with Resource ManagerFor Tactical consumer group, runaway means:Switch to “Low 30+ secPriority” consumer group!For Reports consumer group, runaway means:Abort query!32GB+ I/OsFor Ad-Hoc consumer Don’t execute!group, runaway means:24+ hour estimated execution time
数据库整合和资源管理•Server Consolidation-Managing CPU-Managing IO•Database Consolidation-Managing IO-Managing CPU-Managing Parallel Execution-Managing Runaway Queries-Putting It Together
Scenarios for Database ConsolidationScenario 1-I have a mixed-workload database –OLTP and batch jobs-OLTP has strict performance requirements-OLTP should have no long-running operations. Any such operations should be identified and aborted.-Batch jobs should capitalize on all unused resourcesUse CPU and I/O Resource Manager to prioritize the OLTP workload over CPU and I/O thresholds to identify and abort runaway queries from the OLTP 2-My data warehouse has 3 types of queries –tactical, normal, and ad-hoc-Tactical queries have strict performance requirements-Normal queries are less critical; ad-hoc queries are the least criticalUse Resource Manager to allocate resources for these 3 Parallel Statement Queuing to ensure that tactical queries run immediately.
Resource Manager –End to EndTest scenario:•2 workloads in a data warehouse-Tactical queries (short TPC-H queries)-Batch jobs (long TPC-H queries)•Goal: -Run Batch jobs with Tactical queries -Don‟t impact response time of Tactical queries!
Resource Manager, End to End© 2010 Oracle Corporation
提纲•Exadata架构设计原理•Exadata硬件体系架构•Exadata软件体系技术•Exadata数据库整合与资源管理•Exadata系统指标和配置
Exadata主要指标ComponentQuarter HalfRackRaFull Rack2-8 Full RacksckDatabase Servers2 (24 cores)4 (48 cores)8 (96 cores)16-64Storage Servers3 (36 disks)7 (84 disks)14 (168 disks28-112Total Disk Capacity (SAS 21 TB50 TB100 TB200 –800 TBHP)Total Disk Capacity (SAS 72 TB168 TB336 TB672 –2,688 TBHC)Flash Cache TBUser Data (SAS HP) –360 TBUser Data (SAS HC) TB300 –1200 TBI/O Throughput (disks) GB/sec25GB/sec50 -200 GB/secI/O Throughput (flash)16GB/ -600 GB/secI/O per Second (IOPS)375,000750,0001,500,0003M –12MRacks1112-8
Exadata X2-2最新指标
Exadata X2-8最新指标
X2-2 Full Rack EnvironmentalPowerMax – kW ( kVA)Typical – kW ( kVA) CoolingMax –47,800 BTU/hr (50,400 kJ/hr)Typical –33,400 BTU/hr (35,300 kJ/hr)Airflow: front-to-backMax ~2,200 CFM(subject to actual data center Typical ~1,560 CFMenvironment)Physical ” (H) x ”(W) x ”(D)1998mm (H) x 600mm (W) x 1200mm (D)Weight2131 lbs ( kg)Operating TemperatureOperating temperature/humidity: 5 ºC to 32 ºC (41 ºF to ºF), 10% to 90% relative humidity, non-condensing Altitude Operating: Up to 3,048 m, max. ambient temperature is de-rated by 1°C per 300 m above 900 mIP addresses22 (InfiniBand Network)70 (Ethernet Network -assuming single cluster)Network DropsMinimum 12 network dropsExternal ConnectivityUp to 24 x 1GbE Ethernet PortsUp to 16 x 10GbE Ethernet SFP+ PortsAt least 12 InfiniBandPorts
X2-2 Half Rack EnvironmentalPowerMax – kW ( kVA)Typical – kW ( kVA) CoolingMax –24,600 BTU/hr (25,950 kJ/hr)Typical –17,400 BTU/hr (18,400 kJ/hr)Airflow: front-to-backMax ~1,130 CFM(subject to actual data center Typical ~840 CFMenvironment)Physical ” (H) x ”(W) x ”(D)1998mm (H) x 600mm (W) x 1200mm (D)Weight1329 lbs ( kg)Operating TemperatureOperating temperature/humidity: 5 ºC to 32 ºC (41 ºF to ºF), 10% to 90% relative humidity, non-condensing Altitude Operating: Up to 3,048 m, max. ambient temperature is de-rated by 1°C per 300 m above 900 mIP addresses11 (InfiniBand Network)40 (Ethernet Network –assuming single cluster)Network DropsMinimum 8 Network DropsExternal ConnectivityUp to 12 x 1GbE Ethernet PortsUp to 8 x 10GbE Ethernet SFP+ PortsAt least 34 InfiniBandPorts
X2-2 Quarter Rack EnvironmentalPowerMax – kW ( kVA)Typical – kW ( kVA) CoolingMax –12,300 BTU/hr (13,000 kJ/hr)Typical –9,200 BTU/hr (9,700 kJ/hr)Airflow: front-to-backMax ~550 CFM(subject to actual data center Typical ~410 CFMenvironment)Physical ” (H) x ”(W) x ”(D)1998mm (H) x 600mm (W) x 1200mm (D)Weight902 lbs ( kg)Operating TemperatureOperating temperature/humidity: 5 ºC to 32 ºC (41 ºF to ºF), 10% to 90% relative humidity, non-condensing Altitude Operating: Up to 3,048 m, max. ambient temperature is de-rated by 1°C per 300 m above 900 mIP addresses5 (InfiniBand Network)23 (Ethernet Network –assuming single cluster)Network DropsMinimum 6 Network DropsExternal ConnectivityUp to 6 x 1GbE Ethernet PortsUp to 4 x 10GbE Ethernet SFP+ PortsAt least 48 InfiniBandPorts
X2-8 Full Rack EnvironmentalPowerMax – kW ( kVA)Typical – kW ( kVA) CoolingMax –58,050 BTU/hr (61,200 kJ/hr)Typical –40,630 BTU/hr (42,840 kJ/hr)Airflow: front-to-backMax ~2,690 CFM(subject to actual data center Typical ~1,880 CFMenvironment)Physical ” (H) x ”(W) x ”(D)1998mm (H) x 600mm (W) x 1200mm (D)Weight2080 lbs ( kg)Operating TemperatureOperating temperature/humidity: 5 ºC to 32 ºC (41 ºF to ºF), 10% to 90% relative humidity, non-condensing Altitude Operating: Up to 3,048 m, max. ambient temperature is de-rated by 1°C per 300 m above 900 mIP addresses22 (InfiniBand Network)45 (Ethernet Network -assuming single cluster)Network DropsMinimum 5 network dropsExternal ConnectivityUp to 16 x 1GbE Ethernet PortsUp to 16 x 10GbE Ethernet SFP+ PortsAt least 12 InfiniBandPorts
X2-2 Leaf Switch Topology
Full Rack Spine and Leaf Topology
Scale Performance and Capacity•Scalable•Redundant and Fault Tolerant-Scale to eight racks by -Failure of any component adding cablesis tolerated.-Scale beyond eight racks by -Data is mirrored across adding InfiniBandswitchesstorage servers.-Scale to hundreds of storage servers to support multi-petabytedatabases
X2-2 Full Rack –Server to Leaf SwitchPort 1Port 2DB Servers (1,2,3,4)DB Servers (5,6,7,8)Exadata Servers (1,2,..,7)Exadata Servers (8,9,..,14)1111Leaf swL1L2itchLeaf switch1111DB Servers (5,6,7,8)DB Servers (1,2,3,4)Exadata Servers (8,9,..14)Exadata Servers (1,2,..,7)Port 2Port 2•InfiniBandnetwork with Redundancy•Servers connect to the two leaf switches•Active & Passive ports balanced across switches•Full Bandwidth even if switch fails•Connections pre-wired at factory
X2-2 Half Rack –Server to Leaf SwitchPort 1DB Servers (1,2,3,4)Exadata Servers (1,2,3,4,5,6,7)11L1Leaf switch7L2Leaf switch11DB Servers (1,2,3,4)Exadata Servers (1,2,3,4,5,6,7)Port 2•Connect Port1 from DB & Exadata Servers to Switch “L1”•Connect Port2 from DB & Exadata Servers to Switch “L2”•Interconnect two switches –7 links•Pre-wired at Factory
X2-2 Quarter Rack –Server to Leaf SwitchPort 1DB Servers (1,2)Exadata Servers (1,2,3)11L1Leaf switch7L2Leaf switch11DB Servers (1,2)Exadata Servers (1,2,3)Port 2•Connect Port1 from DB & Exadata Servers to Switch “L1”•Connect Port2 from DB & Exadata Servers to Switch “L2”•Interconnect two switches –7 links•Pre-wired at Factory
Spine and Leaf InfiniBand SwitchDB and Exadata Spine switchServers Ports11Leaf switchLeaf switchL17L2•Use 3rdswitch (S) as “spine” switch for expansion to multiple racks•Available in X2-2 Full Rack, X2-2 Half Rack, and X2-8 Full Rack•Connect each leaf switch to spine switch (1 links wide)•Interconnect “leaf” switches with each other (7 links wide)•Enough bandwidth even if switch failure•Pre-wired at factory
Configuration Worksheet Example
Configuring ASM Disk Groupswith Configuration WorksheetSystem Area (Disks 0-1)orSYSTEMDG (Disks 2-11Exadata Cell 1)Exadata Cell (29G)Exa2data Cell nDATA(220G)RECO(ALL ~ 309G)
Generating the Configuration Files12
The Result After Installation and Configurationeidm01 -vip -vip -scan Server 1DB Server 2DB Instance: dbm1DB Instance: dbm2ASM Instance: +ASM1ASM Instance: +ASM2Cluster: eidm Cluster: Storage totalDATADATADATA7,920GB totalRECORECORECO11,112GB totalExadata Server 1Exadata Server 2Exadata Server
Exadata:Oracle Database Machine适用于云计算的数据库和存储平台•灵活的容量•网格体系结构用于数据库和存储服务器的向外扩展•智能扫描可将查询处理卸载到存储层•智能闪存缓存存储实现实时随机I/O•数据压缩针对OLTP、数据仓储和存档数据进行了优化•无限带宽联网支持大量数据传输•资源共享•ASM(自动存储管理)让所有数据库共享Exadata存储•RAC(真正应用集群)让所有节点共享大型数据库•IORM(I/O 资源管理)根据数据库和应用程序的优先级分配I/O 带宽•实例囚笼让一个节点内的多个数据库共享CPU•Oracle 数据库的全部强大功能•真正应用集群、备份/恢复、复制、安全性、分区、大型对象、Enterprise Manager……
188