Implementing Oracle Real Application Clusters
Using Network Block Device Technology

Hardware implementation of Oracle 10g RAC can be found here.

Purpose

This document describes how to implement cost efficient Oracle Real Application Clusters using only common PC hardware and Linux operating system. In a Real Application Clusters environment, each node has to access all the data stored in the database. While traditional approach requires expensive storage subsystems (such as network disk arrays) to provide this functionality, this solution allows you to build scalable and high available database system only with common Intel PCs connected into Ethernet network.

In this solution, a standard shared disk subsystem is replaced by a native Linux technology -- Network Block Device (NBD) that maps remote files to local block devices (e.g. /dev/nb0) via TCP/IP network. So that one computer (not necessarily Linux machine) serves as data storage all cluster nodes (Linux machines) instead of expensive disk array.


Configuration example:

Typical configuration

Typical configuration
  Simple NBD configuration

Simple NBD configuration



1. NBD server for Sun Solaris

Even if Network Block Device is a Linux solution, NBD server is a user space daemon which can be run also on other systems. As the installation in Linux environment is supported by the NBD project (and does not depend on Oracle requirements) this document will focus on installation of Solaris version. Finally, it may be found useful to use other servers than PC to store the data.

1.1 Installing NBD server

For this purposes it is necessary to install NBD version 2.0. Notice that there is also an Enhanced Network Block Device project that provides additional functionality but there where some problems with new kernels and Solaris compilations. Standard Linux NBD solution seems to be enough. First, download source from http://nbd.sourceforge.net/ and then extract it from GZIPed TAR archive:
	gunzip nbd-2.0.tar.gz
	tar xfv nbd-2.0.tar
This installation doesn't contain necessary header files usually included in Linux distribution -- make new directory for Linux kernel NBD driver headers
	mkdir nbd/linux
and copy nbd.h from your Linux kernel source into this new nbd/linux directory.


To compile NBD server you will need GCC 2 (version 2.95.2 worked for me). You can compile it by typing something like
	cd nbd
	./configure
	gcc -O2 -I. -lsocket -lnsl -o nbd-server nbd-server.c	
Don't forget to add libraries socket and nsl (on Solaris version only!).

Then you can test the result simply (if no errors found) by executing the nbd-server binary file. Execution without parameters returns usage help:
	This is nbd-server version 2.0
	Usage: port file_to_export [size][kKmM] [-r] [-m] [-c] [-a timeout_sec]
		-r read only
		-m multiple file
		-c copy on write
		-a maximum idle seconds, terminates when idle time exceeded
		if port is set to 0, stdin is used (for running from inetd)
		if file_to_export contains '%s', it is substituted with IP
			address of machine trying to connect

Binary file nbd-server is the only thing necessary to use on the server side.
Copy it into your local system sbin directory, e.g.:
	cp nbd-server /usr/local/sbin/
Finally it is possible to create /etc/nbd_server.allow file that contains list of IP addresses allowed to connect to NBD server.

1.2 Running NBD server

It's recommended to run NBD server as common user not as root (NBD server will listen at non-privileged TCP port). Before starting the server create necessary datafiles for basic database tablespaces. You can use mkfile or dd command to create new empty files (current datafiles can't be used). It's better to create a little bit larger datafiles than recommended by documentation. Also if you want to install additional database components (e.g Oracle InterMedia) create large SYSTEM tablespace (larger than 300MB).

Syntax:
	mkfile <size>m <path_to_data_disk>/<tbs_name>_raw	
Recommended size and name:
	tablespace/file 	size 		datafile name
	SYSTEM			300M		system_raw
	USERS			 30M		users_raw
	TEMP			110M		temp_raw
	UNDOTBS	(per instance)  210M	 	undo_<n>_raw		
	INDX			 30M		indx_raw	
	TOOLS			 20M		tools_raw
	controlfile1		120M		controlfile_1_raw
	controlfile2		120M		controlfile_2_raw
	redo logs (2 per inst)  130M		redo<n>_<x>_raw
	spfile			  5M		spfile_raw
	srvmconfig		110M		srvctl_raw
	node monitor		 10M		nm_raw

Example:
	mkfile 300m /orac/system_raw
	mkfile 30m  /orac/users_raw
	mkfile 110m /orac/temp_raw
	mkfile 210m /orac/undo_1_raw
	mkfile 210m /orac/undo_2_raw
	mkfile 30m  /orac/indx_raw
	mkfile 30m  /orac/tools_raw
	mkfile 120m /orac/controlfile_1_raw
	mkfile 120m /orac/controlfile_2_raw
	mkfile 130m /orac/redo1_1_raw
	mkfile 130m /orac/redo1_2_raw
	mkfile 130m /orac/redo2_1_raw
	mkfile 130m /orac/redo2_2_raw
	mkfile 5m   /orac/spfile_raw
	mkfile 110m /orac/srvctl_raw
	mkfile 10m  /orac/nm_raw
	mkfile 20m  /orac/drsys_1_raw

It is possible and recommended to configure volume /orac as redundant device (using e.g. Solaris Solstice DiskSuite). You would like to also place different datafiles into different physical disks (controllers).

To start NBD server you must choose one TCP port for each datafile.

Syntax:
	nbd-server <port> <filename>

Example:

	/usr/sbin/nbd-server 4101  /orac/system_raw &
	/usr/sbin/nbd-server 4102  /orac/users_raw &
	/usr/sbin/nbd-server 4103  /orac/temp_raw &
	/usr/sbin/nbd-server 4104  /orac/undo_1_raw &
	/usr/sbin/nbd-server 4105  /orac/undo_2_raw &
	/usr/sbin/nbd-server 4106  /orac/indx_raw &
	/usr/sbin/nbd-server 4107  /orac/tools_raw &
	/usr/sbin/nbd-server 4108  /orac/controlfile_1_raw &
	/usr/sbin/nbd-server 4109  /orac/controlfile_2_raw &
	/usr/sbin/nbd-server 4110  /orac/redo1_1_raw &
	/usr/sbin/nbd-server 4111  /orac/redo1_2_raw &
	/usr/sbin/nbd-server 4112  /orac/redo2_1_raw &
	/usr/sbin/nbd-server 4113  /orac/redo2_2_raw &
	/usr/sbin/nbd-server 4114  /orac/spfile_raw &
	/usr/sbin/nbd-server 4115  /orac/srvctl_raw &
	/usr/sbin/nbd-server 4116  /orac/nm_raw &
	/usr/sbin/nbd-server 4117  /orac/drsys_1_raw &
Now the NBD server should be running and waiting for a client connection.

2. Linux NBD Client

Using NBD technology, every communication is performed by Ethernet interface, so you need three different Ethernet connections -- one for connecting cluster into your network (and application servers), one for inter-node communication and one for connection between database cluster node (NBD client) and datafile server (NBD server). It is possible to use one Ethernet interface, but three separate interfaces can improve your performance (as will be described in Performance section later). Configuration of cluster interface is written simply in /etc/hosts file on each node.

Example:
	
	# External (real and public) interface (eth0) 
	147.251.48.197 rac1 
	# Second node connected via public interface
	147.251.48.198 rac2

	# Cluster interconnect (private) interface (eth1)
	10.0.0.1  rac1-int
	# Second node connected via private interface	
	10.0.0.2  rac2-int

	# Shared data subsystem (NBD) interface (eth2)
	10.1.1.1      rac1-data
	# Second node will use 10.1.1.2 address but its unreachable via
	# this interface.
	# NBD server address
	10.1.1.100    rac-data
Configure your local network interfaces (using ifconfig command) according to this /etc/hosts file settings.

Compilation and instalation of NBD client for Linux is described in the NBD documentation. Please follow instructions listed there.

Such as the NBD server, each NBD client process belongs to one NBD server port and one local block device. However NBD client must be run as root (because of kernel parts of NBD). Before starting NBD client you would have to install Linux kernel NBD module by typing:
	modprobe nbd
To start NBD client you must specify the server and its assigned linux device.

Syntax:
	/usr/sbin/nbd-client <data server> <port> /dev/nb<n>
Example:
	/usr/sbin/nbd-client rac-data 4101 /dev/nb1
	/usr/sbin/nbd-client rac-data 4102 /dev/nb2
	/usr/sbin/nbd-client rac-data 4103 /dev/nb3
	/usr/sbin/nbd-client rac-data 4104 /dev/nb4
	/usr/sbin/nbd-client rac-data 4105 /dev/nb5
	/usr/sbin/nbd-client rac-data 4106 /dev/nb6
	/usr/sbin/nbd-client rac-data 4107 /dev/nb7
	/usr/sbin/nbd-client rac-data 4108 /dev/nb8
	/usr/sbin/nbd-client rac-data 4109 /dev/nb9
	/usr/sbin/nbd-client rac-data 4110 /dev/nb10
	/usr/sbin/nbd-client rac-data 4111 /dev/nb11
	/usr/sbin/nbd-client rac-data 4112 /dev/nb12
	/usr/sbin/nbd-client rac-data 4113 /dev/nb13
	/usr/sbin/nbd-client rac-data 4114 /dev/nb14
	/usr/sbin/nbd-client rac-data 4115 /dev/nb15
	/usr/sbin/nbd-client rac-data 4116 /dev/nb16
	/usr/sbin/nbd-client rac-data 4117 /dev/nb17

Now block devices should be configured and you should be able to access remote data. Furthermore Oracle Real application clusters need raw access to shared disk subsystem so mapping raw devices to block devices is needed. This could by done with standard raw command.

Syntax:
	/usr/bin/raw /dev/raw/raw<n> /dev/nb<n>
Example:
	/usr/bin/raw /dev/raw/raw1  /dev/nb1
	/usr/bin/raw /dev/raw/raw2  /dev/nb2
	/usr/bin/raw /dev/raw/raw3  /dev/nb3
	/usr/bin/raw /dev/raw/raw4  /dev/nb4
	/usr/bin/raw /dev/raw/raw5  /dev/nb5
	/usr/bin/raw /dev/raw/raw6  /dev/nb6
	/usr/bin/raw /dev/raw/raw7  /dev/nb7
	/usr/bin/raw /dev/raw/raw8  /dev/nb8
	/usr/bin/raw /dev/raw/raw9  /dev/nb9
	/usr/bin/raw /dev/raw/raw10 /dev/nb10
	/usr/bin/raw /dev/raw/raw11 /dev/nb11
	/usr/bin/raw /dev/raw/raw12 /dev/nb12
	/usr/bin/raw /dev/raw/raw13 /dev/nb13
	/usr/bin/raw /dev/raw/raw14 /dev/nb14
	/usr/bin/raw /dev/raw/raw15 /dev/nb15
	/usr/bin/raw /dev/raw/raw16 /dev/nb16
	/usr/bin/raw /dev/raw/raw17 /dev/nb17
To access these files also as an Oracle user, additional permission commands should be run (considering that oracle is a name of a user who runs Oracle database server):

Example:
	chmod 600 /dev/nb*
	chmod 600 /dev/raw*

	chown oracle:dba /dev/nb*
	chown oracle:dba /dev/raw*
It is also recommended to create local client aliases (linux symbolic links) for every raw device.

Syntax:
	ln -s /dev/raw/raw<n> /orac/<database name>/<raw file alias>
Example:
	ln -s /dev/raw/raw1  /orac/system_raw
	ln -s /dev/raw/raw2  /orac/users_raw
	ln -s /dev/raw/raw3  /orac/temp_raw
	ln -s /dev/raw/raw4  /orac/undo_1_raw
	ln -s /dev/raw/raw5  /orac/undo_2_raw
	ln -s /dev/raw/raw6  /orac/indx_raw
	ln -s /dev/raw/raw7  /orac/tools_raw
	ln -s /dev/raw/raw8  /orac/controlfile_1_raw
	ln -s /dev/raw/raw9  /orac/controlfile_2_raw
	ln -s /dev/raw/raw10 /orac/redo1_1_raw
	ln -s /dev/raw/raw11 /orac/redo1_2_raw
	ln -s /dev/raw/raw12 /orac/redo2_1_raw
	ln -s /dev/raw/raw13 /orac/redo2_2_raw
	ln -s /dev/raw/raw14 /orac/spfile_raw
	ln -s /dev/raw/raw15 /orac/srvctl_raw
	ln -s /dev/raw/raw16 /orac/nm_raw
Finally create configuration ASCII file identifying the raw device (which is used by the Database Creation Assistant).

Format of this file is:
	<tablespace or datafile>=<path to raw device>
Example:
	cat > /orac/datafiles.conf <<EOF		
	system=/orac/system_raw
	users=/orac/users_raw
	temp=/orac/temp_raw
	undotbs1=/orac/undo_1_raw
	undotbs2=/orac/undo_2_raw
	indx=/orac/indx_raw
	tools=/orac/tools_raw
	control1=/orac/controlfile_1_raw
	control2=/orac/controlfile_2_raw
	redo1_1=/orac/redo1_1_raw
	redo1_2=/orac/redo1_2_raw
	redo2_1=/orac/redo2_1_raw
	redo2_2=/orac/redo2_2_raw
	spfile=/orac/spfile_raw
	srvconfig_loc=/orac/srvctl_raw
	EOF
Don't forget to set the environment variable DBCA_RAW_CONFIG to this file name so that Database Creation Assistant can find this configuration.

Example:
	export DBCA_RAW_CONFIG=/orac/datafiles.conf


Note:

Installation of Oracle RAC requires also Linux kernel module softdog. For testing purposes, you would like to start it with parameter soft_noboot set to 1 which causes that the system won't be rebooted after a cluster software error.

Example:
	modprobe softdog soft_margin=60 soft_noboot=1



Now the NBD part (shared disk subsystem) is prepared and you can continue with installing Oracle database. We strongly recommend to read Oracle Notes listed in References very carefully while continuing with instalation. These documents are licenced and could be found on Oracle MetaLink by Doc. ID or simply by searching "linux clusters" keyword.

3. References

4. Author

Miroslav Kripac, Masaryk University Brno

Comments, suggestions and questions are welcome and can be sent to kripac@fi.muni.cz




Creation date: December 10, 2002,
Last modified: Friday, 21-Jan-2005 12:26:53 CET