Ceph is an open-source, massively scalable, software-defined storage system which provides object, block and file system storage in a single platform. It runs on commodity hardware-saving you costs, giving you flexibility and because it’s in the Linux kernel, it’s easy to consume.
Ceph is able to manage:
Object Storage: Ceph provides seamless access to objects using native language bindings or radosgw, a REST interface that’s compatible with applications written for S3 and Swift.
Block Storage: Ceph’s RADOS Block Device (RBD) provides access to block device images that are striped and replicated across the entire storage cluster.
File System: Ceph provides a POSIX-compliant network file system that aims for high performance, large data storage, and maximum compatibility with legacy applications (not yet stable)
Whether you want to provide Ceph Object Storage and/or Ceph Block Device services to Cloud Platforms, deploy a Ceph Filesystem or use Ceph for another purpose, all Ceph Storage Cluster deployments begin with setting up each Ceph Node, your network and the Ceph Storage Cluster. A Ceph Storage Cluster requires at least one Ceph Monitor and at least two Ceph OSD Daemons. The Ceph Metadata Server is essential when running Ceph Filesystem clients.
OSDs: A Ceph OSD Daemon (OSD) stores data, handles data replication, recovery, backfilling, rebalancing, and provides some monitoring information to Ceph Monitors by checking other Ceph OSD Daemons for a heartbeat. A Ceph Storage Cluster requires at least two Ceph OSD Daemons to achieve an active + clean state when the cluster makes two copies of your data (Ceph makes 2 copies by default, but you can adjust it).
Monitors: A Ceph Monitor maintains maps of the cluster state, including the monitor map, the OSD map, the Placement Group (PG) map, and the CRUSH map. Ceph maintains a history (called an “epoch”) of each state change in the Ceph Monitors, Ceph OSD Daemons, and PGs.
MDSs: A Ceph Metadata Server (MDS) stores metadata on behalf of the Ceph Filesystem (i.e., Ceph Block Devices and Ceph Object Storage do not use MDS). Ceph Metadata Servers make it feasible for POSIX file system users to execute basic commands like ls, find, etc. without placing an enormous burden on the Ceph Storage Cluster.
Ceph stores a client’s data as objects within storage pools. Using the CRUSH algorithm, Ceph calculates which placement group should contain the object, and further calculates which Ceph OSD Daemon should store the placement group. The CRUSH algorithm enables the Ceph Storage Cluster to scale, rebalance, and recover dynamically.
# -*- mode: ruby -*-# vi: set ft=ruby :ENV['LANG']='C'# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!VAGRANTFILE_API_VERSION="2"# Insert all your Vms with configsboxes=[ { :name=>:mon1, :role=>'mon'},
{ :name=>:mon2, :role=>'mon'},
{ :name=>:mon3, :role=>'mon'},
{ :name=>:osd1, :role=>'osd', :ip=>'192.168.33.31'},
{ :name=>:osd2, :role=>'osd', :ip=>'192.168.33.32'},
{ :name=>:osd3, :role=>'osd', :ip=>'192.168.33.33'},
]$install=<<INSTALL
wget-q-O-'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc'|sudoapt-keyadd-echodebhttp://ceph.com/debian/ $(lsb_release-sc) main|sudotee/etc/apt/sources.list.d/ceph.listaptitudeupdateaptitude-yinstallcephceph-deployopenntpdINSTALLVagrant::Config.rundo|config|# Default box OSvm_default=procdo|boxcnf|boxcnf.vm.box="deimosfr/debian-wheezy"end# For each VM, add a public and private card. Then install Cephboxes.eachdo|opts|vm_default.call(config)
config.vm.defineopts[:name]do|config|config.vm.network:bridged, :bridge=>"eth0"config.vm.host_name="%s.vm"%opts[:name].to_sconfig.vm.provision"shell", inline: $install# Create 8G disk file and add private interface for OSD VMsifopts[:role]=='osd'config.vm.network:hostonly, opts[:ip]file_to_disk='osd-disk_'+opts[:name].to_s+'.vdi'config.vm.customize['createhd', '--filename', file_to_disk, '--size', 8*1024]config.vm.customize['storageattach', :id, '--storagectl', 'SATA', '--port', 1, '--device', 0, '--type', 'hdd', '--medium', file_to_disk]endendendend
This will spawn VMs with correct hardware to run. It will also install Ceph as well. After booting those instances, you will have all the Ceph servers like that:
A Ceph Monitor maintains maps of the cluster state, including the monitor map, the OSD map, the Placement Group (PG) map, and the CRUSH map. Ceph maintains a history (called an “epoch”) of each state change in the Ceph Monitors, Ceph OSD Daemons, and PGs.
To add 2 others monitors nodes (mon2 and mon3) in the cluster, you’ll need to edit the configuration on a monitor and admin node. You’ll have to set the mon_host, mon_initial_members and public_network configuration in:
To get the first admin node, you’ll need to gather keys on a monitor node. To make it simple, all ceph-deploy should be done from that machine:
1
2
3
4
5
6
7
8
9
10
11
12
> ceph-deploy gatherkeys mon1
[ceph_deploy.cli][INFO ] Invoked (1.2.7): /usr/bin/ceph-deploy gatherkeys osd1
[ceph_deploy.gatherkeys][DEBUG ] Checking osd1 for /etc/ceph/ceph.client.admin.keyring
[ceph_deploy.sudo_pushy][DEBUG ] will use a local connection without sudo
[ceph_deploy.gatherkeys][DEBUG ] Got ceph.client.admin.keyring key from osd1.
[ceph_deploy.gatherkeys][DEBUG ] Have ceph.mon.keyring
[ceph_deploy.gatherkeys][DEBUG ] Checking osd1 for /var/lib/ceph/bootstrap-osd/ceph.keyring
[ceph_deploy.sudo_pushy][DEBUG ] will use a local connection without sudo
[ceph_deploy.gatherkeys][DEBUG ] Got ceph.bootstrap-osd.keyring key from osd1.
[ceph_deploy.gatherkeys][DEBUG ] Checking osd1 for /var/lib/ceph/bootstrap-mds/ceph.keyring
[ceph_deploy.sudo_pushy][DEBUG ] will use a local connection without sudo
[ceph_deploy.gatherkeys][DEBUG ] Got ceph.bootstrap-mds.keyring key from osd1.
Then you need to exchange SSH keys to remotely be able to connect to the target machines.
Ceph OSD Daemon (OSD) stores data, handles data replication, recovery, backfilling, rebalancing, and provides some monitoring information to Ceph Monitors by checking other Ceph OSD Daemons for a heartbeat. A Ceph Storage Cluster requires at least two Ceph OSD Daemons to achieve an active + clean state when the cluster makes two copies of your data (Ceph makes 2 copies by default, but you can adjust it).
To deploy Ceph OSD, we’ll first start to erase the remote disk and create a gpt table on the dedicated disk ‘sdb’:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
> ceph-deploy disk zap osd1:sdb
[ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk zap osd1:sdb
[ceph_deploy.osd][DEBUG ] zapping /dev/sdb on osd1
[osd1][DEBUG ] connected to host: osd1
[osd1][DEBUG ] detect platform information from remote host
[osd1][DEBUG ] detect machine type[ceph_deploy.osd][INFO ] Distro info: debian 7.2 wheezy
[osd1][DEBUG ] zeroing last few blocks of device
[osd1][INFO ] Running command: sgdisk --zap-all --clear --mbrtogpt -- /dev/sdb
[osd1][DEBUG ] Warning: The kernel is still using the old partition table.
[osd1][DEBUG ] The new table will be used at the next reboot.
[osd1][DEBUG ] GPT data structures destroyed! You may now partition the disk using fdisk or
[osd1][DEBUG ] other utilities.
[osd1][DEBUG ] Warning: The kernel is still using the old partition table.
[osd1][DEBUG ] The new table will be used at the next reboot.
[osd1][DEBUG ] The operation has completed successfully.
It will create a journalized partition and a data one. Then we could create partitions on the on ‘osd1’ server and prepare + activate this OSD:
1
2
3
4
5
6
7
> ceph-deploy --overwrite-conf osd create osd1:sdb
[ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy --overwrite-conf mon create osd1:sdb
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts osd1:sdb
[ceph_deploy.mon][DEBUG ] detecting platform for host osd1 ...
ssh: Could not resolve hostname sdb: Name or service not known
[ceph_deploy.mon][ERROR ] connecting to host: sdb resulted in errors: HostNotFound sdb
[ceph_deploy][ERROR ] GenericError: Failed to create 1 monitors
The Ceph Client retrieves the latest cluster map and the CRUSH algorithm calculates how to map the object to a placement group, and then calculates how to assign the placement group to a Ceph OSD Daemon dynamically. By default Ceph have 2 replicas and you can change it by 3 in adding those line to the Ceph configuration:
1
2
3
[global] osd pool default size=3 osd pool default min size=1
osd pool default size: the number of replicas
osd pool default min size: set the minimum available replicas before putting OSD down
Configure the placement group (Total PGs = (number of OSD * 100) / replicas numbers):
1
2
3
[global] osd pool default pg num=100 osd pool default pgp num=100
For the OSD, you’ve got 2 network interfaces (private and public). So to configure it properly on your admin machine by updating your configuration file as follow:
1
2
3
[osd]cluster network= 192.168.33.0/24
public network= 192.168.0.0/24
To avoid service restart on a simple modification, you can interact directly with Ceph to change some values. First of all, you can get all current values of your Ceph cluster:
To locate the file on the hard drive, look at this folder (/var/lib/ceph/osd/ceph-1/current). Then look at the previous result (3.47) and the filename af0f2847. So the file will be placed here :
1
2
> ls /var/lib/ceph/osd/ceph-1/current/3.47_head
ceph\ulog__head_AF0F2847__3
If you can’t add a new monitor mon (here mon2)[5] :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
> ceph-deploy --overwrite-conf mon create mon2
[...][mon2][DEBUG ]=== mon.mon2 ===[mon2][DEBUG ] Starting Ceph mon.mon2 on mon2...
[mon2][DEBUG ] failed: 'ulimit -n 32768; /usr/bin/ceph-mon -i mon2 --pid-file /var/run/ceph/mon.mon2.pid -c /etc/ceph/ceph.conf '[mon2][DEBUG ] Starting ceph-create-keys on mon2...
[mon2][WARNIN] No data was received after 7 seconds, disconnecting...
[mon2][INFO ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.mon2.asok mon_status
[mon2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[mon2][WARNIN] monitor: mon.mon2, might not be running yet
[mon2][INFO ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.mon2.asok mon_status
[mon2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[mon2][WARNIN] mon2 is not defined in `mon initial members`[mon2][WARNIN] monitor mon2 does not exist in monmap
[mon2][WARNIN] neither `public_addr` nor `public_network` keys are defined for monitors
[mon2][WARNIN] monitors may not be able to form quorum
You have to add public network and monitor to the list in configuration file. Look here to see how to add correctly a new mon.