Using rrDNS to allow mount-failover with GlusterFS

•January 9, 2012 • Leave a Comment

Using round-robin DNS (rrDNS) makes for an easy way to ensure your clients can mount their volumes regardless of whether a server is down. To do this, create a DNS entry that points to all of your servers. (if you’re not using nsupdate to manage your dns entries, you’re doing it wrong. 😉 )

# nsupdate -k${UPDATE_KEY}
Creating key...
> zone domain.dom
> update add glusterfs.domain.dom. 86400 A 10.0.0.1
> update add glusterfs.domain.dom. 86400 A 10.0.0.2
> update add glusterfs.domain.dom. 86400 A 10.0.0.3
> send

If this worked, you should now be able to see (note that the order may be different every time):

# host glusterfs.domain.dom
glusterfs.domain.dom has address 10.0.0.2
glusterfs.domain.dom has address 10.0.0.1
glusterfs.domain.dom has address 10.0.0.3

Now the rest is easy, use this new dns entry to mount your volumes:

mount -t glusterfs glusterfs:testvol1 /mnt/testvol1
Advertisements

Bad customer service

•November 14, 2011 • 1 Comment

I hate being treated like a newb about as much as I hate support people reading half of a support ticket and dismissing it as invalid.

ME:

I’ve tried sending dtmf via SIP INFO or via RFC2833, and neither works consistently. If I dial my POTS line, and dial digits on the voip phone keypad, I can see sip info packets if I set it to info, or rtp rfc2833 packets if I set it that way, but on the pots phone, I hear maybe 1 in 10 tones.

THEM:

INFO DTMF is not supported on our service for calls to or from the PSTN. You must use RFC2833. RFC2833 is fully supported by all of our underlying carriers for both inbound and outbound service, so if you are having a problem, the issue is likely between your SIP phone and the PBX.

When I called 12066230560 and entered 256 it immediately interrupted the IVR. I then received an ‘Incorrect Entry’ response. When I enter #1, I am able to reach the directory.

This call was placed from my Polycom phone and an MX system that is configured to utilized RFC2833 only.

Thanks,

ME:

As I said, I get the same results either way.
THEM:
The problem, as was noted in the previous response, is likely related to the DTMF relay between your phone and your PBX, as this issue cannot be reproduced from our PBX when calling your POTS line.

In addition, what you hear when pressing tones should not make any difference as RFC2833 is out-of-band and has nothing to do with the audible tones.

Thanks,

ME:
When you say, “between your phone and your pbx”, are you referring to my Polycom phone to my Freeswitch PBX, or my Panasonic phone to my KSU? If the former, then that would prevent the RFC2833 packets from showing up in a packet capture that I’ve reviewed with Wireshark between my Freeswitch PBX and your Acme Packet SBC. If the latter, then I would have the same results utilizing the IVR as you.

When referring to what I hear, I was referring to the Panasonic phone on the POTS KSU. If I’m not hearing DTMF at the POTS end and the RFC2833 is leaving my Freeswitch PBX correctly then the only possibility that I can imagine is that the problem is within your realm of responsibility.

THEM:
I am referring to the Polycom to Freeswitch connection. The inbound leg of the call to the POTS line is the same for my call tests and your call tests. I do not have any problems originating traffic via nexVortex and terminating to your POTS line and then relaying DTMF.

Are there other numbers that you have problems passing DTMF to?

Thanks,

ME:
Well then, as I said in my very first message, “I can see … rtp rfc2833 packets … but on the pots phone, I hear maybe 1 in 10 tones.”

Yes, I cannot use DTMF consistently anywhere that we’ve tried that uses an IVR system. The bank, the alarm company… even just calling a POTS line with an old-fashioned simple handset.

I mean really…. If you’re going to answer a question, at least make an attempt at understanding it first. I give way better support than this over IRC to people that I have no financial interest in helping. You’d think that the vendor that gets money from us every month could do a bit better.

Gluster Aquired by RedHat for $136M

•October 4, 2011 • 2 Comments

In the latter part of 2008, we had a server that started to have problems. That’s when we discovered that DRBD would fail miserably if you tried switching the active state back and forth rapidly. This caused a split-brain situation and we had our entire company down for half a day while I got critical systems back up, and I spent more than a month tracking down everything that was botched.

I had to find a new solution.

By March of 2009, I had found and learned how to use GlusterFS. This was no easy task, as it turned out. The documentation was hit and miss. Settings may or may not be for the current version. Terms were used incongruously. Though they had a mailing list, questions to that list went unanswered. The IRC channel was silent.

Eventually, I figured it out on my own and decided to take my newfound knowledge and edit the wiki, and help people in the IRC channel. Now, if you’ve ever tried to get support from an IRC channel, most of the time you get surly narcissists who feel that demeaning people is somehow helpful. I vowed that our channel would be different. My narcissism would not be surly.

I have no idea over the past two and a half years how many people I helped, but I do know that my example and influence have created one of the nicest and most helpful IRC support channels I’ve ever encountered. This in large part is thanks to the wonderful people that have joined me on #gluster: Samuli ‘samppah’ Heinonen, Louis ‘semiosis’ Zuckerman, Jeff ‘jdarcy’ Darcy, Adam ‘m0zes’ Tygart, Todd ‘Heebie’ Hebert and others.

Today, RedHat and Gluster announced the acquisition. Anand Babu (AB) Periasamy, Gluster CTO stated in the webcast, “The community was central to our success.” Brian Stevens, CTO and VP, Worldwide Engineering, Red Hat also acknowledged the community was a part of RedHat’s decision in his announcement.

I’ve been assured that the check is in the mail for my part in this acquisition, but that the envelope and postage have more value than the check.

All kidding aside, my sincere congratulations to everybody, employee, community and end-user that’s made Gluster such a success. And congratulations to RedHat on filling this gap in their enterprise offerings.

GlusterFS 3.1.6 performance tests

•September 16, 2011 • Leave a Comment

Testing on our Drupal 6 intranet server. Simple ab test.

Volume Name: intranet
Type: Distributed-Replicate
Status: Started
Number of Bricks: 4 x 3 = 12
Transport-type: tcp
Bricks:
Brick1: ewcs2:/var/spool/glusterfs/a_intranet
Brick2: ewcs4:/var/spool/glusterfs/a_intranet
Brick3: ewcs7:/var/spool/glusterfs/a_intranet
Brick4: ewcs2:/var/spool/glusterfs/b_intranet
Brick5: ewcs4:/var/spool/glusterfs/b_intranet
Brick6: ewcs7:/var/spool/glusterfs/b_intranet
Brick7: ewcs2:/var/spool/glusterfs/c_intranet
Brick8: ewcs4:/var/spool/glusterfs/c_intranet
Brick9: ewcs7:/var/spool/glusterfs/c_intranet
Brick10: ewcs2:/var/spool/glusterfs/d_intranet
Brick11: ewcs4:/var/spool/glusterfs/d_intranet
Brick12: ewcs7:/var/spool/glusterfs/d_intranet

Defaults

$ ab -n 100 -c 5 http://intranet/
This is ApacheBench, Version 2.3
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking intranet (be patient)…..done

Server Software: Apache/2.2.3
Server Hostname: intranet
Server Port: 80

Document Path: /
Document Length: 16883 bytes

Concurrency Level: 5
Time taken for tests: 113.311 seconds
Complete requests: 100
Failed requests: 0
Write errors: 0
Total transferred: 1733000 bytes
HTML transferred: 1688300 bytes
Requests per second: 0.88 [#/sec] (mean)
Time per request: 5665.557 [ms] (mean)
Time per request: 1133.111 [ms] (mean, across all concurrent requests)
Transfer rate: 14.94 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 27 58.8 2 281
Processing: 2141 5593 2214.2 4936 11377
Waiting: 1985 5436 2175.5 4762 11001
Total: 2141 5620 2213.0 4941 11378

Percentage of the requests served within a certain time (ms)
50% 4941
66% 5546
75% 6706
80% 7483
90% 9540
95% 10590
98% 11245
99% 11378
100% 11378 (longest request)

With performance.cache-refresh-timeout set to 30

$ ab -n 100 -c 5 http://intranet/
This is ApacheBench, Version 2.3
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking intranet (be patient).....done

Server Software: Apache/2.2.3
Server Hostname: intranet
Server Port: 80

Document Path: /
Document Length: 16908 bytes

Concurrency Level: 5
Time taken for tests: 124.092 seconds
Complete requests: 100
Failed requests: 45
(Connect: 0, Receive: 0, Length: 45, Exceptions: 0)
Write errors: 0
Total transferred: 1734285 bytes
HTML transferred: 1689585 bytes
Requests per second: 0.81 [#/sec] (mean)
Time per request: 6204.579 [ms] (mean)
Time per request: 1240.916 [ms] (mean, across all concurrent requests)
Transfer rate: 13.65 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 15 33.8 1 184
Processing: 3521 6189 2428.1 5970 13399
Waiting: 3214 6031 2442.0 5861 13288
Total: 3522 6204 2427.7 5971 13399

Percentage of the requests served within a certain time (ms)
50% 5971
66% 6533
75% 7080
80% 7501
90% 9621
95% 12271
98% 12967
99% 13399
100% 13399 (longest request)

Using the local nfs layer method: http://community.gluster.org/p/nfs-performance-with-fuse-client-redundancy/

$ ab -n 100 -c 5 http://intranet/
This is ApacheBench, Version 2.3
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking intranet (be patient).....done

Server Software: Apache/2.2.3
Server Hostname: intranet
Server Port: 80

Document Path: /
Document Length: 16926 bytes

Concurrency Level: 5
Time taken for tests: 87.674 seconds
Complete requests: 100
Failed requests: 0
Write errors: 0
Total transferred: 1737300 bytes
HTML transferred: 1692600 bytes
Requests per second: 1.14 [#/sec] (mean)
Time per request: 4383.720 [ms] (mean)
Time per request: 876.744 [ms] (mean, across all concurrent requests)
Transfer rate: 19.35 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 18 56.3 1 490
Processing: 917 4344 3324.3 3495 16214
Waiting: 719 4227 3312.7 3445 16031
Total: 955 4363 3325.0 3498 16215

Percentage of the requests served within a certain time (ms)
50% 3498
66% 4357
75% 4706
80% 4909
90% 10750
95% 13920
98% 15947
99% 16215
100% 16215 (longest request)

Unified File and Object Storage on GlusterFS 3.1 Howto

•August 25, 2011 • Leave a Comment

The following article is outdated. It remains here for historical purposes but is no longer valid.

This tutorial expects that you know how to install a working GlusterFS filesystem and add the swift components.[1]

  • Create a volume for storing the authentication data. This must be named “auth”.

    gluster volume create auth replica 2 server1:/data/auth server2:/data/auth

    gluster volume start auth

  • Create a volume for your storage. This will be referred to as an “account”.

    gluster volume create account1 replica 2 server1:/data/account1 server2:/data/account1

    gluster volume start account1

  • Run gluster-object-config

    # /usr/bin/gluster-object-config
    Enter external IP for Storage-Server: 192.168.0.10
    Enter Super Admin key: P-assword-1
    Enter FileSystem (Press Enter for default Glusterfs):
    Enter FileSystem (Press Enter for default Glusterfs):
    Enter Object server port (Press Enter for default 6010):
    Enter Container server port (Press Enter for default 6011):
    Enter Account server port (Press Enter for default 6012):
    Enter ‘1’ for https ‘2’ for ‘http’ (Recommended is https):1

  • Start the object service

    # gluster-object-start

  • Prepare the authentication store

    # gluster-object-prep -A https://192.168.0.10/auth/ -K P-assword-1

  • Add an administrative user to the account

    # gluster-object-add-user –admin -A https://192.168.0.10/auth/ -K P-assword-1 account1 adminuser1 adminpass1

To test, use “st” from the openstack-swift package:

# st -A https://192.168.0.10/auth/v1.0 -U account1:adminuser1 -K adminpass1 stat
Account: AUTH_bc4de365fd9e8a6a262d99056f5bd47f
Containers: 0
Objects: 0
Bytes: 0

That’s all there was to it.


[1] If you don’t, come see us on IRC – #gluster on freenode.

auth

Very minimal CentOS image howto

•October 7, 2010 • 1 Comment

I’ve seen it said by someone at Redhat, “The favorite whipping boy around here is anaconda. Every slow or failed install is anaconda’s fault. Never mind that someone here wrote it,” and I agree. I don’t like the anaconda installs, I think even the minimum install is bloated for a minimalist VM approach, so here’s how I like to make my minimal CentOS install.

First, download the current centos-release and centos-release-notes for your architecture from your favorite mirror. I’m doing this with Fedora 13, adjust for your favorite flavor.

su -
TARCH=x86_64
wget ftp://mirrors.kernel.org/centos/5.5/os/${TARCH}/CentOS/centos-release-5-5.el5.centos.${TARCH}.rpm
wget ftp://mirrors.kernel.org/centos/5.5/os/${TARCH}/CentOS/centos-release-notes-5.5-0.${TARCH}.rpm

Next, let’s create a work directory to do this under. I have lots of space in my home directory, so I’m going to use that:

WKDIR=/home/jjulian/centos_image
mkdir -p ${WKDIR}/image
cd ${WKDIR}

We’re going to use yum to do our install, so we’re going to create a custom yum config:

cat >yum.conf <<EOF
[main]
cachedir=/var/cache/yum/${TARCH}/5
keepcache=0
debuglevel=2
logfile=${WKDIR}/yum.log
exactarch=1
obsoletes=1
gpgcheck=0
plugins=1
installonly_limit=3
repodir=${WKDIR}/image/etc/yum.repos.d
installroot=${WKDIR}/image
EOF

We initialize the rpm database and install the centos-release rpms:

rpm -r ${WKDIR}/image --initdb
rpm -ivh --nodeps -r ${WKDIR}/image centos-release*rpm

Now we just install the yum and rpm using yum:

yum -c yum.conf --nogpgcheck install rpm yum

I’m sure there’s some way to leave gpgcheck enabled, but I haven’t looked for it. If someone wants to figure that out and let me know, I’ll update my article.

Since we created the rpm database using our parent OS, it may not be using the correct db version, so let’s recreate it using the tools we just installed. We’re going to reinstall using these tools, so we’ll need the rpms we downloaded earlier as well.

cp centos-release*rpm ${WKDIR}/image/root
cp /etc/resolv.conf ${WKDIR}/image/etc/
mount -t proc none ${WKDIR}/image/proc
chroot ${WKDIR}/image
rm -rf /var/lib/rpm/*
rpm --initdb
rpm -ivh --nodeps ~/centos-release*rpm
yum -y groupinstall Base
yum clean all

So now we have a fairly minimalist centos tree (397 packages) weighing in at about 820 Mb. You can install other packages now, set up some initial configurations, etc.

You could have an even slimmer install if you wanted to be more selective. I find the Base group to be sufficiently slim for most cases.

If you’re using pxe for nfs rooted VMs, then your image is done. Move the image where you want it, export it and see the next article about nfs rooted vm’s.

If you need a bootable grub image for a stand-alone vm, ask me and if I get enough requests I’ll write about how to do that too.

GlusterFS in an Enterprise setting

•June 11, 2010 • 3 Comments

About a year ago I began my experiment with GlusterFS as a 0DT SAN. There were a few hiccups at the beginning, and a couple of version changes to get to the point where I consider it stable, but now I don’t know how anyone lives without it.

My test configuration started with 3 servers and 4 drives each. I created single LVM volumes on each drive, consisting of half the drive so I have room for snapshots. I mount them each under /cluster as 0,1,2, and 3 and share them using glusterfsd.

volume posix0
type storage/posix # POSIX FS translator
option directory /cluster/0 # Export this directory
end-volume

volume locks0
type features/locks # Implement posix locks, not working 100%
subvolumes posix0
end-volume

volume brick0
type performance/io-threads # Performance enhancement
option thread-count 8
subvolumes locks0
end-volume

volume posix1
type storage/posix
option directory /cluster/1
end-volume

volume locks1
type features/locks
subvolumes posix1
end-volume

volume brick1
type performance/io-threads
option thread-count 8
subvolumes locks1
end-volume

volume posix2
type storage/posix
option directory /cluster/2
end-volume

volume locks2
type features/locks
subvolumes posix2
end-volume

volume brick2
type performance/io-threads
option thread-count 8
subvolumes locks2
end-volume

volume posix3
type storage/posix
option directory /cluster/3
end-volume

volume locks3
type features/locks
subvolumes posix3
end-volume

volume brick3
type performance/io-threads
option thread-count 8
subvolumes locks3
end-volume

volume server
type protocol/server
option transport-type tcp
subvolumes brick0 brick1 brick2 brick3
option auth.addr.brick0.allow *
option auth.addr.brick1.allow *
option auth.addr.brick2.allow *
option auth.addr.brick3.allow *
end-volume

Each of the clients then connects to the 3 servers and handles AFR and distribution, effectively doing a raid 1(3)+0(4).

volume fs1_cluster0
type protocol/client
option transport-type tcp
option remote-host fs1.ewcs.local
option remote-subvolume brick0
end-volume

volume fs1_cluster1
type protocol/client
option transport-type tcp
option remote-host fs1.ewcs.local
option remote-subvolume brick1
end-volume

volume fs1_cluster2
type protocol/client
option transport-type tcp
option remote-host fs1.ewcs.local
option remote-subvolume brick2
end-volume

volume fs1_cluster3
type protocol/client
option transport-type tcp
option remote-host fs1.ewcs.local
option remote-subvolume brick3
end-volume

volume fs2_cluster0
type protocol/client
option transport-type tcp
option remote-host fs2.ewcs.local
option remote-subvolume brick0
end-volume

volume fs2_cluster1
type protocol/client
option transport-type tcp
option remote-host fs2.ewcs.local
option remote-subvolume brick1
end-volume

volume fs2_cluster2
type protocol/client
option transport-type tcp
option remote-host fs2.ewcs.local
option remote-subvolume brick2
end-volume

volume fs2_cluster3
type protocol/client
option transport-type tcp
option remote-host fs2.ewcs.local
option remote-subvolume brick3
end-volume

volume fs3_cluster0
type protocol/client
option transport-type tcp
option remote-host fs3.ewcs.local
option remote-subvolume brick0
end-volume

volume fs3_cluster1
type protocol/client
option transport-type tcp
option remote-host fs3.ewcs.local
option remote-subvolume brick1
end-volume

volume fs3_cluster2
type protocol/client
option transport-type tcp
option remote-host fs3.ewcs.local
option remote-subvolume brick2
end-volume

volume fs3_cluster3
type protocol/client
option transport-type tcp
option remote-host fs3.ewcs.local
option remote-subvolume brick3
end-volume

volume repl0
type cluster/replicate
subvolumes fs1_cluster0 fs2_cluster0 fs3_cluster0
end-volume

volume repl1
type cluster/replicate
subvolumes fs1_cluster1 fs2_cluster1 fs3_cluster1
end-volume

volume repl2
type cluster/replicate
subvolumes fs1_cluster2 fs2_cluster2 fs3_cluster2
end-volume

volume repl3
type cluster/replicate
subvolumes fs1_cluster3 fs2_cluster3 fs3_cluster3
end-volume

volume distribute
type cluster/distribute
subvolumes repl0 repl1 repl2 repl3
end-volume

volume writebehind
type performance/write-behind
option aggregate-size 32MB
option cache-size 64MB
subvolumes distribute
end-volume

volume ioc
type performance/io-cache
option cache-size 64MB
subvolumes writebehind
end-volume

This configuration allows for any 2 drives in a replicate set to fail, or even any 2 servers, without loss of operation. Only 1/6th of the entire available storage capacity is shared, but with the dramatically reduced cost of storage over other redundant systems, this is a negligible problem.

Here’s a graphic representation of the configuration:

3 server, 12 disk, triple redundant distributed glusterfs design