DBA 3.0 (The Holistic DBA) – Part Three
It’s been a little while since I wrote part two of my DBA 3.0 series. I find it’s 3am in the morning, I can’t sleep, I don’t feel well and I’m cold.
A perfect time to write.
First, I’ve decided that my concept of DBA 3.0, though well intended, is not very descriptive. There is one comment on my previous post where someone mentioned that they used the moniker DBA 3.0 in a work to describe something. Now, DBA 3.0 is generic enough that I don’t feel all that bad about adopting the name, but in my mind that is not descriptive of the concept. So, I’m rechristening the idea as the Holistic DBA (and maybe to make it extra cool we could add something like – The Next Generation to the title…. naaahhhhhhh).
In exploring the idea of the Holistic DBA in my previous posts, I lamented that many are entrenched in their cubicles, that many do not speak the language of business, that many are too busy writing scripts or digging into the next new technology. All of these things, in their proper proportion are good things, no doubt. It’s when the priorities get out of wack, then what we do (or as often what management tells us to do) can get us into trouble. In my world view, the holistic DBA is not a specialized DBA. They don’t spout out 10046 translations without using TKPROF, they don’t dump the contents of datafile headers and use bbed to magically open the database (and if they did I’d question if they were really an expert at all).
Let me say, first and foremost, that not every DBA needs to be a holistic DBA. Also, not every organization needs to have every DBA be a holistic DBA. There is a need in the technical world for specialization if your organization can afford it and if the need can be justified based on cost, meeting customer expectations and principally (in my mind), meeting established Service Level Agreements (SLA), Recovery Time Objectives (RTO) and Restore Point Objectives (RPO). I will also counter that every specialist DBA needs to have the skill sets and countenance to be a holistic DBA, especially that of communication.
There will always be a need for Lewis, Kyte, Millsap and that breed of expert (forgive me if I left your name out, you are all much smarter than me and I freely admit it).
And now, I’ll let the shock wear off your face with the notion of SLA’s, RTO’s and RPO’s. You might be asking, “What, do organizations really have those?”. Indeed, the best run, most successful data organizations do have those, somewhere.... often in a form that is pretty much useless, and not updated since the last great depression.
The organizations that have repeatable success have those. Organizations that fly by the “seat of their pants” typically do not, and that is one reason why those types of data organizations fail, eventually.
Eventually is a key word here too. Just because your ham-strung, bailing wire and spit, technical marvel has not yet failed, be sure that it will someday fail. When that day comes the song will be “Who’s got the job today? Oh Yeah! Oh Yeah!” as opposed to, “Information, I need the number for the unemployment office”.
If you think that it’s just the little boys who fail, and that the big boys and your big boy organization is immune from failure because of its vast size, number of DBA’s and it’s wonderful glowing talk about procedure, process and all the like, you are waiting to stand in the unemployment line too (or you should be already there). No my friends, the evil angel of death is an equal opportunity player when it comes to data center death and subsequent total mis-management of the world. In fact, I would suggest that the bigger you are, the harder you are more likely to fall and fall hard. Just because it hasn’t happened yet does not mean that the grim reaper does not have you on his visiting list… he is, after all, not omni-present. The stories I could tell of the foolish big boys and the key assumptions they made that caused massive failures.
So… What do we do to fix these problems? First of all, we have to realize that we have some principle goals as DBA’s. I’d like you to consider the following items as part of a list of these principle goals:
- Be able to reliably, consistently and efficiently backup and recover the databases and all database in accordance with SLA, RPO and RTO’s.
- Ensure database uptime with respect to all SLA, RPO and RTO’s.
- Ensure all databases are secure.
- Monitor all databases in a consistent and reliable manner.
- Communicate in an effective manner.
- Help users to help themselves.
- Work to understand and correct bad behaviors with respect to the organizations data policies.
- Work to understand and correct bad behaviors with respect to the organizations data designs.
- Work to understand and correct bad behaviors with respect to the organizations application designs.
- As befits the organization, your abilities, and time keep up with current data related technologies.
Do you notice anything about being the hero of the day on this list? Good, because you should never need to be the hero of the day. Yeah, it feels good when it works, but the unemployment line feels even worse when it does not.
Do you notice irritating details with respect to policies and procedures and doing backup and recovery testing? You didn’t? Really? Look closer at the list my friend. It’s in there.
Now…. I ordered this list in a very specific order for a specific reason and I wonder if you can tell me why? What is it about this list that, when done in the order listed, makes for a holistic DBA?
First, I think they are arguably in some order of importance. You might shift a couple of them around, say 2 and 3 or 8 and 9 (or you might consider them one and the same), but generally they are in order of priority (in my eyes) to any data organization. What else though, is magical about this list?
Automation, replication, consistent execution. Look carefully at 1 – 4. Holy cow, they are all something that can be documented, easily. They can be implemented rather easily one time, fire and kinda-forget within the constraints of an established policy (for example: Occasional maintenance, testing and the like thrown in to the schedule) and automation.
Something else I’ve noticed about the whole automation thing, we really like re-inventing the wheel don’t we? Why do we feel like we have to spend hours of time re-inventing the wheel by writing korn scripts, throwing them into Cron and having them run all over creation. Oh, here’s is a smart guy, he put them all on a shared NFS drive. UGH. Do I really need to mention the problem here? Do I really need to say this is the wrong way to be doing things?
Case in point – again the characters, places and all identifying information have been properly scrubbed. Maxwell Smart was a DBA in a rather large DBA organization. Now Maxwell was tasked with the assignment of getting RMAN “up and running” and to replace the stock of old, spaghetti code of backup scripts in the process. Maxwell is a pretty smart guy, but he really needs guidance and he really didn’t get much on this assignment. He felt the weight of this assignment on his shoulders quite broad and heavy. As a result, Maxwell replaced the old spaghetti code backup scripts and replaced them with a new set of equally spectacular spaghetti code backup RMAN backup scripts. Essentially we changed nothing in the process, except the tool that did the backups. The overall architecture was not considered, the total non-supportability of the scripts in the future was not considered, and the result was a product that was no better than what it replaced.
The point is that the holistic DBA, as we continue to define what and who they are (in my own limited understanding) is that we hope that Maxwell Smart will have moved past steps 1-4 long ago. That Maxwell would have moved onto the steps that offer the greater good, #5 in particular, and then steps 6 through 10. If Maxwell were a holistic DBA he might have come to understand some principles of architecture and design, that he didn’t possess when he re-created hell and he might well have created a backup haven.
Perhaps your retort to using technologies is something like “Well Grid Control didn’t work with backups n number of versions ago and so I gave up on it.”. Really. How very odd. The one product that Oracle gives you to make your life easier and at the first hint of failure, you give up on it. Instead you resort to korn, perl, or heaven knows what scripting language is your favorite and you spit out this cool, really dense code. WOW…. Meet Dr. Frankenstein. You have your creation and its’ a monster. Do I really need to tell you why it’s a monster? The bottom line is the holistic DBA put’s his ego and desire to raise the dead on hold and does it the right way. He/She does it in a way that is automated, centrally manageable and easy to use.
So, your retort, “what if it is broken”? Then my friend, it’s time to remember your priorities above, call Oracle support and get the damn thing fixed.
Then you retort, “They never work on my SR, they never call me back”…..my goodness someone call the wambulance and rush ahead to priority number 5 and get it down fast. The bottom line is that the holistic DBA can not be a passive “Yes man”. You don’t have to be a jerk, you don’t have to be evil incarnate to get what you want but you do have to persevere. You have to pursue excellence, even if those around you seem not to be following the same path. Heavens, this is leading me down the path to calling this the Black Belt DBA but I’ll avoid the temptation.
If you are not getting the support you desire from Oracle (or any other vendor) then who’s responsibility is it to get the level of support you need? YOU! It is your job to make these things work, and to bust dam’s (and perhaps even utter one or two aloud) in your pursuit of a solution. Oracle support can be very responsive if you know how to use it, and if you are persistent. If you don’t know how to use it, beyond opening an SR on Metalink, there are some great training tools available to teach you how to use the support system. Working with any support organization is a bit like working with an old car with an engine that tends to flood way to often. You have to figure out how it works, what knobs to pull and just how to pull them. Once you figure that out, the car will start up pretty well just about every time. Again though, I put the onus back on you to make support work for you. If you are going to sit and wait for support to come looking for you then the fault, in my opinion, lies within yourself.
So, we’ve figured out that 1-4 are pretty easy to take care of (though the implementation may take some time and major initial effort which can frustrate those who love to do things by the seat of their pants). Do you notice something else about #1-4? First, they don’t require you to know how to do 10046 tracing. They don’t require that you know how to do explain plans. They don’t require that you have to be able to do data file header dumps or a triple summersault with a half-gainer pike on the database (is that possible?). You might protest, but I need to know how to do these things? I need to be able to look like I know what I’m doing. I agree, you need to look like you know what you are doing. Remember that this list is a list of priorities. Few people will remember you for your marvelous once every six months 10046 trace that fixed one year old SQL code that was not designed to be scalable. They just won’t. They will remember that your database was restored flawlessly and on time and that the business never even noticed things were down. Better yet, they won’t even remember because they didn’t notice and there was no need to. If you have a good boss, he will remember. If you get off the wambulance and show some persistence in your reviews, he will certainly remember.
Here is another point about the over-arching nature of numbers 1 through 4. They can be costly if not done right. If you find yourself iterating only between these various areas of DBA Ville, you are costing your employer vast sums of money that really does not need to be spent.
Another point about 1 through 4 is about automation and simplification. You need to realize that these are just lines in the page and that there are other needs in this section that you need to “read between the lines”. Automate, automate, simplify, simplify. For example, if you find that you are creating lots of databases or lots of schemas, and that takes a lot of your time, find a way to automate that workflow. Make it user serviceable. The tools are out there to allow you to do basic, simple, automated provisioning. Rather than spending 6 hours to 5 days to provision a database, let’s spend 2 seconds as you click the approval email that is a part of the overall workflow. These are the kinds of things that you need to be doing, now.
Plenty has been said about re-inventing the wheel, about the costs of re-inventing the wheel and the dangers of re-inventing the wheel. Use the Damn wheel why don’t you and quite finding little reasons not to use the wheel and build your own wheel. “But there are bugs in the wheel”, you say. All wheels have bugs. The problem is, the only person who can solve your wheels bug is you, and if you leave, then who is it going to be. Sure, the Oracle wheel might have a bug or “undocumented” feature in there, but can you really cost-justify the time and money spent re-inventing the wheel rather than taking the support organization to task and making them fix the problem? Also, keep in mind that the wheel they created, has been tested many more times and in various different ways than the wheel you are creating.
The real point, with numbers 1 through 4 is that they really can take up an inordinate amount of time if not done correctly. They can also cause an inordinate amount of damage to the organization if not done properly. We need to have these settled, on automatic pilot and flying without aide of an instructor reliably. Then we have freed up time for tasks that are more strategic, and therefore quite important.
That the leads to the that magical mystical number 5? Let’s talk about that and the rest of them in my next post.