To switch or not to switch - The MySQL Question
Before I let my enthusiasm overpower me and jump into this rewrite with all it's worth, I have to take a long and hard look at what technologies are being used, and what I want to replace them with.
Currently, the whole thing is built on MySQL, but during the lifespan of this fairly complex CRM application I have seen some of the limitations of MySQL up close.
A lot of my contacts and friends also insist that while MySQL is fine for a website or any simple CRUD operations, it will let you down when it comes to more complex queries.
Rumor has it that these problems are both in performance and dependability, and while it might have been simple user error, I have seen both.
Keep in mind, however, that I am no DBA. What indexing my databases currently have I have gotten help from friends and colleges to build, or they were already built, and I inherited them.
What I think, however, is that the optimum choice of database is largely dependent on the use case.
For our statistics needs, for example, there is a lot of COUNT usage, and anyone worth their salt knows that PostgreSQL and most (all?) other MVCC-implementing database engines, even InnoDB for MySQL, will COUNT rather slowly.
A splash of googling turned up a PGSQL COUNT(*) workaround.
I must say, that's pretty good. The total number of rows in the database, however, is useless to me.
I need stuff like...
SELECT COUNT(DISTINCT(`caseID`)) FROM events WHERE userID = 31;
...to determine how many unique cases user 32 has logged events for.
Alright, so I update the user record every time a case is assigned, to reflect the "ownership count" for the user that lost it and the user that gained it. Right?
SELECT COUNT(DISTINCT(`caseID`)) FROM events WHERE userID = 31 AND loggedTime > NOW() - INTERVAL 30 DAY;
...Okay, now what? I need to know how many unique cases user 31 has logged events for in the last 30 days. Suddenly, it becomes very complex to keep updated. So complex, in fact, that it's better for overall performance if I just leave it with the straight-up COUNT syntax with no further hooplah.
In the end, it comes down to this: Where do I need the speed?
Both me and the people who will be using the statistics on a daily basis are fine with the statistics taking a few seconds to load. That's not where the performance is needed.
Where it is needed, however, is in the so-called Morphing, where our business rules are applied. It's also needed in Case Selection, where the cases are prioritized and the one that comes up on top is handed to the first available agent.
So, yeah, COUNT is relatively slow, but I won't be using that as much as long, complicated and confusing SELECT that don't have any COUNT in them at all.
It's these 50-line SELECTs I have to worry about. The WHEREwolves.
When it comes to these beasts, I set out to try and find dirt on each database engine. The problem with doing this kind of research is that anything I might find might be very much out of date. A major problem with any DB engine could be fixed in the latest release, so I need the data as fresh as I can get it.
How about June 2009? Since that article, PostgreSQL has advanced from 8.3.7 to 9.0.4 and MySQL from 5.1.30 to 5.5.14.
After over an hour just searching for data and even trying to run some benchmarks of my own, I spoke to a coworker on the subject. He's not doing software development for the company, but he runs a website with thousands and thousands of hits per day in his spare time. It runs MySQL.
He made me realize one major upside to MySQL: People know MySQL!
I won't be around to maintain this application for ever, and poking around the company a little reveals a healthy heap of MySQL people, but only the occasional "Yeah, I've used Postgres once".
So, it turns out that this decision won't be about dependability or performance at all. It will be about maintainability down the line, and about what people out there know how to use.
With the performance so seemingly close, and dependability reportedly being so much improved for InnoDB in MySQL in later versions, it seems we're staying with it.
I just never even thought of this reason to do so.
Thanks, Tom Erik. Your perspective was very valuable to me.