1. Aug 23rd, 2008

    Dumb, and faster, databases

    Experiences like this take me two decades back, when I was introduced to RDBMS as “slower, heavier, but oh so good for …”. Somewhere along the way we lost both qualifications. There’s an ongoing myth that declarative is always faster/better because it lets the software smarts kick in, SQL must therefore be … Worse, I bet most people who do use a database server can’t even complete the sentence “good for ____”.

    It’s just something you do, let’s not ask why.

    Well, why?

    Put another way, taking the brain-dead stupid, non-SQL, mainframe-like approach got me results 12 times faster than doing it the seemingly “correct” way.

    Now this isn’t exactly what the whole disk vs. tape thing is about but it’s pretty close. I’m aware that InnoDB works with pages (that will contain multiple records, some of which I don’t need) and that’s part of the problem in this scenario. But it’s a really interesting data point. And it’s certainly going to change my thinking about working with our data in the future.

    Not to say that RDBMS are bad for what they do, just that applied liberally they’re stillĀ incapableĀ of restoring a receding hairline. And that’s not the only problem they’re bad at solving.

    1. Aug 24th, 2008

      jeremy

      Whenever I read something along the lines of “I tried to do xyz in SQL and I couldn’t work it out/it wasn’t fast enough, so instead I did it with zyx” my immediate thought is “I bet their schema, indexes or query is incorrect”. I think this is definately the case here after reading the rest of the post and comments.

      In the worst case the product could be to blame (often MySQL), but another RDBMS will probably be able to handle what you’re trying to do in a sensible fashion. Without looking at the alternatives you end up saying all RDBMS are bad for a particular task, whereas that isn’t necessarily the case.

    2. Aug 25th, 2008

      Assaf

      “Jeremy Zawodny is an incoming employee of Craigslist, having left Yahoo!’s platform engineering group. He has been described as ‘Yahoo!’s MySQL guru’”

      I’m just saying (source: Wikipedia, so you know it must be true!)

      The problem is not RDBMS, but that people hold on to certain “truths” that are wrong. Computers are very lousy in playing along to our believes. If you know how the query optimizer works and the algorithms it uses, you know why (and when and where) it’s wrong. I think most people will find it easier to relate to a real life example, rather than discussing the internals of query optimizers, which is why I chose to link to this post.

      And the problem is not isolated to MySQL, all the major database servers have these issues.

    3. Aug 27th, 2008

      Jason Watkins

      Assaf: I agree with the general idea, however MySQL’s query planner is particularly dumb, and it’s lack of certain features taken as given for other database engines (ie hash join) amplifies the problem.

    4. Aug 27th, 2008

      Chipping the web: August 26th — Chip’s Quips

      [...] Labnotes » Dumb, and faster, databasesPrecisely why Synergy DBMS still powers some of the heaviest used apps in the world.Tags: sql performance database [...]

    5. Aug 27th, 2008

      Ole Phat Stu

      Could it be that 3rd normal form is slower than 2nd is slower than 1st?
      And 3NF is what most people use/undertsand by a RDBMS?

    6. Aug 27th, 2008

      apotheon

      Unfortunately, Stu, if you’re running a database in first normal form, you’re running an almost random pile of data — at least as far as maintenance is concerned. That’s not exactly something I’d care to deal with over long periods, except in data sets so small I’d be better off using a flat file anyway (like Firefox’s cookie policy exceptions database).

    7. Aug 27th, 2008

      Assaf

      Generally, higher normal forms would be slower to query: there’s more work involved because the data is spread over more rows, and query optimizers are only so smart. But higher normal forms is where the relational model shines.

      Like, Apotheon said, if you don’t care about higher normal forms, perhaps you shouldn’t be using an RDBMS to begin with. If the best way to manage your data is 3NF (or 4NF or 5NF), then you’re willing to put up with less performance for the added benefits.

      What I see a lot of people doing, is using RDBMS because they need to store data. Fail.

      Without understanding what RDBMS are good for you end up misusing them. On one end we have cases where the RDBMS adds unnecessary complexity that slows you down (performance and development). On the other hand, poor schemas that don’t take advantage of the relational model.