Image may be NSFW.
Clik here to view.Domnul Alex Popescu posted a response to my latest post and I wanted to take the opportunity to address some of the points he raises.
To frame the discussion, it’s important to note that my analysis of MySQL and NoSQL’s pros and cons relates to how large, medium and even small businesses today would result to using these databases within their cloud environment.
We are not talking about the mammoths of this world, which deal with monstrous amounts of data and very extreme use cases – these would be willing to pay a high price to find a working solution to their problems. However, for 90% of operations interested in benefiting from the cloud – there needs to be a simpler solution, which can work with their echo system, not to mention within their budget, and still support future scaling and optimal performance.
*Popescu responses to my previous post are marked in Italic.
Why NoSQL
I’d say that data normalization is not a goal per se, but a solution to a problem (data duplication, frequent updates to common entities). But what if this solution is introducing another bigger problem (read JOINs)?
I believe there is considerable value in normalization – in both SQL and NoSQL, especially in read Joins. Non-normalized models mean you are required to transfer a lot more data on the network than you would have had to if you were using normalized data model. Normalized models can bring this overhead to a minimum. I do agree though, that sometimes non-normalized data models can be faster; still, this can also be implemented in SQL.
[Alongside the ability to easily scale, NoSQL] may give you more flexibility in your data model, plus it may be a better (as in operational, complexity, performance, etc.) storage for different formats of data.
Looking at the context of medium to large business, and not huge enterprises with obscene amounts of data (and funds), these could actually be disadvantages rather than advantages, because the IT service in the organization would have to struggle with implementing these new tools and approaches across the organization and get it all the way to production. I’m not dismissing, though, the option of using NoSQL for very specific use cases.
Why not NoSQL
Previously, i had stated that at the system level, data models are key. Not having a skilled authority to design a single, well-defined data model, regardless of the technology used, has its drawbacks.
Actually I think the reality might be a bit different. Because NoSQL imposes a “narrow predefined access pattern” it will require one to spend more time understanding and organizing data. Secondly, the final model will reflect and be based on the reality of the application, on not only on pure theory (as is the case with most initial relational model designs).
I agree this can be good and beneficial for very specific use cases, but the Skillset is still an issue. Who is the right person to design a NoSql schema? What is the price of an error in such a design? Do enterprises today have the tools to lower the risks involved in bad NoSql design? Moreover, what happens when the application needs to get out of the “narrow predefined access pattern”? Say you need to switch engines (you competition had acquired the open source you rely on) – if switching from one SQL provider to another is hard because of deviations from the standard, moving from an entirely different data model is so much more.
At the architecture level, I pointed to two major issues: interfaces and interoperability. Interfaces for the NoSQL data services are yet to be standardized.
The interface limitation is a temporary issue in terms of getting more/better/quicker tooling support and probably a longer term issue for developers needing to learn different models. But as we’ve agreed, NoSQL has a small, predefined access mode and so we are not talking about learning completely new languages.
Personally, I think the real issue is steep learning curve of understanding each of these NoSQL databases semantics and operational behavior then not having a common API.
I agree with Popescu. The API issue is just the shell of the problem, the different data models are the real problem; this is why the absence of clear NoSql standards is so critical.
I claimed Interoperability is an important point, especially when data needs to be accessed by multiple services.
I’m not seeing the problem here. As far as I know each relational database is coming with its per-language drivers. On the NoSQL side, there are already quite a few products using standard protocols.
I think this is a good progress in the NoSql arena. However, many NoSql solutions find it hard to externalize their special features via standard protocols, thus, forcing you to choose between standards, better performance or richer set of features. However, as pointed out, you can work using different protocols from different locations, as needed.
I maintain the operational environment requires a set of tools that is not only scalable but also manageable and stable, be it on the cloud or on a fixed set of servers. […] Operation needs to be systematic and self contained.
Now, this is completely the other way around. If you read any large scale application story, you’ll notice the pattern: the operational costs where a significant factor in deciding to use NoSQL. Just check the stories of Twitter, Adobe, Adobe products, Facebook. Complexity is a fundamental dimension of scalability and right now the balance is towards NoSQL databases .
I fully agree that these kind of companies require special solutions, as well as numerous other organizations such as government agencies, research centers etc’. However, these are the mammoths I’m referring to. I think your average Joe (as well as below-average or above-average :)) will need something else entirely. I also believe that even in the case of the Heavy-Lifters such as the Twitter and Facebook of the world – they will still keep 95% of their databases as SQL, and will look for SQL solutions for the cloud. And also, let’s not forget, that Twitter’s fale-Whale and other similar examples occurred because there weren’t scalable SQL solutions available – which is exactly what Xeround addresses! :).
It is my opinion that a SQL database built on NoSQL foundations can provide the highest value to customers who wish to be both agile and efficient while they grow.
Unfortunately I don’t think that’s actually possible or at least not for all solutions. But If we just want some common access language, we will probably get it.
If, on the other hand, what we want is more tunable and scenario specific engines, we will probably get these too. (nb: as far as I’ve heard the PostgreSQL community is learning a lot from the various NoSQL databases and trying to bring in as many of the good ideas they can).
Agility requires a very flexible solution, based on readily available resources. Agility and Efficiency will allow making changes at a reasonable cost and faster time-to-market. Many companies cannot pay an upfront price for potential growth, therefore, the model I present works well for them. There will always be specific applications that will need special data models, where RDBMS will not be a good fit. But let us not forget, that for general enterprise applications – it’s a practice that has proven itself.
Conclusion
My conclusion is simple. As with programming languages where we are not stuck with COBOL, polyglot persistence is here to stay and it’ll only get better.
I do not see a contradiction between Popescu conclusion and my post. Many companies will deploy NoSQL solutions in addition to SQL Solutions. Plus, with a SQL DB, which is based on NoSQL foundations, you can get the best of both worlds using polyglot. I also believe that SQL is here to stay, and it will get better and more scalable implementations – like Xeround.
Learn more on how Xeround supports extreme MySQL scalability in the cloud.