Data Modelling: am I brain dead?

So, there I was… getting ready to write my own plugin to suit my own purposes; yay! Open source! As a part of that process, I began researching how other developers are modeling their data. You know, for plagiarisminspiration kind of thing. I noticed that a lot, like nearly everyone, in their database model uses a post table and then a posts_meta table with key, value pairs stored in it. Then they store everything in meta table, perhaps, way too much. Then they loop through meta table like a greased hog at a Sunday picnic; it’s seems comical at first but then ends in tragedy. From an efficiency perspective, especially as the data set grows, isn’t this a really bad idea? For this kind of data model wouldn’t json work faster and result in a much cleaner code base as well? Additionally, it seems to compound the task of building out an API for that kind of data model.

Why are developers preferring to use a key:value pair over normalizing a table and then providing what seems to me to be unnecessary php to make it all work? Please help me understand the decision for preferring the key:value type data models or optionally explain how I’m brain dead. That will work too. As long as I have a better understanding of why decisions are being made that I would not normally make.

Please don’t use the “because that’s just the WP answer” or the “that’s the way we do it answer” without providing the context to what I may have missed.

Thanks in advance.

Other plugin developers might have a different answer. I’m not a plugin developer, but I’ve been part of many database discussions and have seen many discussions.

The answer, unfortunately, is “because that’s just WP”. Let me clarify. WordPress was built to use MySQL, which is a relational database. There’s no way to use NoSQL database. MySQL, and other relational databases, follow first normal form rule (1NF), which requires every column to store a single value. Storing JSON, for example, breaks that rule although it is allowed nowadays. If you do store JSON in MySQL, you can’t index it, so you’re really doing yourself a disservice.

It’s not perfect and it has issues, but that’s basically everything. In WP’s case, that’s how data is stored and manipulated by the core. If you want something different, use a different CMS or write your own.

Now, this brings us to a few points when it comes to building themes/plugins:

  1. WP recommends plugins/themes use existing functions instead of writing custom database queries. This forces developers to use what’s available and store data in a standardized way, which is what you see with post meta, user meta, and options tables.
  2. WP sanitizes and escapes data as needed, so when developers use native functions, their plugins/themes are less likely to have vulnerabilities. With custom queries, you’re more prone to miss something.
  3. Using native functions also speeds up the development of plugins/themes. That’s really the point of a CMS/framework: standardize and re-use.

In the last few years, more plugins have begun using custom tables (WooCommerce, Yoast, etc.). So things are changing slowly. But WP will always be bound to restrictions imposed on it by MySQL.

That’s my 2 cents.

Viktor,
I appreciate the quick and straight forward response. I feared this answer. My comment towards using json was a slight towards 1nf normalization practices <== drives me nuts. I’m one those guys who writes SQL via the CLI.

I get the standardization, code reuse, and ease of development points. WP/CP want’s you to use their functions which enforces 1NF. This is where my follow up question starts. Does WP/CP provide any core functions that provide for any of the other levels of normalization factors? If not, which I suspect, is there project working on that concept?

Missing #4: Caching and thus, processing speed. Caching gets insanely difficult if everybody decides to write their own custom queries. Yes, one could add filters everywhere, but lets face it: NOBODY DOES.

And at the end of the day, people like humble me have to dig through that hot mess and try to move and arrange everything into a properly maintenable whole. Which takes much more effort and also one’s mental sanity than sticking to the “olden, but not always golden” ways.

I have to add: If its a small, nimble plugin, which only does ONE thing, that MAY work initially. But as soon as you extend it, or if its meant for progressing to something bigger, then the issues start. For a few real life examples:

a) My UI SSL Enforcer plugin doesnt come with admin settings: in 95% of the cases, you dont need any, and if you do, global constants currently still do the job reasonable well. In theory, one could easily add an admin screen, but I dont see any advantages over the “drop in, activate + forget” solution yet. It does one thing, and that it does well.

b) The last 6 months I have been working on migrating a medium-sized sales / distribution platform of a public insurance company - and am going to continue working on it at least till the end of the year (and probably beyond, because rn I’m the only person who has the most insight into its inner workings).

Here, EVERYBODY decided to do THEIR OWN THING, not caring about standards, code conventions and the advantages of using an existing framework. That includes:

  • just skipping over adding comments in the code, so when returning to the code a few weeks later, NOBODY is going to understand what was commited here
  • building plugins with completely different data models that only has a very few things in common with the rest (ie. it “uses” the plugin infrastructure, the WPDB class and some sort of SQL)
  • keeping important prep work like diagrams, specification sheets etc. at the absolute minimum, with obscure and ambivalent naming, not referencing other important documents, etc. ppp.

So in terms of b) and “just roll your own” - please do, if you hate other people :slight_smile:
But if you want to do yourself a favor, esp, when it comes to increased ability for extension and maintenance: Try finding a compromise for yourself :slight_smile:

cu, w0lf.

3 Likes

I have not seeing anything related in the works, but I also don’t know everything that might be going on at WP. However, the only thing that might be related to databases is the SQLite integration. Not really what you’re interested in.

There is one aspect to this, where you can definitely improve on what many plugin developers do, @shimmy. As you say, “nearly everyone … uses a post table and then a posts_meta table with key, value pairs stored in it.” But even according to WP’s own best practices, this is often completely the wrong thing to do.

In many cases (though admittedly not all), the data stored as meta should, in fact, be stored as taxonomies. While this still means using more than one table in the database, it also means using a table that is designed for the purpose, which makes a huge difference to performance at scale.

1 Like

Thanks for the replies everyone. I appreciate your thoughts. If you’re new to the thread and have something to contribute to please add to the conversation.

At this point, I’m considering building a “developers plugin” for CP v2.0 (because php8.x). The concept is simple enough: provide an add-on library something like normalizations.php in order to provide a new class; such as cpdb because the class wpdb is already taken; and then utilizing semantically similar functions, hooks, and methods as compared to wpdb. IMO, with 3NF/ECNF compliance as the goal for v1.0, the complexity of this undertaking is greatly reduced from say 5NF. Those who are okay with 1NF normalization in their plugins need not consider it. Those who want to take advantage of enhanced normalization can do so and enjoy a familiar, albeit enough, standardized toolset that provides sanitizing and escaping features as they have come to expect.

SideBar: A standardized 3NF library, could help reignite the interest in ClassicCommerce (development appears to have stalled for about year now on github). It might provide a foundation for distinct ClassicBuddy project.

Thoughts on this approach?

Both SQLite3 and MariaDB have in memory features; SQLite works best with smaller datasets; MariaDB works better with larger datasets. The feature of both these could be exploited on-the-fly to really optimize performance of read operations. If sqlite3 isn’t installed then the logic gets simple. IMO, we need at least 3NF to really benefit from design like this.

I have reservations about using sqlite3 for persisting data on public facing applications. I love using it for “local copy” apps with or without sync to remote/cloud apps.

When the Classic Commerce maintainer left, we found no one interested enough to continue active development. There is a need for CC, though. People do use it. Any help with it is appreciated.

ClassicPress is a great option for developers to fully control their website or app without needing a drag-and-drop website builder to drag performance down. Making CP more useful to developers, with improved performance, is definitely something we can get behind.

More thoughts on SQLite integration.

On a web-server, when processing sensitive information such as PII, SQLite is never an option. However, for more mundane data, I can see a use case of running a SQLite in conjunction with MariaDB.

Theoretically, MariaDB and SQLite could run a relay race together that processes that data faster. MariaDB goes the distance while SQLite takes the baton for the sprints; back and forth they go.

I suggest this because MariaDB has features (LOAD_DATA_INFILE and SELECT_INTO_OUTFILE) where she can read or write datasets (csv,xls,etc) natively. While SQLite can work with "CSV VIRTUAL TABLE"s. I am supposing that mapping one’s in to the other’s out on the fly can be solved problematically. I’ve never tried this, but I might have to now.

When you combine this idea with the in memory features that I previously mentioned; we get close to a performance solution that only both can solve together.

Perhaps this, or something like it, is WP’s motivation for looking into SQlite? Or maybe they just want their plugin developers to stop using the same database that the program ships with in order to alleviate performance related issues due to Gutenberg’s Blocks.

and

Can you elaborate on these statement and/or point me in a direction?

Tom Nowell is the guru on this. See the following:

2 Likes

Thank-you

1 Like

Then there is Redis and PostgreSQL but not sure how compatible they are with CP2. SQLite is a great option and II use it very often, but outside of CP/WP.

In general I try to give CPTs minimal extension coverage. meta-data and user-data at most. If you can get away without cats and tags, that makes the spread even less and joining calls is not needed as much.

So yes, it is all in the design of the plugin. Building a custom table is just as taxing as using postmeta and usermeta if the custom table uses the meta in the same manner that posts/users data is used. this is why Woo uses table creations delta so that they can handle UNIQUE datasets. If the data is not unique, you really should not worry too much about the database speed. If you do worry about it then see the first sentence. LOL

First and last sentence. I’ve been looking at drop-in normalization solutions that will work for site-admins that does not require new dependencies. Vanilla WP/CP means MariaDB, or in the very least MySQL. So that’s our playground.

Agreed. I was just going down a thought corridor with my previous comments.

That’s the argument at the heart of my thoughts. Yes, at smaller scales the performance is about equal, but as the data-set grows normalization means something.

I’m very much inclined towards that “developer’s plugin” that I mentioned which would target 3NF for v1, and 5NF for v2. However, it’s a paradigm shift, one that could distinguish CC, ultimately CP, from the competition.

If our normalized solution hot-swapped utilized mariadb “views”, or db persistent queries, with the appropriate names, data, and fields then we could do more things. For instance in the CPv2 the cp_users table has the field user_pass varchar(60). If we implemented a normalized db schema that I am suggesting then plugin developers could start moving towards NIST, CIS, etc compliance. At the same time we can maintain backwards compatibility because MariaDB allows us to write to tables through it’s views, assuming all the “not null” data is provided at the time of the commit. That’s a feature that the competition can’t claim to do. Thoughts?

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.