Application design: Data-driven vs Domain-driven

Information technology is developing by leaps and bounds. There are new devices, platforms, operating systems, and a growing range of problems, which need to be solved by developers.

But, it’s not so bad—new development tools, IDEs, new programming languages, methodologies, etc., rush to help programmers. The list of programming paradigms is impressive, and with the modern multi-paradigm PL (e.g., C#), it is reasonable to ask: «What is the best way to handle it? What to choose?»

Let’s try to figure this answer out.

From where did so many paradigms come?

In fact, the answer has already been stated in this post—different types of tasks are now easier and faster to solve, using the appropriate paradigm. Accordingly, new types of problems have appeared with the IT evolution (or the old ones become relevant), and solving them using the old approach is not suitable and it is inconvenient, which resulted in rethinking things and the development of new techniques.

What to choose?

Everything depends on what is required. It is worth noting that all the development tools are different. For example, PHP with a «standard» set of modules does not support aspect-oriented programming. Therefore, the choice of methodology is quite closely linked to the development platform. Oh, and do not forget that you can combine different approaches, which leads us to the choice of stacking paradigms.

For paradigm categorization, I used to use four dimensions that are inherent in almost any task:

Data
Any program somehow works with data: stores, processes, analyzes, reports.

Actions
Any program should do something—the action is usually connected with the data.

Logic
Logic or business logic defines the rules that govern the data and actions. Without the program, logic does not make sense.

Interface
How the program interacts with the outside world.

We can go further and get deeper into this idea to come up with quality characteristics for these four measures, creating strict rules and adding in a little math, but this is perhaps a topic for another post. I think most system architects determine the characteristics of the data for a specific task on the basis of their knowledge and experience.

Once you analyze your problem to these four dimensions, it is likely that you will see that a certain dimension is expressed stronger than the other. And this in turn will determine the programming paradigm, as they usually focus on some singular dimension.

Consider this example

Orientation to the data (Data-driven design)

Data is in the consideration, rather than how they are related to each other

Types of suitable applications:

1. Grabbers/crawlers (collect data from different sources, save somewhere)
Various admin interfaces to databases; everything with a lot of simple CRUD operations.
2. Cases, when the resource is already defined, for example, it requires the development of a program and the database already exists and you can’t change the schema of the data. In this case, it may be easier to focus on what is already done, rather than creating additional wrappers over data and data access layers. Using an ORM often leads to data-driven, but it is impossible to say in advance if it is good or bad (see below).

Orientation to actions—imperative approaches to development

Event-driven Programming, Aspect-oriented Programming, etc.

Orientation to logic: Domain-driven design (DDD) and everything connected with it

Here we have an important subject area task. We pay attention to the modeling of objects, analysis of the relationships and dependencies. This is mainly used in business applications, and it is a declarative approach, and partly functional programming (tasks that are well described by mathematical formulas) is a part of DDD as well.

Orientation on the interface

Used when it is first important as the program interacts with the outside world.
An applications development with a focus only on the interface is a situation that is quite rare. Although some of the books I’ve read mention the fact that such an approach was considered seriously, and is based on the user interface, meaning it takes what the user sees directly and, on this basis, designs the data structures and everything else.

Orientation to the user interface in business applications is often manifested indirectly. For example, the user wants to see the specific data that is difficult to obtain due to what additional structures the architecture acquires (e.g., forced redundancy data). Formally, event-driven programming is included here.

What about real life?

Based on my experience, I can say that two approaches are indicated: focus on data (data-driven) and focus on logic (domain-driven). In fact, they are competing methodologies, but in practice can be combined in symbiosis, which is often known as anti-patterns.

One of the advantages of data-driven over domain-driven is the ease of use and implementation. Therefore, data-driven is used where it is necessary to apply the domain-driven (and often this happens unconsciously). Problems arise from the fact that the data-driven is hardly compatible with the concepts of object-oriented programming (of course, if you do use OOP). In small applications, these problems are almost invisible. In medium-sized applications, these problems are already visible and begin to lead to anti-patterns. On major projects, problems become serious and require appropriate action.

In turn, domain-driven wins on major projects, but on small, it complicates the solution and requires more resources for development, which is often critical in terms of business requirements (to bring the project to the market «asap», for a small budget).

To understand the differences in the approaches, consider a more concrete example. Suppose we want to develop a system of accounting for sales orders. We have things such as:

1. Product
2. Customer
3. Quote
4. Sales Order
5. Invoice
6. Purchase Order
7. Bill

Deciding what the scope is at a glance, we begin to design the database. Create the appropriate tables, run the ORM, generate essential classes (well, in the case of a smart ORM, put this scheme somewhere separately, for example, XML, and generate a database and essential classes). Finally, we get an independent class for each essence. Enjoy life; it’s easy and simple to work with objects.

Time passes, and we need to add additional logic in the program, for example, to find the products with the highest price. There may already be a problem if your ORM does not support external communication (i.e., essential classes do not know anything about the context of the data). In this case, it is necessary to create a service, which is a method, to return the suitable product for the order. But our good ORM can work with external relations, and we simply add a method to the class order. Enjoy life again; the goal is achieved, the method is added in a class, and we have almost the real OOP.

Time passes, and we need to add the same method for the quote, for the invoice, and for other similar entities. What to do? We can simply add this method in all classes, but it will, in fact, code duplication and backfire with the support and testing. We want to avoid complicating and simply copy methods in all classes. Then there are similar methods, and the essential classes begin to swell with the same code.

Time passes, and there is a logic that can’t be described by the external connections in the database. In this case, there is no way to place it in an essential class. We begin to create services that perform these functions. As a result, we find that the business logic is scattered by essential classes and services, and understanding where to look for the correct method is becoming increasingly difficult. Decide to refactor and to move out repetitive code in services—highlight common functionality into the interface (for example, make the interface IProductable, i.e., something that contains products). Services can work with these interfaces, thereby winning a little in abstraction. But it does not fundamentally solve the problem; we get more methods in the services and solutions for the unity of the painting techniques to transfer all the essential services in the classes. Now we know where to look for methods, but our essential classes lost any logic, and we got the so-called «anemic model.»

At this stage, we are completely gone from the concept of OOP—the object stores only the data, all the logic is in separate classes, and there is no encapsulation and no inheritance.

It is worth noting that this is not as bad as it may seem—nothing prevents implementing unit testing and the development through testing (TDD) to integrate dependency management patterns (IoC, DI), etc.. In short, we can live with it. Problems arise when the application will grow large—when we get so many entities that it’s unrealistic to keep in mind. In this case, the support and development of such an application would be a problem.

As you have probably guessed, this scenario describes the use of the data-driven approach and its problems.
In the case of domain-driven, we would proceed as follows. Firstly, there is no database designing in the first stage. We would need to carefully analyze the problem domain context, model it, and move on to OOP language.

For example, we can create an abstract model of the document, which would have a set of basic properties. Inherit from this a document that has the products, to inherit from this a «payment» document, with the price and billing address, and so on. With this approach, it’s pretty easy to add a method that finds the most expensive product—we just add it to the appropriate base class.

As a result, the scope of the problem will be described using the OPP to the fullest.
But there are obvious problems: how to store data in the database? Actually, it will require the creation of a function for mapping data from models to the fields in the database. Such a mapper can be quite complex, and when you change models, you also need to change the mapper.

Moreover, you are not immune from errors in the modeling, which can lead to complex refactoring.

Summary:
Data-driven vs Domain-driven

Data-driven

Pros

1. Allows you to quickly develop an application or prototype
2. Convenient to design (code generation, scheme, etc.)
3. Can be a good solution for small or medium-sized projects

Cons

1. Can lead to anti-patterns and loss of the OOP
2. Leads to chaos on large projects, complex support, etc.

Domain-driven

Pros

1. Use the power of OOP
2. Allows you to control the complexity of the scope (domain)
3. There are a number of advantages that are not described in the article, for example, the creation of the domain of language and the use of BDD
4. Provides a powerful tool for developing complex and large solutions

Cons

1. Requires significantly more resources for the development, which leads to greater solutions cost
2. Certain parts are becoming harder to support (mapper data, etc.)

So, what the hell should I choose?

Unfortunately, there is no single answer. Analyze your problem, resources, prospects, goals, and objectives. The right choice is always a compromise.