An ORM for C?

Introduction

Hanging out on StackOverflow I was interested by this question and then disappointed by answers that argued:

Taking each of these in turn:

Hence my disappointment. So below I’m going to sketch what I think an ORM for C could look like.

Note: I was just exploring the space here. I hadn’t thought about this before and, it turns out, I didn’t get that far.

Basic Requirements

This is what would have helped me recently:

I think these are all fairly self-explanatory. Maybe the efficiency issue is controversial, but it reduces scope for an initial project and addresses my own needs.

Initial Technology Choices

Since we’re working in C, this has to be layered over ODBC - it’s the recognised standard.

As discussed earlier, the lack of RTTI suggests that we need a pre-runtime solution. And that implies generating code. Which doesn’t fill me with pleasure, I must admit. But I can’t see a way around this, so will roll with the punches, make lemonade, etc, etc…

The need to work with existing structs implies parsing C code (to read those structs). It also, likely, implies talking to a database and constructing C code via a template engine or similar. How should all that be implemented? Using C would require less additional tools (since the target is C), but if I am implementing it I’d prefer something I can work more quickly in. Python might be a good choice - it’s widely available and has tools like pycparser and a pile of templating engines.

Foreign Keys

How do we handle the relationship between pointers to other structs and foreign keys? The requirements give part of the answer - we don’t retrieve connected objects - but leave open how we store the information so that related values can be retrieved efficiently later.

A “dirty” solution might store foreign keys in the pointer itself. I think that would work (for integer keys on x86_64 architectures, at least), but it’s likely offending something in the ISO spec and/or places an unclear limit on the values of keys.

Another solution would be to have a separate cache for this information. We could use the address of the struct in memory as a key. But how do we manage the lifetime of this information? This seems like it would be complex and intrusive (eg. requiring a custom free()).

For guidance, I have looked at my own project. In general, because I was having to deal with the database by hand, I stored related keys rather than nested structs. That seems like a simple solution that avoids what is otherwise a hard problem.

But there’s another possible solution too, because most structs also store their own key, and that’s to support, in the API, a request for the related struct (or structs). This has a cost (a join) that could be avoided by some of the (rejected) ideas above, but it fits with the requirements and feels like it would make a “comfortable” API.

So we will not map nested / related structs. You can also have a pointer to a nested struct, and populate it from the key, or from the key of the struct you already have, plus the type. But we won’t have any “magic” for making links work.

Relationships

Implicitly above I was considering one-to-one relationships. But many different relationships as possible:

There seems to be a simplification here, because C makes little distinction between 1 and many pointers, so a single API may be able to handle multiple cases.

Tentative API

Given the above, it’s probably worth making a strawman API. For error handling all routines will return an int, 0 for success. The database connection itself, any options, etc, will be stored via an ADT (opaque pointer - I’m exposing pointers so that const is useful):

(Please excuse my C-like psuedocode).

typedef struct {
    int free(SRM **srm, int status);
    ...
} SRM;
int srm_open(SRM **srm, const char *db_url);
const char *srm_error(int error);

Here SRM provides a namespace for other operations. It also provides names for structures. In the examples below I will use struct_a etc as arbitrary structures. These are already known (the library code is generated at compile time) and named via, for example, SRM.struct_a.

The status parameter for srm_free() allows simple chaining of the status (the previous value is returned unless it was 0 (OK) and an error occurs while closing).

Using free(**SRM) allows the ADT to be nulled. Is this worth the non-intuitive API?

Instance Retrieval

int find(SRM *srm, const char *name, void **result, int *n, ...);

    typedef struct {
        int id;
        int foo;
        char *bar;
    } struct_a;

    SRM *srm = NULL;
    struct_a *a;
    int n;

    int status;
    if ((status = srm_open(&srm, "mysql://...."))) goto exit;
    ...
    if ((status = srm->find(srm, srm->struct_a, &a, &n, "foo", 42, NULL))) goto exit;
    printf("retrieved %d instances where foo=42\n", n);

exit:
    if (a) free(a);
    if (srm) status = srm->free(&srm, status);
    return status;

Here the varargs name and provide values for fields in the struct. The type must match the type in the struct, which is going to lead to errors. Perhaps there should be specialised versions with explicit arguments for small structs? Or for larger structs with pre-selected fields?

Do we need a constraint on the maximum number returned? Would it be better to have a specialised version that raises an error if it doesn’t return a single value?

Would it be better to have separate functions (srm->find_struct_a etc) rather than the name parameter? I suspect the name will re-appear on other functions, so is reused in a way that functions would not be (reducing the total number of components to the API).

Similarly, should names and functions be namespaced to avoid conflicts? For example, srm->f.find and srm->n.struct_a might be used.

int related(SRM *srm, const char *from, void *from_id, const char *to, void **result, int n);

    if ((status = srm->related(srm, srm->struct_a, a->id, srm->struct_b, &b, &n))) goto exit;

Can we automatically handle the different types of relations listed earlier at compile time by inspecting the database? Is that too clever? Are ambiguities likely to be important?

Questions about size limits also apply here.

We could add varargs constraints…

Out of Time / Conclusions

OK, I’m out of time. I think I’ll publish this as it is and see if there are any useful comments.

It seems to me that there is useful functionality here, but the API is already pretty complex. And the smarts needed to implement it at compile time are significant.


Related Posts

blog comments powered by