Auto-incrementing Identifiers in LDAP

The LDAP protocol was designed for a subset of all database tasks. Frequently, developers run into a requirement that is satisfied in other database systems, but is not addressed in LDAP. One of those pernicious gaps is the ability to auto-increment the identifiers for a class of entry. Whereas you can tell an RDBMS to generate keys for new rows and add data without concern, the LDAP model expects you to explicitly identify whatever entries you create. The desire to overcome this issue is expressed so often on the OpenLDAP mailing list, that many of the project's veterans have (sadly) grown weary of even referring others to information on recommended solutions. On one such occasion, I described my own alorithm for an application-layer solution to the problem. Nobody gave me any negative feedback, so when the time came to use auto-incrementing identifiers in my weblog example, I implemented it. The next time somebody asked, I pointed them to my code. The following thread with senior OpenLDAP developer Howard Chu resulted:

Jon serves
Howard spikes
Jon vollies
Howard concedes point, takes match

In a nutshell, I chose to keep track of the incremented identifier in memory while Howard recommended keeping this value in the database. I came away from the exchange thoughtful yet confident, but it wasn't long before I found a flaw in my approach. Between the time a new identifier is inferred and the time the new entry is created, there is always a possibility that something can go wrong. For my weblog, it was a simple matter of data validation, but this state could also arise from any number of reasonable causes including an unavailable LDAP server or a mistyped password. The net effect is that some valid identifiers get skipped under the assumption that they've already been used. This shortcoming is relevant to both approaches, but with my algorithm these gaps spell disaster: when memory is reset and the routine is invoked to again infer the highest identifier value, it can't be reliably done without examining the identifiers for all existing entries. If the value is persistent in the directory database, it doesn't make a difference unless you're anal and the idea of non-consecutive identifiers bugs you.

I could foolproof my algorithm by checking the validity of a new identifier before using it (I actually did so for posterity), but the new performance tax of verifying identifiers made the Howard-approved distributed technique too attractive. The real reason I was against the idea of storing the value in the directory was because I was intent on minimizing the constraints of my offering by avoiding any assumptions about particular object classes, attributes, or access control rules. I thought storing the incrementor in the directory would entail all three, but after some thought I realized that wasn't the case. I liked the new solution so much, instead of just putting it in the example, I baked it into my framework. The following versions of my weblog comment class will both work.

Original version

Final version

You will note that the original version sets the new identifier in the preCreate() method using an extensive and expensive private routine, getNextIdentifier(). The final version replaces this with a single call:

setIncremental("CJBA", "c=CJBAcounter, ou=Anonymous, ou=Comments, ou=Expressions,", "description");

Now all you need to do to make an LDAPHttp object use auto-incrementing is to put a counter entry in your database and add a line like this. The method call lets you specify the counter's distinguished name and attribute to be whatever you want. An access control rule may be necessary, but if you put the counter entry in the same base as the entries you're creating, the authority to update the counter can come with the same rule that already gives users authority to create entries in the first place. The original version may still be faster, especially when lots of entries are created in rapid succession, but as Howard points out the final version has the major advantage of working in a distributed environment.

I was wrong. Admitting you made a mistake also means you are now wiser. Best practices can be challenged, but their existence presumes that they have already withstood challenge. I am hosting both algorithms because I don't think there is one right answer to this problem, but I accept with all humility that my idea was indeed inferior. My thanks to Howard Chu for taking the time to engage me and help me see the light. May this be the way in the open source community at large, so that we can all grow wiser.

Design | Javadocs | Downloads | Examples | Plan

Services | Products | Standards | Vision

© 2003 Mentata Systems. All rights reserved.