A Simple Design Pattern for Clarity

I hate opaque flow control conditions, you know, “if clauses” with impenetrable conditions that defy reason without clarifying comments or extensive crib notes. The scenario is very common and not really something that’s easily avoided. In these cases, I tend to wrap the condition in a simple method (with no behaviour), where the name explains the test the condition is satisfying.
Here’s a very simple example:


if (usertype == "2" || usertype == "4" || usertype == "6")


This condition isn’t clear what it’s testing for. In fact, it’s testing to see if the user type is a category of user type, in this case a “business” user. So the pattern is just to wrap the condition inside a simple method, like so:


if (userIsBusinessType(usertype))


The new method is simplicity itself and could ostensibly be documented with a more extensive description of the condition being tested.

private static bool userIsBusinessType(string usertype)
{
return usertype == "2" || usertype == "4" || usertype == "6";
}

Superficially, this is an extremely simple pattern, but no one will dispute that it provides better clarity. However, importantly, when it’s compiled, it adds no more instructions for the condition to execute. That’s because we’ve added no behaviour to the condition, only an abstraction which can be optimized away by the compiler.
I use this pattern extensively. Good object orientation techniques and judicious use of extension methods mean that you can improve the pattern further, with an example like this:

userType.IsBusinessType

But the basics remain the same. Either way it’s little touches like this that really apply the polish to good code for me.

Leave a comment

Bind Variables Good, Literals Bad…

It’s always a temptation for developers to just concoct a literal query with injected parameters, for example:

“SELECT first_name, last_name FROM my_users WHERE user_name = ‘”+strUserName+”’”

Don’t give in to this temptation. The correct way to construct parameterized application queries is to use bind variables for each parameter. In the MS-SQL dialect the equivalent would be:

“SELECT first_name, last_name FROM my_users WHERE user_name = @userName”

Or in the Oracle dialect this might be:

“SELECT first_name, last_name FROM my_users WHERE user_name = :userName”

In both cases a bind variable has been used instead of a literal username value in the query. There are two immediate and very strong reasons for doing this.

  1. Queries with bind variables are resistant to SQL injection attacks. This includes most SQL injection smuggling exploits as well.
  2. It is very difficult for any RDBMS to exercise query caching successfully against queries purely literal parameters. Query caches are a valuable performance aid, particularly for complex queries. It’s easy to exactly match an explain/execution plan to a query with bind variables because the text of the query is immutable, only the parameters change.

In C# terms, the actual parameter values of each bind variable are added as parameter definition objects to the query command before being executed. This can be done in similar fashion through JDBC in Java by utilizing the prepared statement object parameterized queries. While utilizing these features is a little less transparent and results in more code, the advantages vastly outnumber the disadvantages.

There is one performance related note I’d like to add, which can paradoxically result in some literal queries functioning better than queries with bind variables. Literal values are much easier for the optimizer to check against index cardinality values (through histograms and the like) and hence in some unusual and rare cases, the optimized explain/execution plan for a literal query can outperform the equivalent plan for a bind variable based query. However, in almost every case, it’s best to use bind variables.

Leave a comment

Sixty second explanation: What is Hashing?

Hashing is the term used for a specific sort of encryption; one-way, or asymmetric encryption. In other words, you can make something secret but never be able to reveal the secret again, ever. What use is something like that, immediately springs to mind.

As it happens, it’s actually very useful. Hashing is the key (no pun intended) to inviolable digital signatures. When you think about it, who actually needs to read a signature, you just need to able to recognize it. The reason hashing works as a signature mechanism, is because it’s always a repeatable exercise. If you know what the signature should be, when you rehash what you expect the signature to be, you should have a match. If your assumption is wrong, because something has changed or been tampered with, then the hashed value will never match (at least that’s the idea). Thus the signature has served its purpose. It doesn’t tell you what has changed or who is responsible, just that something has changed.

That’s not the only use for hashed data. A hash algorithm also reduces the size of what you are encrypting to a fixed manageable size. Thus the signature for anything, even the complete works of Shakespeare will be a hash (or message digest to the pedantic) of a fixed size, depending on the hash algorithm used. Popular algorithms are MD-5, SHA-1, SHA-256 (& other flavours of SHA-2) and most recently a new SHA-3 algorithm has emerged. There are other variants but those are the most common. They are related to one another and, as you might imagine, are successively more secure than prior algorithms (incidentally resulting in progressively larger hash sizes).

So, the more astute of you are probably wondering how one can compress a document the size of the complete works of Shakespeare into a hash the size of 256 bits say (with SHA-256). Sort of TARDIS like, if you enjoy colourful metaphors like I do. Well the answer is, there are obviously multiple collections of data out there that will eventually give you the same hash. This is called a collision in true opaque style. Indeed, most attempts to break hash algorithms involve trying to deduce arrangements of data that yield collisions. Consequently, the larger the message digest size in bits, the less likely it is to possibly obtain a viable collision, at least that’s the theory.

,

1 Comment

Sixty second explanation: What is Thread Safety

Since individual threads operate on your code simultaneously, in the normal course of events it’s possible for separate threads of execution to reassign the values of shared variables with unforeseen consequences. In other words, the left hand doesn’t always know what the right hand is doing…

Too many cooks spoil the broth.

These side-effects are usually referred to as race conditions. Race conditions typically manifest chaotically, with nebulous, unreproducible and often contradictory results. In fact, this is often the clue I use to detect a potential race condition. They’re hard to prevent unless you have a familiarity with threading concerns and truly understand the concurrency regime your application will support.
Remediation:

  1. Isolation. i.e. don’t share variables. Have all your state local to a method. It’s very effective and a pretty efficient pattern trading RAM for parallel performance. Note that garbage collection and object instantiation costs can rapidly undermine the performance of this pattern.
  2. Synchronization. (I hate this one My least favourite). Using some form of lock, mutex or semaphore to force the shared code to be executed one thread at a time. Kind of undermines the benefits of concurrency in my opinion. Use it sparingly to avoid potential deadlocks or other performance killers.
  3. Immutability. (My favourite). Leverages a trick where value assignment instructions are always atomic. If you’re worried about shared state being mutated, make it immutable! This pattern is very effective where reads vastly outnumber writes (changes).

,

Leave a comment

Microsoft Now Supports Migrating On-Premises TFS to the Cloud (Visual Team Team Services)

Apart from the usual regularity that MS has for renaming it’s products, there’s now a clear migration pathway for moving your on-premises TFS DevOps infrastructure into the cloud and off-premises.  Keeping TFS up-to-date, robust and secure is just a pain and detracts from our core business, which is developing code right? (I hope)

I, for one, don’t like necessary evils.  However, sometimes you can apply a “somebody else’s problem field” to them.  This will allow us to stay up-to-the-minute in terms of source control and build infrastructure without having to spend one minute in out-of-hours hair pulling. I’m all for that!

You can also download the Migration Guide at: https://www.visualstudio.com/team-services/migrate-tfs-vsts/

,

Leave a comment

Seven Reasons for Employing Asynchronous Messaging

To me asynchronous messaging means asynchronous, persistent, transactional messages; think MSMQ, MQ series, Rabbit MQ that sort of thing. Asynchronous messaging is still a popular communications pattern that’s been in use since the mainframe (read overpriced space heater) era. It’s still a very popular solution for providing scalable, reliable messaging between disparate systems. Modern enterprise application design principles advocate avoiding single large monolithic solutions, hence the focus on patterns like micro-services. Instead, they focus on loosely coupled subsystems, preferably with separate ecosystems, all operating in concert, providing a single unified system.

Loose coupling is a great concept designed to reduce the resulting side-effects of any change to core features. Loose coupling acts like a developmental circuit breaker, minimizing any downstream ripples that can affect related features when making application changes. Asynchronous messaging is one of the more powerful methods for providing an application with a systematic solution for loose coupling.

Transactional message handling, much like traditional database operations, must be atomic. This means, all of the tasks undertaken in handling a message must succeed if any are to succeed at all. Any faults encountered have the effect of cancelling out the results of any tasks completed prior to encountering a fault. As you might imagine, this is often a very attractive feature that guarantees application state consistency. But like all powerful features, it comes at a price. The cost of transaction management across multiple subsystems like database connections and message queue services can be really appreciable. Thus, it’s wise to consider the extra cost of transaction control when handling asynchronous messaging, particularly when messages are being persisted. This begs the question, when should I employ asynchronous transactional persistent messaging?

1. Crossing bounded Contexts 

A bounded context is a concept associated with Domain Driven Design and often discussed in SOA related conversations. Bounded contexts are a divide-and-conquer solution to domain complexity, dividing a large application’s domain into clearly defined contexts (often around functional business areas) and declaratively managing their interrelationships. One of the prime goals of these contexts is loose coupling. Employing asynchronous persistent transactional messaging is often a very successful strategy.

2. Extensibility Points 

I’ve already written a prior article about the benefits of application extensibility. Message queues are tailor-made extension points within any application. The pub/sub pattern can be readily employed to add functionality when handling a message.

3. Parallel Processing (many threads

The pub/sub pattern makes naive parallelism for message handling relatively trivial to implement. I say naïve parallelism because the asynchronous nature of the pattern implies that concurrency is facilitated, not required. Some of the features inherent with a more traditional approach to parallelism can be awkward to implement, such as waiting for a group of parallel tasks to complete before progressing further.

4. Sequential Task Processing (single thread

Similar to the scenario above. A single message handler instance makes processing tasks sequentially relatively easy.  You might be wondering why this might be useful.  Some tasks must occur in sequential order in order to complete at all.  For example I recently had to hash the contents of file, block by block.  If the progressive hash function had not been applied sequentially to the file blocks, the message digest would never have been calculated correctly, or even been consistently wrong.

5. Out-of-band tasks (new transactions, fire and forget and non-transactional tasks)

Please note that, transaction scope support can vary wildly from platform to platform.  Many tasks don’t require a transaction or must complete regardless of whether the main task is completed successfully or not. Message handling provides an easy way to change the transaction scope and hence process tasks outside the scope of the original transaction.

6. Facilitating Retry Semantics 

Transactional messaging implies utilizing a retry strategy. This can vary from simple to very sophisticated with timed retries etc. When a robust, yet powerful retry mechanism is required for complex tasks, it’s often convenient to leverage the features supported by the message platform and avoid having to write extensively robust components from scratch without any appreciable business value.

7. Transaction Auditing

Messages are a convenient way to track transactions for audit trails. While they don’t tend to indicate transaction progress, they can be a very succinct way of determining what sort of transaction is undertaken.

, , ,

Leave a comment

Another Real World Tip for Implementing Enterprise Applications

My second tip for creating production ready enterprise applications is to ensure your design is extensible. Application extensibility is one of those qualities that separates the merely talented from the truly professional. An extensible application is specifically designed to accommodate change. To clarify, change means functional modification rather than an exercise in refactoring.

So how can one build extensibility into a design? The main secret to application extensibility is loose coupling. I’m a fan of loose coupling in general and will usually promote it as a matter of course. Adopting some good development practices and implementing several common design patterns will all promote extensibility within your application. A secondary way to promote extensibility is to use patterns that promote choosing behaviour at run time over compile time.

Good Practices

Work against business interfaces not concrete classes. There are many reasons why using a business interface will make life easier for you. In this context, using an interface makes it easy to replace an implementation. Extensibility often comes down to replacing one implementation with another that just does more. Using an interface makes this easier.

Consider using metadata to control behaviour (use attributes in C# or annotations in Java). It’s easy to control how an objects properties are processed when they are decorated with a custom attribute or annotation. For example, many custom serializers use this technique.

Design Patterns

I suggest using many of the design patterns that promote loose coupling such as:

A word of warning about loose coupling; yes, it is possible to take it too far, just look at the dangers of the relational database EAV anti-pattern if you don’t believe me.

1 Comment