Sixty second explanation: What is Hashing?

Hashing is the term used for a specific sort of encryption; one-way, or asymmetric encryption. In other words, you can make something secret but never be able to reveal the secret again, ever. What use is something like that, immediately springs to mind.

As it happens, it’s actually very useful. Hashing is the key (no pun intended) to inviolable digital signatures. When you think about it, who actually needs to read a signature, you just need to able to recognize it. The reason hashing works as a signature mechanism, is because it’s always a repeatable exercise. If you know what the signature should be, when you rehash what you expect the signature to be, you should have a match. If your assumption is wrong, because something has changed or been tampered with, then the hashed value will never match (at least that’s the idea). Thus the signature has served its purpose. It doesn’t tell you what has changed or who is responsible, just that something has changed.

That’s not the only use for hashed data. A hash algorithm also reduces the size of what you are encrypting to a fixed manageable size. Thus the signature for anything, even the complete works of Shakespeare will be a hash (or message digest to the pedantic) of a fixed size, depending on the hash algorithm used. Popular algorithms are MD-5, SHA-1, SHA-256 (& other flavours of SHA-2) and most recently a new SHA-3 algorithm has emerged. There are other variants but those are the most common. They are related to one another and, as you might imagine, are successively more secure than prior algorithms (incidentally resulting in progressively larger hash sizes).

So, the more astute of you are probably wondering how one can compress a document the size of the complete works of Shakespeare into a hash the size of 256 bits say (with SHA-256). Sort of TARDIS like, if you enjoy colourful metaphors like I do. Well the answer is, there are obviously multiple collections of data out there that will eventually give you the same hash. This is called a collision in true opaque style. Indeed, most attempts to break hash algorithms involve trying to deduce arrangements of data that yield collisions. Consequently, the larger the message digest size in bits, the less likely it is to possibly obtain a viable collision, at least that’s the theory.

,

Leave a comment

Sixty second explanation: What is Thread Safety

Since individual threads operate on your code simultaneously, in the normal course of events it’s possible for separate threads of execution to reassign the values of shared variables with unforeseen consequences. In other words, the left hand doesn’t always know what the right hand is doing…

Too many cooks spoil the broth.

These side-effects are usually referred to as race conditions. Race conditions typically manifest chaotically, with nebulous, unreproducible and often contradictory results. In fact, this is often the clue I use to detect a potential race condition. They’re hard to prevent unless you have a familiarity with threading concerns and truly understand the concurrency regime your application will support.
Remediation:

  1. Isolation. i.e. don’t share variables. Have all your state local to a method. It’s very effective and a pretty efficient pattern trading RAM for parallel performance. Note that garbage collection and object instantiation costs can rapidly undermine the performance of this pattern.
  2. Synchronization. (I hate this one My least favourite). Using some form of lock, mutex or semaphore to force the shared code to be executed one thread at a time. Kind of undermines the benefits of concurrency in my opinion. Use it sparingly to avoid potential deadlocks or other performance killers.
  3. Immutability. (My favourite). Leverages a trick where value assignment instructions are always atomic. If you’re worried about shared state being mutated, make it immutable! This pattern is very effective where reads vastly outnumber writes (changes).

,

Leave a comment

Microsoft Now Supports Migrating On-Premises TFS to the Cloud (Visual Team Team Services)

Apart from the usual regularity that MS has for renaming it’s products, there’s now a clear migration pathway for moving your on-premises TFS DevOps infrastructure into the cloud and off-premises.  Keeping TFS up-to-date, robust and secure is just a pain and detracts from our core business, which is developing code right? (I hope)

I, for one, don’t like necessary evils.  However, sometimes you can apply a “somebody else’s problem field” to them.  This will allow us to stay up-to-the-minute in terms of source control and build infrastructure without having to spend one minute in out-of-hours hair pulling. I’m all for that!

You can also download the Migration Guide at: https://www.visualstudio.com/team-services/migrate-tfs-vsts/

,

Leave a comment

Seven Reasons for Employing Asynchronous Messaging

To me asynchronous messaging means asynchronous, persistent, transactional messages; think MSMQ, MQ series, Rabbit MQ that sort of thing. Asynchronous messaging is still a popular communications pattern that’s been in use since the mainframe (read overpriced space heater) era. It’s still a very popular solution for providing scalable, reliable messaging between disparate systems. Modern enterprise application design principles advocate avoiding single large monolithic solutions, hence the focus on patterns like micro-services. Instead, they focus on loosely coupled subsystems, preferably with separate ecosystems, all operating in concert, providing a single unified system.

Loose coupling is a great concept designed to reduce the resulting side-effects of any change to core features. Loose coupling acts like a developmental circuit breaker, minimizing any downstream ripples that can affect related features when making application changes. Asynchronous messaging is one of the more powerful methods for providing an application with a systematic solution for loose coupling.

Transactional message handling, much like traditional database operations, must be atomic. This means, all of the tasks undertaken in handling a message must succeed if any are to succeed at all. Any faults encountered have the effect of cancelling out the results of any tasks completed prior to encountering a fault. As you might imagine, this is often a very attractive feature that guarantees application state consistency. But like all powerful features, it comes at a price. The cost of transaction management across multiple subsystems like database connections and message queue services can be really appreciable. Thus, it’s wise to consider the extra cost of transaction control when handling asynchronous messaging, particularly when messages are being persisted. This begs the question, when should I employ asynchronous transactional persistent messaging?

1. Crossing bounded Contexts 

A bounded context is a concept associated with Domain Driven Design and often discussed in SOA related conversations. Bounded contexts are a divide-and-conquer solution to domain complexity, dividing a large application’s domain into clearly defined contexts (often around functional business areas) and declaratively managing their interrelationships. One of the prime goals of these contexts is loose coupling. Employing asynchronous persistent transactional messaging is often a very successful strategy.

2. Extensibility Points 

I’ve already written a prior article about the benefits of application extensibility. Message queues are tailor-made extension points within any application. The pub/sub pattern can be readily employed to add functionality when handling a message.

3. Parallel Processing (many threads

The pub/sub pattern makes naive parallelism for message handling relatively trivial to implement. I say naïve parallelism because the asynchronous nature of the pattern implies that concurrency is facilitated, not required. Some of the features inherent with a more traditional approach to parallelism can be awkward to implement, such as waiting for a group of parallel tasks to complete before progressing further.

4. Sequential Task Processing (single thread

Similar to the scenario above. A single message handler instance makes processing tasks sequentially relatively easy.  You might be wondering why this might be useful.  Some tasks must occur in sequential order in order to complete at all.  For example I recently had to hash the contents of file, block by block.  If the progressive hash function had not been applied sequentially to the file blocks, the message digest would never have been calculated correctly, or even been consistently wrong.

5. Out-of-band tasks (new transactions, fire and forget and non-transactional tasks)

Please note that, transaction scope support can vary wildly from platform to platform.  Many tasks don’t require a transaction or must complete regardless of whether the main task is completed successfully or not. Message handling provides an easy way to change the transaction scope and hence process tasks outside the scope of the original transaction.

6. Facilitating Retry Semantics 

Transactional messaging implies utilizing a retry strategy. This can vary from simple to very sophisticated with timed retries etc. When a robust, yet powerful retry mechanism is required for complex tasks, it’s often convenient to leverage the features supported by the message platform and avoid having to write extensively robust components from scratch without any appreciable business value.

7. Transaction Auditing

Messages are a convenient way to track transactions for audit trails. While they don’t tend to indicate transaction progress, they can be a very succinct way of determining what sort of transaction is undertaken.

, , ,

Leave a comment

Another Real World Tip for Implementing Enterprise Applications

My second tip for creating production ready enterprise applications is to ensure your design is extensible. Application extensibility is one of those qualities that separates the merely talented from the truly professional. An extensible application is specifically designed to accommodate change. To clarify, change means functional modification rather than an exercise in refactoring.

So how can one build extensibility into a design? The main secret to application extensibility is loose coupling. I’m a fan of loose coupling in general and will usually promote it as a matter of course. Adopting some good development practices and implementing several common design patterns will all promote extensibility within your application. A secondary way to promote extensibility is to use patterns that promote choosing behaviour at run time over compile time.

Good Practices

Work against business interfaces not concrete classes. There are many reasons why using a business interface will make life easier for you. In this context, using an interface makes it easy to replace an implementation. Extensibility often comes down to replacing one implementation with another that just does more. Using an interface makes this easier.

Consider using metadata to control behaviour (use attributes in C# or annotations in Java). It’s easy to control how an objects properties are processed when they are decorated with a custom attribute or annotation. For example, many custom serializers use this technique.

Design Patterns

I suggest using many of the design patterns that promote loose coupling such as:

A word of warning about loose coupling; yes, it is possible to take it too far, just look at the dangers of the relational database EAV anti-pattern if you don’t believe me.

1 Comment

A Pattern for Switching Factory Method Implementations Dynamically using C# Delegates

I see this as an alternative to the normal IOC patterns such as Service Locator or Dependency Injection. In this pattern, dependency instances are created using a factory method which is actually a delegate.

A delegate in C# is analogous to a function pointer in C or C++ and often used in similar scenarios to anonymous classes in Java. Delegates encapsulate individual method signatures as named types within a class definition. For example, the following code defines a factory method delegate named CreateBookProcessor which declares a method, returning an IProcessBooks implementation, and requiring a Book instance as an argument.

// Declare a delegate type for creating a book processor:
public delegate IProcessBooks CreateBookProcessor(Book book);

This named type can now be used just as you might any declared type within a class. The implication is, a delegate can be used as a method argument or indeed as a property value. I find it’s convenient to expose the delegate as a property, but a basic setter method is just as easy and probably cleaner. Exposing a delegate as a property is as simple as:

// Property to expose the BookProcessor factory method.
public CreateBookProcessor NewBookProcessor { get; set; }

Thus it’s possible to replace the factory method implementation just by setting the property to a new factory method implementation; Lambda’s are really convenient for this. If you’re worried about polluting your business logic with delegate properties (as you probably should be) you can use an interface to hide these class features.

Something as simple as this will replace a book processor factor implementation:

bookManager.NewBookProcessor = (book) => { return new MyBookProcessor(book); }

This will work fine as long as the MyBookProcessor class implements the IProcessBooks interface.  So to complete the pattern it’s just necessary for your class to invoke the factory method through the property any time a new dependency instance is required. Here’s an example:

public void WhoIsTheAuthor(book)
{
  // this looks funny because I’m invoking a property like a method with an argument!
  var processor = NewBookProcessor(book);
  processor.ProcessBook(book);
}

,

Leave a comment

A Real World Tip for Implementing Enterprise Applications

I thought I’d share a technique I’ve found very useful for any application of significant size with a need for automated testing.  A lot of transaction processing tasks are time sensitive, particularly in the financial or insurance domains.

This can make automated testing brittle or even impossible while your application directly acquires the current time or date from a system clock.  The tip is to implement an application clock abstraction which is always used by your application for obtaining the current date and time.  Thus in any testing, your application clock can be reliably used to simulate any arbitrary time and date repeatably.  It’s a simple but invaluable pattern making test results repeatable and consistent for years if necessary.

This pattern is really simple to implement and is even relevant for database only operations as the same pattern can be applied there too.

It’s usually worth insuring your application clock can only simulate a date and time when in development environments. It would be nasty to have a production system accidentally start using test values for time and date.

Leave a comment