Structured, Unstructured and Semi-Structured Data

Big Data vs. Relational Data

The advent of “Big Data” is relatively new and is loosely defined as data that contains greater variety, arriving in increasing volumes and with more velocity. This is also known as the Three Vs. Greater than what? Well, greater than the data that would traditional be handled by a traditional Relational Database Management System (RDBMS).

The Three V’s

Volume – Every day, global data volume is increasing exponentially. There are many cultural, scientific and technological reasons for this including the invention and proliferation of smart phones, wearable technology, IoT devices, cloud computing, machine learning and artificial intelligence.

Velocity – The rate at which data is received and processed. Velocity has less to do with the aforementioned exponential growth of data being stored and more to do with real time streaming of that data and the need to process said data in near real time. Traditional ETL pipelines that operated on daily or even hourly batch processing just aren’t enough and so new solutions that could derive meaningful insights from data sets as they were coming in were necessary.

Variety – The increasingly varied types of data that were being processed. Constraining usable data to a predefined schema (structured data) had and still does have its advantages and in a perfect world all data would automagically be this way. But in the real world, big data solutions offer flexibility to process data much more quickly and in new ways that never would have been possible with traditional RDMS structures. These unstructured and semi-structured data types, such as text, audio, and video, require additional preprocessing to derive meaning and support metadata.

Data Structures

In the context of processing, storage and analysis, all data that exists can be categorized as either structured, unstructured or semi-structured.

Structured Data

Sometimes referred to as quantitative data. This is how all data in the enterprise used to be stored at scale. Structured data is data whose elements are addressable for effective analysis. It has been organized into a formatted repository that is typically a database. They have relational keys and can easily be mapped into pre-designed fields.

Unstructured Data

Unstructured data is a data which is not organized in a predefined manner or does not have a predefined data model, thus it is not a good fit for a mainstream relational database. It’s sometimes referred to qualitative data — you can derive meaning from it, but it can also be incredibly ambiguous and difficult to parse.

Semi-Structured Data

Data that does not reside in a relational database but that has some organizational properties that make it easier to analyze. This data probably is not as strictly typed as structured data but does enforce some rules such as hierarchy and nesting.

Software Architectural Patterns

What is Software Architecture?

The job of a Building Architect is typically to design buildings, structures and civil infrastructure. Not too dissimilarly, the job of a Software Architect is to design the systems, services and infrastructure of computing systems. More importantly, just like as building architectural planning is typically the first step in any major construction project, so too is software architecture (albeit, one of the two is better suited to an agile methodology).

What is an Architectural Pattern?

An architectural pattern is a general, reusable solution to a commonly occurring problem in software architecture within a given context. The architectural patterns address various issues in software engineering, such as computer hardware performance limitations, high availability and minimization of a business risk.

Wikipedia: Architectural Pattern

You can think of an Architectural Pattern as a sort of “template” that you can use as a first step when designing the architecture of your system or application; it is not, in and of itself, an architecture. Rather, an architectural pattern is generally considered “strictly described and commonly available”. They’re designed to be broad and represent high level solutions to general software engineering problems that are reoccurring.

Just like there are many different “styles” of Building Architecture (i.e. Classical, Industrial, Victorian, Tudor, Art Deco, ect.) Software Architecture has “Patterns”.

Why use an established Architectural Pattern?

It’s good to learn from your mistakes. It’s better to learn from other people’s mistakes.

Warren Buffett, CEO of Berkshire Hathaway

Like I said before, an architectural pattern is a starting point; a template. Starting with the model that most closely fits your project’s needs has advantages:

  • More optimized systems – by using architectural patterns, we build transferrable models that can be reused, thus making them scalable and extensible.
  • Early design changes – most architectural patterns are flexible and provide you the opportunity to examine your project holistically so that you can work out errors or fundamental changes that need to be made before technical debt is accrued.
  • Simplification – not just for your sake but for the sake of collaboration among all the stakeholders involved. The faster stakeholders can form a mutual understanding, the faster communication, negotiation, and consensus. Obfuscation never solved anything.

Common Architectural Patterns

Below are just some of today’s most commonly used patterns:

  • Layered
  • Multi-Tier
  • Pipe and Filter
  • Client Server
  • Event-Driven
  • Microservices

There are many many other architectural patterns out there and this only represents a small subset of those. I may cover some of these plus others in more detail in the future via a separate post.

Below are high level conceptual diagrams for each of the above.

Layered Architecture

Multi-Tiered Architecture

Pipe and Filter Architecture

Client Server Architecture

Event-Driven Architecture

Microservices Architecture