Monday, April 27, 2015

Do NoSQL databases have a place in your enterprise?

There’s been a quiet revolution going on in the world of database technology. After dominating for years, the monopoly of relational databases is ending. Names like Cassandra, MongoDB, Google AppEngine DataStore, Amazon’s DynamoDB, HBase (a Hadoop technology), and Raik are showing up alongside traditional databases like Oracle, SQL Server, and MySQL. The primary data storage technologies offered today by many cloud providers are NoSQL, and NoSQL databases are already at work within a growing number of enterprises enabling them to store and process more data than ever before.

So what changed?

Loosely encompassing everything ‘not SQL’, or not relational, NoSQL databases began emerging as a phenomena in the 2000s at the intersection of some developments in the industry. Internet presences like Google, Amazon, Facebook, and Twitter were rapidly growing. Trend-defining concepts like ‘cloud computing’ and ‘big data’ were also beginning to take hold, powered by more affordable physical data storage. Together these increased the demand for databases that could scale far greater than was typically needed before.

At the same time an important change was also taking place on the software side. Web services were becoming a prominent means of providing access to data, reducing dependence on SQL. SQL is tied to relational databases. Web services opened the door to alternatives.

And in the era of ever ‘bigger data’, companies needed alternatives. Relational databases weren’t designed to be run on clusters of commodity computers, where many of the benefits that give them value can be lost and traditional pricing structures just don’t fit. They also sometimes don't give enough control over query performance, and it can be tough for developers to make them fit application logic.

Enter NoSQL.

Led by pioneering work by Google and Amazon to create databases that could scale to massive clusters of commodity computer hardware, NoSQL is now a loose term encompassing all databases that don’t model and access data in the same way relational databases do.

The majority of NoSQL databases are for clusters and come in roughly three flavors, each of which model data in a different way:

Key-value databases associate chunks of data with keys which are used to access the chunks. These chunks can be virtually anything, but to the database are just meaningless bytes to store and retrieve. For example, a customer chunk may include name and address information, but a key-value database cannot identify them and can only retrieve these chunks by their key.

Document databases also associate chunks of data with keys. The difference with document databases is that information within chunks can be identified to the database and used for indexing and queries. For example, a customer chunk’s name and address information could be identified to a document database which would then be able to find customer chunks by name, address, or by key.

Column family databases save chunk information in columns within column families. For example, a customer chunk's name and address could be stored in columns in one column family, and customer orders in columns in another column family, enabling information on customers to be accessed independently from their orders. Column family databases like Cassandra and HBase store column family data across rows together, giving them some resemblance to column-oriented databases. It’s fairly easy to imagine storing data that needs to be summed or averaged in a column family so processes can get to the data efficiently within a cluster. Storing event information like customer updates in a CRM system, posted messages in a blog, and user actions in a system are other uses that come to mind.

In addition to those designed for clusters, the NoSQL term also includes other types like NoSQL graph databases. Graph databases are not designed specifically for clusters and are instead focused on connecting data values together in a variety of ways for social, organizational, spatial, and other rich interrelationships.

While many organizations aspire to standardize on a single family of technology products, making all data storage needs fit in a single database technology is a bit like mandating carpenters only use a table saw for cutting wood. Modern functions like online session management, transaction processing, event tracking, data warehousing, analysis, social relations, spatial correlation, customer preferences, compliance, and search, to name a few, have different database needs. As the data grows, these differences only become more apparent. This is why in the era of big data the trend is towards matching databases with specific needs, even to the extent of using more than one database within a single application.

I'll paint a simple picture to illustrate.

Consider an e-commerce system positioned for growth. Such a system today may include the following high-level requirements:
  • Collect rich data on user interactions for future data mining of customer patterns, security audits, compliance, and disclosures.
  • Capture relationships between customers for cross-marketing.
  • Capable of broad international expansion.
  • Aggregate an expanding inventory of catalog items from a growing list of international and regional suppliers.
  • Perform well under heavy loads.
  • Scale without technology costs spiraling exponentially.

Such a system probably could not use relational databases exclusively and meet these requirements under today’s loads. Implementations of large systems do, however, meet similar requirements by incorporating NoSQL technologies, oftentimes alongside relational database technologies. Breaking the system down into subsystems, and then relating each to a type of database that could fit, a hypothetical implementation could look something like this:
  • User profiles, preferences, and other cached data served from a NoSQL key-value database for simple, highly scalable, rapid lookup of arbitrary chunks of data.
  • A user session shopping cart subsystem also built on a NoSQL key-value database, keyed by user session.
  • A central accounting and finance subsystem built on a relational database for strong consistency, rigid data structures, rich data validation, and ad-hoc reporting.
  • A customer subsystem for capturing customer information, relationships, and recommendations for cross marketing and social mining built, in part, on a NoSQL graph database, capable of connecting data together in a variety of ways.
  • A catalog of products capable of handling and organizing large amounts of data from a variety of suppliers based on a NoSQL document database where stored values have structure that can be queried on.
  • An order maintenance subsystem for capturing, viewing, and exchanging information on recent orders to supplier fulfillment systems also based on a NoSQL document database.
  • An analytics subsystem for archiving orders for trend analysis and data mining based on a NoSQL column family database, where related information in rows can be grouped together into column families for efficient processing.
  • An event tracking and logging subsystem that captures actions taken in the system for experience optimization, fraud analysis, intrusion detection, and future data mining also built on a NoSQL column family database.

NoSQL technologies are increasingly being used like this to power modern systems, but they do have limitations of their own.

NoSQL databases built for clusters keep chunks of data whole for more efficient distribution within a cluster, but this can limit real-time, ad-hoc reporting capability. For example, it may make sense to save orders in NoSQL whole, with each order chunk containing complete information on the customer who made the order and all items in the order. Doing so, however, makes it more difficult to query for things like “give me all customers who ordered a particular item” because this information is spread among all order chunks. In contrast, a relational database design would typically store customer and item information separately then relate both in separate order records, enabling the data to be queried in a variety of ways.

This, of course, relates to the problem of how to group data together into chunks in the first place. Continuing with the same example, would it be best to group information into order chunks or customer chunks, where each customer chunk would contain information on all orders and items in each order for that customer? Grouping by customer chunks could be best for providing customers access to their orders because satisfying a customer's request to view their orders would only require a single retrieval of data within a cluster, helping make the application more responsive. On the other hand, grouping by order chunks may be best for a fulfillment system so it’s not necessary to retrieve and open all customer chunks just to see if there’s an active order that needs to be fulfilled. How to group data into chunks depends on how data will be queried. Unfortunately, it’s usually not possible to predict, especially early in projects, exactly how data will be queried, and making changes to how data is grouped into chunks down the road can be difficult.

This example’s a bit oversimplified, but it serves to illustrate some of the new design challenges NoSQL databases present in contrast to those of relational databases. Relational technology offers fairly standard guidance on how data should be broken apart into finer-grained units so new queries can be made without restructuring the data or extra processing, but in doing so trades off the ability to easily distribute the data in clusters. For example, how can order information linked to separate customer and item information be sent to different computers in a cluster and the relational database still support queries -- at least without sending linked customer and item information along to each machine too?

If it sounds like NoSQL presents new software engineering challenges, it does. Some work like designing data structures and queries, ensuring data integrity and consistency, aggregating data for reporting, and performance tuning, handled for years by Database Administrators (DBAs) using built-in relational database features, are headed squarely back over to your application developers. And NoSQL technologies are young and changing quickly. Software design skills capable of modeling data in new ways and keeping database logic separate from application logic are becoming more important again.

While relational databases may continue to be a good fit for many enterprise applications, big data has ended their monopoly. NoSQL databases are now prominent offerings of major cloud platform providers like Amazon, Rackspace, and Google. Whether it be because of a next-generation application, the ability to archive a richer set of data for mining purposes, or the need to collect more system events for new compliance requirements, it’s prudent to prepare for NoSQL having a place alongside relational databases within your enterprise.

Wednesday, March 18, 2015

Demystifying Apple’s iBeacon

Chances are you’ve heard of Apple’s iBeacon. Wisdom of the day has it ushering in a new revolution in smartphone technology. As with everything mobile, the hype surrounding it is pretty intense, but hype isn’t reality. To discover ways your business may be able to take advantage of it -- or avoid getting taken advantage by it -- it’s necessary to take a look at how it really works.

Beacons establish a virtual region around them. When in range, interested smartphones can detect entrance, exit and rough distance to a beacon along with its identifying information with room-level accuracy of about 30 feet.

The term “iBeacon” often gets mixed up with Bluetooth Low Energy (BLE). iBeacon isn’t BLE, it’s built upon it.

Just like traditional Bluetooth, BLE (marketed as Bluetooth Smart) is a near-range radio communication technology. BLE slims down on the amount of data it can transfer, which saves power. Whereas traditional Bluetooth can transmit audio to your headset yet batteries last days at most, BLE is limited to transmitting small amounts of data like a heart rate from a wrist monitor every few seconds but batteries can last for months -- or longer.

BLE beacon transmissions are like the heart rate monitor, only even simpler. They transmit an identifying code and a smidgen of data about the device over and over again in one direction only. That’s it.

iBeacon is just a beacon specification promoted by Apple, and it’s not the only game in town. There's another by Gimbal, an open standard, and others. The primary difference between them is the format of the identifying code that’s transmitted.

While the functionality of beacons is simple, by recognizing and looking up other information based on identifying codes received from beacons, apps running on smartphones can do some pretty interesting things. Imagine a retail app automatically showing information on tools when customers walk into the tools department then gardening information when they enter gardening. Or an app that gives workers an indoor map of a production floor, showing their location in relation to other workers and moveable equipment. How about an app that helps riders find buses in a large indoor station; an app that turns lights on when entering a room; or an app that helps you find your car in a parking garage?

Because apps must be able to relate identifying codes with other information, specific beacons are usually related to specific apps. After all, what use would there be in receiving an identification code but not being able to relate it to any other information?

Recent Apple i-devices can receive and decode iBeacons transmissions (and become iBeacons as well), but there are some frustrating limits to what you can do, using the identifier is up to apps, and how iBeacons are managed is up to manufacturers. Although Apple can’t stop newer Android devices from using iBeacons, it’s not sanctioned. So, if you want a more comprehensive solution that includes Android devices it may be worth looking into cross-platform alternatives, Gimbal and Radius Networks as two examples. It’s important to keep in mind BLE beacons are a relatively new technology so it’s probably best to stick with one device manufacturer at this point.

Despite the hype out there, knowing how beacons really work should make it obvious that they don’t talk to heart monitors or handle payments. They're not a way to deliver coupons or a replacement for ‘old’ smartphone location services. They're also not a new way for hackers to control phones. And iBeacon is certainly not the only type of beacon out there -- or necessarily even the best. Yet even after peeling away the misconceptions, iBeacon and others can still be used for some pretty innovative things.

Thursday, January 22, 2015

Software contractors and IP

"Work made for hire" may not apply to commissioned software

Did you know using an independent contractor agreement that stipulates software contractor work as work made for hire, presumably so your company owns copyright to software developed by the independent contractor, may not result in your company owning copyright? The contractor may still retain ownership because most software does not fit into one of the nine categories of works required for commissioned works to qualify as work made for hire.

Assignments are used instead

Obtaining copyright to software developed by an independent contractor may instead require an assignment. With an assignment, the contractor expressly assigns copyright to your company. Since it's hard to know what intellectual property may actually exist in software, and typically software deliverables also include technical documentation and other peripheral works, assignments of software I’ve seen have typically assigned all independent contractor rights, title, and interest to all deliverables prepared by the independent contractor.

But there’s more in deliverables than meets the eye

No matter how convenient it would be to separate a work from the mind that created it, software development and consulting works contain the unique experiences, skills, perspectives, and expert opinions of the contractor that produced them. Each of these are valuable components of the contractor's intellectual 'toolbox', needed by the contractor to continue making a living. Of course these can't really be "removed" from the contractor anyway: an intellectual work, and the ideas it contains, can't be separated either from the work or the mind that invented it. At least I don't know how to do it.

Insisting on everything could get you less

Would a contractor freely provide you with everything their intellect has to offer if they know they're
potentially giving up all rights to everything that arises, results, or can be derived from any part of their work with you forever? Does motivating a contractor in this way encourage them to provide you with everything of value they have to offer?

A better alternative?

There is another way that may be better for everyone. Having the independent contractor provide you with a license that authorizes all the uses you require of a deliverable may be able to meet your needs without excluding the contractor of rights of authorship. The intellectual property clauses I've seen in agreements typically gives clients nonexclusive, perpetual, worldwide license to use and sub-license the use of all work products for the purpose of developing and marketing their products or internal business operations. Can you imagine really needing more than that?

When you still need an assignment

Well, maybe there are circumstances where you DO need more. Maybe you have an existing work made for hire relationship with your customer, or maybe you just feel you need to own the copyright. For these situations, including a provision that supports adding an assignment on a project-by-project basis may enable the best of both worlds. You're still provided a license to all works by default, enabling your contractor to share the full value of their experiences, skills, opinions, and software creativity uninhibited. But for specific situations when it's needed, a full assignment can be made.

My take

Work made for hire agreements with your independent software contractors may not result in you having the rights you expect. An assignment can transfer "all rights, title, and interest", but going that far may not be necessary -- or ideal. Licenses authorizing uses can probably give you all you need (and more) and won't exclude the contractor from continuing to use expressions of their intellect and knowledge to make a living. Best of all, they won’t be restricted from giving you everything they've got.


Note that this information is for informational purposes only. Compass Point, Inc. is not a law firm and does not provide legal services or legal advice, and this information is not a substitute for an attorney’s advice. Please consult a licensed attorney in your area with specific legal questions or concerns.

Wednesday, August 6, 2014

Making employee productivity mobile apps the lean way

How many times have you caught yourself asking that same question as you observe your employees at work: A truck arrives and someone enters a handwritten form into a PC. A manager spot checking quality stops to track down a hand-held scanner. An employees digs for receipts in their wallet and enters expenses when back in the office. Why can’t we develop a mobile app to handle that?

The thought of having a thick mobile app polished over the course of months for the iPhone is enough to stop many cold in their tracks. Then what about Android? Then Windows? A pretty expensive employee productivity boost, and do you really need a work of art?

Luckily there is a lean way to develop apps once that can then run across devices -- including the big thing sitting on your desk. It doesn’t carry expensive licensing fees and has been running in your browser for years.

HTML has gone from simple web ‘pages’ to powering full-fledged applications. In fact HTML was the original way developers produced iPhone apps. Even its latest version, HTML5, is supported on all major mobile devices and provides access to prominent mobile device capabilities such as camera and GPS. HTML5 apps can also be designed to work offline when the network is down, just like a native mobile app can. And an HTML5 app can be turned into a full-fledged native mobile app by placing it within a thin native application ‘shell’ when it needs full access to underlying device capabilities.

Imagine a web data entry application that captures GPS coordinates and images of transaction receipts in the field when and where the transaction occur. Imagine it working even where network coverage is spotty. Imagine employees doing all this from the devices they are comfortable with already in their pockets. It’s all possible using open technology that’s been around for years.

Yes, native application development is still there when you need an app optimized for flashy user performance on a specific device. But when lean business productivity is your focus, it’s nice to know you can have a mobile app for that too.

Wednesday, December 5, 2012

Understanding MS enterprise integration technologies

ESB, or Enterprise Service Bus, is a concept that's been around for a while. Borrowed from the concept of a computer hardware I/O bus, the purpose of ESB is to facilitate communication between the various software applications -- internal and external -- that comprise an enterprise system.

ESB was traditionally designed as MOM, or Message Oriented Middleware, and implemented with message queuing technologies. This is giving way, however, to a service-based design.

Over the years MS has provided a variety of technologies for moving data around the enterprise, with BizTalk most closely resembling an ESB. Other complementary and overlapping technologies have emerged from MS, however, indicating they are not taking a 'one size fits all' approach to ESB:

  • WCF: At the core of these technologies is WCF, or Windows Communication Foundation. WCF supports the development of communication-unaware interfaces and business-logic implementation classes which can then be 'placed' into a WCF host and accessed across networks mostly as if they were local objects.
  • BizTalk: BizTalk is the MS technology that most resembles a traditional ESB. One of the features that made BizTalk somewhat unique is its ability to design and 'run' orchestrations, or flow charts of processing and communication steps. With its use of WCF and resemblance to WF Services, it's less clear than in years past what unique value it has to offer.
  • WF Services: WF, or Windows Workflow, supports both graphical design of program logic and an engine to run the logic. With the addition of two particular, built-in Workflow Activities, Receive and Send, WF can now graphically create and run WCF services and control the flow of their operations, known as WF Services.
  • AppFabric: Used to provide caching, monitoring, and control of WCF Services, including WF Services.
  • Azure Service Bus: Used to provide cloud-based WCF Service communication routing between organizations behind network firewalls.
  • LOB Adapters: MS also provides pre-built LOB, or Line of Business, Adapters to a variety of third party applications as well as an SDK, or Software Development Kit, for creating them. LOB Adapters expose line of business application capabilities as WCF Services so they can integrate with other applications across enterprise networks using MS technologies.

Instead of writing complex communication protocol code, with WCF it is possible to focus almost exclusively on developing business logic, thereby minimizing the need for a specialized ESB solution.That may be the strategy: make the communication technology full-featured then provide technologies to extend it when necessary instead of confining organizations to a single bus and its way of doing things.