Storj.io: An open source, massively distributed object store and API for developers

I’ve been playing with a new object storage solution that’s kind of cool. It’s called Storj. Before I describe how it works, let me start by comparing it to a more familiar solution.

Probably the best-known example of object storage is Amazon S3. It allows you to define buckets and then upload files into those buckets. Amazon charges you based on the amount you store and the amount you transfer, plus a little based on the total number of objects stored. There are three tiers of storage based on frequency of access and pricing varies by region, but for discussion purposes let’s say it is about $0.023 per GB per month. To store 500 GB that would cost about $138 per year without transfer fees.

For that $138 you can be sure that Amazon is replicating your data across multiple facilities and devices. Amazon says that S3 offers 99.999999999% durability. That’s pretty impressive.

But one consideration with using S3 or any other traditional cloud storage solution is that your data is sitting in data centers owned by a single vendor. Of course you could take steps to replicate that data to other providers, but that is kind of a pain. Even then you will still end up with your data sitting behind a relatively small number of vendors, none of whom are really geared toward transparency and openness.

Storj.io was built to address this problem. It’s an open source, distributed object storage platform. Like S3, the model consists of buckets and files in those buckets. The difference is in how your data is stored. When you upload a file to Storj, your file is broken into small pieces called shards, encrypted using keys you hold, and then uploaded to several nodes around the world.

Here’s where it gets really interesting. The nodes that store data are not owned by a single entity. Instead, nodes are run by “storage farmers”. Disk farming is kind of like crypto currency farming, but instead of solving mathematical computations to earn crypto coins, farmers receive micro payments based on how much their space gets utilized. Storj actually leverages the Ethereum blockchain to make this work, and if you are interested in the nitty gritty details, you should check out the whitepaper.

A farmer might be an individual with 50 GB of spare disk space or it could be an organization with lots and lots of space. You don’t know and you don’t really care. Their space gets selected based on a number of factors, which includes things like stability of the node, bandwidth, and total space available. If a farmer tries to tamper with any data on their node they get dropped and they don’t get paid.

Right now Storj is offering 25 GB of free space for one year. After that, their current pricing is $.015 per GB per month. So using my 500 GB example, that’s $90 per year without transfer fees. And if you have some extra storage sitting around, you could become a farmer and offset your costs a little bit.

To be clear, Storj is a tool for developers. After signing up you’ll get presented with a GUI for creating buckets, but when it’s time to start moving data into those buckets you’ll need the API. Right now there is a NodeJS library or you can use a command-line tool provided by a native installer.

This service definitely looks promising, but it is important to know that it is still early. One thing to think about is what happens if farmers start dropping out of the network. When your file is split into shards, each shard is copied to multiple nodes. In my quick test, shards were spread across five nodes, which is plenty to give me confidence that I will be able to get my file back.

If a node drops offline, it is supposed to trigger a replication of your shard to another farmer using one of the remaining good nodes. This works great unless all of the farmers who hold one of your shards drop at once, but with 19,000 farmers and climbing, and assuming your shards are always on multiple nodes, the chances of that happening seem very, very low. The docs say that Storj is working on rolling out additional mirroring strategies. And, you can always use the API to ask Storj which nodes your file is sharded across. It looks like you can make an API call to move a shard yourself, but I haven’t tried that yet.

One last thing to point out is that this is an open source project. You are welcome to contribute. You can even grab the software and run a completely private Storj network, if you want.

I feel like some of my clients are still getting used to the idea of putting their data in the cloud. And some like one throat to choke. A distributed cloud like this may be a tougher sell for conservative customers, even if the security and the durability are there. Still, I love the concept. What do you think?

June 19, 2017

Have you tried the serverless framework?

Last year I was working on a POC. The target stack of the POC was to be 100% native AWS as much as possible. That’s when I came across Serverless. Back then it was still in beta, but I was really happy with it. After the POC was over I moved on to other things. A couple of days ago I was reminded how useful the framework is, so I thought I’d share some of those thoughts here.

Before I continue, a few words about the term, “serverless”. In short, it gets some folks riled up. I don’t want to debate whether or not it’s a useful term. What I like about the concept is that, as a developer, I can focus on my implementation details without worrying as much about the infrastructure the code is running on. In a “serverless” setup, my implementation is broken down into discrete functions that get instantiated and executed when invoked. Of course, there are servers somewhere, but I don’t have to give them a moment’s thought (nor do I have to pay to keep them running, at least not directly).

If your infrastructure provider of choice is AWS, functions run as part of a service offering called Lambda. If you want to expose those functions as RESTful endpoints, you can use the AWS API Gateway. Of course your Lambda functions can make calls to other AWS services such as Dynamo DB, S3, Simple Queue Service, and so on. For my POC, I leveraged all of those. And that’s where the serverless framework really comes in handy.

Anyone that has done anything with AWS knows it can often take a lot of clicks to get everything set up right. The serverless framework makes that easier by allowing me to declare my service, the functions that make up that service, and the resources those functions leverage, all in an easy-to-edit YAML file. Once you get that configuration done, you just tell serverless to deploy it, and it takes care of the rest.

Let’s say you want to create a simple service that returns some JSON. Serverless supports multiple languages including JavaScript, Python, and Java, but for now I’ll do a JavaScript example.

First, I’ll bootstrap the project:

serverless create --template aws-nodejs --path echo-service

The serverless framework creates a serverless.yml file and a sample function in handler.js that echoes back a lot of information about the request. It’s ready to deploy as-is. So, to test it out, I’ll deploy it with:

serverless deploy -v

Behind the scenes, the framework creates a cloud formation template and makes the AWS calls necessary to set everything up on the AWS side. This requires your AWS credentials to be configured, but that’s a one-time thing.

When the serverless framework is done deploying the service and its functions, I can invoke the sample function with:

serverless invoke -f hello -l

Which returns:

{
    "statusCode": 200,
    "body": "{\"message\":\"Go Serverless v1.0! Your function executed successfully!\",\"input\":{}}"
}

To invoke that function via a RESTful endpoint, I’ll edit serverless.yml file and add an HTTP event handler, like this:

functions:
  hello:
    handler: handler.hello
    events:
      - http:
          path: hello
          method: get

And then re-deploy:

serverless deploy -v

Now the function can be hit via curl:

curl https://someid999.execute-api.us-east-1.amazonaws.com/dev/hello

In this case, I showed an HTTP event triggering the function, but you can use other events to trigger functions, like when someone uploads something to S3, posts something to an SNS topic, or on a schedule. See the docs for a complete list.

To add additional functions, just edit handler.js and add a new function, then edit serverless.yml to update the list of functions.

Lambda functions cost nothing unless they are executed. AWS offers a generous free tier. Beyond the first million requests in a month it costs $0.20 per million requests (pricing).

I should also mention that if AWS is not your preferred provider, serverless also works with Azure, IBM, and Google.

Regardless of where you want to run it, if you’ve got 15 minutes you should definitely take a look at Serverless.

May 8, 2017

Thoughts on BeeCon 2017, the community-organized conference for Alfresco enthusiasts

At the end of April a couple hundred Alfresco enthusiasts met in Zaragoza, Spain, for the second annual BeeCon. BeeCon is a conference organized by and for the Alfresco community. Attendees represented every continent except Antarctica (and Boriss, who is Chilean, tried to claim that he also represents Antarctica). I was truly impressed with the broad attendance, especially with those that came from as far away as China and Australia to collaborate, learn, and teach together with the rest of the Alfresco community.

Here are some thoughts I had during the conference…

Where are the North Americans, especially U.S.-based partners?

We increased our attendance by North Americans this year from almost no one to a small handful of the usuals plus a few new delegates, including some from the University of Alberta. The Alfresco community has always been stronger in Europe than in North America for whatever reason, but I’d like to see us significantly improve attendance from North America.

Key to that is partner participation. A formal partner should be setting the example for contributing to the community. This is not simply altruism–a company can help the community and by doing so add top-spin to the value of their partnership. European partners clearly get that. One European partner had 8 or 9 people at the conference. Zia and Micro Strategies did make it. But most North American partners seem to only be interested in selling and marketing events, which is a real shame.

Alfresco is clearly focused on Cloud and ADF

It is clear that the big focus for Alfresco is on two areas: Cloud and the ADF (Application Development Framework).

For cloud, work is under way to essentially develop a cloud-native content repository and associated services. This is more than just spinning up some virtual machines on AWS, Google, or Azure and running Alfresco as we know it. This would leverage the services those platforms provide in a native way to provide content services. Examples include things like using S3 for storage, Dynamo for metadata, and lambdas for things like actions and behaviors. Alfresco is focusing first on AWS, then they’ll look at other cloud providers.

The ADF is the other area of focus. It is often referred to in the same breath as “Angular components” but it is more than just a set of components for that specific client-side framework. The ADF also includes a more general client-side JavaScript library, so if Angular is not your thing, you can still leverage the client-side JavaScript library from your framework of choice. The ADF also includes some project bootstrapping in the form of a Yeoman generator.

There were also several talks on the server-side REST API that is new in 5.2 which is an essential set of services that enable those front-end libraries to work.

Alfresco confirmed that this focus on ADF means the existing Share client is not receiving much, if any, attention at this point. John Newton said that they are working on an “exemplary” or canonical client built with the ADF. The goal, however, is for that example client to be more focused on a specific use case and not a more general document management use case, which is what Share does. In one of the birds-of-a-feather sessions, the group urged Alfresco engineering to not let the example client accidentally become a de facto Share replacement. Alfresco has a history of releasing example or demo systems that then somehow become real products. Hopefully that won’t be repeated. It’s okay to replace Share at some point, just do so intentionally.

I think the advice I gave last year regarding the ADF still holds, so if you are trying to figure out how the ADF affects your customization plans around Alfresco, read that post.

Not Much Else New and Exciting

If you went to BeeCon looking for big product announcements you were likely disappointed. The cloud and ADF stuff is definitely interesting and critical for Alfresco to stay relevant and so it is naturally sucking every last ounce of energy out of the engineering and product teams leaving little room for innovation in other areas.

John Newton’s talk was basically about “Digital Transformation”, which is what Gartner has decided to call ECM. My clients and I don’t really care what Gartner wants to call what we do. We’re still solving the same problems we’ve always solved using essentially the same approaches with newer tools. That’s just how ECM–sorry, I mean Digital Transformation–is. It’s a mature industry. Should we be shocked that Alfresco did not knock us over with an amazing set of new features no one had ever thought of before? He did hint at some natural learning and machine language applications. I would have loved for him to spend most or all of his talk on that.

Support Tools for Community Edition is a Hot Add-On

Here’s a tip. If you are running Enterprise Edition and you have not installed Support Tools, you need to. It is really helpful for those that administer Alfresco servers. Unfortunately, it has historically been Enterprise-only. But last year, Axel Faust and a merry band of hackers participating in the 2016 Global Virtual Hack-a-Thon wrote their own version of Support Tools that works for both Community Edition and Enterprise Edition. In my mind, this could be the single most important add-on written by the community since the JavaScript Console. It’s incredibly useful and it has a lot of people actively participating in its development. Definitely take a look if you haven’t.

Fun at the Hack-a-thon

Speaking of the hack-a-thon, this was the first time I’ve been able to participate in the on-site hack-a-thon for the entire day. It was a lot of fun! My team worked on a little add-on that makes it easier to manage rules (convert local rules to shared, relocate rule sets, etc.).

Like all hack-a-thons it was a little unnerving because you feel a lot of pressure to build something minimally viable with the time and team that you have but I highly recommend the experience. Axel Faust did a great job facilitating the day-long session for the 20 or so attendees from a variety of backgrounds and skillsets.

See You Next Year?

We haven’t identified a location for next year’s conference yet. Before you start shouting out your favorite cities from around the globe, realize a couple of things. The Order of the Bee puts on this conference with an all-volunteer committee. The venue is free or extremely low cost. The free venue together with help from our sponsors allows us to keep the price very low, but it also requires a lot of work from the team which includes a local in-city coordinator to work with the venue, catering, audio-visual, etc. So we do want to hear suggestions, but viable suggestions will take all of that under consideration.

There is some talk of Alfresco resurrecting DevCon. They’ve been a wonderful and supportive sponsor of BeeCon these last two years and as long as the event continues its high signal-to-noise ratio and community inclusiveness (both in terms of the ecosystem and Community Edition, specifically) then it is probably a good thing. Until they commit, we’ll assume there will be a BeeCon 2018, and I hope to see all of you there next year, wherever that may be!

April 24, 2017

Tutorials updated for Alfresco SDK 3.0.0

A week or so ago Alfresco released version 3.0.0 of their Maven-based SDK. The release was pretty significant. There are a variety of places you can learn about the new SDK release:

The official documentation discusses what’s new.
Martin Bergljung wrote a guide here. (Why Google Docs? No idea. Maybe this is a draft?)
The SDK project on Github

I recently revised all of the Alfresco Developer Series tutorials to be up-to-date with SDK 3.

Upgrading each project was mostly smooth. The biggest change with SDK 3.0.0 is that the folder structure has changed slightly. At a high level, stuff that used to go in src/main/amp now goes in src/main/resources. These changes are good–they make the project structure look like any other Maven-based project which helps other developers (and tools) understand what’s going on.

I did have some problems getting the repo to build with an out-of-the-box project which I fixed by using the SNAPSHOT rather than the released build.

Support for integration tests seems to have changed as well. I actually still need to work that one out.

Other than that, it was mostly re-organizing the existing source and updating the pom.xml.

A small complaint is that the number of sample files provided with the SDK has grown significantly, which means the stuff you have to delete every time you create a new project has increased.

I understand that those are there for people just getting started, but no one, not even beginners, needs those files more than once, so I’d rather not see them in the main archetype. Maybe Alfresco should have a separate archetype called “sample project” that would include those, and configure the normal archetype as a completely empty project. Just a thought.

The new SDK is supposed to work with any release back to 4.2 so you should be able to upgrade to the new version for all of your projects.

March 29, 2017

This is How I Work

On any given day I might be writing code, designing the architecture for a content-centric solution, meeting with clients, collaborating with teammates, installing software on servers, writing blog posts, answering questions in forums, writing and signing contracts, or doing bookkeeping. Some days I do all of that. My day is probably similar to that of anyone else doing professional services work as a small business. Here are some of the tools I use to keep all of that going smoothly.

Hardware & Operating System

In 2006 I left Windows behind and never looked back. I ran Ubuntu as my primary desktop for three years, then switched to Mac OS X. I still love Linux, and all of my customers run Linux servers, but for my primary machine, I am most productive with my MacBook Pro and OS X.

I’ll admit that the latest MacBooks shook my faith by incorporating a shiny feature I’ll never use and not upgrading the CPU or RAM. I briefly considered moving back to Linux on something like a System76 or a Lenovo, but I am so deep into the Apple ecosystem in both home and office it may not be practical to switch. I’m hopeful the new MBP’s will get beefed up in terms of RAM and CPU later this year.

Collaborating with teammates and clients

I’ve been using Trello for project and task tracking for a long time and it really agrees with me. I set up Trello teams with some of my clients and it helps keep us all on track.

For real-time collaboration I tend to use Slack. It doesn’t always make sense to do so, but for selected projects, I invite my clients to my corporate Slack team and we use private channels to work on projects. We use Slack’s integrations to get notified when changes happen in Trello or in our codebase which resides in either Git or Bitbucket.

For real-time collaboration via IRC, Jabber, and GChat I use Adium.

Creating content & code

For plain text editing, I use either Aquamacs or Atom depending on what I’m doing. If it’s just a free-form text file, like maybe a blog post or just some rough meeting notes or something like that I’ll use Aquamacs, which is a Mac-specific distribution of Emacs. If I am doing non-compiled coding in something like JSON, XML, Python, Groovy, or JavaScript I’ll typically use Atom.

For more intense JavaScript projects I will often switch to WebStorm and sometimes for Python I’ll use PyCharm instead of Atom.

For Java projects I’ve recently moved from Eclipse to IntelliJ IDEA. I’ve used Eclipse for many, many years, but IntelliJ feels more reliable and polished. I’ve been pretty happy with it so far.

IntelliJ, WebStorm, and PyCharm are all available from JetBrains and the company offers an all-in-one subscription that is worth considering.

I do a lot with markdown. For example, if I’m taking notes on a customer’s installation and those notes will be shared with the customer, rather than just doing those notes in plain text with no structure, I’ll use markdown. Then, to preview the document and render it in PDF I use Marked. This is also handy for previewing Github readme files, but Atom can also be used for that.

Another time saver is using markdown to produce presentations. I wouldn’t necessarily use it for marketing-ready pitches, but for pulling together a quick deck to review thoughts with a customer or presenting a topic at a meetup, markdown is incredibly fast. To actually render and display the presentation from markdown I use Deckset. It produces beautiful presentations with very little effort while maintaining the editing speed that plain text markdown provides.

Sometimes I’ll create a video to illustrate a concept, either to help the community understand a feature or extension technique or to demo some new functionality to a customer when schedules won’t align. Telestream is a wonderful tool for creating such screencasts.

My open source projects live at Github while my closed source projects live at Bitbucket. The primary draw to Bitbucket is the free private repositories, but I really like what Atlassian offers. When git on the command-line just won’t do, I switch to Sourcetree, Atlassian’s visual git client.

Automation, Social & News Feeds

I have a client where I have to manage over 60 servers across several clusters. To automate the provisioning of new nodes, upgrades, and configuration management, I use Ansible. It’s much easier to learn than Chef and it requires no agents to be installed on the servers being managed. Plus, it’s Python-based.

Lots of servers, lots of customers, and lots of business and personal accounts means password management can become an issue. I use KeePassX on my desktop and iKeePass on my iOS devices to keep all of my credentials organized.

I have a ton of different news sources I try to keep up with. Feedly helps a lot. I use it on my mobile devices and on the web.

I have a personal Twitter account and a corporate Twitter account, plus I help out with other accounts from time-to-time. Luckily, Hootsuite makes that easy, and it handles more than just Twitter.

Back-office

When you run your own business there are two things that can be a time suck without the right tools: contracts/forms and bookkeeping. PDF Expert lets me fill in PDF forms and sign contracts and other documents right on my mobile device. It has direct integrations with Google Drive, Dropbox, and other file share services so it is easy to store signed documents wherever I need to.

I used QuickBooks Desktop Professional Services Edition to handle my bookkeeping for several years. But that edition was only for Windows, so I ran it in a Windows VM on my Mac with VMWare, which was kind of a drag. I finally got tired of that and migrated to QuickBooks Online. The migration was completely painless. I just called them up, and within about an hour they had taken my money and moved all of my data without anything getting screwed up. It was a pretty awesome experience.

Physical Office

I’ve been working from home for over ten years, so I appreciate the importance physical space plays in a productive working environment. My desk is a custom-built UPLIFT standing desk with a solid cherry top. I love it and haven’t had a problem with it. I do need to make myself stand up more often, though.

I found that even with the desk raised completely, my cinema display wasn’t quite high enough. So I snagged a Humanscale M8 articulating mount with an Apple VESA adapter. Now I just grab my display and put it exactly where it needs to be.

My Mac hooks up to my display via Thunderbolt, which makes connecting and disconnecting a breeze. But I didn’t like how much real estate the laptop took up on my desk. There are a variety of solutions for this. I went with a Twelve South BookArc vertical desktop stand and that’s worked really well. I have a minor concern about whether or not using a MacBook in a vertical position is bad with regard to heat dissipation but I’ve decided to roll the dice on that.

I love my home office setup, but I do get tired of the same four walls sometimes. To combat that and to just change things up a bit, every week I try to spend some time in a co-working space. Here in my hometown there’s one right on the square in the historic part of downtown that has got a good vibe and is close to good food. You might check Sharedesk to see if there’s something similar near you.

So there you have it. Those are some of the tools I use every day. Got any favorites you’d like to share?

March 12, 2017

Now Available: Alfresco Developer Guide 2nd Edition

After more than 8 years since it was first published, Packt has published a second edition of the Alfresco Developer Guide. Much like the Alfresco product itself, the overall approach and architecture of the book is fundamentally the same, but Ben Chevallereau has given it much-needed updates by adding Share, SDK, and Application Development Framework coverage as well as bringing everything else up-to-date.

Before I talk about what’s in the book, I should mention that although the title says “Alfresco One” the book is applicable to both Community Edition and Enterprise Edition. I honestly have no idea why the “One” got added, but that’s a minor quibble.

The first four chapters are essentially the same, with updates here and there to match recent software releases. Chapter 1 introduces content management and Alfresco. Chapter 2 discusses the Alfresco way to develop extensions and customizations. Chapter 3 dives into content modeling. Chapter 4 explains actions, behaviors, transformers, and extractors.

Chapter 5 is when the first serious departure from the first edition occurs. That’s because when the first edition was published, Alfresco Share was just being created. The second edition uses Chapter 5 to discuss Share customizations, primarily focusing on Surf, but also touching on Aikau where it is relevant.

Chapter 6 further expands on application development by exploring the new Application Development Framework based on Angular2.

Chapter 7 returns to familiar territory with web scripts and Chapter 8 covers advanced workflow, although in the second edition it is exclusively focused on Activiti rather than the old jBPM engine.

Chapter 9 is kind of a grab bag of new feature coverage such as the Search Manager and Smart Folders. It also throws in a light example of developing a mobile client for Alfresco using Appcelerator.

Chapter 10 focuses on security and covers the same topics as the first edition (LDAP, SSO, and custom permissions) but with obvious updates.

The appendices originally in the first edition were cut completely which would have been a pain to update. And with the improvements made in the Alfresco-provided documentation over the last eight years they really no longer add much value.

Overall, I think the book still meets its goal of being the quintessential reference for anyone getting started with Alfresco development.

Many people have come up to me over the years and said, “I got my start with Alfresco thanks to your book,” and that makes me happy because that’s exactly what it was for. I’m glad that this new edition will enable a whole new generation of people to get up-to-speed and join us in the vibrant Alfresco community.

The credit for the hard work on the update goes to Ben–you did a great job. We should also recognize Bindu Wavell for the technical editing, which is a huge task. Thanks so much guys!

January 26, 2017

Join me in Spain for BeeCon, the community-organized Alfresco conference

Registration for BeeCon 2017 is now open. What is BeeCon? It’s a conference focused on Alfresco organized by The Order of the Bee, a grassroots community of Alfresco enthusiasts.

This year the conference is April 25 – 28. We’ll be in Zaragoza, Spain, a beautiful city about 1.5 hours by train from Madrid.

If you’ve ever been to Alfresco DevCon, the conference is a lot like that. The focus is on providing high-quality content free of sales pitches.

Despite being run by volunteers with costs kept to a minimum (it essentially runs as a non-profit), last year’s conference was well-attended and felt very professional and well-planned. I have no doubt that the hard work of the conference committee and the support of our sponsors will result in another proud moment for The Order of the Bee and, more importantly, a productive use of your time.

This year the format changed slightly. We moved the hack-a-thon to the beginning of the conference so it would not conflict with sessions. That night we’ll have a welcome party. Sessions start the next morning. The conference features two and-a-half days of traditional sessions, which are mostly technical, as well as lightning talks, which are always entertaining and informative. The schedule is on the conference site.

BeeCon is planned, organized, and executed entirely by volunteers. Alfresco Software, Inc. and other vendors pay to sponsor the event, but the program is driven by a committee of Order of the Bee members. Speaker selection is based on the merit of the proposal. Do you have an Alfresco story to share? Become a speaker!

For me, BeeCon is a time to lift my head up from my projects and spend time learning what others are doing in this space. It also gives me a chance to physically hang out, chat, and laugh with people I collaborate with online nearly every day. This year, I hope you’ll decide to join us in person. I am looking forward to seeing you in Zaragoza!

October 26, 2016

Alfresco Maven SDK: Declaring other AMPs as dependencies

I’ve seen a few questions about how to configure an Alfresco project to depend on other add-ons when using the Alfresco Maven SDK, so I thought I’d do a quick write-up on that.

When you use the Alfresco Maven SDK Archetypes to bootstrap a project, you can select one of three project types:

alfresco-allinone-archetype: Referred to simply as “All-in-One”, this type, as the name implies, gives you a project that facilitates creating both a repository tier AMP and a Share tier AMP, plus it includes a working Solr installation.
alfresco-amp-archetype: This sets up a project that generates only a repository tier AMP.
share-amp-archetype: This sets up a project that generates only a Share tier AMP.

In my Alfresco Maven SDK Tutorial I use both the alfresco-amp-archetype and the share-amp-archetype but I don’t really talk about the All-in-One archetype at all. The reason is two-fold. First, I think for most beginners, the smaller, more focused archetypes are less confusing. Second, on the vast majority of my client projects, that’s the setup I use. Usually, I feel like the All-in-One archetype is overkill.

However, there is an advantage the All-in-One archetype has over the other two. It is able to pull in other AMPs as dependencies. With the single-purpose archetypes, your goal is simply to build an AMP. You can run integration tests against that AMP using the embedded Tomcat server, but if you want to test the AMP with other AMPs, you have to deal with that manually. Usually I just deploy the AMPs to a test Alfresco installation that already has the other AMPs deployed.

But with the All-in-One archetype, you can declare those other AMPs as dependencies, and Maven will go grab them from a repository and install them into your local test environment using something called an “overlay”.

Here’s an example. Suppose you are developing some customizations that you know will be deployed to an Alfresco server that will be running the following add-ons: Share Site Creators, Share Site Space Templates, Share Login Announcements, and Share Inbound Calendar Invites. You’d like to test your code alongside these add-ons.

One approach would be to go grab the source for each of these projects and build them locally to produce their AMPs (or download pre-built AMPs), then deploy all of those AMPs to a test server, then deploy your custom AMP and test.

But all of those add-ons happen to live on a public Maven artifact repository called Maven Central. So a better alternative might be to simply list those modules as dependencies and let Maven take care of the rest.

Here’s how it works. When you generate an all-in-one project, the structure includes not only folders that contain your repo and Share AMP source, but also folders representing the Alfresco and Share web applications. For example, here are the files and folders in the root folder of a project called “test-allinone-220”:

pom.xml
repo
run.bat
run.sh
runner
share
solr-config
test-allinone-220-repo-amp
test-allinone-220-share-amp

In our example, your custom code for the repo tier would go in test-allinone-220-repo-amp and your Share tier code would go in test-allinone-220-share-amp.

The repo and share directories are used to build the respective WAR files and they each have their own pom.xml. So to add the other add-ons to our setup, first edit repo/pom.xml. You need to add the add-ons as dependencies, like:

<dependency>
    <groupId>com.metaversant</groupId>
    <artifactId>inbound-invites-repo</artifactId>
    <version>1.1.0</version>
    <type>amp</type>
</dependency>
<dependency>
    <groupId>com.metaversant</groupId>
    <artifactId>share-site-space-templates-repo</artifactId>
    <version>1.1.2</version>
    <type>amp</type>
</dependency>
<dependency>
    <groupId>com.metaversant</groupId>
    <artifactId>share-login-ann-repo</artifactId>
    <version>0.0.2</version>
    <type>amp</type>
</dependency>
<dependency>
    <groupId>com.metaversant</groupId>
    <artifactId>share-site-creators-repo</artifactId>
    <version>0.0.5</version>
    <type>amp</type>
</dependency>

Then, add them again to the overlays section (without the “version” element), like this:

<overlay>
    <groupId>com.metaversant</groupId>
    <artifactId>inbound-invites-repo</artifactId>
    <type>amp</type>
</overlay>
<overlay>
    <groupId>com.metaversant</groupId>
    <artifactId>share-site-space-templates-repo</artifactId>
    <type>amp</type>
</overlay>
<overlay>
    <groupId>com.metaversant</groupId>
    <artifactId>share-login-ann-repo</artifactId>
    <type>amp</type>
</overlay>
<overlay>
    <groupId>com.metaversant</groupId>
    <artifactId>share-site-creators-repo</artifactId>
    <type>amp</type>
</overlay>

That covers the Alfresco WAR. Now for the Share WAR, edit share/pom.xml and add the Share tier dependencies, like:

<dependency>
    <groupId>com.metaversant</groupId>
    <artifactId>share-login-ann-share</artifactId>
    <version>0.0.2</version>
    <type>amp</type>
</dependency>
<dependency>
    <groupId>com.metaversant</groupId>
    <artifactId>share-site-creators-share</artifactId>
    <version>0.0.5</version>
    <type>amp</type>
</dependency>

And then the overlays:

<overlay>
    <groupId>com.metaversant</groupId>
    <artifactId>share-login-ann-share</artifactId>
    <type>amp</type>
</overlay>
<overlay>
    <groupId>com.metaversant</groupId>
    <artifactId>share-site-creators-share</artifactId>
    <type>amp</type>
</overlay>

Now you can run the whole thing–your custom AMPs plus all of the dependencies–by running ./run.sh. As part of the build, you should see the dependencies being downloaded from Maven Central. As the server starts up, if you watch the log, you’ll see the modules get initialized. And when you log in, you’ll be able to test all of the functionality together.

When you are ready to deploy to one of your actual server environments, you have a choice. You can either copy all of your AMPs to the amps and amps_share directories, then run bin/apply_amps.sh. Or you can deploy the WAR files that are now sitting under repo/target and share/target. Personally, I think re-applying the AMPs against a fresh WAR on the target server is less confusing and less error-prone, but I’ve seen it done both ways.

If the add-ons you want to depend on are not on a public Maven artifact repository like Maven Central, you can still build them locally, run “mvn install” to install them into your local Maven repository, then depend on them as shown.

October 15, 2016

Activiti founders fork the project to create Flowable, an open source BPM engine

On Thursday the news broke that Activiti had been forked to create a new open source Business Process Management (BPM) engine called Flowable. Activiti is a BPM engine that Alfresco Software, Inc. funded to replace JBoss jBPM. The Activiti engine can automate business processes as a standalone application, but it is also the embedded workflow engine that Alfresco ships as part of its ECM platform.

A fork is when the evolution of an open source project intentionally diverges. Instead of one code base controlled by a single entity, a fork creates a second code base from the first, allowing each project to evolve separately going forward in accordance with their own distinct roadmap and governance model.

Forks happen fairly frequently, and in places like GitHub, where a lot of collaboration around open source software takes place, they happen constantly, by design. But when a significant fork happens to a project, especially one that has commercial backing, like this one, it is worth taking note. Alfresco depends on Activiti for its ECM platform, so now it will be forced to either continue to develop Activiti under their own direction, or switch to the new project like any other upstream dependency.

The news of the fork comes about a month after my blog post reporting that multiple Activiti engineers had resigned from Alfresco. It is now public that those engineers were Joram Barrez and Tijs Rademakers, who were absolutely the heart and soul of the Activiti project. Those departures left a big question mark hanging over the future of Activiti.

Then the big news came on Thursday. Tijs Rademakers released this tweet, announcing a new project, Flowable, with a link to a blog post.

The blog post outlined some high-level reasons for the fork:

“We acknowledge Alfresco’s stewardship of the Activiti.org project, and as employees we enjoyed considerable freedom to develop the project over several years. However, things didn’t work out as we expected or hoped. We came to the conclusion the only way to continue evolving our ideas was to fork.”

The blog post goes on to talk about forks, in general, and their potential impact.

I wanted more detail so I spent some time talking to members of the new Flowable project last week to understand why they forked Activiti and what we can expect going forward.

The first thing that is immediately obvious when talking to any of the former Activiti engineers is how much passion they have for the project. Leaving Alfresco was clearly a hard decision. But the hardest part was feeling disconnected from the software they created and the fear that the project would no longer have the stewardship it enjoyed prior to their departure. “We couldn’t just leave it there,” said one of the engineers. Indeed, looking at the Activiti forum as well as the commits for Activiti on GitHub, it is obvious that Alfresco is still working to rebuild the Activiti team.

After their departure, the former employees continued to discuss with Alfresco Software how they might be able to move forward, but the two sides could not reach an agreement. It soon became obvious to the team that in order to move the project forward they would have to take more direct control, and the only way to do that was to fork.

Specifically, the team started with the latest Activiti release, 5.21.0, and added a new feature called “transient variables” and some other small enhancements as well as bug fixes, and released that as Flowable 5.22.0 (GitHub).

It’s actually not unlike the creation of MariaDB after Oracle acquired Sun (and then became owners of MySQL). In fact, it is the stated goal of the engineers for Flowable to be a drop-in replacement for Activiti, just like MariaDB is for MySQL.

The engineers say they are looking forward to getting back to their open source roots. “In true open source fashion, we want people to get involved in whatever way they can, whether that’s documentation, testing, or code contributions,” said the team. The Flowable project will run as a meritocracy, with trusted developers earning commit rights over time, regardless of their commercial affiliation.

The fork has actually made many people in the community a lot more comfortable with the direction of the project because the engineers who created the software and made the vast majority of the commits during its life are now committed to moving it forward, and this fork gives them the freedom to do it. “Everyone has been supportive so far,” the team said, “Now the burden is on us to start releasing.”

Clearly, Alfresco Software is unhappy with the fork. In a blog post published Friday, John Newton, Alfresco co-founder and CTO, said he was hoping for a different outcome:

“Unfortunately, some of my early friends on the Activiti project have disagreed with our direction and have taken the step of forking the Activiti code. This is disappointing, because the work that they have done is very good and has generally been in the spirit of open source. However, the one thing that we could not continue to give them was exclusive control of the project. I truly wish that we could have found a way to work with them within a community framework.”

John’s post indicates that the company intends to continue to develop Activiti. John writes, “We are redoubling our determination to grow Activiti faster, better and more openly.”

Growing Activiti faster will be key. If Alfresco is slow to field a new Activiti team and then fails to give the market what it wants at the pace it wants it, those stuck on a lagging codebase may become frustrated, particularly if Flowable is able to out-innovate Alfresco. Flowable has the advantage here because they are the original core team, they are more nimble, and they can hit the ground running. But without Alfresco as a sponsor, Flowable engineers must find a way to earn a living as well as move the project forward. Ideally they will be able to make the two intersect by finding clients to pay them to stay focused on the project.

Alfresco’s product and strategy depends heavily on Activiti. From a purely commercial perspective it is understandable they would want as much control over the codebase as possible. However, there are other commercial open source companies that have strategic dependencies on upstream open source projects they don’t fully control. Elastic’s dependency on Lucene comes to mind. If Alfresco could participate in the Flowable project in a similar way it could potentially benefit everyone, but John’s post seems to indicate that is not in the cards. The next few months will tell us just how committed the company is to continuing with Activiti as a separate project.

Personally, regardless of what happens with Activiti, I hope Flowable flourishes. If team member passion was the only predictor of success I’d say it’s a sure bet it will.