Skip to content

Local-first software: you own your data, in spite of the cloud#

I spent a lot of time with photos (Picasa) trying to do this peer to peer - this is what we built in the 2002 era. Here are a few issues:

1. Identity is hard to do on the LAN, so any sharing ends up using the cloud to figure out who has access. Similarly, identity is hard to move around, so representing your Facebook comments feed outside Facebook is difficult to do.

2. Any time you have a "special" server that handles identity, merging, or any other task, it ends up in the effective role of master, even if the rest of the parts of your system are peers. You want all your collaborations in Dropbox to survive their infrastructure being offline? It's tough to do.

3. p2p stalled a bit in the mid-2000s when storage and bandwidth got much cheaper--in a period of just two years (2002-2004), it became 100x cheaper to run a cloud service. But what continued to stall p2p was mobile. Uploading and syncing needs to run in the background, and if you're on a limited bandwidth client or a battery-limited device like iOS, sync can effectively diverge for months because the applications can't run in the background. So changes you "thought" you made don't show up on other devices.

4. For avoiding mass surveillance, what we are missing from this older time is the ability to make point to point connections (between peers) and encrypt them with forward secrecy, without data at rest in the cloud. Even systems that try to do some encryption for data at rest (e.g., iMessage) keep keys essentially forever, so data can be decrypted if you recover a private key later on. A system that only makes direct connections between peers does not have this issue.

5. Anytime you have multiple peers, you have to worry about old versions (bugs), or even attackers in the network, so it's fundamentally harder than having a shared server that's always up to date and running the latest code.

SSB (Secure Scuttlebutt)[1] has solved all of these problems except the encryption of data using the same key forever.

The apps all currently use one identity per device but it's not actually hard to use the same identity on two devices and multiple apps share the same identity commonly.

Old versions could of software could be a concern but really that's what Versioned APIs are for. It's a solved problem.

The only notable problem with local first dev that I'm aware of currently is storage requirements being confounded by the limited size of SSDs on most people's computers.

[1]: https://scuttlebutt.nz/

With regards to identity management, maybe there should be a formalized integration between browsers and password managers such that the concept of "registration" goes away and new logins just automatically create user accounts with default permissions, according to email address.

Centralized identity management was the grand promise of OAuth. It seems that somehow ended up, in most practical applications, as a "login with Facebook" button.

There are a number of startups working on simplified or "passwordless" auth, but it seems that none have substantive traction. I'd love to be proven wrong, here, though!

Before that there was Microsoft Passport. Federated login for the web is a “problem” various parties have been working on for 20 years.

There's a fundamental problem of "I want my credentials to be validated by some central service, but I don't want to give some big faceless company my information." The centralized login service can be used to track your activity all across the net. It's a lot of power to give to some people you don't know.

Worse, people's fears in this area are completely justified. Precious few companies have proven themselves to be contentious with our personal data. Some go as far as to repackage and outright sell said data for personal gain.

A decentralized blockchain-like system might work for this, but as far as I know it has not been attempted.

Actually, the problem of centralized login services tracking users across the net was considered and partially solved in 2010 by a team from Google and MIT. This was at a time when OpenID was the most popular federated login system, so the proposal was called PseudoID.

Unfortunately this idea didn't gain more traction, and there are only a few references to it online now, such as this research paper and a YouTube video by one of the authors:

https://ai.google/research/pubs/pub36553

https://www.youtube.com/watch?v=fCBPuGsO_I4

Also, it didn't address all the methods by which a malicious identity provider could track the user, so it would probably have to be extended by having support added in the browser.

"I spent a lot of time with photos (Picasa) trying to do peer to peer..."

As I recall, one of Google's many "failed" projects was a means for transferring large numbers of photos person to person called "HELLO". I reckon this existed for a short time post-2004, then AFAIK disappeared from public view (or morphed into something else). I could be remembering the details incorrectly.

Thanks for taking the time to share this!

The issue I face with PhotoStructure is that people's home network frequently has throttled upstream rates. I'd love to provide a caching CDN, but I want all content encrypted. I don't want my pipes to see any data from my users.

How would you do perfect forward secrecy when only the library owner's key is available at upload time? Is it possible? If not, it seems that every bit of shared content would have to be re-encrypted and re-uploaded when someone new is granted access to that content.

One option is an envelope keys.

Content is encrypted and published by the owner, but the owner uses a new random key to do the encryption. This new random key is the "envelope key."

The owner then takes the envelope key, encrypts it with the public key of recipient A, and publishes the encrypted message containing the envelope key. (This message with the envelope key should also be signed by the owner using the owner's private key.)

Anyone who obtains the envelope key effectively has read-only access to the content. The owner can publish a separate message with the envelope key, all these messages have recipient B, C, and D's different public keys, so if you are a recipient you will want to know which message is likely to be encrypted by your public key, after which you can decrypt it to get the envelope key.

The owner also needs to sign the content. Signing the content and later verifying the content is done with the owner's public/private keypair, not the envelope key.

A glaring weakness of envelope keys is that the recipient can share the content freely once they have it. There's no content protection after the content and the envelope key are published.

The recipient can easily just share the envelope key if they want to. But they can also just copy the content (once they have decrypted it).

This gets complex fast. The access control list needs to travel with the data. Revocation can theoretically take forever (until all offline clients get back online). Onwer signing content is great but breaks rule 4 (It should support collaboration).

I've been working on my own personal-use app for my twin and I to collaborate on documents from different continents so pretty close to the problem.

As far as possible, I'm following a local-first methodology for a recipe search, meal planner, and shopping list application:

https://www.reciperadar.com

There's a 'collaboration' mode which allows peer-to-peer sharing of a session via CRDTs over IPFS. My partner and I select our meals for the week, and then when one of us is doing the shopping, we can mark ingredients as found -- the other person's view reflects those updates in near-real-time.

If either of us lose connectivity, we can continue to use the app, and when data access is restored, those changes are synced with the shared session (with automatic conflict resolution). All data in the shared session is encrypted, and the collaboration link contains the keys.

Much of this functionality is thanks to peer-base, which is an experimental but extremely useful library:

https://github.com/peer-base/peer-base

A side-benefit of this approach is that all user data can be stored locally (in browser localStorage) - there are no cookies used in communication between the app and server.

(peer-base maintainer here)

This is really cool!

I found the share link, and just tried it between two Chrome browsers, and it worked great!

Thanks for using peer-base! There's a lot of great work happening with js-libp2p recently that would be awesome to incorporate ... I'm hoping to get active developing on it again in the new year. I've got so many ideas for improvements.

Thanks a ton for developing the library :)

If & when you're looking for any more contributors and/or testing, let me know; I'd be glad to pay it back.

I'm hoping to open source the reciperadar application's stack soon and happy for it to be part of any ecosystem of examples (jbenet's peer-chess was a big help to me getting started).

Cool project! Some suggestions:

* Allow "patchcables" with different hooks, so for example a different recipe database, or a different language.

* Allow groups of ingredients, ie. tag ingredients. For example, if I say I don't want meat, I still get shrimp suggested which might be what I want but it might also not be what I want.

* In line with that, allow to specify allergens or blacklist ingredients (in line with dietary preference). Allergens are more important than dietary preference.

* Allow weighting of ingredients, such as "80% tomato and 20% bell pepper", or something like "tomato or bell pepper" or "tomato xor bell pepper".

* Some kind of verification reading at store or fridge (bar codes / tags) to remove entries automatically. Been dreaming of this for a long time. The way I envisioned it stores would share an API, with price, and customers could use it to compare prices and order appropriately (including taking into account S&H). Then the smartfridge (and other smart storage) would read the incoming ingredients. It would also read outgoing ingredients, and it could detect when they don't reenter. It could read weight checking how full the package still is. I had this idea 20 years ago, but I never saw a viable business case in it. You'd need to sit between all these stores who already claim they have low margins. Plus, if you'd sit between them, how would you earn profit from such software? My take is that if you want to secure privacy, software like this would require an initial investment, and then being FOSS, or a subscription and it being partly FOSS.

Thanks for using it and providing feedback; it's great to hear you've considered similar applications :)

I really like the idea of being able to patch in an alternative recipe dataset. It strikes me that a default-offline, peer-to-peer search engine (i.e. 'distributed lunr.js') is technically feasible (even if no such codebase exists at the moment, afaik). That would, in my opinion, be the ideal pluggable data source in terms of achieving local-first behaviour, resilience, and privacy, with the option of fully offline & single-device operation.

Shorter-term, once the application's backend code is open source, at least it'll be easier for anyone to run their own instance (it's a containerized set of Kubernetes microservices, FWIW).

You're completely correct about ingredient searching and handling. There's a lot to do here. One of the upcoming 'large' work items on the roadmap is to build a true knowledge graph over recipes and ingredients, with relationships (like nutritional information, substitutions and pairings) between entities. I've started exploring this space, but it's going to take a while, and I want to do it carefully and thoughtfully.

Allergies in particular are a sensitive topic and I don't want to give users any false sense of safety -- that said, it's also a very valid use case which the app should cater for.

Almost every time we do our shopping here we discuss the pricing and stock-keeping problem you mention. In many ways the application would work best if it already knew what you have in the kitchen (and how fresh your ingredients are).

There may be some use for image recognition and OCR of receipts -- or, perhaps better, at point-of-sale in stores. Keeping the application client-first is a goal, and feasible I believe - tesseract.js exists today, for example.

The project roadmap (and changelogs) will be published on the site in the not-too-distant future, and I'd like to use an issue tracker to track bugs and rank feature requests. I'll keep a note to include items from your feedback and you'll get a credit against them once implemented. Cheers!

Very nice work. I like what I see on many levels!

Let me also echo the idea of a "pantry inventory". Getting something like that working (well) would by itself be a very strong feature. Like others, I have been interested in that for a long time.

I even took a stab at it many years ago, to some success, but abandoned it when we hired a nanny who took care of cooking.

I had a dedicated optical scanner attached to the wall plugged in via USB to an Arduino microprocessor. It had a single button and a single RGB LED. Pushing the button would "wakeup" the system and it would resume previous mode. Two modes: check-in and check-out. (Pushing the button again toggles the mode. System reboots by holding the button down for 3 seconds. Auto "sleep" occured after a set inactivity.) The Arduino just sent a POST request via HTTP API upon a successful SKU scan, and flash a red or blue light depending on thr mode. And then a little CRUD interface I could access from my phone.

There are many SKU datasets out there available, and some amount of management of that data (organization, categorization) would be what could really set you apart.

Thanks! Having to experience the inventory management problem regularly certainly helps inspire & validate solutions :)

The setup you had sounds extremely cool - did you hit any particular challenges around the SKU data wrangling?

I've added a naive rules-based approach to categorizing products into supermarket departments (bakery, fruit & veg, ...) - it's really barebones at the moment, but already helps planning walking around the shops.

I'm not sure where you're based but your Arduino setup reminds me a bit of supermarket self-service checkouts in the UK - you hold each item up to a scanner and get a little 'beep' confirmation for each one during checkout.

Yeah. I was definitely very AdHoc.

A typical grocery store here in the US has about like, what, 20k SKUs? But I personally only really deal with maybe 200 of those, so at the time I just managed the data manually. (Plugged that USB scanner into a laptop and scanned every item into Excel --Scan, type. Scan, type. Then imported that into MySQL.)

I looked into data providers mostly because I wanted item photos. (Ultimately I just scraped images from Target.com cause I could search via SKU and it was super easy.)

Unfortunately that's my only experience with SKU data aggragators.

My scanner was attached just inside my pantry to the left, and it worked well enough. Hands free.

Funny you mention it, cause I'm not a big fan of self-checkouts here in the US. They are ergonomically incorrect, technologically unreliable, and impractical when purchasing more than 15 items. With traditional checkout the conversation is "how are you, did you find everything today?" and with self-checkout they make it clear they know you are a potential theif. shrug

Self checkout machines in the US are primarily a way to have someone working multiple registers. Working retail, you'll regularly see an entire line stop flat while payment gets processed or a problematic item/customer comes through. Self checkout stones this problem by only requiring a single employee be available for up to four registers (in my experience) while customers do the work themselves. Another advantage is that if the store is fully staffed, people with issues (returns, price mismatches, complaints, etc) are likely to get them resolved faster than if everyone with a small order was ahead of them.

It's entirely possible that I'm wrong, but as a retail worker, that's my understanding.

One of our local hardware stores (Home Depot) went completely self-checkout a couple years ago. It was terrible and the employees hated it. I haven't been back. I might be the minority though.

I'd argue there is also a societal effect as well. Those employees are trained on how to spot theft, and not about being helpful to the customer. I'm not interested in being treated like I'm in a prison. :) I'm fine waiting in a checkout queue with my kids. They enjoy shopping with me, but it's a small nightmare trying to do it all by myself.

This will invariably lead to people like me having to pay a premium for an actual employee of the store prepare my bill of sale. --Which, when that time comes I'll pay.

Also, I really don't want the liability of performance on me. I scan a wrong barcode or I miss something and then I'm in a small room talking to a police officer having to explain myself. No thank you. :)

I like systems I've seen in Europe(eg Carrefour) much more, where you get a handheld scanner and scan your items as you put them in the cart. When it comes time to pay you just put your scanner back and swipe your credit card.

I totally agree. Scan into cart is my favorite self-checkout method I've used. They had trial runs of a system like this in my community but phased them out soon after.

We also tried a couple home delivery services, but it was too unreliable.

What is popular around here is pickup at the store. Submit an order and a "picker" does the work. You either get curbside delivery (small grocery chain) or pickup at a dedicated counter inside (Walmart).

Same here, about self-checkout. Produce - good luck. Awkward item? Won't fit on bagging shelf, error. Barcode won't scan? Struggle like a fool for 2 minutes until the attendant notices.

I never use them, unless its one small plain barcoded item.

Yes. As much as I liked having an inventory, it was still "work"... and a very easy thing to drop from a list of employee responsibilities.

I still have the components in a box somewhere. I was thinking about installing it in our basement "cold storage"/"emergency food storage" room for longer term items. But I just don't access those items enough to justify drilling a hole in the concrete wall for DC power.

This is really neat, thank you for sharing and developing! I plan to use this to simplify mealtime in my household.

One note, as I have not used it much: the "Most relevant" sorting scheme (or perhaps just the search function itself) needs some love. I searched for "chicken" as the sole ingredient, and of the 10 results on page one, only the ninth result is a recipe containing chicken. Many of the rest are desserts. I'm guessing this works better when you include multiple ingredients, but I would think that if chicken isn't in a recipe, it wouldn't be included in the results at all. I'm curious to know if you have experienced this.

Thanks for trying it out :)

In response I've made some changes to search indexing for 'chicken'-related ingredients; I explicitly want some terms like 'chicken breast' to fall under the category 'chicken', but yep, 'chicken bouillon' and a few others don't seem relevant.

Hope that helps - glad to keep on narrowing down on any extra cases you find too. There's a small 'feedback' button at the bottom-right-hand corner of the page which sends messages directly to my inbox.

Thank you for the peer-base link. I've been investigating data storage on IPFS for application interop and I love the idea about using an existing library rather than re-inventing one. I will definitely be taking a closer look at this project.

Awesome, hello mtlynch :)

It's great that you found this post; as it happens I'm using your fork of nytimes/ingredient-phrase-tagger (thanks a lot for updating and containerizing it!) in combination with 'ingreedypy' for ingredient parsing.

How's zestful doing? It looks great and I did consider using it; the quantity of parsing I was doing led me to choose the container version just to keep my own operational costs down (albeit at the loss of any model retraining you've done).

Oh, cool! I'm glad it was helpful.

Zestful's doing okay. It's in maintenance mode at the moment, but when I have spare time, I like to tinker with it to bump up the accuracy. Customers seem to mostly use it in big bursts, so it'll be a few thousand requests one month, then the next month, it'll drop to only a few dozen.

If you ever have patches you want to push back upstream to the open source tagger, I'm happy to review. And if you ever want to do a bulk parse of ingredients on Zestful, I can certainly offer volume discounts.

That is a very cool web app. I really like being able to type in several ingredients and get relevant recipes. I have a nutrient tracking oriented cooking web site (cookingspace.com) that I basically haven’t modified in ten years - seeing your site maybe will motivate me to give my own some attention.

Thanks Mark - your philosophies around efficient use of ingredients and enjoying healthy meals match our own very closely. I'm having a read of your blog and will drop you a line via email soon.

Just read through your google doc, interesting! But what about additional family members, with their own cameras, and no interest in any clever workflow activities :) I'm currently using Google Photos as my main service, and it's working good enough for now: each family member has the Google Photos app which uploads pics automatically, to their own account. We all share our Google Photos with each other. This way I (as main curator) have access to everyone's pics, without anyone having to do anything. Google lets you store the original size pics, so that is great (not like iCloud that resizes all pics!). Google also adds face recognition, which is very practical, and also provides a good interface for everyone to view the pictures. Regarding safekeeping: I use the Google Drive interface to backup all my photos to my local linux storage (combination of rsync and https://github.com/astrada/google-drive-ocamlfuse to mount Google Drive). This way I always have all original photos locally. Finally I backup everything offsite using BackBlaze.

All this relies heavily on Google Photos, but I have my own local backup of all original files. So if I need to change service, it should just be a one-time effort to migrate.

I also rely mostly on Google Photos as my viewing and sharing app. This includes sharing photos with family which is done through Google Photos shared albums.

I'm really the only one who cares about archiving photos so I'll transfer the shared photos from Google Photos to Google Drive (using the "share" functionality from the mobile app).

This kicks off a workflow that simultaneioously organizes the shared photo into my library and copies it too Dropbox and uploads it to my Google Photos library [1]. (I use Google Drive as a transport mechanism to get photos off my phone and onto my Synology).

Not ideal but once I got it set up it's worked really well.

[1] https://github.com/jmathai/elodie/tree/master/elodie/plugins...

Uploading to both services is still supported through the Backup and Sync app. But once uploaded they are independent copies and deleting from one doesn't delete from the other. I also expect that this support will be deprecated at some point in the future.

Not keeping Drive and Photos in sync really killed it for me. I ended up switching from Google Drive to Dropbox but I still use Google Photos.

I have photos added to my library in Dropbox automatically added to my Google Photos library and this has worked well so far. [1]

[1] https://github.com/jmathai/elodie/tree/75e65901a94e14e6fd1ff...

I see. I was hoping there was still some way to quickly backup from google photos directly to linux.

I'm using resilio sync on all phones to sync to a computer. All the family needs to do is start Sync and wait until it's done. Does not matter on what network they are on, it still syncs.

This looks great and feels a lot like beets [1] for music, only that they use a database. I'll try it when I have the time to re-organize ~20 years of photos.

[1] http://beets.io/

elodie looks amazing! I definitely need to try it out

it was just the thing I was thinking of building recently as it was getting really tiring to manually organize photos

one extra idea that I had: there's a cool project that would enable offline geocoding[1], which would help get rid of API limits while making the reverse geocode queries almost instant

(the included dataset is pretty limited, but it's not hard to extend from an openstreetmap planet dump)

[1] https://pypi.org/project/reverse_geocoder/

I had looked into local geocoding databases but did not want to add a database file in the git repository. reverse_geocoder looks really interesting though and I may have a look at adding that.

For the time being, elodie does cache responses from the MapQuest API. And to reduce the number of API calls it does some approximation by seeing if an entry exists in the cache within 3 miles of the current photo --- if so, it uses that instead of looking up the location. [1]

[1] https://github.com/jmathai/elodie/blob/75e65901a94e14e6fd1ff...

When I select software these are among the list of things I am looking for generally:

- file formats that won’t lock you in or are even openly hackable (allows you to automate things)

- no clouds that will break the software once it is gone

- local storage with custom syncing or backup options

- strictly no weird data collection or “We own the rights to your data”-Type of terms

So if I get the slightest feeling of a lock in or unnecessary data collection you are scaring me away, because mentally I would then already look at the time after you decide to scrap your cloud or abandon your file formats. The data collection bit shows me your users aren’t front and center but something else is which makes your product even less of a good choice.

If your product runs on the web, allowing for self-hosted solutions is also a big plus.

While I fully agree with your selection criteria, please consider the other side of the equation, because engineering (and the world) is all about compromises.

I am the author of a SaaS app (https://partsbox.io/). I export in open formats (JSON), there is no lock-in, it's easy to get all of your data at any time. But the app is online and will remain so. Why? Economics. Maintaining a self-hosted solution is an enormous cost, which most people forget about. You need to create, document, maintain and support an entirely different version of your software (single-user, no billing/invoicing, different email sending, different rights system, etc). And then every time you change something in your software you have to worry about migrations, not just in your database, but in all your clients databases.

I am not saying it's impossible, it's just expensive, especially for companies which are built to be sustainable in the first place (e.g. not VC-funded). Believe me, if you don't have VC money to burn, you will not be experimenting with CRDTs and synchronizing distributed data from a multitude of versions of your application.

I regularly have to explain why there is no on-premises version of my app. The best part is that many people think that an on-premises version should be less expensive than the online version, and come without a subscription.

You raise a valid point. I think what it ultimately boils down to is to give your users the feeling that even if your service stops (and there are many reasons why it could), all the hours they put into your product are not lost. The criteria I listed are factors after all, and how I factor them in depends on the available alternatives and the work I will put into it myself and whether a online-solution actually makes sense there.

A good example are note taking apps. My notes should be private and I want to be able to read them ten years later. For me your product would need to add something valuable to this that a filesystem and a bunch of files that I synchronize myself can't do. As for now there is no note-taking app I found where the benifits outweight the preceived loss of privacy and reliability. The online thing can make sense, but syncing my phone with Nextcloud works even better, so I don't really see why I'd need it.

This is potentially different with an app like yours, because the benefits of using your app vs using e.g. a spreadsheet seem truly tangible. Using JSON and allowing export at any time is a huge plus. Having it a web-only kinda makes sense, as your app seems to be geared toward teams (and any serious parts managment makes a lot more sense when you are not alone).

While the additional work that would have to go into documentation and programming if you were to offer this as a self-hosted variant is non-trivial, from my standpoint offering the option to self-host can also be seen as an act of communicating: "Don't worry, whatever happens to this project, you will not lose the time invested".

I am not obsessed with this kind of reliability, but I just want to avoid the future hassle of having to deal with this, especially when I put a lot of time into it.

That's not an unreasonable assumption, right? It's just an extrapolation from how desktop/local software has always worked. Demanding a subscription for something that doesn't, technically, need any maintenance or ongoing costs on your part is a relatively new thing. Yes, of course, people expect a stream of updates these days for security if nothing else. But you could sell a version of your app that's static - urgent bug fixes only - and then sell it to them again when a new major version is released, old school style.

A lot of devs have moved away from that because subscriptions provide better peace of mind and smooth out the income stream, but it's not clear it's really better for users. Certainly they lose some optionality and there's less market pressure on the suppliers to ship big new features that motivate upgrades.

I believe a one-time fee for software is fundamentally unsustainable, and a relic of the past, when we had no networking and our operating systems did not evolve quickly. Every piece of software needs maintenance and support.

Also, a lot of people seem to forget that even if we pretended that support isn't necessary, just the existence of a standalone version imposes ongoing costs: there is more testing, and the scope of changes that can be made is more limited.

Subscriptions are the only sustainable way of maintaining software in the long term. We can either accept that and move on, or keep pretending we "buy" software that will work forever, and then pay every year for a "major new version", which apart from the fundamentally fictional nature of the deal, results in developers cramming in new and unnecessary features instead of focusing on software quality.

> a relic of the past

I could be a potential customer, since my company works on the embedded market and we design and manufacture all our PCBs.

There was a time the company has a moment of big growth, and we looked into solutions like yours. The question was: what would happen if this guy (not specifically you, others similar to you) closes tomorrow? The answer generally was: you can export all your data and then you feed it into your local database. So the matter ended with: OK, we'll stick with our current database. Goodbye.

At the end of the day we saw such services as a glorified data-entry/store/search database that hardly will suffer many modifications or require updates, and since pretty much of our distributors (apart from mouser, digikey, avnet, etc.) won't support data exchange and PO/Offers will be negotiated in the traditional way (also negotiating price reductions). No need for VC funding or anything else.

We need to be able to open an schematic/BOM/whatever in 10 years and be able to operate as if it was just created yesterday, so much of the software we use is "a relic of the past" (in SaaS vendors eyes), but effective and "the way is supposed to be".

Online/cloud or subscription-based CADs? Also kicked out of the door.

With this I want to say that people like the article author seem to live in a little box, thinking on what they use daily as web-service-saas-whatever developers, without realizing the world is much bigger than they think it is.

"...I believe a one-time fee for software is fundamentally unsustainable..."

It is pretty much sustainable. Except few natural cases SAAS is just a wet dream of vendors trying to have constant revenue stream the easy way.

If software needs support/new features you pay for upgrades. Or not if the old version works just fine for you. In my case I use tons of software for which I only paid once (sometimes I pay for upgrades as well). If I had to pay monthly fee for each title I use I'd be spending insane amount of money.

> I believe a one-time fee for software is fundamentally unsustainable

I strongly disagree. I have always, up to this day, sold software on that model, and I will probably always do so. There's nothing unsustainable about it.

What is true is that rent-seeking can be much more profitable. But fortunately for those of us who find it very distasteful, it's not necessary.

what do you do when you've saturated the market?

one-time fee only works if you have a constant stream of new customers into your niche that is sufficient to pay your salary.

with upgrades or subscriptions you can have a group of users who love what you do and are happy to support you without having to hope for new users and constantly market to get them.

> what do you do when you've saturated the market?

If I want to continue active development on it, then I sell upgrades. If not, then I'll continue to do free maintenance releases, but my main business will be from different products.

> one-time fee only works if you have a constant stream of new customers into your niche that is sufficient to pay your salary.

You can continue to get income from existing customers. You just have to provide real value to them in exchange for it.

Worth a thought. How does desktop software achieve backwards compatibility? For example Libreoffice can work with arbitrary datastores from the 1990's. Meanwhile with modern web based software we struggle to maintain compatibility within a single datastore.

> Maintaining a self-hosted solution is an enormous cost, which most people forget about. You need to create, document, maintain and support an entirely different version of your software (single-user, no billing/invoicing, different email sending, different rights system, etc).

I think you're not looking at this from a local-first perspective. From that perspective you can have the same app locally as on the server. There's only one version. Yes, it does require more planning and atypical approaches, but it's 100% doable.

I'm in the planning stage of a local first web-app that will have a server side version, and it's literally going to be the exact same code on both.

I can see some arguments for having _slight_ differences between server and client software but nothing that isn't easy for a solo-dev to maintain. Mostly set it up once and never touch it again type things.

thinking local-first has fundamentally changed some decisions i normally make without thinking for web apps, but I think it will absolutely be worth it for my users in the end.

Very exciting. Could you please elaborate on what technology stack you're planning to use? Language, frameworks, etc.

And how do you plan to achieve that? What will be running on a server side? Will it be headless or server-rendered?

Do you have a GitHub repo with that project?

Sorry for lots of questions and thanks in advance!

> The best part is that many people think that an on-premises version should be less expensive than the online version, and come without a subscription.

Which makes sense IMHO, provided they are not expecting any updates to their on-prem installation. It can just be a fork of your current codebase with no new features or warranty. Maybe you can include some terms for critical security updates but that is about it.

But that, unfortunately, is entirely unrealistic. When I discover critical bugs, I can't leave users out in the cold.

I believe that the whole concept of "software without support" is fundamentally flawed.

Agreed. But if a user agrees to be left "out in the cold", you should allow him to. Given that it is not very costly for you to prepare your software for on-prem, it maybe a win-win for all the parties involved.

> Given that it is not very costly for you to prepare your software for on-prem

I have been asserting exactly the opposite in this thread.

Assuming you are not providing any updates (not even security critical ones for argument's sake), you only need to package and add documentation for an on-prem installation. Why is it so costly?

> you only need to package and add documentation

None of which are insignificant.

In addition to that you must now support (and test for) a million configurations, rather than just one.

FUD. Among my products I have desktop software. Sure I got some pains upon initial release but it's been literally years since I really needed to test "for million configurations".

I also have experience developing and maintaining cloud solutions. Total amount of work that goes into large cloud apps and amount of things things that can brake or not work properly is fairly impressive. Definitely not any less then on premises

I think we as engineers owe our users bug fixes, because we messed up at some point. However new functionality or email/phone support? Well, this is what people are not entitled for.

Only for a limited time though, releasinga piece of software shouldn't be a lifetime commitment.

I sell nothing but on-prem software, and have never left my customers "out in the cold" or without support.

Maybe you should look in the mirror. If you are actively developing software and adding new features - sure, you might be also developing critical bugs.

In my case for example I have few older products (desktop apps that still sell and bring some money) for which I have source code that has not been touched for years. Not interested in development new features either as those apps are mature and there is not much ROI in developing new features the whole 2.5 customers are willing to pay for.

Long story short I do not have any critical bugs there. Maybe do exist and hiding somewhere but since nobody had ever discovered them I am totally cool. The only occasional customers complaint I have every once in a while is RTFM type. Not real bugs.

A lot of software doesn’t have to be “hosted” anywhere. The first question I usually have when trying out some web-based software is: “Why on earth can’t this be a native application that I just download and run? Why are they shoving a web browser and server into this thing?” With a cross-platform native application, you bypass the whole on-prem vs hosted question entirely.

I wonder how often the answer is simply “We know lots of JavaScript programmers and not a lot of Qt programmers.” Talk about the tail wagging the dog.

As a user, I strongly favor software I buy once, download, and run as-is, forever, untethered to the Internet.

i agree but would add that there are a number of reasons why your browser as the client is a better long term strategy that minimizes maintenance. Unless you want a really generic geeky bare-bones UI. Native UI toolkits don't have a long lifespan. You constantly need to tweak what you've built for changes in the latest version of the OS.

However, that doesn't mean it needs to be "hosted" anywhere. It's trivial to make an app run a little web server that they access with their browser instead of a native or QT UI.

Browser based interfaces are an excellent bet (probably the best bet) for "run as-is, forever" or as close to "forever" as we can reasonably get right now.

> Believe me, if you don't have VC money to burn, you will not be experimenting with CRDTs and synchronizing distributed data from a multitude of versions of your application.

While you're right that doing this is less convenient, I can say with the personal experience of developing a few such projects on a shoestring that doing this isn't as hard or expensive as you're making out. It's entirely doable. It just requires thoughtful engineering.

Martin Kleppmann is a major inspiration for our startup, Ditto.

We take the local-first concept and p2p to the next level with CRDTs and replication. But what we really do is leverage things like AWDL, mDNS, and or Bluetooth low energy to sync mobile databases instances with each other even without internet connectivity. www.ditto.live

Check it out in action!

https://youtu.be/1P2bKEJjdec https://youtu.be/ITUrk_rjnvo

We found that CRDTs, local first, and eventual consistency REALLY shines in the mobile phones since they constantly experience network partitions.

Very interesting. Will the upcoming server support be end-to-end encrypted? In other words, will the server be able to read the data?

I made "traditional" enterprise-style-apps for small business and have tried to crack the sync of data several times.

Exist a resource in how leverage that tech with boring stuff like inventory, invoices, etc? Hopefully,. without a total change of stacks (I use postgresql, sqlite as dbs, and need to integrate with near 12+ different db engines)

Cryptomator [1]. Cross-platform, allows you to encrypt your data in the cloud, and access it transparently.

Thing is, like Syncthing, it lacks a collaborative feature. Nextcloud has it, but only if you have the Nextcloud accessible (I want to host only on LAN). Something like IPFS (or Tor) is a solution to such problem.

[1] https://cryptomator.org/

> it lacks a collaborative feature

It does not, you can share folders with anyone you want without them even needing an account.

there's also resilio sync that gives you the option have the node/folder on your vps be encrypted. hopefully syncthing will add that feature at some point in the future

I have been considering spending more money than I should finding someone on a freelancer site to add the untrusted node / encrypted file transfer but don't give a certain node the key to decrypt the file support, that is the number one missing feature for syncthing for me.

It would unlock scenarios like share your files with me and I'll share files with you and neither of us will know the others files unless your house burns down in which case you can get your files back just by hookin the sync back up with the encryption key pulled from your password manager.

I'm on Android so it works, but I also used Nextcloud for some time and had issues with it "forgetting" about the data folder, so I had to re-create it every time it happened. I wasn't happy about that.

With Syncthing I don't (yet) have this problem.

I may be biased as the maintainer of PouchDB but you can do all this today (and for the last 5+ years) with PouchDB.

The comment about CouchDB and the "difficulty of getting application-level conflict resolution right" I am not really certain how it applies, You dont have to handle conflicts in Pouch/CouchDB if you dont want to, there is a default model of last write (actually most edits) wins but you can handle them if needed

Hi Dale, I'm one of the local-first paper coauthors. I'm a fan of PouchDB (thanks for that) and the whole CouchDB lineage--the CouchDB book[1] was an early inspiration in my exploration of next-gen storage and collab models.

I've been down the CouchDB/PouchDB path several times with several different engineering teams. Every time we were hopeful and every time we just couldn't get it to work.

As one example, I worked with a small team of engineers to implement CouchDB syncing for the Clue Android and iOS mobile apps a few years back. Some of my experience is written down here[2]. After investing many months of engineering time, including some onsite help from Jan Lehnardt[3] we abandoned this architecture and went with a classic HTTPS/REST API.

Other times and with different teams we've tried variations of Couch including PouchDB with web stack technology including Electron and PWA-ish HTML apps. None of these panned out either. Wish I could give better insights on why--we just can't get it to be reliable, or fast, or find a good way to share data between two users (the collaboration thing is kind of the whole point).

[1]: https://guide.couchdb.org/

[2]: https://medium.com/wandering-cto/mobile-syncing-with-couchba...

[3]: https://neighbourhood.ie/couchdb-support/

I'm currently working on an app using PouchDB, and the approach I've taken is using one database per "project" in my app. I'm not there yet, but I'll use another database to manage users & access control. These aren't things you want to sync anyway; I might even end up with a regular SQL database for this (I haven't decided yet).

I hope this approach avoids most of the pitfalls you mentioned.

Your Git analogy is also spot-on, but I think you don't take it far enough. Creating a repo is cheap, and I believe CouchDB databases are, too (altough I'm still very new at this). You seemed hesitant to create too many.

Good point about notifications, though. I think you'll still have to have a server process that manages that kind of thing (and probably inserts notifications in other users's databases).

I've been using a combo of PouchDB/CouchDB in my app[0] for the past few months, and I find it a hard combo to beat at the moment. I just haven't found anything else that works as seamlessly. While going through the article I found I was able to tick most of the boxes thanks to PouchDB.

[0] https://mochi.cards/

> you OWN YOUR data

Like most people here I'm fairly hard line when it comes to personal data abuses but I still struggle with the concepts of owning data about yourself. It's a confusion I see amongst less technically literate people when a well meaning person explains to them the importance of some latest data breach and they try to understand the concept that they owned this data, it was theirs but now it has been "stolen" or abused in some way.

I would consider going as far to say that framing the data as owned by you is a bad approach, but maybe I'm just being pedantic about the language. Company A does have data about me, but I don't own it, and they have responsibilities to protect it (or delete it if requested), but I don't see any ownership in the equation, especially when the nature of the data can become quite abstract while still maintaining some reference to you.

Not to take away from the intention or sentiment of framing it that way though, I'm just musing.

I don't entirely agree. People may be confused about data, but they are not confused about ownership. They understand ownership quite well. By sticking with that framing, you only have to get them to understand this ephemeral thing. Once you do, they can immediately apply all their concepts of ownership to it. To replace ownership with something else, well, now you've still got to get them to grasp the data, but also this something else, which might necessarily be somewhat abstract.

I'm not even quite convinced they can't grasp the concept. People think they 'own' their digital copy of The Rescuers Down Under, when technically they've leased it. They generally understand they shouldn't post their credit card numbers online. I think it's more likely that they just don't understand the consequences, and even if they did, generally feel dis-empowered to do anything about it. That's a dynamic that has existed since before the internet.

The problem fundamentally is yes it's your data. It's what makes you, you. The reason this is important is because if you can't control this data it can be used against you. At some point in the near future if not already you'll have job applications rejected based on third party data you didn't sign up for and can't opt out of. It will impact your healthcare, your loans, what prices you see when you shop and what items are shown. In essence what makes you, you is used solely for controlling you and the options available to you.

The issue then isn't the data, but the fact that it could be used in those ways. Because if it's legal for companies to use that data (even if it's owned/controlled by the user), they will incentivize people to share it in order to give them better prices/service, and the end result will be the same.

This comment made me think about how uneven the expectations are about whether data is owned by the person who collected it or the person who it was collected about.

I think most people would agree that if you take a photo of someone who agrees to you taking it, you own the rights to that photo. But if a company collects data about someone voluntarily using that company's product, the person owns that data? I don't understand where the line is drawn.

Exactly my train of thought. Most people don't consider the photograph to be data where we likely would.

I don't see the article's idea of ownership as universal control, but more focused on being able to use whatever instance of the data you have without needing to rely on a third party.

For example, if I own a book then I can read it, doodle on it, or lend it out at any time. I can't prevent other people from reading/defacing/whatever their copies of the book.

You could argue in the past that PII data was just name, birthday, SSN, etc. However, nowadays pretty much anyone can be identified by their metadata so that's where I think this broad sense of ownership comes from.

What's ownership? It's the right or ability how something is used and what happens to it. The term fits here.

consider the concept that lots of businesses operate perfectly fine whilst temporarily having full physical control over physcial objects owned by their customers

why not treat data the same way ?

yes it will be very disruptive to some businesses. i hope.

Ownership is precisely the right word to use.

Companies should possess your data, not own it.

"It should support data access for all time." - This is key for me after I had to convert my notes more than once between formats after the original app(s) went into extinction (beloved Circus Ponies Notebook).

That's why I'm designing any new apps around a file format that can be accessed even without the app.

I have a "local-first" Kanban/Trello-style app, "Boards" (http://kitestack.com/boards/), that uses zipped HTML files (to support rich text with images). No collaboration and cross-device support just yet, but it works without a network and saves everything locally.

Boards seems to be Mac only.

"A new Mac app to boost your productivity in school, at work, and for personal projects.", "For macOS 10.14 Mojave and later"

Yes, currently Mac-only, no cross-device support yet, but an iOS app is in planning.

I just tried Boards out. It's really nice and exactly what I wish Trello could be for a single user. Will follow the progress on this for sure.

Thanks for your feedback! I don't have a newsletter signup just yet, but if you'd like to hear about new versions, send me a quick email through the app (Help → Contact Support) and I'll add you to the list.

>It should be fast. We don’t want to make round-trips to a server to interact with the application.

The cloud apps are not slow only because of moving data, but there is also a problem that an average server is fast(16cores CPU + 64GB RAM), but If it's used by let's say 100users, It means one user has only 0.16core + 0.64GB memory. So an average laptop(4cores/4GB) or phone(4cores/1GB) is way faster. Basically people buy billions of transistors to use them only as a terminal to the cloud. Not to mention privacy risks.

A week ago, I did showHN for skyalt.com. It's a local accessible database(+ analytics, which is coming soon). I'm still blown away how fast it is, that you can put tens of millions of rows to single table with many columns on consumer hardware and you don't pay for scale or attachments.

> If it's used by let's say 100users, It means one user has only 0.16core + 0.64GB memory. So an average laptop(4cores/4GB) or phone(4cores/1GB) is way faster.

This is overly simplistic. You're pretending that cores/memory are "allocated" to users, but really, a user might only make a few tens of requests, and the server only needs to spend a second or two servicing each request. On a server with only 100 users, it could very well be the case that a user has all 16 cores + 64gb available at the time they make a request. Also, as another commenter pointed out, you could use a large chunk of that memory for shared resources, and then each request might only need a few mb of memory to service a request.

That's not a fair comparison since most of the memory usage is just loading the app in to memory and then everyone is sharing the same app already loaded. Web apps don't have to be nearly as slow as they are. It's just that it's easier to make a slow app than a fast one. Also desktop apps are becoming super slow and bloated now thanks to electron.

> Also desktop apps are becoming super slow and bloated now thanks to electron.

I can't quite get this point. From my perspective software engineers love/adore electron applications.

Look at VScode as the example:

- electron based javascript application

- telemetry included

- proprietary build with "open core"

It is literally the most popular code editor right now (p.s. I don't use it). Why as a tech savvy user you will use something you don't like for 5-10 hours each day to do your work?

Only answer I can see that electron is not an issue here.

Obvious answer: it's the only IDE-like thing that's both well supported and caters to the huge population of JavaScript-only developers, who generally don't want to use IntelliJ or similar products because they can't customise it using only web-stack skills.

That is, if JetBrains had made JS plugins first class citizens of their products, possibly VS Code wouldn't be as popular as it is.

I use vscode even though I don't particularly like it. Imo rubymine is a better editor but vscode does 90% of the job while not costing $300/year.

I'm curious where you live. Is $300/year a big expense for you? Computers and developer salaries are usually much higher.

Sublime was the leader before VSCode and it had python plugins. VSCode is popular because it works so well with TypeScript

As a user of VSCode, I use it in spite of it being based on Electron because there are no other good alternatives. It is noticeably very slow on many tasks, and has a large startup time. I recently had it basically become unusable when opening a 2MB yacc file and had to switch to Sublime Text to edit the file. I much prefer Sublime Text in terms of speed, but VSCode simply has more and better extensions for different programming languages making it better for day-to-day coding. If somebody were to make a performant text editor that has similar features to VSCode I would switch in a heartbeat.

I was a strong supporter of VSCode for a long period of time. Now I realise there is always an alternative.

After taking time to learn my tools I can't use vscode anymore because of how inefficient and restrictive it is now for my workflow. I mainly prefer tools that can last a lifetime and time you've investend now can yeld much better results across several decades of usage.

Here is short list of tools I can't live without: magit, org-mode, undo-tree (actualy it was a feature request and vscode team said it's too compticated for broad audience), ability to hack your code editor as you wish. Ability to work all day without touching a mouse once.

VSCode is proof that an Electron app can be performant, but that's not the rule, and most people writing Electron apps aren't as talented as the VSCode team.

Also, there's an expectation that an IDE will be somewhat heavyweight. I don't mind if VSCode or IntelliJ grabs a few gigs or RAM, because I live inside those applications and depend on the features they provide.

What I don't want or need is Yet Another Electron Based Markdown Editor that gobbles up half my laptop's memory so that I can edit a hundred line text document.

I do not love electron applications. They gobble RAM. Allocation is the enemy of performance.

I stopped using VS Code because of the billions of files it ships with (does anyone remember DLLs these days or what?) and its performance and debugging capabilities were very disappointing.

Electron makes it easy to build badly-performing apps. VSCode has put a lot of effort into avoiding this.

Sad about the Electron route. I don't consider it to be "desktop" software. Native toolkits are miles faster than the extremely lazy approach of running an entire new browser instance because someone wanted to write javascript for their "app". It's terrible.

In my opinion it's not all about lazyness.

You have to consider the costs of building an app native for each platform vs a cross platform framework built with javascript. Single javascript code base for all platforms is a lot cheaper.

I am not sure on that really. I built a cross-platform control software for high-end audio equipment that ran on macOS and Windows and did all of its own drawing (charts, graphs, ALL controls [knobs, buttons, grids, faders etc etc etc], 2D overview with links of all devices, 3D scene of venue layout with OpenGL, diagrams, alerts and floating popup windows). No bitmaps used anywhere, all generated in code. Another guy handled the network stack that it used which was an entirely-bespoke protocol and he wrote the control firmware on the equipment. This was in C++ with wxWidgets. It redrew every 50ms and logged data received so you could keep historic data of all devices on the network.

I am pretty sure that all of these giant companies releasing garbage Electron apps have a bigger budget than the budget on the 2 people (us!) that wrote what we wrote in 2.5 years.

When I see how abysmal apps like Skype and Slack are, I despair. Colossal amounts of RAM and simply displaying text and pictures.

It might be more convenient for the developer to write in the first language they learned but it is producing giant bloated applications. More CPU cycles, more allocations, more power consumed, shorter battery life on mobile devices (laptops), more charging of devices, more fossil fuels burned.

The difference between you and the big corps is, you care, they don't.

You care about delivering a high quality and high performance application, big corps just want to sell a product.

Also, just because they have a bigger budget doesn't mean they are all about spending it, they probably want to squeeze it as much as they can.

> Also desktop apps are becoming super slow and bloated now thanks to electron.

The people who build slow electron apps would have built slow native apps, if they could build one.

Kleppmann co-created/major contributor to Apache Kafka along with Jay Kreps and Neha Narkhede. He also co-founded Rapportive, a YC company acquired by LinkedIn, along with Rahul Vohra, who is presently the CEO of Superhuman.

Nearly all of Apple’s first party software works this way.

Notes, Reminders, Pages, Numbers, Keynote, Voice memos, etc

All using iCloud APIs to synchronize what is essentially local-first software.

You could even count Mail, Contacts, and Calendar, although they rely on more established protocols to sync.

Today's SaaS world is largely economically opposed to the idea of data ownership. It's a lot easier to make money by renting people access to their data.

The problem is not inherently technical. The solution must address the fact that the software businesses favor cloud solutions and other systems that make it difficult to stop spending money

Yes. I always think data freedom is more important than software freedom. For example it matters less that MS Word is not free as in freedom when you can open the file in something that is.

And cloud SaaS are enemies of both! After all, in order to have data freedom, you need to have something to open in that other program, and cloud solutions do their best to not give you proper open/save file (and even if there's an export function, and even if everything is actually included in the export, it often isn't followed by import function that could read that export).

For this reason, I avoid using cloud SaaS for anything where can avoid it.

Absolutely. There are some exceptions though. Github is the obvious one. Dropbox by nature has your data constantly exported. Google for its sins has Google takeout.

Yes. And the first two I use, treating them mostly as dumb pieces of infrastructure. Arguably, the functionality they provide have a crucial ops component that I'm all for paying for someone else to handle it for me. But neither Github nor Dropbox locks me into anything.

Google - yes, web e-mail obviously is similar to the above; as for their office suite, I recently found a good excuse to justify shelling out for a proper Microsoft Office subscription (though I don't like that it's a subscription), and I stick to using the faster, locally-available, file-using, much more powerful (if still proprietary) software.

Seperating the two so cleanly is inappropriate imho. Software is the access layer around data, hence

  "Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious." -- Fred Brooks, The Mythical Man Month (1975)

I would even say that software freedom implies or even requires data freedom, while the reverse does not hold. As an example I don't think that Facebook or Google's data export tools make them exemplars of user choice or control.

Yes, this is it.

We went from local, to local infrastructure/IT, to cloud services in which data is mixed with other data and ownership is nebulous.

The business as you say, favours this, because the 'downsides' of the model are more on the side of risk (your data leaking, losing rights to your data, your data being 'sold' etc.) - which we don't like to pay for until it's too late.

Unless there are 'big scares' or regulatory requirements etc. I don't see that much changing.

But a single major leak from Salesforce or say Google Docs could see a massive shift in how we think about such things.

If Equifax or even Ashley Madison leaks didn't cause any shift in the software culture, I'm having doubts a breach of Salesforce or G Suite would.

I've been working for a few months on a database called Redwood that's intended to make it easier to build this kind of software. Having spent much of the past couple of years working with libp2p, IPFS, Dat, and similar technologies, I was curious to see what would result if I started from the ground up.

https://github.com/brynbellomy/redwood

So far, the model seems promising. It's fully peer-to-peer, and supports decentralized identity, configurable conflict resolution, read/write access, asset storage, and currently is running across 3 different transports:

- HTTP (augmented with extensions proposed by the Braid project [1][2])

- Libp2p

- WebRTC

I've included two simple demos, a collaborative document editor (well, it's just a textarea at the moment), and a chat app. Would appreciate any feedback or participation that folks are willing to give.

[1] https://github.com/braid-work/braid-spec

[2] https://groups.google.com/forum/#!forum/braid-http

Hi everyone, I've been working on this subject for a few months already ;

Thank you OP, your work is wonderful to read and even though I've spent a few months on the idea already I haven't thought of reusing Dropbox or similar. I think exciting things are about to come :)

I'd like to submit Working Group proposal to the IETF.

Why would we need an RPC for Independent Apps?

Independent Apps are surfacing as being a solution to the lack of control of our own data. oAuth Framework has allowed a more secure web, but even if it makes a difference between an identity provider and a resource host, it does confuse the resource and the service hosts.

Independent Apps should NOT be claimed by a lone company, let's make it something that the web owns.

How would it be structured?

I personally believe there should be multiple subjects treated by the IWA Framework, as one being the qualities of independent apps, and second being how data is accessed. Both of these are currently Topics of Interest for the IETF : https://ietf.org/topics/ - However the way this Working Group would proceed should be discussed and decided by it's members.

Why not submit a single person draft?

I could propose a draft but it wouldn't have the same meaning as if it would be drafted by a Working Group. As individuals, we are motivated by our own agenda and the quality of said draft wouldn't be the same. I'm volunteering, but I'd like to allow other persons to join in as well.

You can join your mails here https://forms.gle/igNdd6rH4MnPK8rb8 , at December 6 I will send the Working Group proposal to the IETF with gathered persons, if accepted I believe it should remain open to anybody to join.

Isn’t Office 365 the platonic ideal of a local first software (suite) by this definition?

High quality desktop apps, data saved in discrete documented file formats, optional ability to save in the cloud, the presence of collaborative editing, privacy is protected if you’re using it locally only, etc.

any marginally successful "local-first" app is going to go and raise $10m in vc, switch to software as a service, and add an enterprise mode that requires user permissions and data access to be managed on the server

I don't see what's wrong with that. Local First really just means distributed, fault tolerant, and eventually consistent but designed for user devices instead of a cloud "scale" service.

Why couldn't an enterprise run a "device" (a server) which others can easily sync to ("sync.enterprise.com") and which also only allows authorized users to access data which they're allowed to access? Maybe using Macaroons or something and devices can still sync locally via Bluetooth, Wifi, or whatever.

Now you have a full back up of everything on that server which IT could now more easily ensure it backed up, secured, and etc.

Not to mention the same idea could be used by a normal person just running a NAS at home or a server in DO/AWS/GCS/etc.

Sure, any one company probably will - but there’s a whole market.

As soon as that one company abandons the local-first model, a gap opens, which will (usually, eventually) be filled by a new company offering local-first until that new company does the same.

As long as the companies don’t band together and agree to end it, there should be a company offering that model somewhere somehow.

Even in the situation I described, the original company would leave the free / client side version on the site as free marketing. This is the standard today for enterprise-monetised open source software.

Solving this problem isn't about being local-first. It's about being local-last. You have to be able to make more money by selling a software license than you make by selling equity and chasing user acquisition and retention.

Then we'll see people waking up to the fact that all this proprietary data is a liability and subscriptions are golden handcuffs and people will finally get back to making real software again

Because local-first is not a viable business model compared to the cloud. Software goes where the money is.

You don't have to invent entire new paradigms such as CRDT for this. Unix is all about site autonomy, no-BS tooling, simplicity, and portability. So for your next project, consider Unix/Linux as deployment target during development, and only then deploy it to a cloud-hosted Unix cluster, with a local-first but cloud-hosted DB such as PostgreSQL and standardized middleware such as AMQP/RabbitMQ/qpid rather than provider-specific solutions, or at least use de-facto standard protocols such as s3 and MongoDB (if needed) and supported by multiple clouds. Many people are prematurely committing to k8s and "microservices" but in my experience, even though k8s as such isn't intended as a lock-in strategy, it has the effect of absorbing so much energy in projects (with devs more than happy to spend their time setting up auth, load balancing, and automating things rather than on business functionality), and then still ending up with a non-portable, incomprehensible mess of configs and deploy scripts that it just isn't worth it.

My view on this is a bit different. I see Kubernetes as the abstraction layer on top of the cloud providers. In the last few years I have set up multiple k8s clusters for clients who specifically do not want to be locked in to a certain cloud provider. Once the software is running on top of k8s it is easy to switch cloud providers without changing the software.

Switching to another cloud provider this way is trivial and usually only involves changing the Terraform configuration to setup a k8s cluster on another cloud. All k8s-specific config/deploy files can be reused on the new cluster.

This of course only works if (as you suggest) you stay away from cloud-specific services (SQS, aurora, ECS, S3) and run everything in-cluster, or use managed services that are available on multiple providers (Postgresql via RDS, or Digital Ocean managed Postgres, Cloud SQL on GCP)

> Switching to another cloud provider this way is trivial

Based on my limited experience, I highly doubt this. Have you actually deployed cross-cloud k8s setups, or is this merely a theoretical statement on your part? Deploying to another cloud provider brings a whole new universe of failure modes and auth quirks, let alone migration and switch-over woos.

I’ve done a couple cluster switches from k8s on AWS to other providers like Digital Ocean and GCP. As far as I recall we had no issues and one of those was done in about an hour were most of the time was spent waiting for pg_dump/restore.

Note that most of these were not production clusters so switch-over was just data restore and DNS changes.

I build clusters from the start to not use cloud-specifics where possible and all cloud-specific configuration is on the cluster edges in terraform which you have to rewrite anyway when switching clouds.

Auth things like IAM permissions are not an issue if everything is “in cluster” and auth/permissions are checked there.

Most of these deployments consist of several application servers, PG databases, redis, rabbitmq etc

Just wanted to point out that iTunes has had a local focused set up since inception, using xml format for a library’s database.

That seems to still exist with the introduction of Apple Music. So all library data (play counts, skips, file locations etc) are stored locally, but streaming files are hosted remotely.

Although whether this was by accident or design I have no idea.

My side project is a local first (local storage on web) JAMStack. For extra goodness it’s mobile first too.

I really love making apps this way for some reason. I think it’s the focus on just the UI and not worrying about the back end until later.

For this particular app I’d consider “smartwatch first” to have been better as its for fitness!

The data ownership is very important for business world. The reason why we built our products for IoT ( www.bevywise.com ) as a more install able version is the data privacy and ownership.

We really see this largely in manufacturing industry. If not local, we should provide private servers and data security.

This. I want this. It happens to me quite often that I feel demotivated or uneasy about using some software because I know I'm producing valuable data specific to me that I could use in the future, but the product stipulates that I don't get to keep the data. iOS apps are often the worst at this. There's no filesystem, so all data has to be kept with an app. I loaded all of the books I wanted to read into iBooks, and then it turned out that the iOS backup skips books that you didn't buy from Apple. Bye bye book collection T_T.

Some of this could also be alleviated by Tim Berners-Lee's pod idea: https://solid.inrupt.com/. But local-first is better. I just want files on my machine.

I've been following a local-first methodology without realising it for an app that I've been developing. It's a workout-tracking app called harder better faster fitter. It's designed for mobile use in the gym.

https://harderbetterfasterfitter.com/

At the moment the app is a local only service and there aren't any backups. Next year I plan to add a backend. I'll be keeping some of the ideas in this article in mind. Currently I'm using the browser's local storage api to store data locally. It mostly works, but will be bolstered significantly with a cloud backup.

I learned of Plan-Systems.org, they’re working towards something like this. Their company is non-profit, their collaboration tools are open source and protocol and the service is built on Unity and Unreal, which makes it cross-platform.

I've been working on SPA's that launch from keybase.io public folders and can talk to my local KBFS storage, which is encrypted locally and then and distributed to the cloud. This way I can access my own data anywhere I have keybase installed, using apps I don't need a server to host. It's still all just prototype work for myself, but im excited about owning my own data while still having the safety of cloud distribution combined with the security of local encryption.

I've been building apps that meet this criteria for several years now. It's nice to see the concept getting some attention here.

The only thing I'll point out is that CouchDB getting rated "partially meets the idea" seems pretty weak to me. They reference v2.1 but the latest version is 2.3.1 and here's link to the docs on how conflict resolution is dealt with:

https://docs.couchdb.org/en/latest/replication/conflicts.htm...

If finer grained control is needed it would be up to the developer to implement it, and it really shouldn't be difficult to do that.

In my case, I use PouchDB to perform a "live sync" with all connected users so they all get the latest updates to a document. If a conflict arises it's easy for any one of the users to fix and push it to everyone connected.

I love this idea. Especially the end-to-end encryption for data that passes through a server to enable the ease of cloud computing without relinquishing data ownership.

It also depends on _who_ owns the data. In an enterprise environment the company usually has a vital interest in the data and on-premise deployments are a good way of retaining cloud computing without giving up data ownership. I'm surprised that more SAAS products don't offer on-premise given the privacy and ownership benefits. The tricky part there is making software that is easy to deploy and maintain, which might be the reason that it isn't done more often.

A product like Grammarly that allowed on-premise deployment would side-step a lot of the issues with sending all that data to a third party. I can't imagine a law firm ever being able to (legally) sign up for that.

Maybe on-premise installation happen but are not advertised? I have experienced one case in a previous company of a deal large enough to justify one for a client with sensitive data. No need to say operations were not happy, but I don’t have much informations to judge if it was a good deal for the company in the end.

The Holo / Holochain project was founded with this principle as a primary goal:

https://holo.host

We’ll be implementing CRDTs soon, but the concept of local control of all data, authenticated and encrypted communications, etc. is implemented.

One fundamental difference between apps that support this and those that cannot: agent-centric vs. data-centric design.

Strangely, many “distributed” applications (eg. Bitcoin) didn’t make the “leap” to agent-centricity, and thus missed out on some key enabling optimizations.

As a result — they are forced to implement “global consensus” (expensively), when they didn’t need to, to achieve their goals: a valid global ledger, in Bitcoin’s case.

It turns out that, to implement things like cryptocurrencies, you don’t need everyone, everywhere to agree on a “total order” for every transaction in the entire network!

Agent-centricity, FTW!

So I got fed up with current image hosting solutions the other day, because I realized free image hosting is unsustainable and Imgur has turned into a social network, which is the opposite of what I want.

So, I figured I'd create my own paid one, and am working on https://imgz.org. However, I want to add a free tier for people who are willing to host their own images, and was thinking of writing a daemon that would run on the user's computer and store all their images on a directory there. It would have to be mostly-on, but not always-on, since I'm going to be using a caching CDN.

Is this a good idea? I don't know how many people would know how/want to run this, but it feels empowering from a data ownership perspective. What does everyone here think?

I've been working on PhotoStructure (not so much to be an imgur replacement, but as a way to automatically organize and share my large (many tb) and disorganized (due to failed photo apps and cos) pile of photos and videos. I'm releasing to a new wave of beta users soon if you want to sign up and try it out. It's a self-hosted solution with a web UI. https://blog.photostructure.com/introducing-photostructure/

As one entrepreneur/engineer to another: Don't underestimate the legal and logistical effort you'll incur from a caching CDN. People post pirated, abusive, and generally bad things, and if it's on your server, it's (becoming moreso) your responsibility. DMCA and takedowns will consume noon-trivial time, and makes simple corporate insurance decidedly not simple (or cheap). It's typical for media hosting companies to hire teams to handle these issues. I was shocked while working at CNET (way back when) when I found out most of a floor (in a large building) was for webshots' trust and safety team.

> It's a self-hosted solution with a web UI

That sounds great, thanks! Is there an easy way for me to distinguish photographs from my photography work from snapshots I took with my phone?

By the way, your "Get early access" button does nothing on Firefox beta with uBlock/Privacy Badger.

> People post pirated, abusive, and generally bad things

Oh ugh :( I was hoping this would be curtailed by the fact that this service is paid-only, although I now realize I might have to rethink my "accepting cryptocurrency" idea. Thanks for the heads up!

> Is there an easy way for me to distinguish photographs from my photography work from snapshots I took with my phone?

Yeah! You can browse by camera (and by lens).

Thanks for the heads-up on the get early access button issue! The link just scrolls you down to the bottom of the page where the login form is. I use FF with privacy badger and ublock (and a pihole) on linux and android, and both of those work. What OS are you using?

> Yeah! You can browse by camera (and by lens).

That's not entirely helpful because I have multiple cameras... Is there something like a smart category where I can specify multiple cameras, or directories, or something like that?

I'm running 71.0b5 (64-bit) Ubuntu, by the way.

IPFS works fine. I'm on the IPFS team and I use it everyday.

Depending on the app, it may or may not be a good fit. The performance you get out of it depends a lot on what features you are using.

There's a whole ecosystem of projects (eg. IPFS Companion, IPFS Desktop, IPFS Cluster) and 3rd-party services which are important to consider when deploying a production-ready app.

There's a lot of work ongoing to solve some of the biggest pain points (eg. content discovery across NATs), so expect the performance profile to improve dramatically in the short term and for it to become an option for many more apps.

> IPFS works fine. I'm on the IPFS team and I use it everyday.

I run Eternum.io and IPFS has been such a pain that I am considering just shutting the service down. The node has been consuming so much RAM and CPU (even though it's behind a caching proxy and the gateway should get minimal traffic) that it was disrupting everything else on the server. The memory leaks have been off the charts for ages, so now I just restart the node every day.

I set up IPFS cluster on that and another machine with the intention of moving node to the second machine, I waited for weeks for files to be pinned between the two nodes but the queued count was going up and down.

In the end I set the pinning timeout to ten seconds and it finished faster, and still a bunch of files didn't manage to pin, even though the two nodes were directly connected (`ipfs swarm connect`). I shut the first node down anyway because I couldn't deal with it any more. At least now the rest of the stuff on the server isn't flapping every day.

And this is on top of the atrocious pin handling that requires you to have a connection open to the IPFS daemon when you want to pin a CID for the entire duration of the pin. I opened a ticket years ago to get a sort of download manager in the node so we could have pins happen asynchronously, but there has been no movement on that at all.

I'm glad it works well for you, but it has been nothing but pain for me. Hell, most of the time the gateway doesn't manage to discover files I have pinned on my local computer.

I really want something like IPFS to succeed, because it has immense potential to literally change the world, but I can't even recommend that people run a node locally because I know it's going to eat their battery and slow their computer down. I don't know why these problems haven't gone away after years of work and millions in funding.

Sorry to hear about your bad experiences.

I think I've experienced some of the same frustrations at times, and I think most of the bugs/problems are well known to the development team and are being actively worked on. It's a very open process. I know there's been a lot of improvement in the year that I've been working on it. The release train moves along slowly at times, especially as we are actively working on improving our testing processes.

Thanks, I'm sure you're working on it, and I hope everything gets solved and IPFS is awesome enough to use for archival, distribution, security, censorship-resistance, and the whole other bunch important things it will solve and that made me create Eternum in the first place.

I'm just frustrated with the glacial progress I'm seeing as an end user...

It’s strange that Evernote is omitted from the list - it is a great example of local-first app.

Their recent-ish history, when the their free tire become limited to only syncing a few devices, illustrates that even if software is fully local, and supports open formats, having the functional cloud matters, a lot.

I love Evernote and I use it daily, but Evernote lacks end-to-end encryption, which is a pity. I store lots of information in that app and I would be more reassured if I knew that it is only me who can read the data. I would even pay more in order to have that kind of encryption. I think that no current feature of Evernote would be affected by encryption as text recognition in images can be done at the client level.

Evernote is included in one of their “what not to do” images, showing that they can’t handle conflict resolution at all.

But yeah otherwise Evernote is pretty good for a single user.

I wrote documentation for configuration files in xsd with the idea that I could use xslt to display them in a browser on any system and use the same files for validation. This worked a year ago.

Now browsers consider local files that access other local files suspect and will refuse to load anything unless beaten. So I now use a python script to run a simple local http server to view my local files from a single "origin". However http itself is already considered suspect and many claim it should be deprecated for https.

In the future I will have to provide a lets encrypt signed https server with valid domain so anyone can view those files on their browser without having to mess with about:config settings or their own certificate cache. The cloud is the future, do not dare to build something that runs locally.

Is there a documented guarantee of that somewhere? Because I am half tempted to make a browser based UI for a few tools, but would rather use Java if there is even a tiny risk that I get the rug pulled from under my feet again.

I'm surprised nobody has mentioned using git-annex (and Git of course) to manage data (full disclosure: I develop an archival management app which keeps its data in git/git-annex repositories). Of the seven key properties, git-annex gives you 1,2,3,4 and supports 5 and 7; 6 depends on how you store things. Git-annex supports identity in the sense that each clone of the repository has a UUID. You can choose to have a central hub if you like but you don't have to (surprise! you don't really need GitHub!). It comes with caveats of course: Binaries can be synced but only textual data is actually versioned, and once you put something in it will always be there unless you use something like git-filter-branch.

Awesome write-up! This brings me back to early 2000's when we we typically owned most of our graphic design software. Yes it was expensive but, there was barley any "cloud" features added and if you wanted a new version you simply had to download the update or simply keep the software you were currently using.

My team and I have taken the initiative to offer a e mail design tool that is considered first-class software to the OS (https://bairmail.com). The last thing I would say is developing desktop apps vs a web app is considerably harder thus most companies are aware they are saving by controlling software updates, versioning etc.

Our software Construct 3 (https://editor.construct.net/) I think meets most of these points. It runs offline, and we never have access to the users project files. You can save/load locally, and it runs in the browser. Game project files are zips with JSON + raw asset files. No syncing with the server needed so it is fast, and a design mistake that is severely hampering some of our competitors!

I'm not entirely sure how supporting collaboration in real time belongs on this list. Seems like a nice to have that isn't really related to the rest of the list.

Is this something like Blockstack could help with?

https://blockstack.org/

I'm having a hard time understanding the differences between "Local-first technology" and something like Blockstack. I'm not saying BS completely solves the issues pointed out in the blog post, but it seems to me its pretty close.

What do you think?

here is a list of the current apps available: https://app.co/blockstack

The paper says pinterest meets the "collaboration" ideal but github doesn't. I'm sympathetic to the idea that nothing meets the ideal, but c'mon.

I'm using NextCloud for that. Nextcloud can be basically used in local-first mode (mimicking dropbox).

My phone automatically uploads all pictures to the NextCloud. Then there are apps. For instance I use Nextcloud with the Music app to stream my own mp3s from my Nextcloud to my phone running Ampache.

There are also collaborative editing tools, and various options to edit all sort of documents in a web UI, and always the local editing fallback (or the opposite way, as you see fit).

A more recent issue, this one with youtube banning google accounts for posting too many emoji's during a live stream,what if all of your business data was on your google account? You would be done unless you could get unbanned. There is something to be said for using local apps. The engineers I work with are often out of range of any network, so having apps that run on their laptops is crucial.

Very cool. Let me add to all of these great examples in the comments. I published this app with focusing on exactly these points just two days ago.

https://stockevents.app/

All tracked stocks stay within the app. You only pull information from the servers and store that information locally for offline use.

I'm creator of BLOON (https://www.bloon.io). If BLOON had been put in the Table 1 in the paper, the values could be:

O O - - O - O

I think BLOON is closer than Git+GitHub.

And great article & paper! They do give us many inspiration about improving BLOON. Thank you!

Resilio sync is a perfect example of such app.

It's basically a P2P based Dropbox with no accounts, full end-to-end encryption and no folder size limits.

It's not open source, but it can work without a central server if you need it to. It's also amazingly simple to set up, much simpler than sync thing.

i used resilio for years and thought the same about syncthing until i switched last year and now I actually prefer how syncthing is setup.

if you pay for resilio there is an option to add all your folders in one go but on some computers I don't want to add all of them anyway so that's not much use to me. with the free version you have to manually add folders one by one but to do that you need the key which means you need to copy them to a text file and add them on another computer.

with syncthing, it will detect other syncthing devices on your home network so you just have to add the ones you want then accept the request from another device.

once that is done you select which folders the device has access to and then a notification will show up on said device asking you to connect. so basically no fiddling with keys or having to store them somewhere secure

(this is all presuming i was using resilio correctly, maybe there was an easier way I was not aware of)

no, most end users did not understand (or cared about) the data ownership problem back then. Times have changed.

Things are happening in that space already! I work at Actyx and we have a production ready stack for local-first real serverless (peer-to-peer) applications. Please take a look at https://actyx.com/ .

If you want an easy way to make a file (or .zip of files) more private before storing it in the cloud for backup/availability purposes, please check out Cloaker: github.com/spieglt/cloaker. It's simple, password-based, drag-and-drop file encryption.

Article mentions the following tweet:

What was the last app you used on your laptop that wasn't:

Terminal

VS Code

Web Browser

OS settings

That guy must be living in a very limited / imaginary world world. I use boatload of local software

I do not have favorites. I use various tools for various activities. Mechanical design, electronics, software development, video, photo, data analysis etc. etc. Way too much to list all. Some apps are being used very rarely but I still need those when the time comes

IPv6 will make it easier to access your data anywhere. I am putting this here because so far nobody has mentioned it.

why would the section marked “Git+Github” have “collaboration” marked as ‘no’? Like that is what it actually does...

Not related to the actual content, but may I please request you to increase the contrast on your website. Grey on grey is very hard to read.

P2P has miles to go before challenging the reliability, convenience and performance of the cloud.

That said, one area undervalued is Partially homomorphic cryptosystems[1], where the cloud never ever gets to see unencrypted user data.

I hope the future is fast-local compute on cached data, with the cloud holding a much larger, encrypted but permissioned data store, offering utility functions like search over encrypted data

[1]: https://en.m.wikipedia.org/wiki/Homomorphic_encryption

P2P & CRDTs are definitely production ready already.

HackerNoon & Internet Archive are using (mine) https://github.com/amark/gun already.

Local-first is very much the mantra of the whole dWeb community. I'm liking this naming "local-first" as an evolution to "offline-first".

Ink & Switch had a good article on this:

https://www.inkandswitch.com/local-first.html

Also, for doing End-to-End Encryption, we've built some really good tooling around this as well: https://gun.eco/docs/SEA , wraps WebCrypto and works across Browsers, NodeJS, and React Native, so you can do some really cool cross-environment/platform apps now.

You might think so from only reading the title, but what the article/paper actually proposes is using CRDTs to create a synthesis of the advantages (for the user) of both cloud applications and traditional local applications.