Federated social identity backup and restore


Version: 0.1.0 (released 7th Feb 2016)

This version:

Latest published version:

Editors:

Contributors:

Repository:

Status of this document

This document is published as a working draft after preliminary discussion in the diaspora* user migration issue and the relevant Loomio discussion. The original spec draft was published in the diaspora* wiki.

Working on the specification

Work on this specification should be done via issues and pull requests in the git repository. Comments can also be given via other ways, for example The Federation mailing list or the above linked earlier discussions. Issues and pull requests are the recommended way to participate however.

Versioning

This specification document should follow Semantic Versioning with a 1.0 released on the acceptance of the first version by a both the editor and possible implementing platform.

Overview

Specification to deal with two common problems with decentralized social sites:

These two problems create lack of identity security and lack of continuity for users of these social networks.

The purpose of this specification is to provide means to protect the identity of users, not actual content. As such, content like posts, comments, likes, photos, or any other content type objects are not in scope of this specification.

NOTE! This specification assumes the servers implement public and private keys to verify authorship of content. If not, a platform implementing this specification should use these methods within this specification.

High level concept

Terms and concepts

Term Explanation
User/identity An object that is backed up or restored
Server A server that is home to the user/identity. Called ‘pods’ for example in diaspora*.
Handle A network wide unique identifier for the identity. For example user@domain.tld or https://domain.tld/user. This could also be a GUID but to allow a user friendly restore process a human friendly identifier should be preferred.

Use cases and flow of actions

The following use cases are related to and describe the flow of actions in this specification.

User creation

Initializing a backup

Manually choosing a backup pod

Manually downloading a backup archive

User requests restore (automatically backed up archive)

User requests restore (manually downloaded backup archive)

User verifies data restore via email

Sending out a moved message

Receiving a moved message (not users old home server)

Receiving a moved message (old home server)

Backup server receiving a backup

Becoming a backup server

Sending a backup to a backup server

Backup server discovery

Servers which support this specification should publicize a JSON endpoint at .well-known/x-acc-backup-restore. This endpoint should contain the following schema:

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "id": "https://the-federation.info/specs/backup-restore#backup_server_discovery",
  "type": "object",
  "properties": {
    "allow_backups": {
      "type": "boolean"
    },
    "allow_new_backups": {
      "type": "boolean"
    }
  },
  "required": [
    "allow_backups"
  ]
}

Servers should query other known servers frequently (1/week minimum) to refresh this information.

TODO: To avoid an extra endpoint, should we use a version of NodeInfo instead?

Data specifications

Backup archive

The archive should, depending on the features offered by the server, contain the following data:

Archive format

The archive should be in JSON format.

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "id": "https://the-federation.info/specs/backup-restore#archive_format",
  "type": "object",
  "properties": {
    "email": {
      "type": "string"
    },
    "content": {
      "type": "string"
    }
  },
  "required": [
    "email",
    "content"
  ]
}

TODO: Define full schema of content or place content keys in the first level.

Archive encryption

The archive should be strongly encrypted using a passphrase given by the user.

TODO: Define encryption method.

Servers sending backups

Servers which send out backups should store the following extra information for users:

column type example
Backups opted out boolean false
Backup server string sub.domain.tld
Backup server receive route string /receive/backups
Backup server fail count integer 0

Servers receiving backups

A dedicated table or other storage should be available for storing backups information.

column type example
Backed up handle (unique key) string user@domain.tld
Backup content large text (encrypted text content)

Settings should be available to allow advertising backup readyness to other servers.

Servers that have received backups should always allow restoring them, even if they stop allowing new backups to be received.

Delivery of backups

To protect from arbitrary storage of data and to validate backup ownership, the backup delivery needs to be signed with the user private key. A receiving server should check the signature against user public key before storing the backup.

Actual implementation on how to verify the delivered packages can be left to individual implementation. However, cross-platform compatibility would improve from using identical methods.

Platforms are free to restrict what platforms they deliver backups to, for example to ensure users are able to restore their identity to a place with a similar set of features available.

TODO: Give example of signing method.
TODO: Should signing method be specificied in the delivery schema?
TODO: Should backup servers advertise what signing methods they support? (if above)

Scheduling delivery

Servers should aim to backup the identities of users at minimum once per week.

Delivery package schema

The delivery JSON message needs to contain the following schema:

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "id": "https://the-federation.info/specs/backup-restore#delivery_package",
  "type": "object",
  "properties": {
    "handle": {
      "type": "string"
    },
    "backup": {
      "type": "string"
    }
  },
  "required": [
    "handle",
    "backup"
  ]
}

backup is signed using the user private key. This is done to validate the content of what is inside. Additionally, before signing, it is encrypted using a user chosen passphrase.

Status codes

Delivery of backup archive

A successful delivery of a backup should expect to receive 200, 201 or 202 status code. These should all be counted as successful delivery of backup.

Refusal to accept this backup should be indicated with a 403 status code.

Any other error code should be understood as temporary problems with receiving the backup.

Moved messages

On a successful restore of user identity, a moved message should be sent out to all known servers.

The moved message is signed with the users old private key and should contain the users new public key.

In the case of the moved message failing to be understood by the recipient, due to lack of support for this specification (non-2xx status code response), a retry should be scheduled. Since the users old private key is not kept, the scheduled moved message should contain the fully prepared moved message to send. A server should retry moved messages for a minimum of 6 months. The time between sending out retries can be lenghtened over time.

Moved message schema

The moved message JSON needs to contain the following schema:

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "id": "https://the-federation.info/specs/backup-restore#moved_message",
  "type": "object",
  "properties": {
    "old_handle": {
      "type": "string"
    },
    "new_handle": {
      "type": "string"
    },
    "new_public_key": {
      "type": "string"
    }
  },
  "required": [
    "old_handle",
    "new_handle",
    "new_public_key"
  ]
}

Receiving a moved message

A server receiving a moved message for an identity that exists locally either as a local user or a remote profile, should do the necessary internal changes to map the user to the new location, discarding then the old public key and old handle.

The server receiving a moved message should ensure all local and remote content stored now points to the new handle.

Security considerations

A user backup must be encrypted strongly so it can be safely sent for storage to other servers.

A user backups must not be allowed to be restored without email confirmation. The dual protection of passphrase AND email confirmation is to avoid identity theft in the case that the user passphrase leaks out (from sending server database leak for example).

All signed backup deliveries must be verified against a recent public key of the sending profile.

Servers should not allow restoring any uploaded backup archives unless the user to be restored can be found either existing as a remote profile or by fetching the to be restored remote profile. This is to protect against uploading of faked identity data for identities that have disappeared off the network.

User discovery and public key

This specification assumes any to be backed up users have their public key available via common known discovery routes.

TODO: Should the spec take opinion on where users can be discovered from?

License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.