[MG] Fwd: Re: Democratizing Blockchain Governance in Versioning

Discussion:

Mikael Sand

2018-04-17 16:31:32 UTC

Had sent this only personally by mistake, sending to the list as
intended, before sending the reply to the reply that follows in another
mail.

//M

-------- Forwarded Message --------
Subject: Re: [MG] Democratizing Blockchain Governance in Versioning
Date: Sun, 15 Apr 2018 21:09:44 +0300

Apologies for the delay in getting back to you: I only had very
limited Internet access for the past couple of weeks (camping).

1) Since you have to trust the executive branch to actually
*implement* any of the decisions made using the system, why is it so
much to ask to trust it to run the decisionmaking system itself?

One problem I can think of is the deletion / censoring of data, this needs
to be externally monitored and verified in a both centralized and
decentralized systems.

Again, if you don't trust the government not to do this, how can you
trust them to do anything else?
And as a practical matter this also isn't a concern anyway IMHO as
long as you use an open voting system where anyone can go in and
verify their votes (surely someone will notice if their vote gets
deleted/changed).

Well, it's not really about trusting them or not, but about removing the
need to trust the government, software, hardware and network at all. I
certainly don't trust the cpu fabs nor the instruction sets (public and
otherwise) available in most modern cpus, likewise for almost all
compilers and operating system kernels, seL4 is getting there, but until
we can fabricate our own hardware, or validate hardware produced by
others, there is very little reason to expect anything but several
serious security faults in almost all networked electronics. I'm not
even sure I trust set theory and first-order/predicate logic, it might
contain a contradiction still. I certainly prefer homotopy type theory
and intuitionistic/constructive logic or some other framework without
the law of excluded middle, I very much prefer my functions computable
and sound.

You can even leak information over the powerlines by controlling the
frequencies and loads of otherwise inactive processors, and have
hardware level backdoors by doping the silicon slightly differently, on
an otherwise perfectly fine looking circuit unless you very carefully
look at it with a scanning tunneling microscope.

On top of this, we have our old friend entropy making our probabilistic
computers degrade over time, noisy communication channels, cosmic rays
causing bit flips, distributed denial of service attacks, governments
shutting down telecommunications infrastructure etc. If we're trying
make systems which are to work in catastrophe ridden situations and
dictatorships as well, then our systems cannot have single points of
failure, and need some degree of
https://en.wikipedia.org/wiki/Byzantine_fault_tolerance

And, if you think further about the problem of how to verify if even
just your own vote has been distributed, stored and counted correctly,
you'll eventually end up in the space of consensus algorithms and
byzantine failures.

Alternatively, if simply distributed databases and a decentralized web is
wanted, one can use https://datproject.org/ and https://www.scuttlebutt.nz/
as a gossip protocol. No blockchain needed. Can try it today with the
https://beakerbrowser.com/

Or use any database that supports sharding on cloud systems (e.g.,
MongoDB on AWS, which is what proxyfor.me is built on). That gets you
the redundancy to deal with outages, DOS attacks, hacks, etc., which
IMHO is 99% of what you need distributed systems for. Note that
"because I don't trust anyone" is not on that list...

Well, having a centralized service being capable of scaling out and
handling DDOS is one thing. Having it truly p2p and decentralized
without any single points of failure is another, further, having it work
when any kind of network is available is yet another (dat/beakerbrowser
might be one of the easiest ways to share files cross-platform on a lan
to this day).

Not trusting anyone seems like a fools game, but at least I don't trust
everyone, almost always someone can perceive incentives to exploit for
economic reasons or otherwise. And I certainly do not trust any
self-policing system or government where the citizens cannot audit both
the people who audit the executive branch and the ones who authorize
secret government decisions, wiretaps, dragnets, and other violations of
privacy; without public oversight whether the decisions are in any way
just or called for, self-censoring and classifying any proof of their
own incompetence as secret. Any deletions of evidence from the public
record needs to leave an immutable trail. Or preferably not be possible
at all, neither intentionally nor otherwise.

The threat model for something running inside a single commercial entity
or under the control of an opaque self-policing system, is quite
different from an open source p2p consensus algorithm. The Internet was
built to be decentralized and manage nuclear war, although currently,
centralized commercial entities have taken significant legal ownership
of users data and managed to achieve global network effects by
convenience, addiction, and a few other tricks, but at least I certainly
hope the p2p web will keep growing in traction and remove single points
of control/failure/censorship and am willing to work and code for it.
It's not really good enough yet, in terms of user experience for the
masses, but certainly getting closer by the day.

I hope the previous paragraph makes it clear that we need to monitor that
all data remains immutable, not only our own.

Sorry, I don't follow: Not only do you not trust the executive branch
not to change/delete the data, but you *also* don't trust your fellow
citizen to detect and report this if it happens to them?

Well, anyone running a working consensus algorithm will notice
misbehaving nodes and other byzantine failures, so, I do expect people
to detect if it happens, the system very much depends on it. Are you
suggesting people do it manually by hand when they for whatever reason
feel the urge to check if their data has been manipulated? And using
what process? Opening the website and checking their profile? How does
this say anything about how that data has been counted or how it relates
to other peoples data?

Fiat currency or quite similar systems can for sure be built on distributed
immutable log like systems as long as the participants agree on the rules.
You can do it without any proof-of-work/stake/space/elapsed-time, if a web
of trust exists among the actors taking part of the economy.
E.g. https://duniter.org/en/ implements the Universal Dividend, as specified
by the theory called Relative Money Theory.
https://duniter.org/en/theoretical/
http://vit.free.fr/TRM/en_US/

Again, I submit that if you don't trust the executive branch (and
indeed the entire banking system with all it's checks and balances) I
submit that no amount of fancy technology is going to cure what ails
you. And if you do, then the technology we use can be vastly
simplified, improving service levels, reliability, and the ability to
detect and deal with attacks on the system. Indeed one of the biggest
problems with cryptocurrencies is their very opacity: When someone
develops a way to game the system it probably won't be discovered
until the damage is so great that the entire economy might collapse as
a result.

At least, each electronic voting system that has been built by/for the
finnish government has been bug ridden and full of insane security
problems, while costing hundreds of millions of euros. I'm not sure what
technological simplifications you're suggesting to achieve reliable and
secure software engineering by/for governments. Unless it's based on
some correct-by-construction software design or otherwise proof-based on
some sound type theory, I'm not sure it'll be sufficient, and that is
certainly not a simplification, it easily makes development time 10x or
more (I have a fair share of experience from trying myself, while I was
studying for my computer science masters degree).

I don't see why we should build systems on principles with a proven
track record of serious security flaws. Why require trusting the
executive branch? As long as the necessary evidence exists in the audit
logs I don't see how trust even comes into the picture (except for the
gathering of evidence and its integrity), essentially the laws and
courts will decide if the executive branch has worked within the current
legal limits or not, the evidence and verdicts would be open to public
analysis and questioning. Anyway, byzantine failures include attacks by
malicious actors and general software/hardware/network issues, much more
than trusting the executive branch (for which I currently see no
relevance) behind the software. Besides, e.g. Duniter fundamentally
requires a web-of-trust, and essentially makes the whole network into
the executive/banking branch, in this case there is no need to limit it.
Perhaps I misunderstand what you mean with me not trusting the executive
branch, what exactly is it that it should be trusted with?

I presume you didn't look deeper into RMT, or I might not understand
what you mean with trusting the entire banking system. I sure trust it
to mostly behave according to the current laws governing money and
accounting, as in the sources and sinks of debt, interest rates and
fractional reserve currency; and the mathematical consequences of those.
Perhaps you mean if I consider it to have a good set of rules acting as
checks and balances, which I do not, the currently allowed state
changes, checks and balances guarantee several pathologies and
artificially induce unnecessary boom and bust cycles, with the central
banks essentially acting like black-holes in the economy. (for some
context, I've worked with making interactive visualizations of various
risk models and debt of banks, national banks, central banks, nations,
insurance and large corporations; for the finnish national bank,
european central bank, and some insurance companies...)

The central banks together with the closest financial institutes in the
economic network/debt graph receive an unfair advantage, by taxing the
added value from the rest of the economy, in the form of fractional
reserve interest, merely in exchange for doing the accounting of the
fiat debt money (adding no real value to the economy, only an ever
increasing cost, and, replaceable by certain p2p consensus algorithms).
But, this is completely besides the point that storing the data of an
e-democracy platform is completely possible to do efficiently with p2p
distributed append only directed acyclic graphs and consensus
algorithms. Solving any potential trust issues (reducing it to
public-private key cryptography and the factorization/discrete log
problem for now), scaling, latency, validity, integrity issues, single
points of failure etc.

Do you have any technical objections to this? Or does this boil down to
you wanting the executive branch to be trusted to run a software in a
centralized fashion? For what reasons/benefits? To be "vastly
simplified, improving service levels, reliability, and the ability to
detect and deal with attacks on the system"? What becomes simpler for
who? What service level/reliability increases? How do external users
detect a virus or backdoor in the actual running system? How does anyone
verify what source code is used? That the compiler and virtual machines
are working correctly? OS? Spectre variant 2 patches applied? What else
is running on the same hardware and network or has physical access to
it? Other side-channel attacks? How would the centralized admins
themselves even do these things? It seems to me it opens up several
classes of vulnerabilities. I'm not sure what your threat model and
security analysis method is here. But I don't see anything of substance
to back up these claimed benefits.

Regards,
Scott
_______________________________________________
Start : a mailing list of the Metagovernment project
http://www.metagovernment.org/
Manage subscription: http://metagovernment.org/mailman/listinfo/start_metagovernment.org

Mikael Sand

2018-04-17 19:00:10 UTC

Permalink

Managed to send it privately again. I need to pay more attention to
where I'm clicking ;)

-------- Forwarded Message --------
Subject: Re: [MG] Democratizing Blockchain Governance in Versioning
Date: Tue, 17 Apr 2018 21:59:06 +0300
From: Mikael Sand <***@abo.fi>
To: Scott Raney <***@gmail.com>

Oh sorry, did intend it for the list. I'll resend it, and, I guess I can
just reply to your new comments here as you asked to repost. You can
repost your mail in between as well if you wish, but I'll try to keep
your replies intact.

Did you intend to send this just to me, or to the metagovernment list
too? Let me know, and I'll repost this message to that list if
needed...
(big snip: sounds a little "conspiracy theoretical" to me...)

I understand your sentiment, unfortunately the security faults are
available in published research, I can dig up some peer-reviewed papers
and videos from IT security conferences if you're interested, but some
googling should suffice to find how to do it yourself with relative ease
(often with source code, sometimes just the rough sketch of it). The
chaos communication congress, black hat and def con conference materials
should have you taking out your tinfoil hat quite fast if you're
sensitive to conspiratorial speculation. But the technology exists, and
just accounts for the consequences of the current physical
implementations, hardware and protocol designs, and can mostly be
verified simply by thinking from first principles and the
specifications. I'm not theorizing about who might be conspiring to use
this or not, for whatever reasons, I just either include these published
facts in a specific IT security threat model, or not, depending on the
use-case, for most IT systems they're completely irrelevant. The
importance is not on if it is happening, but rather if it is
theoretically possible at all, physically possible to do within a
certain budget, and if the relevant potential actors have incentives to
spend that budget on it.

If you work with IT security you need to take these publications into
account in some of your threat models either way, they can of course
have varying levels of paranoia in their assumptions, like if you
include state actors and intelligence agencies as potential adversaries
in them it completely changes the picture. Even the top level domain
resolution of the dns system, certificate authorities and the signalling
system 7 (used to set up and route connections, phone-calls and sms
etc.) have known faults and lacks trust in this case. DNSSEC and
DNSCrypt helps somewhat, but only keys shared and verified either in
physical contact or over already secured communications channels and
webs of trust has a chance of handling that as far as I know. And, one
time padding if you need actual secrecy of course, but that won't scale
before we have a cheap source of bell states on a global quantum internet.

Initial trust needs to happen between people who know each other and
meet in real life, then a OpenPGP like web of trust can scale the
network of public keys used for signing the messages (and encrypting if
you need secrecy). Duniter has the most interesting attempt I've seen so
far, for building a web-of-trust and handling the identities and
accounting of who is still living/interacting with the economy in this
manner. The six signatures within the last 100 days and max-distance of
six might not be perfect, but have to start experimenting and measuring
the results somewhere.

The issue of network reliability is really orthogonal to the issue of
governance (i.e., we've become dependent on a functioning Internet for
far more than just the ability to run the government).

Well, if you're going to scale the system out and make it distributed to
be able to handle large loads, network partitions and ddos attacks, then
you'll end up either implementing or using a consensus algorithm of some
sort. Are you familiar with the
https://en.wikipedia.org/wiki/CAP_theorem ? In short, you can only have
two out of these three: consistency, availability and partition
tolerance. You mentioned mongodb before, which now supports running a
primary (+ secondary replicas) and providing either BASE (Basically
Available, Soft state, Eventual consistency) semantics or, starting this
summer in v4.0, multi table ACID (Atomicity, Consistency, Isolation,
Durability) semantics. What conflict resolution strategy are you
suggesting for network partitions? Lets assume AWS or a large part of it
goes down for some amount of time, is the service unavailable until the
connections recover? And the service would depend on a functioning
connection to wider Internet? And thus wouldn't work for organizing
people if e.g. the government, a coup, or a foreign military shuts down
the telecommunications infrastructure?

Not trusting anyone seems like a fools game, but at least I don't trust
everyone, almost always someone can perceive incentives to exploit for
economic reasons or otherwise.

Agreed: And if there's going to be one person you *have* to trust
(outside your close friends and family, of course), it's got to be the
manager of your local government. Without that, I submit that there
*is* no functioning local government.
(snip, more conspiracy theory stuff)

Are you suggesting people do it
manually by hand when they for whatever reason feel the urge to check if
their data has been manipulated? And using what process? Opening the website
and checking their profile? How does this say anything about how that data
has been counted or how it relates to other peoples data?

For example, in proxyfor.me you can download the complete data file
for the vote on any proposal. You can check your vote along with
everyone else's as well as ensuring that everything adds up correctly.
And anyone who can use a spreadsheet (or even a text editor) can do
this. Nothing fancy required, no trust required other than that the
Local Manager has verified that the screen names correspond to real
people. And again, if you don't trust at least that executive, there
*is* no functioning government so this is simply not a problem. And
Voting rolls are public so we already *know* how many votes there
should be...

So, lets assume you've downloaded the archive once, and do it again to
check if any of the old data has changed, and you notice a chunk of it
missing or modified? What now? How do we find the culprit? Was it the
manager? Some IT admin? A bug? An attack?
How much traffic can it handle for this?

Lets say, a majority of users (or relatively large number) would decide
to download the entire archive once or more per day, is this cost
effective? and simple? Perhaps from some perspective, but I don't see
how the auditing would be done reliably without making the normal
verifying users essentially like a ddos attack once the systems gets
large amounts of users. And besides, downloading and checking the data
would only detect the issue, not say what caused it, nor resolve it
automatically using an algorithm made for distributed systems.

Lets say each user needs to generate a private and public key, and sign
their votes/actions/data whenever they add/change something in the
service and include a reference to what was the latest version of the
state, persist this in an event store, and calculate a checksum from the
checksum in the previous last event and the contents of the entire new
event (like git). Then we know it was someone in possession of the
private key corresponding to the public key of the user who
created/caused the change. At this stage you would already have what
amounts to a directed acyclic graph. Can very well be stored in mongodb,
or essentially any other persistence layer. Then if you just add a
consensus algorithm (based e.g. on vector clocks, matrix clocks,
interval tree clocks, or general causal trees), you can make it into a
distributed system which can handle availability, and using something
like latest vote/write wins you can handle conflicts on a per
user/private key event log basis to get eventual consistency (and using
CRDT and/or OT for real-time collaborative data), thus working in p2p
fashion in any network conditions (even highly unreliable and
intermittent ones).

Hmm, you intend a local manager to verify the identity of all the people
using the system? This seems like quite another bottleneck. What would
be the process for verifying the identity? Is the local manager the only
one who knows what screen name corresponds to what living person? Or
what's the auditing process here? What happens when we notice our dead
neighbor adding new votes to the data a few months after them passing away?

Well getting rid of the ridiculous "secret ballot" requirement is a
big one. It's not necessary and trying to impose it just makes the
system non-verifiable. Which perhaps not coincidentally is exactly the
same problem any blockchain-based system has: If the average person
can't go in and validate the vote count, IMHO the system can never be
trusted.

Secret ballots in paper-less electronic voting are inherently
incompatible with verify-ability of either the tally or one person one
vote. It makes sense in paper-trail voting, which is required for any
vote-buying/coercion sensitive topics and decisions. But as far as I
understand now, any kind of public internet voting is only suitable for
completely open data. I didn't actually mention secret ballots so far,
and I'm not sure why you're bringing it up.
A private group (already knowing each others public keys) can create
issue specific keys shared within the group, and use symmetric
cryptography to vote in secret from the public on a public ledger, while
maintaining immutability and the potential to audit the decision history
later on. But, this is more relevant to e.g. a security conscious boards
of directors or some specific interests groups, and will probably be
kept in private "block-chains" or using linked timestamping anyway,
which is nothing new. Calculating signatures and checksums for data
integrity checking has been implemented many times over in e.g. all
kinds of military and banking databases, bank-to-bank communications and
others considering similar threat vectors etc. long before bitcoin came.
Inter-bank comms tend to use some of the best key distribution
mechanisms money can buy, then again, many consumer facing internet
banking apps / checkout flows have mostly crap security and very basic
flaws like no escape character handling together with user modifiable
content and hashes, these fall under the "don't give a shit" policy of
banks, insurance covers it, and mostly a consumer client risk factor.
And, block-chain is mostly just a slightly catchier word that happened
to reach the mainstream because of the popularity and hype around
bitcoin and other crypto-currencies.

Unless it's based on some
correct-by-construction software design or otherwise proof-based on some
sound type theory, I'm not sure it'll be sufficient, and that is certainly
not a simplification, it easily makes development time 10x or more (I have a
fair share of experience from trying myself, while I was studying for my
computer science masters degree).

All software has bugs. The goal is not to produce perfect code, just
good enough code such that it does the job *and* having a proper
design such that you can at least *detect* when a bug/hack/loss has
occurred. Finland's system, along with all blockchain-based systems,
fail to deliver the latter, whereas proxyfor.me delivers it easily.

I'm still confused as to what the auditing process would be in
proxyfor.me? What is the conflict resolution method once someone claims
they have an older backup including data which is missing and/or some
which is fabricated in the current db? What if there are more than one
actor claiming this? With mutually conflicting claims?
How much experience in software development and running distributed
system in production do you have? I'm very interested in seeing your
design and how it easily verifies the good enough correctness, and its
ability to detect bugs, hacks and losses. I'm quite skeptical you've
done any of this yet though, without actually recreating what the field
of computer scientists and cryptographers have made for these specific
purposes. Perhaps there might be some technical jargon I've used which
deserves clarification, please ask if I've used some terms too
ambiguously or in an unclear way, I'm doing my best to express this as
comprehensibly as I can, while I completely understand if it seems very
confusing.

I sure trust it to mostly
behave according to the current laws governing money and accounting, as in
the sources and sinks of debt, interest rates and fractional reserve
currency; and the mathematical consequences of those.

And that's all I'm assuming you should have to do. Or do with the
voting system (i.e., you should have the source code and all the raw
input and output data and so can verify its operation).
I'm certainly not expecting anyone to trust the *political* components
of monetary policy, which I agree with you is as screwed up as the
rest of government (misrepresentative democracy).

Hmm, well the source code and a sample of the dataset is not really
enough to ensure the correct functioning of a distributed system. Among
other things, it doesn't tell if its refusing to respond to some users,
or if it only shows what the person attempting to verify the data wants
to see about their own data, they would still have to check with others
to see if what they see as others data is actually the same as what the
others see, and establish some common knowledge, about what they know,
what others know, who knows that who knows what transitively, and so on.
I'd much rather have the algorithm establish this rather than do it
manually. If the specification is a protocol, you don't have to trust
the specific implementations, as long as the protocol constrains the
possible interactions correctly, you can use any specification compliant
client to interact with the network and verify what parts of the data
has already reached consensus in what parts of the network.

Do you have any technical objections to this? Or does this boil down to you
wanting the executive branch to be trusted to run a software in a
centralized fashion? For what reasons/benefits? To be "vastly simplified,
improving service levels, reliability, and the ability to detect and deal
with attacks on the system"?

Exactly.

What becomes simpler for who? What service level/reliability increases?

Simpler to understand, simpler implement, simpler to maintain, even in
the face of attach (AWS, for example, has *vastly* more capability in
dealing with this kind of thing than *any* government has, or could
ever hope to have).

Too simple in fact, at least to be capable of working as a verifiable
distributed system; unless you intend to have a single primary, in which
case you don't have an actual distributed system with availability, just
some extra scaling out for more and faster reads when network conditions
are good; or unless you use something like Lamport timestamps/Vector
clocks/Matrix clocks/Version vectors/Interval Tree Clocks etc to record
the partial ordering of events and capturing the chronological and
causal relationship in a distributed manner; and, verification of
authenticity, unless you add signatures; and, verification of integrity,
unless you add something like merkle trees. What parts of the process
confuses you? Or what part is not simple enough? Or what parts of these
do you think you're achieving without doing the necessary work?
Essentially a relatively simple interface can describe the entire api
surface needed in a thin-client, the signature and checksum algorithm
choices have acceptable reasoning behind them, otherwise it's a p2p
distributed database like almost any other, except that it actually
works in an existing browser with self-hosting and authoring capability,
like Tim Berners-Lee intended the web to be.

How do external users detect a virus or
backdoor in the actual running system? How does anyone verify what source
code is used?

You can't, but that's true of blockchain or any other distributed
system. Who do you trust more, one executive that *we* voted into
office and can check up on or dozens or even hundreds of unknown
servers run by who knows who?

Well, with a DAG, signature and checksum checking, you can use a
consensus algorithm to agree on what data should exist and it doesn't
matter as long as less than the majority of the network is infected, and
a web-of-trust based system can even handle that to a large degree. I
certainly trust cryptography and the very small likelihood that someone
would posses/deduce/brute-force all the private keys, much more, than
any centralized IT-system without either the required data nor
algorithms used for sufficient integrity checking and conflict resolution.

That the compiler and virtual machines are working correctly?
OS? Spectre variant 2 patches applied? What else is running on the same
hardware and network or has physical access to it? Other side-channel
attacks? How would the centralized admins themselves even do these things?

Again, it's got to be open voting so all this secrecy stuff just goes
away (who cares if another process on the same CPU can read the voting
systems' memory? It's all open and public anyway!)

Well, it's not about someone being able to read it, everybody being able
to read it is actually a requirement rather than something to avoid, a
DAG would also have all the data openly readable (unless you write some
encrypted data instead of actual text in a comment field or something,
or if the protocol allows storing arbitrary data then anything can be
stored of course), and anyone wanting to keep their local copy of the
database up-to-date is able to, and can vote while off-line/grid and
distribute the results once they have a connection again, can even have
actual sneakernet as courier of votes/data to/from remote places.
But rather, it's about the potential for backdoors and user specific
massaging of what it responds with, and what is actually used in other
calculations/responses. Simply put, the users need to be able to reach
consensus about the data in the service, otherwise data can
disappear/change without a trace of who/what caused it or any way to
resolve the conflict, thus deserving criticism and eroding trust in the
system before it even gets started.

It seems to me it opens up several classes of vulnerabilities. I'm not sure
what your threat model and security analysis method is here. But I don't see
anything of substance to back up these claimed benefits.

Only if you think like a secrecy-obsessed paranoid. If it's all open
nearly all of your reservations simply disappear, and the rest can
easily be dealt with by using the best commercially-available services
rather than trusting JimBob's basement server or those run by the
Russians like the blockchain proponents would propose.
Regards,
Scott

Well, you might have confused me for someone else, I haven't advocated
for secrecy in any public internet voting system or any other kind of
e-democracy platform, (perhaps many years ago for privacy sensitive
topics, but not since actually looking into building them and reading
some of the security analysis around them). The p2p web is essentially
the opposite to secrecy. And my concerns are mostly about data integrity
and authenticity, to make it good enough such that the security minded
people would be happier with proxyfor.me and to make it feasible at all
to audit what is going on. I'm not sure how you could have confused any
of my reservations being related to secrecy. The only relation to or
concern about secrecy is perhaps of someone in the executive branch or
an external malicious actor attempting to delete/change/add data in secret.

I'm not sure what you consider "the best commercially-available", but
the https://en.wikipedia.org/wiki/Merkle_tree seems almost universally
common. Not sure what brand of it you would consider simple enough. But
if the best commercial options suffice, it seems it's ok to use it as
long as we don't call it a blockchain, or what counts as simple in
distributed systems really? ;) I think you might have confused what I'm
suggesting with nakamoto consensus, which I don't think is a good fit
for this. Dat, ipfs and (secure scuttlebutt) ssb are perhaps the most
evolved forms of merkle trees for the p2p web, even firefox allows
supporting dat:// ipfs:// and ssb:// urls now:
https://blog.mozilla.org/addons/2018/01/26/extensions-firefox-59/ and
DNS integration will keep getting better.

Consensus and distributed systems sure ain't very simple to reason about
that's for sure. But putting an arbitrary limit somewhere such that it's
not even possible to at least eventually resolve conflicts, or that the
system is unavailable in network partitions doesn't seem very great.
These things need to be dealt with on the protocol level, instead of
being hopeful that some lazy it-admins are top notch and keeping
everything in shape in a centralized system (especially in government
IT, where the contracts tend to go to the same old friends each time, at
least in finland), when in reality they tend to be ignorant of many
security issues and they don't even admit/know about it/think it's an issue.

If you think that the dat project and the beaker browsers have bad
choices in their technological decisions compared to some commercial
alternative, I'm very interested in finding out about it and improving
the p2p web using it. But I think you might find that the commercially
available service implement these same categories of general algorithms,
and in many cases the same specific choices, to solve the issues I'm
trying to highlight here.

Sorry for the essay length ramblings, hope at least someone finds some
of it useful.

Regards,
Mikael

Ned Conner

2018-04-18 02:07:00 UTC

Permalink

Mikael, I for one am finding your remarks very useful. You are now on my
list as someone to approach for advice when my project reaches the point
of having to deal with these sorts of issues. (No good essay goes
unpunished ... :-D)

Post by Mikael Sand
Managed to send it privately again. I need to pay more attention to
where I'm clicking ;)
-------- Forwarded Message --------
Subject: Re: [MG] Democratizing Blockchain Governance in Versioning
Date: Tue, 17 Apr 2018 21:59:06 +0300
Oh sorry, did intend it for the list. I'll resend it, and, I guess I
can just reply to your new comments here as you asked to repost. You
can repost your mail in between as well if you wish, but I'll try to
keep your replies intact.

I understand your sentiment, unfortunately the security faults are
available in published research, I can dig up some peer-reviewed
papers and videos from IT security conferences if you're interested,
but some googling should suffice to find how to do it yourself with
relative ease (often with source code, sometimes just the rough sketch
of it). The chaos communication congress, black hat and def con
conference materials should have you taking out your tinfoil hat quite
fast if you're sensitive to conspiratorial speculation. But the
technology exists, and just accounts for the consequences of the
current physical implementations, hardware and protocol designs, and
can mostly be verified simply by thinking from first principles and
the specifications. I'm not theorizing about who might be conspiring
to use this or not, for whatever reasons, I just either include these
published facts in a specific IT security threat model, or not,
depending on the use-case, for most IT systems they're completely
irrelevant. The importance is not on if it is happening, but rather if
it is theoretically possible at all, physically possible to do within
a certain budget, and if the relevant potential actors have incentives
to spend that budget on it.
If you work with IT security you need to take these publications into
account in some of your threat models either way, they can of course
have varying levels of paranoia in their assumptions, like if you
include state actors and intelligence agencies as potential
adversaries in them it completely changes the picture. Even the top
level domain resolution of the dns system, certificate authorities and
the signalling system 7 (used to set up and route connections,
phone-calls and sms etc.) have known faults and lacks trust in this
case. DNSSEC and DNSCrypt helps somewhat, but only keys shared and
verified either in physical contact or over already secured
communications channels and webs of trust has a chance of handling
that as far as I know. And, one time padding if you need actual
secrecy of course, but that won't scale before we have a cheap source
of bell states on a global quantum internet.
Initial trust needs to happen between people who know each other and
meet in real life, then a OpenPGP like web of trust can scale the
network of public keys used for signing the messages (and encrypting
if you need secrecy). Duniter has the most interesting attempt I've
seen so far, for building a web-of-trust and handling the identities
and accounting of who is still living/interacting with the economy in
this manner. The six signatures within the last 100 days and
max-distance of six might not be perfect, but have to start
experimenting and measuring the results somewhere.

The issue of network reliability is really orthogonal to the issue of
governance (i.e., we've become dependent on a functioning Internet for
far more than just the ability to run the government).

Well, if you're going to scale the system out and make it distributed
to be able to handle large loads, network partitions and ddos attacks,
then you'll end up either implementing or using a consensus algorithm
of some sort. Are you familiar with the
https://en.wikipedia.org/wiki/CAP_theorem ? In short, you can only
have two out of these three: consistency, availability and partition
tolerance. You mentioned mongodb before, which now supports running a
primary (+ secondary replicas) and providing either BASE (Basically
Available, Soft state, Eventual consistency) semantics or, starting
this summer in v4.0, multi table ACID (Atomicity, Consistency,
Isolation, Durability) semantics. What conflict resolution strategy
are you suggesting for network partitions? Lets assume AWS or a large
part of it goes down for some amount of time, is the service
unavailable until the connections recover? And the service would
depend on a functioning connection to wider Internet? And thus
wouldn't work for organizing people if e.g. the government, a coup, or
a foreign military shuts down the telecommunications infrastructure?

Not trusting anyone seems like a fools game, but at least I don't trust
everyone, almost always someone can perceive incentives to exploit for
economic reasons or otherwise.

So, lets assume you've downloaded the archive once, and do it again to
check if any of the old data has changed, and you notice a chunk of it
missing or modified? What now? How do we find the culprit? Was it the
manager? Some IT admin? A bug? An attack?
How much traffic can it handle for this?
Lets say, a majority of users (or relatively large number) would
decide to download the entire archive once or more per day, is this
cost effective? and simple? Perhaps from some perspective, but I don't
see how the auditing would be done reliably without making the normal
verifying users essentially like a ddos attack once the systems gets
large amounts of users. And besides, downloading and checking the data
would only detect the issue, not say what caused it, nor resolve it
automatically using an algorithm made for distributed systems.
Lets say each user needs to generate a private and public key, and
sign their votes/actions/data whenever they add/change something in
the service and include a reference to what was the latest version of
the state, persist this in an event store, and calculate a checksum
from the checksum in the previous last event and the contents of the
entire new event (like git). Then we know it was someone in possession
of the private key corresponding to the public key of the user who
created/caused the change. At this stage you would already have what
amounts to a directed acyclic graph. Can very well be stored in
mongodb, or essentially any other persistence layer. Then if you just
add a consensus algorithm (based e.g. on vector clocks, matrix clocks,
interval tree clocks, or general causal trees), you can make it into a
distributed system which can handle availability, and using something
like latest vote/write wins you can handle conflicts on a per
user/private key event log basis to get eventual consistency (and
using CRDT and/or OT for real-time collaborative data), thus working
in p2p fashion in any network conditions (even highly unreliable and
intermittent ones).
Hmm, you intend a local manager to verify the identity of all the
people using the system? This seems like quite another bottleneck.
What would be the process for verifying the identity? Is the local
manager the only one who knows what screen name corresponds to what
living person? Or what's the auditing process here? What happens when
we notice our dead neighbor adding new votes to the data a few months
after them passing away?

Secret ballots in paper-less electronic voting are inherently
incompatible with verify-ability of either the tally or one person one
vote. It makes sense in paper-trail voting, which is required for any
vote-buying/coercion sensitive topics and decisions. But as far as I
understand now, any kind of public internet voting is only suitable
for completely open data. I didn't actually mention secret ballots so
far, and I'm not sure why you're bringing it up.
A private group (already knowing each others public keys) can create
issue specific keys shared within the group, and use symmetric
cryptography to vote in secret from the public on a public ledger,
while maintaining immutability and the potential to audit the decision
history later on. But, this is more relevant to e.g. a security
conscious boards of directors or some specific interests groups, and
will probably be kept in private "block-chains" or using linked
timestamping anyway, which is nothing new. Calculating signatures and
checksums for data integrity checking has been implemented many times
over in e.g. all kinds of military and banking databases, bank-to-bank
communications and others considering similar threat vectors etc. long
before bitcoin came. Inter-bank comms tend to use some of the best key
distribution mechanisms money can buy, then again, many consumer
facing internet banking apps / checkout flows have mostly crap
security and very basic flaws like no escape character handling
together with user modifiable content and hashes, these fall under the
"don't give a shit" policy of banks, insurance covers it, and mostly a
consumer client risk factor. And, block-chain is mostly just a
slightly catchier word that happened to reach the mainstream because
of the popularity and hype around bitcoin and other crypto-currencies.

I'm still confused as to what the auditing process would be in
proxyfor.me? What is the conflict resolution method once someone
claims they have an older backup including data which is missing
and/or some which is fabricated in the current db? What if there are
more than one actor claiming this? With mutually conflicting claims?
How much experience in software development and running distributed
system in production do you have? I'm very interested in seeing your
design and how it easily verifies the good enough correctness, and its
ability to detect bugs, hacks and losses. I'm quite skeptical you've
done any of this yet though, without actually recreating what the
field of computer scientists and cryptographers have made for these
specific purposes. Perhaps there might be some technical jargon I've
used which deserves clarification, please ask if I've used some terms
too ambiguously or in an unclear way, I'm doing my best to express
this as comprehensibly as I can, while I completely understand if it
seems very confusing.

Hmm, well the source code and a sample of the dataset is not really
enough to ensure the correct functioning of a distributed system.
Among other things, it doesn't tell if its refusing to respond to some
users, or if it only shows what the person attempting to verify the
data wants to see about their own data, they would still have to check
with others to see if what they see as others data is actually the
same as what the others see, and establish some common knowledge,
about what they know, what others know, who knows that who knows what
transitively, and so on. I'd much rather have the algorithm establish
this rather than do it manually. If the specification is a protocol,
you don't have to trust the specific implementations, as long as the
protocol constrains the possible interactions correctly, you can use
any specification compliant client to interact with the network and
verify what parts of the data has already reached consensus in what
parts of the network.

Exactly.

What becomes simpler for who? What service level/reliability increases?

Too simple in fact, at least to be capable of working as a verifiable
distributed system; unless you intend to have a single primary, in
which case you don't have an actual distributed system with
availability, just some extra scaling out for more and faster reads
when network conditions are good; or unless you use something like
Lamport timestamps/Vector clocks/Matrix clocks/Version
vectors/Interval Tree Clocks etc to record the partial ordering of
events and capturing the chronological and causal relationship in a
distributed manner; and, verification of authenticity, unless you add
signatures; and, verification of integrity, unless you add something
like merkle trees. What parts of the process confuses you? Or what
part is not simple enough? Or what parts of these do you think you're
achieving without doing the necessary work? Essentially a relatively
simple interface can describe the entire api surface needed in a
thin-client, the signature and checksum algorithm choices have
acceptable reasoning behind them, otherwise it's a p2p distributed
database like almost any other, except that it actually works in an
existing browser with self-hosting and authoring capability, like Tim
Berners-Lee intended the web to be.

How do external users detect a virus or
backdoor in the actual running system? How does anyone verify what source
code is used?

Well, with a DAG, signature and checksum checking, you can use a
consensus algorithm to agree on what data should exist and it doesn't
matter as long as less than the majority of the network is infected,
and a web-of-trust based system can even handle that to a large
degree. I certainly trust cryptography and the very small likelihood
that someone would posses/deduce/brute-force all the private keys,
much more, than any centralized IT-system without either the required
data nor algorithms used for sufficient integrity checking and
conflict resolution.

Again, it's got to be open voting so all this secrecy stuff just goes
away (who cares if another process on the same CPU can read the voting
systems' memory? It's all open and public anyway!)

Well, it's not about someone being able to read it, everybody being
able to read it is actually a requirement rather than something to
avoid, a DAG would also have all the data openly readable (unless you
write some encrypted data instead of actual text in a comment field or
something, or if the protocol allows storing arbitrary data then
anything can be stored of course), and anyone wanting to keep their
local copy of the database up-to-date is able to, and can vote while
off-line/grid and distribute the results once they have a connection
again, can even have actual sneakernet as courier of votes/data
to/from remote places.
But rather, it's about the potential for backdoors and user specific
massaging of what it responds with, and what is actually used in other
calculations/responses. Simply put, the users need to be able to reach
consensus about the data in the service, otherwise data can
disappear/change without a trace of who/what caused it or any way to
resolve the conflict, thus deserving criticism and eroding trust in
the system before it even gets started.

Well, you might have confused me for someone else, I haven't advocated
for secrecy in any public internet voting system or any other kind of
e-democracy platform, (perhaps many years ago for privacy sensitive
topics, but not since actually looking into building them and reading
some of the security analysis around them). The p2p web is essentially
the opposite to secrecy. And my concerns are mostly about data
integrity and authenticity, to make it good enough such that the
security minded people would be happier with proxyfor.me and to make
it feasible at all to audit what is going on. I'm not sure how you
could have confused any of my reservations being related to secrecy.
The only relation to or concern about secrecy is perhaps of someone in
the executive branch or an external malicious actor attempting to
delete/change/add data in secret.
I'm not sure what you consider "the best commercially-available", but
the https://en.wikipedia.org/wiki/Merkle_tree seems almost universally
common. Not sure what brand of it you would consider simple enough.
But if the best commercial options suffice, it seems it's ok to use it
as long as we don't call it a blockchain, or what counts as simple in
distributed systems really? ;) I think you might have confused what
I'm suggesting with nakamoto consensus, which I don't think is a good
fit for this. Dat, ipfs and (secure scuttlebutt) ssb are perhaps the
most evolved forms of merkle trees for the p2p web, even firefox
https://blog.mozilla.org/addons/2018/01/26/extensions-firefox-59/ and
DNS integration will keep getting better.
Consensus and distributed systems sure ain't very simple to reason
about that's for sure. But putting an arbitrary limit somewhere such
that it's not even possible to at least eventually resolve conflicts,
or that the system is unavailable in network partitions doesn't seem
very great. These things need to be dealt with on the protocol level,
instead of being hopeful that some lazy it-admins are top notch and
keeping everything in shape in a centralized system (especially in
government IT, where the contracts tend to go to the same old friends
each time, at least in finland), when in reality they tend to be
ignorant of many security issues and they don't even admit/know about
it/think it's an issue.
If you think that the dat project and the beaker browsers have bad
choices in their technological decisions compared to some commercial
alternative, I'm very interested in finding out about it and improving
the p2p web using it. But I think you might find that the commercially
available service implement these same categories of general
algorithms, and in many cases the same specific choices, to solve the
issues I'm trying to highlight here.
Sorry for the essay length ramblings, hope at least someone finds some
of it useful.
Regards,
Mikael