container: add readme, remove extra formats, remove go-ipld-cbor dependency

This commit is contained in:
Michael Muré
2024-10-07 18:46:19 +02:00
parent 60922ced96
commit 40639b6715
13 changed files with 191 additions and 167 deletions

86
pkg/container/Readme.md Normal file
View File

@@ -0,0 +1,86 @@
# Token container
## Why do I need that?
Some common situation asks to package multiple tokens together:
- calling a service requires sending an invocation, alongside the matching delegations
- sending a series of revocations
- \<insert your application specific scenario here>
The UCAN specification defines how a single token is serialized (envelope with signature, IPLD encoded as Dag-cbor), but it's entirely left open how to package multiple tokens together. To be clear, this is a correct thing to do for a specification, as different ways equally valid to solve that problem exists and can coexist. Any wire format holding a list of bytes would do (cbor, json, csv ...).
**go-ucan** however, provide an opinionated implementation, which may or may not work in your situation.
Some experiment has been done over which format is appropriate, and two have been selected:
- **DAG-CBOR** of a list of bytes, as a low overhead option
- **CAR** file, as a somewhat common ways to cary arbitrary blocks of data
Notably, **compression is not included**, even though it does work reasonably well. This is because your transport medium might already do it, or should.
## Wire format consideration
Several possible formats have been explored:
- CAR files (binary or base64)
- DAG-CBOR (binary or base64)
Additionally, gzip and deflate compression has been experimented with.
Below are the results in terms of storage used, as percentage and byte overhead over the raw tokens:
| Token count | car | carBase64 | carGzip | carGzipBase64 | cbor | cborBase64 | cborGzip | cborGzipBase64 | cborFlate | cborFlateBase64 |
|-------------|-----|-----------|---------|---------------|------|------------|----------|----------------|-----------|-----------------|
| 1 | 15 | 54 | 7 | 42 | 0 | 35 | \-8 | 22 | \-12 | 16 |
| 2 | 12 | 49 | \-12 | 15 | 0 | 34 | \-25 | 0 | \-28 | \-3 |
| 3 | 11 | 48 | \-21 | 4 | 0 | 34 | \-32 | \-10 | \-34 | \-11 |
| 4 | 10 | 47 | \-26 | \-1 | 0 | 34 | \-36 | \-15 | \-37 | \-17 |
| 5 | 10 | 47 | \-28 | \-4 | 0 | 34 | \-38 | \-18 | \-40 | \-20 |
| 6 | 10 | 47 | \-30 | \-7 | 0 | 34 | \-40 | \-20 | \-40 | \-20 |
| 7 | 10 | 46 | \-31 | \-8 | 0 | 34 | \-41 | \-21 | \-42 | \-22 |
| 8 | 9 | 46 | \-32 | \-10 | 0 | 34 | \-42 | \-22 | \-42 | \-23 |
| 9 | 9 | 46 | \-33 | \-11 | 0 | 34 | \-43 | \-23 | \-43 | \-24 |
| 10 | 9 | 46 | \-34 | \-12 | 0 | 34 | \-43 | \-25 | \-44 | \-25 |
![Overhead %](img/overhead_percent.png)
| Token count | car | carBase64 | carGzip | carGzipBase64 | cbor | cborBase64 | cborGzip | cborGzipBase64 | cborFlate | cborFlateBase64 |
|-------------|-----|-----------|---------|---------------|------|------------|----------|----------------|-----------|-----------------|
| 1 | 64 | 226 | 29 | 178 | 4 | 146 | \-35 | 94 | \-52 | 70 |
| 2 | 102 | 412 | \-107 | 128 | 7 | 288 | \-211 | 0 | \-234 | \-32 |
| 3 | 140 | 602 | \-270 | 58 | 10 | 430 | \-405 | \-126 | \-429 | \-146 |
| 4 | 178 | 792 | \-432 | \-28 | 13 | 572 | \-602 | \-252 | \-617 | \-288 |
| 5 | 216 | 978 | \-582 | \-94 | 16 | 714 | \-805 | \-386 | \-839 | \-418 |
| 6 | 254 | 1168 | \-759 | \-176 | 19 | 856 | \-1001 | \-508 | \-1018 | \-520 |
| 7 | 292 | 1358 | \-908 | \-246 | 22 | 998 | \-1204 | \-634 | \-1229 | \-650 |
| 8 | 330 | 1544 | \-1085 | \-332 | 25 | 1140 | \-1398 | \-756 | \-1423 | \-792 |
| 9 | 368 | 1734 | \-1257 | \-414 | 28 | 1282 | \-1614 | \-894 | \-1625 | \-930 |
| 10 | 406 | 1924 | \-1408 | \-508 | 31 | 1424 | \-1804 | \-1040 | \-1826 | \-1060 |
![img.png](img/overhead_bytes.png)
Following is the performance aspect, with CPU usage and memory allocation:
| | Write ns/op | Read ns/op | Write B/op | Read B/op | Write allocs/op | Read allocs/op |
|-----------------|-------------|------------|------------|-----------|-----------------|----------------|
| car | 8451 | 1474630 | 17928 | 149437 | 59 | 2631 |
| carBase64 | 16750 | 1437678 | 24232 | 151502 | 61 | 2633 |
| carGzip | 320253 | 1581412 | 823887 | 192272 | 76 | 2665 |
| carGzipBase64 | 343305 | 1486269 | 828782 | 198543 | 77 | 2669 |
| cbor | 6419 | 1301554 | 16368 | 138891 | 25 | 2534 |
| cborBase64 | 12860 | 1386728 | 20720 | 140962 | 26 | 2536 |
| cborGzip | 310106 | 1379146 | 822742 | 182003 | 42 | 2585 |
| cborGzipBase64 | 317001 | 1462548 | 827640 | 189283 | 43 | 2594 |
| cborFlate | 327112 | 1555007 | 822473 | 181537 | 40 | 2591 |
| cborFlateBase64 | 311276 | 1456562 | 826042 | 188665 | 41 | 2596 |
(BEWARE: logarithmic scale)
![img.png](img/cpu.png)
![img_1.png](img/alloc_byte.png)
![img_2.png](img/alloc_count.png)
Conclusion:
- CAR files are heavy for this usage, notably because they carry the CIDs of the tokens
- compression works quite well and warrants its usage even with a single token
- DAG-CBOR outperform CAR files everywhere, and comes with a tiny ~3 bytes per token overhead.
**Formats beside DAG-CBOR and CAR, with or without base64, have been removed. They are in the git history though.**

View File

@@ -9,7 +9,12 @@ import (
"iter"
"github.com/ipfs/go-cid"
cbor "github.com/ipfs/go-ipld-cbor"
"github.com/ipld/go-ipld-prime"
"github.com/ipld/go-ipld-prime/codec/dagcbor"
"github.com/ipld/go-ipld-prime/datamodel"
"github.com/ipld/go-ipld-prime/fluent/qp"
cidlink "github.com/ipld/go-ipld-prime/linking/cid"
"github.com/ipld/go-ipld-prime/node/basicnode"
)
/*
@@ -40,7 +45,7 @@ func writeCar(w io.Writer, roots []cid.Cid, blocks iter.Seq[carBlock]) error {
Roots: roots,
Version: 1,
}
hb, err := cbor.DumpObject(h)
hb, err := h.Write()
if err != nil {
return err
}
@@ -67,11 +72,10 @@ func readCar(r io.Reader) (roots []cid.Cid, blocks iter.Seq2[carBlock, error], e
if err != nil {
return nil, nil, err
}
var h carHeader
if err := cbor.DecodeInto(hb, &h); err != nil {
return nil, nil, fmt.Errorf("invalid header: %v", err)
h, err := readHeader(hb)
if err != nil {
return nil, nil, err
}
if h.Version != 1 {
return nil, nil, fmt.Errorf("invalid car version: %d", h.Version)
}
@@ -183,6 +187,67 @@ type carHeader struct {
Version uint64
}
func init() {
cbor.RegisterCborType(carHeader{})
const rootsKey = "roots"
const versionKey = "version"
func readHeader(data []byte) (*carHeader, error) {
var header carHeader
nd, err := ipld.Decode(data, dagcbor.Decode)
if err != nil {
return nil, err
}
if nd.Length() != 2 {
return nil, fmt.Errorf("malformed car header")
}
rootsNd, err := nd.LookupByString(rootsKey)
if err != nil {
return nil, fmt.Errorf("malformed car header")
}
it := rootsNd.ListIterator()
if it == nil {
return nil, fmt.Errorf("malformed car header")
}
header.Roots = make([]cid.Cid, 0, rootsNd.Length())
for !it.Done() {
_, nd, err := it.Next()
if err != nil {
return nil, err
}
lk, err := nd.AsLink()
if err != nil {
return nil, err
}
switch lk := lk.(type) {
case cidlink.Link:
header.Roots = append(header.Roots, lk.Cid)
default:
return nil, fmt.Errorf("malformed car header")
}
}
versionNd, err := nd.LookupByString(versionKey)
if err != nil {
return nil, fmt.Errorf("malformed car header")
}
version, err := versionNd.AsInt()
if err != nil {
return nil, fmt.Errorf("malformed car header")
}
header.Version = uint64(version)
return &header, nil
}
func (ch *carHeader) Write() ([]byte, error) {
nd, err := qp.BuildMap(basicnode.Prototype.Any, 2, func(ma datamodel.MapAssembler) {
qp.MapEntry(ma, rootsKey, qp.List(int64(len(ch.Roots)), func(la datamodel.ListAssembler) {
for _, root := range ch.Roots {
qp.ListEntry(la, qp.Link(cidlink.Link{Cid: root}))
}
}))
qp.MapEntry(ma, versionKey, qp.Int(1))
})
if err != nil {
return nil, err
}
return ipld.Encode(nd, dagcbor.Encode)
}

View File

@@ -38,3 +38,15 @@ func TestCarRoundTrip(t *testing.T) {
// Bytes equal after the round-trip
require.Equal(t, original, buf.Bytes())
}
func FuzzCarRead(f *testing.F) {
example, err := os.ReadFile("testdata/sample-v1.car")
require.NoError(f, err)
f.Add(example)
f.Fuzz(func(t *testing.T, data []byte) {
_, _, _ = readCar(bytes.NewReader(data))
// only looking for panics
})
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

BIN
pkg/container/img/cpu.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

View File

@@ -1,14 +1,11 @@
package container
import (
"compress/flate"
"compress/gzip"
"encoding/base64"
"fmt"
"io"
"github.com/ipfs/go-cid"
cbor "github.com/ipfs/go-ipld-cbor"
"github.com/ipld/go-ipld-prime"
"github.com/ipld/go-ipld-prime/codec/dagcbor"
"github.com/ipld/go-ipld-prime/datamodel"
@@ -20,8 +17,11 @@ import (
var ErrNotFound = fmt.Errorf("not found")
// Reader is a token container reader. It exposes the tokens conveniently decoded.
type Reader map[cid.Cid]token.Token
// GetToken returns an arbitrary decoded token, from its CID.
// If not found, ErrNotFound is returned.
func (ctn Reader) GetToken(cid cid.Cid) (token.Token, error) {
tkn, ok := ctn[cid]
if !ok {
@@ -30,6 +30,7 @@ func (ctn Reader) GetToken(cid cid.Cid) (token.Token, error) {
return tkn, nil
}
// GetDelegation is the same as GetToken but only return a delegation.Token, with the right type.
func (ctn Reader) GetDelegation(cid cid.Cid) (*delegation.Token, error) {
tkn, err := ctn.GetToken(cid)
if err != nil {
@@ -41,6 +42,8 @@ func (ctn Reader) GetDelegation(cid cid.Cid) (*delegation.Token, error) {
return nil, fmt.Errorf("not a delegation token")
}
// GetInvocation returns the first found invocation.Token.
// If none are found, ErrNotFound is returned.
func (ctn Reader) GetInvocation() (*invocation.Token, error) {
for _, t := range ctn {
if inv, ok := t.(*invocation.Token); ok {
@@ -76,38 +79,7 @@ func FromCarBase64(r io.Reader) (Reader, error) {
return FromCar(base64.NewDecoder(base64.StdEncoding, r))
}
func FromCarGzip(r io.Reader) (Reader, error) {
r2, err := gzip.NewReader(r)
if err != nil {
return nil, err
}
defer r2.Close()
return FromCar(r2)
}
func FromCarGzipBase64(r io.Reader) (Reader, error) {
return FromCarGzip(base64.NewDecoder(base64.StdEncoding, r))
}
func FromCbor(r io.Reader) (Reader, error) {
var raw [][]byte
err := cbor.DecodeReader(r, &raw)
if err != nil {
return nil, err
}
ctn := make(Reader, len(raw))
for _, data := range raw {
err = ctn.addToken(data)
if err != nil {
return nil, err
}
}
return ctn, nil
}
func FromCbor2(r io.Reader) (Reader, error) {
n, err := ipld.DecodeStreaming(r, dagcbor.Decode)
if err != nil {
return nil, err
@@ -140,29 +112,6 @@ func FromCborBase64(r io.Reader) (Reader, error) {
return FromCbor(base64.NewDecoder(base64.StdEncoding, r))
}
func FromCborGzip(r io.Reader) (Reader, error) {
r2, err := gzip.NewReader(r)
if err != nil {
return nil, err
}
defer r2.Close()
return FromCbor(r2)
}
func FromCborGzipBase64(r io.Reader) (Reader, error) {
return FromCborGzip(base64.NewDecoder(base64.StdEncoding, r))
}
func FromCborFlate(r io.Reader) (Reader, error) {
r2 := flate.NewReader(r)
defer r2.Close()
return FromCbor(r2)
}
func FromCborFlateBase64(r io.Reader) (Reader, error) {
return FromCborFlate(base64.NewDecoder(base64.StdEncoding, r))
}
func (ctn Reader) addToken(data []byte) error {
tkn, c, err := token.FromSealed(data)
if err != nil {

View File

@@ -5,6 +5,7 @@ import (
"crypto/rand"
"fmt"
"io"
"strings"
"testing"
"time"
@@ -28,15 +29,8 @@ func TestContainerRoundTrip(t *testing.T) {
}{
{"car", Writer.ToCar, FromCar},
{"carBase64", Writer.ToCarBase64, FromCarBase64},
{"carGzip", Writer.ToCarGzip, FromCarGzip},
{"carGzipBase64", Writer.ToCarGzipBase64, FromCarGzipBase64},
{"cbor", Writer.ToCbor, FromCbor},
{"cborBase64", Writer.ToCborBase64, FromCborBase64},
{"cborGzip", Writer.ToCborGzip, FromCborGzip},
{"cborGzipBase64", Writer.ToCborGzipBase64, FromCborGzipBase64},
{"cborFlate", Writer.ToCborFlate, FromCborFlate},
{"cborFlateBase64", Writer.ToCborFlateBase64, FromCborFlateBase64},
{"cbor2", Writer.ToCbor2, FromCbor2},
} {
t.Run(tc.name, func(t *testing.T) {
tokens := make(map[cid.Cid]*delegation.Token)
@@ -92,6 +86,14 @@ func TestContainerRoundTrip(t *testing.T) {
}
func BenchmarkContainerSerialisation(b *testing.B) {
var duration strings.Builder
var allocByte strings.Builder
var allocCount strings.Builder
for _, builder := range []strings.Builder{duration, allocByte, allocCount} {
builder.WriteString("car\tcarBase64\tcarGzip\tcarGzipBase64\tcbor\tcborBase64\tcborGzip\tcborGzipBase64\tcborFlate\tcborFlateBase64\n")
}
for _, tc := range []struct {
name string
writer func(ctn Writer, w io.Writer) error
@@ -99,15 +101,8 @@ func BenchmarkContainerSerialisation(b *testing.B) {
}{
{"car", Writer.ToCar, FromCar},
{"carBase64", Writer.ToCarBase64, FromCarBase64},
{"carGzip", Writer.ToCarGzip, FromCarGzip},
{"carGzipBase64", Writer.ToCarGzipBase64, FromCarGzipBase64},
{"cbor", Writer.ToCbor, FromCbor},
{"cborBase64", Writer.ToCborBase64, FromCborBase64},
{"cborGzip", Writer.ToCborGzip, FromCborGzip},
{"cborGzipBase64", Writer.ToCborGzipBase64, FromCborGzipBase64},
{"cborFlate", Writer.ToCborFlate, FromCborFlate},
{"cborFlateBase64", Writer.ToCborFlateBase64, FromCborFlateBase64},
{"cbor2", Writer.ToCbor2, FromCbor2},
} {
writer := NewWriter()

View File

@@ -1,13 +1,10 @@
package container
import (
"compress/flate"
"compress/gzip"
"encoding/base64"
"io"
"github.com/ipfs/go-cid"
cbor "github.com/ipfs/go-ipld-cbor"
"github.com/ipld/go-ipld-prime"
"github.com/ipld/go-ipld-prime/codec/dagcbor"
"github.com/ipld/go-ipld-prime/datamodel"
@@ -15,12 +12,16 @@ import (
"github.com/ipld/go-ipld-prime/node/basicnode"
)
// TODO: should we have a multibase to wrap the cbor? but there is no reader/write in go-multibase :-(
// Writer is a token container writer. It provides a convenient way to aggregate and serialize tokens together.
type Writer map[cid.Cid][]byte
func NewWriter() Writer {
return make(Writer)
}
// AddSealed includes a "sealed" token (serialized with a ToSealed* function) in the container.
func (ctn Writer) AddSealed(cid cid.Cid, data []byte) {
ctn[cid] = data
}
@@ -41,19 +42,7 @@ func (ctn Writer) ToCarBase64(w io.Writer) error {
return ctn.ToCar(w2)
}
func (ctn Writer) ToCarGzip(w io.Writer) error {
w2 := gzip.NewWriter(w)
defer w2.Close()
return ctn.ToCar(w2)
}
func (ctn Writer) ToCarGzipBase64(w io.Writer) error {
w2 := base64.NewEncoder(base64.StdEncoding, w)
defer w2.Close()
return ctn.ToCarGzip(w2)
}
func (ctn Writer) ToCbor2(w io.Writer) error {
func (ctn Writer) ToCbor(w io.Writer) error {
node, err := qp.BuildList(basicnode.Prototype.Any, int64(len(ctn)), func(la datamodel.ListAssembler) {
for _, bytes := range ctn {
qp.ListEntry(la, qp.Bytes(bytes))
@@ -65,43 +54,8 @@ func (ctn Writer) ToCbor2(w io.Writer) error {
return ipld.EncodeStreaming(w, node, dagcbor.Encode)
}
func (ctn Writer) ToCbor(w io.Writer) error {
raw := make([][]byte, 0, len(ctn))
for _, bytes := range ctn {
raw = append(raw, bytes)
}
return cbor.EncodeWriter(raw, w)
}
func (ctn Writer) ToCborBase64(w io.Writer) error {
w2 := base64.NewEncoder(base64.StdEncoding, w)
defer w2.Close()
return ctn.ToCbor(w2)
}
func (ctn Writer) ToCborGzip(w io.Writer) error {
w2 := gzip.NewWriter(w)
defer w2.Close()
return ctn.ToCbor(w2)
}
func (ctn Writer) ToCborGzipBase64(w io.Writer) error {
w2 := base64.NewEncoder(base64.StdEncoding, w)
defer w2.Close()
return ctn.ToCborGzip(w2)
}
func (ctn Writer) ToCborFlate(w io.Writer) error {
w2, err := flate.NewWriter(w, flate.DefaultCompression)
if err != nil {
return err
}
defer w2.Close()
return ctn.ToCbor(w2)
}
func (ctn Writer) ToCborFlateBase64(w io.Writer) error {
w2 := base64.NewEncoder(base64.StdEncoding, w)
defer w2.Close()
return ctn.ToCborFlate(w2)
}