Represents a potentially large list of independent data elements (typically 'samples' or 'examples').

A 'data example' may be a primitive, an array, a map from string keys to values, or any nested structure of these.

A Dataset represents an ordered collection of elements, together with a chain of transformations to be performed on those elements. Each transformation is a method of Dataset that returns another Dataset, so these may be chained, e.g. const processedDataset = rawDataset.filter(...).map(...).batch(...).

Data loading and transformation is done in a lazy, streaming fashion. The dataset may be iterated over multiple times; each iteration starts the data loading anew and recapitulates the transformations.

A Dataset is typically processed as a stream of unbatched examples -- i.e., its transformations are applied one example at a time. Batching produces a new Dataset where each element is a batch. Batching should usually come last in a pipeline, because data transformations are easier to express on a per-example basis than on a per-batch basis.

The following code examples are calling await dataset.forEachAsync(...) to iterate once over the entire dataset in order to print out the data.

Doc

Type Parameters

Hierarchy

Methods

  • Groups elements into batches.

    It is assumed that each of the incoming dataset elements has the same structure -- i.e. the same set of keys at each location in an object hierarchy. For each key, the resulting Dataset provides a batched element collecting all of the incoming values for that key.

    • Incoming primitives are grouped into a 1-D Tensor.
    • Incoming Tensors are grouped into a new Tensor where the 0th axis is the batch dimension.
    • Incoming arrays are converted to Tensor and then batched.
    • A nested array is interpreted as an n-D Tensor, so the batched result has n+1 dimensions.
    • An array that cannot be converted to Tensor produces an error.

    If an array should not be batched as a unit, it should first be converted to an object with integer keys.

    Here are a few examples:

    Batch a dataset of numbers:

    const a = tf.data.array([1, 2, 3, 4, 5, 6, 7, 8]).batch(4);
    await a.forEachAsync(e => e.print());

    Batch a dataset of arrays:

    const b = tf.data.array([[1], [2], [3], [4], [5], [6], [7], [8]]).batch(4);
    await b.forEachAsync(e => e.print());

    Batch a dataset of objects:

    const c = tf.data.array([{a: 1, b: 11}, {a: 2, b: 12}, {a: 3, b: 13},
    {a: 4, b: 14}, {a: 5, b: 15}, {a: 6, b: 16}, {a: 7, b: 17},
    {a: 8, b: 18}]).batch(4);
    await c.forEachAsync(e => {
    console.log('{');
    for(var key in e) {
    console.log(key+':');
    e[key].print();
    }
    console.log('}');
    })

    Parameters

    • batchSize: number

      The number of elements desired per batch.

    • Optional smallLastBatch: boolean

      Whether to emit the final batch when it has fewer than batchSize elements. Default true.

    Returns Dataset<TensorContainer>

    A Dataset, from which a stream of batches can be obtained.

    Doc

  • Concatenates this Dataset with another.

    const a = tf.data.array([1, 2, 3]);
    const b = tf.data.array([4, 5, 6]);
    const c = a.concatenate(b);
    await c.forEachAsync(e => console.log(e));

    Parameters

    • dataset: Dataset<T>

      A Dataset to be concatenated onto this one.

    Returns Dataset<T>

    A Dataset.

    Doc

  • Filters this dataset according to predicate.

    const a = tf.data.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    .filter(x => x%2 === 0);
    await a.forEachAsync(e => console.log(e));

    Parameters

    • predicate: ((value) => boolean)

      A function mapping a dataset element to a boolean or a Promise for one.

        • (value): boolean
        • Parameters

          • value: T

          Returns boolean

    Returns Dataset<T>

    A Dataset of elements for which the predicate was true.

    Doc

  • Apply a function to every element of the dataset.

    After the function is applied to a dataset element, any Tensors contained within that element are disposed.

    const a = tf.data.array([1, 2, 3]);
    await a.forEachAsync(e => console.log(e));

    Parameters

    • f: ((input) => void)

      A function to apply to each dataset element.

        • (input): void
        • Parameters

          • input: T

          Returns void

    Returns Promise<void>

    A Promise that resolves after all elements have been processed.

    Doc

  • Maps this dataset through a 1-to-1 transform.

    const a = tf.data.array([1, 2, 3]).map(x => x*x);
    await a.forEachAsync(e => console.log(e));

    Type Parameters

    Parameters

    • transform: ((value) => O)

      A function mapping a dataset element to a transformed dataset element.

        • (value): O
        • Parameters

          • value: T

          Returns O

    Returns Dataset<O>

    A Dataset of transformed elements.

    Doc

  • Maps this dataset through an async 1-to-1 transform.

    const a =
    tf.data.array([1, 2, 3]).mapAsync(x => new Promise(function(resolve){
    setTimeout(() => {
    resolve(x * x);
    }, Math.random()*1000 + 500);
    }));
    console.log(await a.toArray());

    Type Parameters

    Parameters

    • transform: ((value) => Promise<O>)

      A function mapping a dataset element to a Promise for a transformed dataset element. This transform is responsible for disposing any intermediate Tensors, i.e. by wrapping its computation in tf.tidy(); that cannot be automated here (as it is in the synchronous map() case).

        • (value): Promise<O>
        • Parameters

          • value: T

          Returns Promise<O>

    Returns Dataset<O>

    A Dataset of transformed elements.

    Doc

  • Creates a Dataset that prefetches elements from this dataset.

    Parameters

    • bufferSize: number

    Returns Dataset<T>

    A Dataset.

    Doc

  • Repeats this dataset count times.

    NOTE: If this dataset is a function of global state (e.g. a random number generator), then different repetitions may produce different elements.

    const a = tf.data.array([1, 2, 3]).repeat(3);
    await a.forEachAsync(e => console.log(e));

    Parameters

    • Optional count: number

    Returns Dataset<T>

    A Dataset.

    Doc

  • Pseudorandomly shuffles the elements of this dataset. This is done in a streaming manner, by sampling from a given number of prefetched elements.

    const a = tf.data.array([1, 2, 3, 4, 5, 6]).shuffle(3);
    await a.forEachAsync(e => console.log(e));

    Parameters

    • bufferSize: number
    • Optional seed: string
    • Optional reshuffleEachIteration: boolean

    Returns Dataset<T>

    A Dataset.

    Doc

  • Creates a Dataset that skips count initial elements from this dataset.

    const a = tf.data.array([1, 2, 3, 4, 5, 6]).skip(3);
    await a.forEachAsync(e => console.log(e));

    Parameters

    • count: number

    Returns Dataset<T>

    A Dataset.

    Doc

  • Creates a Dataset with at most count initial elements from this dataset.

    const a = tf.data.array([1, 2, 3, 4, 5, 6]).take(3);
    await a.forEachAsync(e => console.log(e));

    Parameters

    • count: number

    Returns Dataset<T>

    A Dataset.

    Doc

  • Collect all elements of this dataset into an array.

    Obviously this will succeed only for small datasets that fit in memory. Useful for testing and generally should be avoided if possible.

    const a = tf.data.array([1, 2, 3, 4, 5, 6]);
    console.log(await a.toArray());

    Returns Promise<T[]>

    A Promise for an array of elements, which will resolve when a new stream has been obtained and fully consumed.

    Doc

  • Collect all elements of this dataset into an array with prefetching 100 elements. This is useful for testing, because the prefetch changes the order in which the Promises are resolved along the processing pipeline. This may help expose bugs where results are dependent on the order of Promise resolution rather than on the logical order of the stream (i.e., due to hidden mutable state).

    Returns Promise<T[]>

    A Promise for an array of elements, which will resolve when a new stream has been obtained and fully consumed.

Generated using TypeDoc