Create a CSVDataset
.
A DataSource
providing a chunked, UTF8-encoded byte stream.
Optional
csvConfig: CSVConfig(Optional) A CSVConfig object that contains configurations of reading and decoding from CSV file(s).
hasHeader: (Optional) A boolean value that indicates whether the first
row of provided CSV file is a header line with column names, and should
not be included in the data. Defaults to `true`.
columnNames: (Optional) A list of strings that corresponds to
the CSV column names, in order. If provided, it ignores the column
names inferred from the header row. If not provided, infers the column
names from the first row of the records. If hasHeader is false and
columnNames is not provided, this method throws an error.
columnConfigs: (Optional) A dictionary whose key is column names, value
is an object stating if this column is required, column's data type,
default value, and if this column is label. If provided, keys must
correspond to names provided in columnNames or inferred from the file
header lines. If isLabel is true any column, returns an array of two
items: the first item is a dict of features key/value pairs, the second
item is a dict of labels key/value pairs. If no feature is marked as
label, returns a dict of features only.
configuredColumnsOnly (Optional) If true, only columns provided in
columnConfigs will be parsed and provided during iteration.
delimiter (Optional) The string used to parse each line of the input
file. Defaults to `,`.
Groups elements into batches.
It is assumed that each of the incoming dataset elements has the same
structure -- i.e. the same set of keys at each location in an object
hierarchy. For each key, the resulting Dataset
provides a batched
element collecting all of the incoming values for that key.
If an array should not be batched as a unit, it should first be converted to an object with integer keys.
Here are a few examples:
Batch a dataset of numbers:
const a = tf.data.array([1, 2, 3, 4, 5, 6, 7, 8]).batch(4);
await a.forEachAsync(e => e.print());
Batch a dataset of arrays:
const b = tf.data.array([[1], [2], [3], [4], [5], [6], [7], [8]]).batch(4);
await b.forEachAsync(e => e.print());
Batch a dataset of objects:
const c = tf.data.array([{a: 1, b: 11}, {a: 2, b: 12}, {a: 3, b: 13},
{a: 4, b: 14}, {a: 5, b: 15}, {a: 6, b: 16}, {a: 7, b: 17},
{a: 8, b: 18}]).batch(4);
await c.forEachAsync(e => {
console.log('{');
for(var key in e) {
console.log(key+':');
e[key].print();
}
console.log('}');
})
The number of elements desired per batch.
Optional
smallLastBatch: booleanWhether to emit the final batch when it has fewer than batchSize elements. Default true.
A Dataset
, from which a stream of batches can be obtained.
Returns column names of the csv dataset. If configuredColumnsOnly
is
true, return column names in columnConfigs
. If configuredColumnsOnly
is
false and columnNames
is provided, columnNames
. If
configuredColumnsOnly
is false and columnNames
is not provided, return
all column names parsed from the csv file. For example usage please go to
tf.data.csv
.
Concatenates this Dataset
with another.
const a = tf.data.array([1, 2, 3]);
const b = tf.data.array([4, 5, 6]);
const c = a.concatenate(b);
await c.forEachAsync(e => console.log(e));
A Dataset
to be concatenated onto this one.
A Dataset
.
Filters this dataset according to predicate
.
const a = tf.data.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
.filter(x => x%2 === 0);
await a.forEachAsync(e => console.log(e));
A function mapping a dataset element to a boolean or a
Promise
for one.
A Dataset
of elements for which the predicate was true.
Apply a function to every element of the dataset.
After the function is applied to a dataset element, any Tensors contained within that element are disposed.
const a = tf.data.array([1, 2, 3]);
await a.forEachAsync(e => console.log(e));
A function to apply to each dataset element.
A Promise
that resolves after all elements have been processed.
Maps this dataset through a 1-to-1 transform.
const a = tf.data.array([1, 2, 3]).map(x => x*x);
await a.forEachAsync(e => console.log(e));
A function mapping a dataset element to a transformed dataset element.
A Dataset
of transformed elements.
Maps this dataset through an async 1-to-1 transform.
const a =
tf.data.array([1, 2, 3]).mapAsync(x => new Promise(function(resolve){
setTimeout(() => {
resolve(x * x);
}, Math.random()*1000 + 500);
}));
console.log(await a.toArray());
A function mapping a dataset element to a Promise
for a
transformed dataset element. This transform is responsible for disposing
any intermediate Tensor
s, i.e. by wrapping its computation in
tf.tidy()
; that cannot be automated here (as it is in the synchronous
map()
case).
A Dataset
of transformed elements.
Creates a Dataset
that prefetches elements from this dataset.
A Dataset
.
Repeats this dataset count
times.
NOTE: If this dataset is a function of global state (e.g. a random number generator), then different repetitions may produce different elements.
const a = tf.data.array([1, 2, 3]).repeat(3);
await a.forEachAsync(e => console.log(e));
Optional
count: numberA Dataset
.
Pseudorandomly shuffles the elements of this dataset. This is done in a streaming manner, by sampling from a given number of prefetched elements.
const a = tf.data.array([1, 2, 3, 4, 5, 6]).shuffle(3);
await a.forEachAsync(e => console.log(e));
Optional
seed: stringOptional
reshuffleEachIteration: booleanA Dataset
.
Creates a Dataset
that skips count
initial elements from this dataset.
const a = tf.data.array([1, 2, 3, 4, 5, 6]).skip(3);
await a.forEachAsync(e => console.log(e));
A Dataset
.
Creates a Dataset
with at most count
initial elements from this
dataset.
const a = tf.data.array([1, 2, 3, 4, 5, 6]).take(3);
await a.forEachAsync(e => console.log(e));
A Dataset
.
Collect all elements of this dataset into an array.
Obviously this will succeed only for small datasets that fit in memory. Useful for testing and generally should be avoided if possible.
const a = tf.data.array([1, 2, 3, 4, 5, 6]);
console.log(await a.toArray());
A Promise for an array of elements, which will resolve when a new stream has been obtained and fully consumed.
Collect all elements of this dataset into an array with prefetching 100 elements. This is useful for testing, because the prefetch changes the order in which the Promises are resolved along the processing pipeline. This may help expose bugs where results are dependent on the order of Promise resolution rather than on the logical order of the stream (i.e., due to hidden mutable state).
A Promise for an array of elements, which will resolve when a new stream has been obtained and fully consumed.
Generated using TypeDoc
Represents a potentially large collection of delimited text records.
The produced
TensorContainer
s each contain one key-value pair for every column of the table. When a field is empty in the incoming data, the resulting value isundefined
, or throw error if it is required. Values that can be parsed as numbers are emitted as typenumber
, other values are parsed asstring
.The results are not batched.
Doc