Class CSVDataset

Represents a potentially large collection of delimited text records.

The produced TensorContainers each contain one key-value pair for every column of the table. When a field is empty in the incoming data, the resulting value is undefined, or throw error if it is required. Values that can be parsed as numbers are emitted as type number, other values are parsed as string.

The results are not batched.

Doc

Hierarchy

Dataset<TensorContainer>
- CSVDataset

Index

Constructors

constructor

Methods

batch columnNames concatenate filter forEachAsync map mapAsync prefetch repeat shuffle skip take toArray toArrayForTest

Constructors

constructor

new CSVDataset(input, csvConfig?): CSVDataset

Create a CSVDataset.

Parameters

input: DataSource
A DataSource providing a chunked, UTF8-encoded byte stream.

`Optional` csvConfig: CSVConfig

(Optional) A CSVConfig object that contains configurations of reading and decoding from CSV file(s).

hasHeader: (Optional) A boolean value that indicates whether the first
row of provided CSV file is a header line with column names, and should
not be included in the data. Defaults to `true`.

columnNames: (Optional) A list of strings that corresponds to
the CSV column names, in order. If provided, it ignores the column
names inferred from the header row. If not provided, infers the column
names from the first row of the records. If hasHeader is false and
columnNames is not provided, this method throws an error.

columnConfigs: (Optional) A dictionary whose key is column names, value
is an object stating if this column is required, column's data type,
default value, and if this column is label. If provided, keys must
correspond to names provided in columnNames or inferred from the file
header lines. If isLabel is true any column, returns an array of two
items: the first item is a dict of features key/value pairs, the second
item is a dict of labels key/value pairs. If no feature is marked as
label, returns a dict of features only.

configuredColumnsOnly (Optional) If true, only columns provided in
columnConfigs will be parsed and provided during iteration.

delimiter (Optional) The string used to parse each line of the input
file. Defaults to `,`.

Returns CSVDataset

Methods

batch

batch(batchSize, smallLastBatch?): Dataset<TensorContainer>
Groups elements into batches.

It is assumed that each of the incoming dataset elements has the same structure -- i.e. the same set of keys at each location in an object hierarchy. For each key, the resulting Dataset provides a batched element collecting all of the incoming values for that key.
- Incoming primitives are grouped into a 1-D Tensor.
- Incoming Tensors are grouped into a new Tensor where the 0th axis is the batch dimension.
- Incoming arrays are converted to Tensor and then batched.
- A nested array is interpreted as an n-D Tensor, so the batched result has n+1 dimensions.
- An array that cannot be converted to Tensor produces an error.
If an array should not be batched as a unit, it should first be converted to an object with integer keys.

Here are a few examples:

Batch a dataset of numbers:
```
const a = tf.data.array([1, 2, 3, 4, 5, 6, 7, 8]).batch(4);
await a.forEachAsync(e => e.print());
```
Batch a dataset of arrays:
```
const b = tf.data.array([[1], [2], [3], [4], [5], [6], [7], [8]]).batch(4);
await b.forEachAsync(e => e.print());
```
Batch a dataset of objects:
```
const c = tf.data.array([{a: 1, b: 11}, {a: 2, b: 12}, {a: 3, b: 13},
  {a: 4, b: 14}, {a: 5, b: 15}, {a: 6, b: 16}, {a: 7, b: 17},
  {a: 8, b: 18}]).batch(4);
await c.forEachAsync(e => {
  console.log('{');
  for(var key in e) {
    console.log(key+':');
    e[key].print();
  }
  console.log('}');
})
```
Parameters
- batchSize: number
  The number of elements desired per batch.
- Optional smallLastBatch: boolean
  Whether to emit the final batch when it has fewer than batchSize elements. Default true.
Returns Dataset<TensorContainer>
A Dataset, from which a stream of batches can be obtained.

Doc
Inherited from Dataset.batch
- Defined in src/node_modules/@tensorflow/tfjs-node-gpu/node_modules/@tensorflow/tfjs-data/dist/dataset.d.ts:113

columnNames

columnNames(): Promise<string[]>
Returns column names of the csv dataset. If configuredColumnsOnly is true, return column names in columnConfigs. If configuredColumnsOnly is false and columnNames is provided, columnNames. If configuredColumnsOnly is false and columnNames is not provided, return all column names parsed from the csv file. For example usage please go to tf.data.csv.

Returns Promise<string[]>
Doc
- Defined in src/node_modules/@tensorflow/tfjs-node-gpu/node_modules/@tensorflow/tfjs-data/dist/datasets/csv_dataset.d.ts:58

concatenate

concatenate(dataset): Dataset<TensorContainer>
Concatenates this Dataset with another.
```
const a = tf.data.array([1, 2, 3]);
const b = tf.data.array([4, 5, 6]);
const c = a.concatenate(b);
await c.forEachAsync(e => console.log(e));
```
Parameters
- dataset: Dataset<TensorContainer>
  A Dataset to be concatenated onto this one.
Returns Dataset<TensorContainer>
A Dataset.

Doc
Inherited from Dataset.concatenate
- Defined in src/node_modules/@tensorflow/tfjs-node-gpu/node_modules/@tensorflow/tfjs-data/dist/dataset.d.ts:129

filter

filter(predicate): Dataset<TensorContainer>
Filters this dataset according to predicate.
```
const a = tf.data.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
  .filter(x => x%2 === 0);
await a.forEachAsync(e => console.log(e));
```
Parameters
- predicate: ((value) => boolean)
  A function mapping a dataset element to a boolean or a Promise for one.
  - - (value): boolean
    - Parameters
      value: TensorContainer
      Returns boolean
Returns Dataset<TensorContainer>
A Dataset of elements for which the predicate was true.

Doc
Inherited from Dataset.filter
- Defined in src/node_modules/@tensorflow/tfjs-node-gpu/node_modules/@tensorflow/tfjs-data/dist/dataset.d.ts:146

forEachAsync

forEachAsync(f): Promise<void>
Apply a function to every element of the dataset.

After the function is applied to a dataset element, any Tensors contained within that element are disposed.
```
const a = tf.data.array([1, 2, 3]);
await a.forEachAsync(e => console.log(e));
```
Parameters
- f: ((input) => void)
  A function to apply to each dataset element.
  - - (input): void
    - Parameters
      input: TensorContainer
      Returns void
Returns Promise<void>
A Promise that resolves after all elements have been processed.

Doc
Inherited from Dataset.forEachAsync
- Defined in src/node_modules/@tensorflow/tfjs-node-gpu/node_modules/@tensorflow/tfjs-data/dist/dataset.d.ts:163

map

map<O>(transform): Dataset<O>
Maps this dataset through a 1-to-1 transform.
```
const a = tf.data.array([1, 2, 3]).map(x => x*x);
await a.forEachAsync(e => console.log(e));
```
Type Parameters
- O extends TensorContainer
Parameters
- transform: ((value) => O)
  A function mapping a dataset element to a transformed dataset element.
  - - (value): O
    - Parameters
      value: TensorContainer
      Returns O
Returns Dataset<O>
A Dataset of transformed elements.

Doc
Inherited from Dataset.map
- Defined in src/node_modules/@tensorflow/tfjs-node-gpu/node_modules/@tensorflow/tfjs-data/dist/dataset.d.ts:179

mapAsync

mapAsync<O>(transform): Dataset<O>
Maps this dataset through an async 1-to-1 transform.
```
const a =
 tf.data.array([1, 2, 3]).mapAsync(x => new Promise(function(resolve){
   setTimeout(() => {
     resolve(x * x);
   }, Math.random()*1000 + 500);
 }));
console.log(await a.toArray());
```
Type Parameters
- O extends TensorContainer
Parameters
- transform: ((value) => Promise<O>)
  A function mapping a dataset element to a Promise for a transformed dataset element. This transform is responsible for disposing any intermediate Tensors, i.e. by wrapping its computation in tf.tidy(); that cannot be automated here (as it is in the synchronous map() case).
  - - (value): Promise<O>
    - Parameters
      value: TensorContainer
      Returns Promise<O>
Returns Dataset<O>
A Dataset of transformed elements.

Doc
Inherited from Dataset.mapAsync
- Defined in src/node_modules/@tensorflow/tfjs-node-gpu/node_modules/@tensorflow/tfjs-data/dist/dataset.d.ts:203

prefetch

prefetch(bufferSize): Dataset<TensorContainer>
Creates a Dataset that prefetches elements from this dataset.
Parameters
- bufferSize: number
Returns Dataset<TensorContainer>
A Dataset.

Doc
Inherited from Dataset.prefetch
- Defined in src/node_modules/@tensorflow/tfjs-node-gpu/node_modules/@tensorflow/tfjs-data/dist/dataset.d.ts:213

repeat

repeat(count?): Dataset<TensorContainer>
Repeats this dataset count times.

NOTE: If this dataset is a function of global state (e.g. a random number generator), then different repetitions may produce different elements.
```
const a = tf.data.array([1, 2, 3]).repeat(3);
await a.forEachAsync(e => console.log(e));
```
Parameters
- Optional count: number
Returns Dataset<TensorContainer>
A Dataset.

Doc
Inherited from Dataset.repeat
- Defined in src/node_modules/@tensorflow/tfjs-node-gpu/node_modules/@tensorflow/tfjs-data/dist/dataset.d.ts:232

shuffle

shuffle(bufferSize, seed?, reshuffleEachIteration?): Dataset<TensorContainer>
Pseudorandomly shuffles the elements of this dataset. This is done in a streaming manner, by sampling from a given number of prefetched elements.
```
const a = tf.data.array([1, 2, 3, 4, 5, 6]).shuffle(3);
await a.forEachAsync(e => console.log(e));
```
Parameters
- bufferSize: number
- Optional seed: string
- Optional reshuffleEachIteration: boolean
Returns Dataset<TensorContainer>
A Dataset.

Doc
Inherited from Dataset.shuffle
- Defined in src/node_modules/@tensorflow/tfjs-node-gpu/node_modules/@tensorflow/tfjs-data/dist/dataset.d.ts:273

skip

skip(count): Dataset<TensorContainer>
Creates a Dataset that skips count initial elements from this dataset.
```
const a = tf.data.array([1, 2, 3, 4, 5, 6]).skip(3);
await a.forEachAsync(e => console.log(e));
```
Parameters
- count: number
Returns Dataset<TensorContainer>
A Dataset.

Doc
Inherited from Dataset.skip
- Defined in src/node_modules/@tensorflow/tfjs-node-gpu/node_modules/@tensorflow/tfjs-data/dist/dataset.d.ts:250

take

take(count): Dataset<TensorContainer>
Creates a Dataset with at most count initial elements from this dataset.
```
const a = tf.data.array([1, 2, 3, 4, 5, 6]).take(3);
await a.forEachAsync(e => console.log(e));
```
Parameters
- count: number
Returns Dataset<TensorContainer>
A Dataset.

Doc
Inherited from Dataset.take
- Defined in src/node_modules/@tensorflow/tfjs-node-gpu/node_modules/@tensorflow/tfjs-data/dist/dataset.d.ts:291

toArray

toArray(): Promise<TensorContainer[]>
Collect all elements of this dataset into an array.

Obviously this will succeed only for small datasets that fit in memory. Useful for testing and generally should be avoided if possible.
```
const a = tf.data.array([1, 2, 3, 4, 5, 6]);
console.log(await a.toArray());
```
Returns Promise<TensorContainer[]>
A Promise for an array of elements, which will resolve when a new stream has been obtained and fully consumed.

Doc
Inherited from Dataset.toArray
- Defined in src/node_modules/@tensorflow/tfjs-node-gpu/node_modules/@tensorflow/tfjs-data/dist/dataset.d.ts:308

toArrayForTest

toArrayForTest(): Promise<TensorContainer[]>
Collect all elements of this dataset into an array with prefetching 100 elements. This is useful for testing, because the prefetch changes the order in which the Promises are resolved along the processing pipeline. This may help expose bugs where results are dependent on the order of Promise resolution rather than on the logical order of the stream (i.e., due to hidden mutable state).

Returns Promise<TensorContainer[]>
A Promise for an array of elements, which will resolve when a new stream has been obtained and fully consumed.
Inherited from Dataset.toArrayForTest
- Defined in src/node_modules/@tensorflow/tfjs-node-gpu/node_modules/@tensorflow/tfjs-data/dist/dataset.d.ts:320

Class CSVDataset

Doc

Hierarchy

Index

Constructors

Methods

Constructors

constructor

Parameters

input: DataSource

Optional csvConfig: CSVConfig

Returns CSVDataset

Methods

batch

Parameters

batchSize: number

Optional smallLastBatch: boolean

Returns Dataset<TensorContainer>

Doc

columnNames

Returns Promise<string[]>

Doc

concatenate

Parameters

dataset: Dataset<TensorContainer>

Returns Dataset<TensorContainer>

Doc

filter

Parameters

predicate: ((value) => boolean)

Parameters

value: TensorContainer

Returns boolean

Returns Dataset<TensorContainer>

Doc

forEachAsync

Parameters

f: ((input) => void)

Parameters

input: TensorContainer

Returns void

Returns Promise<void>

Doc

map

Type Parameters

O extends TensorContainer

Parameters

transform: ((value) => O)

Parameters

value: TensorContainer

Returns O

Returns Dataset<O>

Doc

mapAsync

Type Parameters

O extends TensorContainer

Parameters

transform: ((value) => Promise<O>)

Parameters

value: TensorContainer

Returns Promise<O>

Returns Dataset<O>

Doc

prefetch

Parameters

bufferSize: number

Returns Dataset<TensorContainer>

Doc

repeat

Parameters

Optional count: number

Returns Dataset<TensorContainer>

Doc

shuffle

Parameters

bufferSize: number

Optional seed: string

Optional reshuffleEachIteration: boolean

Returns Dataset<TensorContainer>

Doc

`Optional` csvConfig: CSVConfig

`Optional` smallLastBatch: boolean

`Optional` count: number

`Optional` seed: string

`Optional` reshuffleEachIteration: boolean