A preprocessing layer which encodes integer features.
This layer provides options for condensing data into a categorical encoding
when the total number of tokens are known in advance. It accepts integer
values as inputs, and it outputs a dense representation of those
inputs.
Arguments:
numTokens: The total number of tokens the layer should support. All
inputs to the layer must integers in the range 0 <= value < numTokens, or an error will be thrown.
outputMode: Specification for the output of the layer.
Defaults to multiHot. Values can be oneHot, multiHot or
count, configuring the layer as follows:
oneHot: Encodes each individual element in the input into an
array of numTokens size, containing a 1 at the element index. If
the last dimension is size 1, will encode on that dimension. If the
last dimension is not size 1, will append a new dimension for the
encoded output.
multiHot: Encodes each sample in the input into a single array
of numTokens size, containing a 1 for each vocabulary term
present in the sample. Treats the last dimension as the sample
dimension, if input shape is (..., sampleLength), output shape
will be (..., numTokens).
count: Like multiHot, but the int array contains a count of
the number of times the token at that index appeared in the sample.
For all output modes, currently only output up to rank 2 is supported.
Call arguments:
inputs: A 1D or 2D tensor of integer inputs.
countWeights: A tensor in the same shape as inputs indicating the
weight for each sample value when summing up in count mode. Not used
in multiHot or oneHot modes.
A preprocessing layer which encodes integer features.
This layer provides options for condensing data into a categorical encoding when the total number of tokens are known in advance. It accepts integer values as inputs, and it outputs a dense representation of those inputs.
Arguments:
numTokens: The total number of tokens the layer should support. All inputs to the layer must integers in the range
0 <= value < numTokens
, or an error will be thrown.outputMode: Specification for the output of the layer. Defaults to
multiHot
. Values can beoneHot
,multiHot
orcount
, configuring the layer as follows:oneHot: Encodes each individual element in the input into an array of
numTokens
size, containing a 1 at the element index. If the last dimension is size 1, will encode on that dimension. If the last dimension is not size 1, will append a new dimension for the encoded output.multiHot: Encodes each sample in the input into a single array of
numTokens
size, containing a 1 for each vocabulary term present in the sample. Treats the last dimension as the sample dimension, if input shape is(..., sampleLength)
, output shape will be(..., numTokens)
.count: Like
multiHot
, but the int array contains a count of the number of times the token at that index appeared in the sample.For all output modes, currently only output up to rank 2 is supported. Call arguments: inputs: A 1D or 2D tensor of integer inputs. countWeights: A tensor in the same shape as
inputs
indicating the weight for each sample value when summing up incount
mode. Not used inmultiHot
oroneHot
modes.