Comathematics

Trying strange things out on neural networks.


Project maintained by comath Hosted on GitHub Pages — Theme by mattgraham

Introduction

I’ve been working on implementing this paper, with some adjustments. I implemented it at the python level of tensorflow, but it produced far too many tiny operations and I couldn’t get it to be numerically stable. I figured it would be difficult to finally get it working and since it wasn’t going to scale very well I decided to implement a modification of the loss in a couple ops. As the tensorflow documentation is not very informative, I wrote this up. It’s meant to be a complement to the official documentation, and the actual implementation I wrote here so it will focus on explaining snippets from the github.

Registering the Operation

How does it work? Why?

REGISTER_OP("AverageEmbeddingMask")
	.Attr("T: {float32, float64}")
	.Attr("S: {int32, int64}")
  .Input("embedding: T")
  .Input("rle_mask: S")
  .Output("swarm_loss: float32")
  .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c){
});

This registers the op with TensorFlow. The op we’re building has 2 inputs and 1 output. The embedding input has to be either a float or a double, and the rle_mask has to be either a int or a long. We also determine the output as a float. The current description of this in the TensorFlow documentation. This is all I’ve needed to do with this so far. If I run into more issues I will update this later. However, the SetShapeFn is where much of the initial verification steps for the op takes place. It also sets the output shape for ops further down the graph.

How do I make a certain output?

You have to define a lambda, that computes the shape using the shape inference library. These are computed using the InferenceContext class object that is passed to the lambda. For example, if I wanted a op whose output whose dimensions are all but the last element of an input op we would use the following:

SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
  shape_inference::ShapeHandle loss_shape;								
  TF_RETURN_IF_ERROR(c->Subshape(c->input(0),0,-1,&loss_shape));							
  c->set_output(0,loss_shape);
  return Status::OK();
}

To walk this though, step by step, the first line is the function handle that the REGISTER_OP calls to determine the output shapes. The rest is the lambda. c is the inference context object, holding all the information needed to determine the type and shape of the input and output tensors. It holds input and output arrays of ShapeHandles. The input arrays are populated from this op’s input, but we need to populate the output arrays. To do this we create an empty ShapeHandle to hold the output of Subshape.

The InterferenceContext has several functions attached to it that don’t interact with it’s internal data but do manipulate shapes, such as Subshape. This function takes in the start and end that you want to keep from the first input to the op, c->input(0), and outputs it into loss_shape. It actually returns a Status (defined in lib/core/status.h). The TF_RETURN_IF_ERROR is the catch macro for TensorFlow’s internal error handling methods. We finally set the first output to have the same shape as the loss_shape and return an OK status indicating that there’s been no errors.

How do I require a certain input?

SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
  ShapeHandle input;
  TF_RETURN_IF_ERROR(c->WithRank(c->input(0), 4, &input));

  ShapeHandle rle_input;
  TF_RETURN_IF_ERROR(c->WithRank(c->input(1), 4, &rle_input));

  DimensionHandle rle_end_dim;
  TF_RETURN_IF_ERROR(c->WithValue(c->Dim(rle_input, 3), 2, &rle_end_dim));
  return Status::OK();
}

For the first input, this is meant to hold an image who’s pixels have been each been transformed into an embedding. I want a batch size, and an image with an X and a Y dimension that each holds N dimensions for the embedding space. So, the input has to be a rank 4 tensor. This is accomplished by using the WithRank function, however this will also return successfully if the rank of the tensor is unknown. As this implementation was targeted at convex cells I could use run length encoding (hence the rle) as the cells’ masks are all Y-simple. So, the second input needs to have shape [BatchSize, MaxMasks, DimX,2], and we can verify the last entry with a DimensionHandle.

Both the shape and dimension are handled through reference objects DimensionHandle or ShapeHandle. As far as I can tell this is not meant to cross verify dimensions of tensors. More intricate error messages can be written in the op’s implementation, but I would rather do all my shape verification here so that if I have multiple implementations (GPU and CPU) I don’t have to duplicate code.

In the Op Implementation

template <typename Device, typename T, typename S>
class MaskedAveragesOp : public OpKernel {
 public:
  explicit MaskedAveragesOp(OpKernelConstruction* context) : OpKernel(context) {}

  void Compute(OpKernelContext* context) override {
  // Internals go here
  }
};

To implement the actual op we need to have extend the base class OpKernel. As these ops have no internal definitions to worry about with this op, we just need to define the Compute function. The only thing passed to this is the OpKernelContext object, which contains a list of inputs that are allocated and ready to use and a list of outputs that we need to allocate in our compute. The data types for the outputs are already filled by the TensorFlow system as defined by the op’s registration, so we only need to supply the shapes here.

How do I get my inputs and manipulate their shapes?

const Tensor& input_tensor = context->input(0);
const Tensor& rle_tensor = context->input(1);
TensorShape input_shape = input_tensor.shape();
TensorShape rle_shape = rle_tensor.shape();
TensorShape mean_shape = input_shape;
mean_shape.RemoveDimRange(1,-2);
mean_shape.InsertDim(1,rle_shape.dim_size(1));

A tensor consists of a TensorShape and a TensorBuffer. To create a tensor we need to determine a shape and then ask TensorFlow to allocate an appropriately sized buffer for that shape. Do not try to do your own allocation, TensorFlow has it’s own memory management system backed up with jemalloc. This is incompatible with malloc, and if you mix them there will be a segfault. As the numpy arrays returned by TensorFlow are wrappers around the TensorFlow buffer, this even applies when passing an output to an external library. So, when in TensorFlow you have to use the internal memory management, and when passing things from TensorFlow to another C/C++ library treat the buffers gingerly.

Now that we are in an op, the library we use to manipulate their shapes is TensorShape instead of ShapeInference. It has all the pieces we need to build the tensor shape for our output. This op computes the average embedding of each mask, so it needs be of shape [batchSize,MaxMasks,EmbeddingDim]. The last two lines accomplish this slicing a gluing to make the appropriate shape.

How do I allocate for output or scratch?

Tensor* output_tensor = NULL;
OP_REQUIRES_OK(context, context->allocate_output(0, loss_shape,
                                                 &output_tensor));

The output tensors are already set up in the OpKernelContext, with the correct type though they are not allocated and have no given shape. The empty first output object already exists in the context object. So, for the first tensor, 0, we pass a shape, and a reference to a pointer so that we have a means of getting the complete, allocated, tensor.

Tensor mean_tensor_temp;
OP_REQUIRES_OK(context, context->allocate_temp(input_tensor.dtype(), 
								mean_shape_temp, &mean_tensor_temp));

This is the temporary version of above. The type cannot be inferred from context, so it needs to be passed. There’s an enumeration in types.pb.h that holds all the types. The most useful ones are DT_FLOAT and DT_INT32. Remember the Tensor object is just a reference, so this function provides the tensor with the type and the shape and then it allocates the appropriate internal buffer.

Next Steps

We need to define an actual implementation of the operation, and we need to debug it. The next post is on Eigen::Tensor and the odd error messages that it returns when it doesn’t compile.

Posts

subscribe via RSS