• 6
name

A PHP Error was encountered

Severity: Notice

Message: Undefined index: userid

Filename: views/question.php

Line Number: 191

Backtrace:

File: /home/prodcxja/public_html/questions/application/views/question.php
Line: 191
Function: _error_handler

File: /home/prodcxja/public_html/questions/application/controllers/Questions.php
Line: 433
Function: view

File: /home/prodcxja/public_html/questions/index.php
Line: 315
Function: require_once

In tensorflow-lite android demo code for image classification, the images are first converted to ByteBuffer format for better performance.This conversion from bitmap to floating point format and the subsequent conversion to byte buffer seems to be an expensive operation(loops, bitwise operators, float mem-copy etc).We were trying to implement the same logic with opencv to gain some speed advantage.The following code works without error; but due to some logical error in this conversion, the output of the model(to which this data is fed) seems to be incorrect.The input of the model is supposed to be RGB with data type float[1,197,197,3].

How can we speed up this process of bitmap to byte buffer conversion using opencv (or any other means)?

Standard Bitmap to ByteBuffer Conversion:-

/** Writes Image data into a {@code ByteBuffer}. */
  private void convertBitmapToByteBuffer(Bitmap bitmap) {
    if (imgData == null) {
      return;
    }
    imgData.rewind();


    bitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0, bitmap.getWidth(), bitmap.getHeight());



    long startTime = SystemClock.uptimeMillis();

    // Convert the image to floating point.
    int pixel = 0;

    for (int i = 0; i < getImageSizeX(); ++i) {
      for (int j = 0; j < getImageSizeY(); ++j) {
        final int val = intValues[pixel++];

        imgData.putFloat(((val>> 16) & 0xFF) / 255.f);
        imgData.putFloat(((val>> 8) & 0xFF) / 255.f);
        imgData.putFloat((val & 0xFF) / 255.f);
      }
    }

    long endTime = SystemClock.uptimeMillis();
    Log.d(TAG, "Timecost to put values into ByteBuffer: " + Long.toString(endTime - startTime));
  }

OpenCV Bitmap to ByteBuffer :-

    /** Writes Image data into a {@code ByteBuffer}. */
      private void convertBitmapToByteBuffer(Bitmap bitmap) {
        if (imgData == null) {
          return;
        }
        imgData.rewind();


        bitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0, bitmap.getWidth(), bitmap.getHeight());

        long startTime = SystemClock.uptimeMillis();


        Mat bufmat = new Mat(197,197,CV_8UC3);
        Mat newmat = new Mat(197,197,CV_32FC3);


        Utils.bitmapToMat(bitmap,bufmat);
        Imgproc.cvtColor(bufmat,bufmat,Imgproc.COLOR_RGBA2RGB);

        List<Mat> sp_im = new ArrayList<Mat>(3);


        Core.split(bufmat,sp_im);

        sp_im.get(0).convertTo(sp_im.get(0),CV_32F,1.0/255/0);
        sp_im.get(1).convertTo(sp_im.get(1),CV_32F,1.0/255.0);
        sp_im.get(2).convertTo(sp_im.get(2),CV_32F,1.0/255.0);

        Core.merge(sp_im,newmat);



        //bufmat.convertTo(newmat,CV_32FC3,1.0/255.0);
        float buf[] = new float[197*197*3];


        newmat.get(0,0,buf);

        //imgData.wrap(buf).order(ByteOrder.nativeOrder()).getFloat();
        imgData.order(ByteOrder.nativeOrder()).asFloatBuffer().put(buf);


        long endTime = SystemClock.uptimeMillis();
        Log.d(TAG, "Timecost to put values into ByteBuffer: " + Long.toString(endTime - startTime));
      }
      • 2
    • Now the tensorflow android demo has included a data type 'TensorImage' for loading bitmaps to model, in its recent 'support library'.
  1. I believe that 255/0 in your code is a copy/paste mistake, not real code.
  2. I wonder what the timecost of the pure Java solution is, especially when you weigh it against the timecost of inference. For me, with a slightly larger bitmap for Google's mobilenet_v1_1.0_224, the naïve float buffer preparation was less than 5% of inference time.
  3. I could quantize the tflite model (with the same tflite_convert utility that generated .tflite file from .h5. There could actually be three quantization operations, but I only used two: --inference_input_type=QUANTIZED_UINT8 and --post_training_quantize.
    • The resulting model is about 25% size of the float32 one, which is an achievement on its own.
    • The resulting model runs about twice faster (at least on some devices).
    • And, the resulting model consumes unit8 inputs. This means that instead of imgData.putFloat(((val>> 16) & 0xFF) / 255.f) we write imgData.put((val>> 16) & 0xFF), and so on.

By the way, I don't think that your formulae are correct. To achieve best accuracy when float32 buffers are involved, we use

putFLoat(byteval / 256f)

where byteval is int in range [0:255].

  • 1
Reply Report
    • Nice to know. I hope we can live with quantized model, it's faster for us than GPU and also significantly smaller. BTW, on the face of it, quantizing input only should not loose accuracy (this means, conversion of bytes to floats cannot improve accuracy), and the resulting model may still work with GpuDelegate.
    • It depends on scenario or use case; for our case of semantic segmentation small accuracy loss really shows in the output.Accuracy loss depends on level of quantization.GPU supports only float 32(float16 is still experimental).Weights only quantization reduced size; but some operators were not supported in tflite gpu(dequantize).You may see difference between quantized cpu and gpu if you use bigger image/models.GPU will be faster.By the way what do you mean by input only quantization?Is it quantized UINT8 input?This wont run in gpu.We tried two method you mentioned earlier; but it didnt work
      • 1
    • Thanks, it was a silly mistake.. I changed 255/0 to 255.0.Now it works perfectly and is twice as fast.Actually i was too ambitious to optimize this part of TF code.I was trying to remove costly nested loops. We were running other code in parallel to this conversion, and as a result the time taken increased by 4x fold. We tried those two quantization methods ; but it did not work properly.Also we would loose accuracy (cant afford this)in this scenario.We need to try quantization aware training.We are using gpu delegate & tflite would run only float model in gpu.
    • You don't need nested loops for i / for j in the simple Java implementation. Switching to single loop with precalculated limit could give you significant boost.
      • 1
    • Its faster; but not as fast as the current opencv method .Looks like opencv has neon optimized low-level vectorized operations ...