Pdfium – Pattern Shading Integer Overflows

  • 作者: Google Security Research
    日期: 2018-02-15
  • 类别:
    平台:
  • 来源:https://www.exploit-db.com/exploits/44082/
  • This vulnerability relies on several minor oversights in the handling of shading patterns in pdfium, I'll try to detail all of the issues that could be fixed to harden the code against similar issues.
    
    The DrawXShading functions in cpdf_renderstatus.cpp rely on a helper function to compute the number of output components resulting from applying multiple shading functions. Note that all of these functions appear to be vulnerable; the rest of this report discusses the specifics of triggering a heap-overflow using DrawRadialShading.
    
    uint32_t CountOutputs(
    const std::vector<std::unique_ptr<CPDF_Function>>& funcs) {
    uint32_t total = 0;
    for (const auto& func : funcs) {
    if (func)
    total += func->CountOutputs(); // <--Issue #1 : integer overflow here
    }
    return total;
    }
    
    The lack of integer overflow checking would not be an issue if the parser enforced the limitations applied by the pdf specification to the functions applied (namely that the /Function section in a radial shading pattern should be either a 1-n function or an array of n 1-1 functions), as these preconditions would preclude any overflow from occuring. However, we can see in the loading code for CPDF_ShadingPattern that there is no such validation.
    
    bool CPDF_ShadingPattern::Load() {
    if (m_ShadingType != kInvalidShading)
    return true;
    
    CPDF_Dictionary* pShadingDict =
    m_pShadingObj ? m_pShadingObj->GetDict() : nullptr;
    if (!pShadingDict)
    return false;
    
    m_pFunctions.clear();
    CPDF_Object* pFunc = pShadingDict->GetDirectObjectFor("Function");
    if (pFunc) {
    
    // Issue #2: we never validate that the signatures of the parsed Function object
    // match the expected signatures for the shading type that we're parsing.
    
    if (CPDF_Array* pArray = pFunc->AsArray()) {
    m_pFunctions.resize(std::min<size_t>(pArray->GetCount(), 4));
    for (size_t i = 0; i < m_pFunctions.size(); ++i)
    m_pFunctions[i] = CPDF_Function::Load(pArray->GetDirectObjectAt(i));
    } else {
    m_pFunctions.push_back(CPDF_Function::Load(pFunc));
    }
    }
    CPDF_Object* pCSObj = pShadingDict->GetDirectObjectFor("ColorSpace");
    if (!pCSObj)
    return false;
    
    CPDF_DocPageData* pDocPageData = document()->GetPageData();
    m_pCS = pDocPageData->GetColorSpace(pCSObj, nullptr);
    if (m_pCS)
    m_pCountedCS = pDocPageData->FindColorSpacePtr(m_pCS->GetArray());
    
    m_ShadingType = ToShadingType(pShadingDict->GetIntegerFor("ShadingType"));
    
    // We expect to have a stream if our shading type is a mesh.
    if (IsMeshShading() && !ToStream(m_pShadingObj.Get()))
    return false;
    
    return true;
    }
    
    Assuming that we can create function objects with very large output sizes, we can then reach the following code (in cpdf_renderstatus.cpp) when rendering something using the pattern:
    
    void DrawRadialShading(const RetainPtr<CFX_DIBitmap>& pBitmap,
     CFX_Matrix* pObject2Bitmap,
     CPDF_Dictionary* pDict,
     const std::vector<std::unique_ptr<CPDF_Function>>& funcs,
     CPDF_ColorSpace* pCS,
     int alpha) {
    
    // ... snip ...
    
    uint32_t total_results =
    std::max(CountOutputs(funcs), pCS->CountComponents());
    
    // NB: CountOutputs overflows here, result_array will be a stack buffer if we return
    // a resulting size less than 16) or a heap buffer if the size is larger.
    
    CFX_FixedBufGrow<float, 16> result_array(total_results);
    float* pResults = result_array;
    memset(pResults, 0, total_results * sizeof(float));
    uint32_t rgb_array[SHADING_STEPS];
    for (int i = 0; i < SHADING_STEPS; i++) {
    float input = (t_max - t_min) * i / SHADING_STEPS + t_min;
    int offset = 0;
    for (const auto& func : funcs) {
    if (func) {
    int nresults;
    
    // Here we've desynchronised the size of the memory pointed to by 
    // pResults with the actual output size of the functions, so this
    // can write outside the allocated buffer.
    
    if (func->Call(&input, 1, pResults + offset, &nresults))
    offset += nresults;
    }
    }
    float R = 0.0f;
    float G = 0.0f;
    float B = 0.0f;
    pCS->GetRGB(pResults, &R, &G, &B);
    rgb_array[i] =
    FXARGB_TODIB(FXARGB_MAKE(alpha, FXSYS_round(R * 255),
     FXSYS_round(G * 255), FXSYS_round(B * 255)));
    }
    
    Now we need to revisit our earlier assumption, that we can create function objects with large output sizes.
    
    The following code handles parsing of function objects:
    
    bool CPDF_Function::Init(CPDF_Object* pObj) {
    CPDF_Stream* pStream = pObj->AsStream();
    CPDF_Dictionary* pDict = pStream ? pStream->GetDict() : pObj->AsDictionary();
    
    CPDF_Array* pDomains = pDict->GetArrayFor("Domain");
    if (!pDomains)
    return false;
    
    m_nInputs = pDomains->GetCount() / 2;
    if (m_nInputs == 0)
    return false;
    
    m_pDomains = FX_Alloc2D(float, m_nInputs, 2);
    for (uint32_t i = 0; i < m_nInputs * 2; i++) {
    m_pDomains[i] = pDomains->GetFloatAt(i);
    }
    CPDF_Array* pRanges = pDict->GetArrayFor("Range");
    m_nOutputs = 0;
    if (pRanges) {
    m_nOutputs = pRanges->GetCount() / 2;
    m_pRanges = FX_Alloc2D(float, m_nOutputs, 2); // <-- avoid this call
    for (uint32_t i = 0; i < m_nOutputs * 2; i++)
    m_pRanges[i] = pRanges->GetFloatAt(i);
    }
    uint32_t old_outputs = m_nOutputs;
    if (!v_Init(pObj))
    return false;
    if (m_pRanges && m_nOutputs > old_outputs) {
    m_pRanges = FX_Realloc(float, m_pRanges, m_nOutputs * 2); // <-- avoid this call
    if (m_pRanges) {
    memset(m_pRanges + (old_outputs * 2), 0,
     sizeof(float) * (m_nOutputs - old_outputs) * 2);
    }
    }
    return true;
    }
    
    We can only have 4 functions, so we need m_nOutputs to be pretty large. Ideally we also don't want our pdf file to contain arrays of size 0x100000000 // 4 either, since this will mean multiple-gigabyte pdfs. Note also that any call to the FX_ allocation functions will fail on very large values, so ideally we need to follow the case old_outputs == m_nOutputs == 0, avoiding the final FX_Realloc call and allowing an arbitrarily large m_nOutputs.
    
    It turns out that there is a function subtype that allows this, the exponential interpolation function type implemented in cpdf_expintfunc.cpp
    
    bool CPDF_ExpIntFunc::v_Init(CPDF_Object* pObj) {
    CPDF_Dictionary* pDict = pObj->GetDict();
    if (!pDict)
    return false;
    
    CPDF_Array* pArray0 = pDict->GetArrayFor("C0");
    if (m_nOutputs == 0) {
    m_nOutputs = 1;
    if (pArray0) {
    fprintf(stderr, "C0 %zu\n", pArray0->GetCount());
    m_nOutputs = pArray0->GetCount();
    }
    }
    
    CPDF_Array* pArray1 = pDict->GetArrayFor("C1");
    m_pBeginValues = FX_Alloc2D(float, m_nOutputs, 2);
    m_pEndValues = FX_Alloc2D(float, m_nOutputs, 2);
    for (uint32_t i = 0; i < m_nOutputs; i++) {
    m_pBeginValues[i] = pArray0 ? pArray0->GetFloatAt(i) : 0.0f;
    m_pEndValues[i] = pArray1 ? pArray1->GetFloatAt(i) : 1.0f;
    }
    
    m_Exponent = pDict->GetFloatFor("N");
    m_nOrigOutputs = m_nOutputs;
    if (m_nOutputs && m_nInputs > INT_MAX / m_nOutputs) // <-- can't be *too* large
    return false;
    
    m_nOutputs *= m_nInputs; // <-- but it can be pretty large
    
    // Issue #3: This is probably not the place, but it probably makes sense to 
    // bound m_nInputs and m_nOutputs to some large-but-not-that-large value in 
    // CPDF_Function::Init
    
    return true;
    }
    
    So, by providing a function object without a /Range object, but with a large /C0 and /Domain elements, we can construct a function object with about INT_MAX outputs.
    
    7 0 obj
    <<
    /FunctionType 2
    /Domain [
    0.0 1.0
    ... repeat many times ...
    0.0 1.0
    ]
    /C0 [
    0.0
    ... repeat many times ...
    0.0 
    ]
    /N 1
    >>
    endobj
    
    At this point it looks like we have quite an annoying exploitation primitive; we can write a huge amount of data out of bounds, but that data will be calculated as an interpolation between it's input coordinates, and it will be a really, really big memory corruption.
    
    It turns out that the point mentioned earlier at Issue #2 about validating the signatures of the functions is again relevant here, since if we look at the callsite in DrawRadialShading we can see that all of the functions are called with a single input parameter, and if we look at the start of CPDF_Function::Call
    
    bool CPDF_Function::Call(float* inputs,
     uint32_t ninputs,
     float* results,
     int* nresults) const {
    if (m_nInputs != ninputs)
    return false;
    
    We can see that any attempt to call a function with the wrong number of input parameters will simply fail, letting us control precisely the size and contents of our overflow.
    
    The attached poc will crash under ASAN with the following stack-trace, and without ASAN during the free of the corrupted heap block.
    
    Proof of Concept:
    https://gitlab.com/exploit-database/exploitdb-bin-sploits/-/raw/main/bin-sploits/44082.zip