Asked  2 Years ago    Answers:  5   Viewed   152 times

there is already similar questions on , but none of their solutions have been working for me. I'm trying to grab a page on LoveIt.com with cURL, but it returns me a 404 error, while the url works fine in the browser :

        $url = 'http://loveit.com/loves/P0D1jlFaIOzzZfZqj_bY3KV';

        $curl = curl_init();
        curl_setopt($curl, CURLOPT_URL, $url);
        curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
        curl_setopt ($curl, CURLOPT_HEADER, false);
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($curl, CURLOPT_REFERER,'http://loveit.com/');

Here's the header I receive :

Array ( [url] => http://loveit.com/loves/P0D1jlFaIOzzZfZqj_bY3KV [content_type] => text/html; charset=utf-8 [http_code] => 404 [header_size] => 667 [request_size] => 172 [filetime] => -1 [ssl_verify_result] => 0 [redirect_count] => 0 [total_time] => 0.320466 [namelookup_time] => 0.000326 [connect_time] => 0.119046 [pretransfer_time] => 0.119089 [size_upload] => 0 [size_download] => 499 [speed_download] => 1557 [speed_upload] => 0 [download_content_length] => 499 [upload_content_length] => 0 [starttransfer_time] => 0.320438 [redirect_time] => 0 [certinfo] => Array ( ) [primary_ip] => --- [primary_port] => 80 [local_ip] => --- [local_port] => 53837 [redirect_url] => )

I read that some website had protections against this kind of scripts; and I did test some solutions proposed, but none worked for me (CURLOPT_USERAGENT,CURLOPT_REFERER...)

Any ideas of what's happening here ?

I would like to backup my LoveIt account, that's why i'm making this (no exports functions and no replies from LoveIt.com about the health of the website)

 Answers

4

I quickly checked the said page with LiveHeaders enabled and I noticed bunch of cookies set. I suspect that, since it's not "normal" url, you need to hand those cookies while being redirected otherwise you end being kicked out with 404. Use CURLOPT_COOKIEJAR with your cURL instance at start. See: http://php.net/manual/pl/function.curl-setopt.php

Monday, November 7, 2022
 
stellz
 
3

You should only call curl_close() when you know you're done with that particular handle, or if switching from its current state to a new one (ie: changing a ton of options via curl_setopt() would be faster by going from a clean new handle than your current "dirty" one.

The cookiejar/file options are only strictly necessary for maintaining cookies between seperate curl handles/invokations. Each one's independent of the others, so the cookie files are the only way to share between them.

Monday, September 26, 2022
3

You need to add the Curl libraries to the command line PHP.ini.

You can probably just copy the file C:wampbinapacheApache2.2.xbinphp.ini to c:wampbinphpphp5.3.10php.ini (adjust for the actual directories on your system).

Sunday, September 25, 2022
 
brodieg
 
1

Someone posted an answer that helped me, but then deleted it! To paraphrase the answer:

Adding the privacy keys to the plist without knowing what is triggering the error from Apple is not the best way to solve this.

In my case, when I did a grep search I found that there is some reference to CoreBluetooth.framework inside my project.pbxproj. I removed the reference and the build and TestFlight upload succeeded!

To search use the following command:

grep -r -a CoreBluetooth.framework ProjectFolder
Wednesday, August 31, 2022
3

To answer this question I have used the following answers:

  • Get the item that was clicked: https://.com/a/9012576

  • Get the XPATH given the item: https://.com/a/58677712

  • Run js scripts and send the information to python: https://.com/a/55294356

Joining all these parts you get the following solution:

├── main.py
└── xpath_from_element.js

main.py

import os

from PyQt5 import QtCore, QtWidgets, QtWebEngineWidgets, QtWebChannel

from jinja2 import Template

CURRENT_DIR = os.path.dirname(os.path.realpath(__file__))


class Element(QtCore.QObject):
    def __init__(self, name, parent=None):
        super(Element, self).__init__(parent)
        self._name = name

    @property
    def name(self):
        return self._name

    def script(self):
        return ""


class WebEnginePage(QtWebEngineWidgets.QWebEnginePage):
    def __init__(self, parent=None):
        super(WebEnginePage, self).__init__(parent)
        self.loadFinished.connect(self.onLoadFinished)
        self._objects = []
        self._scripts = []

    def add_object(self, obj):
        self._objects.append(obj)

    @QtCore.pyqtSlot(bool)
    def onLoadFinished(self, ok):
        print("Finished loading: ", ok)
        if ok:
            self.load_qwebchannel()
            self.add_objects()

    def load_qwebchannel(self):
        file = QtCore.QFile(":/qtwebchannel/qwebchannel.js")
        if file.open(QtCore.QIODevice.ReadOnly):
            content = file.readAll()
            file.close()
            self.runJavaScript(content.data().decode())
        if self.webChannel() is None:
            channel = QtWebChannel.QWebChannel(self)
            self.setWebChannel(channel)

    def add_objects(self):
        if self.webChannel() is not None:
            objects = {obj.name: obj for obj in self._objects}
            self.webChannel().registerObjects(objects)
            _script = """
            {% for obj in objects %}
            var {{obj}};
            {% endfor %}
            new QWebChannel(qt.webChannelTransport, function (channel) {
            {% for obj in objects %}
                {{obj}} = channel.objects.{{obj}};
            {% endfor %}
            }); 
            """
            self.runJavaScript(Template(_script).render(objects=objects.keys()))
            for obj in self._objects:
                if isinstance(obj, Element):
                    self.runJavaScript(obj.script())


class Helper(Element):
    xpathClicked = QtCore.pyqtSignal(str)

    def script(self):
        js = ""
        file = QtCore.QFile(os.path.join(CURRENT_DIR, "xpath_from_element.js"))
        if file.open(QtCore.QIODevice.ReadOnly):
            content = file.readAll()
            file.close()
            js = content.data().decode()

        js += """
        document.addEventListener('click', function(e) {
            e = e || window.event;
            var target = e.target || e.srcElement;
            var xpath = Elements.DOMPath.xPath(target, false); 
            {{name}}.receive_xpath(xpath);
        }, false);"""
        return Template(js).render(name=self.name)

    @QtCore.pyqtSlot(str)
    def receive_xpath(self, xpath):
        self.xpathClicked.emit(xpath)


if __name__ == "__main__":
    import sys

    app = QtWidgets.QApplication(sys.argv)

    xpath_helper = Helper("xpath_helper")
    xpath_helper.xpathClicked.connect(lambda xpath: print("clicked", xpath))
    view = QtWebEngineWidgets.QWebEngineView()
    page = WebEnginePage()
    page.add_object(xpath_helper)
    view.setPage(page)
    view.load(QtCore.QUrl("https://www.qt.io"))
    view.show()
    sys.exit(app.exec_())

xpath_from_element.js

// Copyright 2018 The Chromium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.

Elements = {};
Elements.DOMPath = {};

/**
 * @param {!Node} node
 * @param {boolean=} optimized
 * @return {string}
 */
Elements.DOMPath.xPath = function (node, optimized) {
    if (node.nodeType === Node.DOCUMENT_NODE) {
        return '/';
    }

    const steps = [];
    let contextNode = node;
    while (contextNode) {
        const step = Elements.DOMPath._xPathValue(contextNode, optimized);
        if (!step) {
            break;
        }  // Error - bail out early.
        steps.push(step);
        if (step.optimized) {
            break;
        }
        contextNode = contextNode.parentNode;
    }

    steps.reverse();
    return (steps.length && steps[0].optimized ? '' : '/') + steps.join('/');
};

/**
 * @param {!Node} node
 * @param {boolean=} optimized
 * @return {?Elements.DOMPath.Step}
 */
Elements.DOMPath._xPathValue = function (node, optimized) {
    let ownValue;
    const ownIndex = Elements.DOMPath._xPathIndex(node);
    if (ownIndex === -1) {
        return null;
    }  // Error.

    switch (node.nodeType) {
        case Node.ELEMENT_NODE:
            if (optimized && node.getAttribute('id')) {
                return new Elements.DOMPath.Step('//*[@id="' + node.getAttribute('id') + '"]', true);
            }
            ownValue = node.localName;
            break;
        case Node.ATTRIBUTE_NODE:
            ownValue = '@' + node.nodeName;
            break;
        case Node.TEXT_NODE:
        case Node.CDATA_SECTION_NODE:
            ownValue = 'text()';
            break;
        case Node.PROCESSING_INSTRUCTION_NODE:
            ownValue = 'processing-instruction()';
            break;
        case Node.COMMENT_NODE:
            ownValue = 'comment()';
            break;
        case Node.DOCUMENT_NODE:
            ownValue = '';
            break;
        default:
            ownValue = '';
            break;
    }

    if (ownIndex > 0) {
        ownValue += '[' + ownIndex + ']';
    }

    return new Elements.DOMPath.Step(ownValue, node.nodeType === Node.DOCUMENT_NODE);
};

/**
 * @param {!Node} node
 * @return {number}
 */
Elements.DOMPath._xPathIndex = function (node) {
    // Returns -1 in case of error, 0 if no siblings matching the same expression,
    // <XPath index among the same expression-matching sibling nodes> otherwise.
    function areNodesSimilar(left, right) {
        if (left === right) {
            return true;
        }

        if (left.nodeType === Node.ELEMENT_NODE && right.nodeType === Node.ELEMENT_NODE) {
            return left.localName === right.localName;
        }

        if (left.nodeType === right.nodeType) {
            return true;
        }

        // XPath treats CDATA as text nodes.
        const leftType = left.nodeType === Node.CDATA_SECTION_NODE ? Node.TEXT_NODE : left.nodeType;
        const rightType = right.nodeType === Node.CDATA_SECTION_NODE ? Node.TEXT_NODE : right.nodeType;
        return leftType === rightType;
    }

    const siblings = node.parentNode ? node.parentNode.children : null;
    if (!siblings) {
        return 0;
    }  // Root node - no siblings.
    let hasSameNamedElements;
    for (let i = 0; i < siblings.length; ++i) {
        if (areNodesSimilar(node, siblings[i]) && siblings[i] !== node) {
            hasSameNamedElements = true;
            break;
        }
    }
    if (!hasSameNamedElements) {
        return 0;
    }
    let ownIndex = 1;  // XPath indices start with 1.
    for (let i = 0; i < siblings.length; ++i) {
        if (areNodesSimilar(node, siblings[i])) {
            if (siblings[i] === node) {
                return ownIndex;
            }
            ++ownIndex;
        }
    }
    return -1;  // An error occurred: |node| not found in parent's children.
};

/**
 * @unrestricted
 */
Elements.DOMPath.Step = class {
    /**
     * @param {string} value
     * @param {boolean} optimized
     */
    constructor(value, optimized) {
        this.value = value;
        this.optimized = optimized || false;
    }

    /**
     * @override
     * @return {string}
     */
    toString() {
        return this.value;
    }
};
Wednesday, November 9, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 

Browse Other Code Languages