Replies: 3 comments 5 replies
-
|
Hello, To test such things, I suggest you make a clean HTML document that you control. First problem: If you only test on the example of scenario you have given, there is no $ curl -sL 'https://www.ebay-kleinanzeigen.de/s-giessen/kettcar-daytona/k0l4710r50' | grep -i 'position-relative'
I believe this is due to a cookie-portal, so you need to include custom cookies in the FreshRSS advanced settings for this feed. But based on what you write, you seem to have passed this problem. Second problem: The HTML of that page is severely broken, with more or less random opening and closing tags: https://validator.w3.org/nu/?doc=https%3A%2F%2Fwww.ebay-kleinanzeigen.de%2Fs-giessen%2Fkettcar-daytona%2Fk0l4710r50 Which means that our HTML parser might not necessarily produce a DOM identical to what you can see in your browser. So you need to go with some safer expressions (try to stick to elements that are properly closed), or use an extension that can clean the source prior to processing. Quick, not perfect, example: E.g. P.S. Tested with an offline copy of the page, as I did not bother fixing the cookie thing. |
Beta Was this translation helpful? Give feedback.
-
|
I think I have a similar problem. I'm trying to use XPath to scrape a literal XML document. <array>
<object></object>
<object>
<array>
<object></object>
</array>
</object>
</array>The problem is that |
Beta Was this translation helpful? Give feedback.
-
|
A simplified XML case: https://gist.githubusercontent.com/mgnsk/39ffd7f8e0a24038373a53f76c317d93/raw/0c772312121ce089bc4c14082ae54e8882298d74/freshrss_test.xml
Now the same XML but the inner array has a name attribute which is similar to the actual XML I dealt with: https://gist.githubusercontent.com/mgnsk/0aab124cfd3516cb8ccddb8d39d5c5a7/raw/e9fe30a5cdedcc736027353e08b4be994f91beb2/freshrss_test.xml
I was finally able to solve it with
|
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
First of all, thank you for this awesome piece of software!
It would be amazing if more than one path segment could be supported in the XPath
finding news itemssection. Something like//fooworks like a charm and I'm using it a lot but if you need select a path like e.g.//foo/bar/basit doesn't work. The same applies for providing the full path like/foo/bar/basor/html/body/foo/bar/bas.The reason behind is that sometimes a website has multiple sections where not want all
basitems in the rss, so I can not simply use something like//bas, only those under the//foo/bar/basshould be selected but not//foo/baz/bas.Example data:
/html/body/foo/bar/bas/1/html/body/foo/bar/bas/2/html/body/foo/bar/bas/.../html/body/foo/baz/bas/1/html/body/foo/baz/bas/2/html/body/foo/baz/bas/...Now as mentioned, I want only
//foo/bar/basin the output.Currently I think it is only possible to provide one path segment including a child in the
finding news itemssection, right? I mean something like//fooworks but//foo/bar/basnot, right?To provide an real world scenario: I want to add an RSS for https://www.ebay-kleinanzeigen.de/s-giessen/kettcar-daytona/k0l4710r50 where I only want those results above the
Alternative Anzeigen in der Umgebungsection where the solution would be//div[@class="position-relative"]/ul/li/articlebecause the result needs to be relative to theposition-relative-div, otherwise I have allarticles in the result.Beta Was this translation helpful? Give feedback.
All reactions