Skip to content Skip to sidebar Skip to footer

Parsing Web Javascript Content To String Using Android

I would like to read the content of a website into a string. I started by using jsoup as follows: private void getWebsite() { new Thread(new Runnable() { @Override

Solution 1:

I'd suggest looking at the network tab in chrome developer tools and then submitting the request to load up the URL ... you'll see a lot of requests going back/forth.

Two that seem to contain relevant content are:

https://merhav.nli.org.il/primo_library/libweb/webservices/rest/primo-explore/v1/pnxs?blendFacetsSeparately=false&getMore=0&inst=NNL&lang=iw_IL&limit=10&newspapersActive=false&newspapersSearch=false&offset=0&pcAvailability=true&q=any,contains,%D7%94%D7%90%D7%A8%D7%99+%D7%A4%D7%95%D7%98%D7%A8&qExclude=&qInclude=&refEntryActive=false&rtaLinks=true&scope=Local&skipDelivery=Y&sort=rank&tab=default_tab&vid=NLI

which requires a token to access token which comes from:

https://merhav.nli.org.il/primo_library/libweb/webservices/rest/v1/guestJwt/NNL?isGuest=true&lang=iw_IL&targetUrl=https%253A%252F%252Fmerhav.nli.org.il%252Fprimo-explore%252Fsearch%253Ftab%253Ddefault_tab%2526search_scope%253DLocal%2526vid%253DNLI%2526lang%253Diw_IL%2526query%253Dany%252Ccontains%252C%2525D7%252594%2525D7%252590%2525D7%2525A8%2525D7%252599%252520%2525D7%2525A4%2525D7%252595%2525D7%252598%2525D7%2525A8&viewId=NLI

.. which likely requires the JSessoinId which comes from:

https://merhav.nli.org.il/primo_library/libweb/webservices/rest/v1/configuration/NLI

.. so in order to replicate the chain of calls you could use JSoup to make these (and any other relevant) HTTP GET requests, pull out the relevant HTTP headers (typically: session, referer, accept and some other cookie values potentially)

Its not going to be straight forward, but you're essentially looking for a url on the page in one of the JSON responses from one of the network requests:

enter image description here

Once you know which request you want to recreate, you just have to work back up the list of requests and try to recreate them.

This one is not an easy one and would require a lot of time to recreate - my advice if you're going to attempt it, forget trying to parse HTML, try to rebuild/recreate the chain of 3 or so HTTP requests to the back end to get the relevant JSON and parse that. You can often pick apart the website but this ones a big job

Post a Comment for "Parsing Web Javascript Content To String Using Android"