Add HTTP agent support for connection pooling#619
Conversation
Adds optional httpAgent and httpsAgent parameters to prepareRequestFunction result, enabling connection reuse to upstream proxies. Changes: - Add httpAgent/httpsAgent to PrepareRequestFunctionResult type - Add httpAgent/httpsAgent to HandlerOpts type in server, forward, and chain - Pass agents to http.request() and https.request() in forward.ts - Pass agents to upstream CONNECT requests in chain.ts - Thread agents through prepareRequestHandling() in server.ts Benefits: - Enables connection pooling with keepAlive - Supports sticky IP for residential proxies - Backward compatible (agents are optional) - Maintains existing getConnectionStats() functionality 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Hey @paveldudka this looks cool, thanks for the PR, our team will review it |
|
Hello @paveldudka, |
yeah, differences in socket behavior across platforms. Should be fixed now. |
|
The tests are still failing, pls @paveldudka |
oof. Was fixing linter failure and messed too much with imports, so broke tests... |
There was a problem hiding this comment.
Hi @paveldudka! Big thanks for your contribution and quick fixes 🎉
Here are few issues from my side that should be addressed before merge:
- Missing test coverage for CONNECT tunneling (chain handler). Currently all targets in tests are HTTP. Please add a tests for HTTPS target.
- Missing HTTPS upstream proxy tests (your last test claims that it verifies HTTPS agent alognside with HTTP, however no code there verifies that httpsAgent is actually used). It would be good to have this covered as separate test.
- Document SOCKS limitation. Agents don't work with SOCKS upstream proxies (forward_socks.ts, chain_socks.ts). Would be good to add some JSDoc or mention it in README.
- Nice to have: Add usage example of agents into README.
Also, you can treat my comments for code itself as nice to have. Overall implementation looks good. If you'll manage them - good, if not - also good ;)
Let me know if you'll need any clarification.
| httpAgent.destroy(); | ||
| }); | ||
|
|
||
| it('uses separate agents for HTTP and HTTPS upstream proxies', async () => { |
There was a problem hiding this comment.
Same as https://github.com/apify/proxy-chain/pull/619/files#r2524036066.
What about instead to verify behavior when using HTTP vs HTTPS upstream proxies (like connection characteristics, actual HTTPS connections being made) rather than checking which internal method was invoked?
| httpsAgent.destroy(); | ||
| }); | ||
|
|
||
| it('reuses connections with keepAlive agents (sticky IP simulation)', async () => { |
There was a problem hiding this comment.
What about instead of verifing implementation (hooks into 'free' event, inspect socket implementation details like remoteAddress and port, compare socketIds) to verify an observable effect of connection pooling (behavior)?
What we can do instead is to count actual connections at the upstream proxy server. With this we can verify that fewer connections are made than requests (observable behavior of pooling).
What do you think?
There was a problem hiding this comment.
that's a great idea. Lemme dwell on it
|
@bliuchak , reworked tests to move away from testing implementation details |
|
@paveldudka , thanks for adding examples and document limitation 👏 However the main issue is still there - lack of test coverage for HTTPS upstream proxies (both forward and chain mode).
Could you please add these two tests? |
@bliuchak , If I understand the pipeline correctly, in forward mode (HTTP requests), the proxy terminates each request, allowing the agent to pool TCP connections for subsequent requests. In chain mode (HTTPS via CONNECT), each CONNECT establishes a long-lived tunnel for the client's TLS connection to the target - these tunnels can't be "reused" for different targets, so each requires its own upstream proxy connection. But at this point Im hitting my domain knowledge limitations in this space, so will defer to you to decide how to proceed. Locally I have a forked version of proxy-chain and we use http upstream, so in the short term, we are unblocked |
|
@paveldudka, I got your point. To be fair we don't have much HTTPS tests with tooling yet (especially for HTTPS upstreams). What would I propose is to merge this as-is and open a new ticket to add missing coverage later (I can create it). Here are my arguments:
@jirimoravcik What do you think? |
jirimoravcik
left a comment
There was a problem hiding this comment.
Thank you, looks great.
One small thing, could we bump the package version in package.json? Thanks
I agree |
There was a problem hiding this comment.
@paveldudka please bump package version to 2.6.0 and we're gonna merge it ;)
|
FYI #620 |
sounds good. Thanks a lot for a thorough review! @bliuchak |
Problem
Currently, proxy-chain creates a new TCP connection to the upstream proxy for every request. This causes:
Solution
Add optional
httpAgentandhttpsAgentparameters toprepareRequestFunctionresult, enabling users to provide persistent HTTP agents withkeepAlive: true.Changes
httpAgent?andhttpsAgent?toPrepareRequestFunctionResulttypeserver.ts,forward.ts,chain.ts)http.request()andhttps.request()callsBenefits
getConnectionStats(), events, etc. still workUsage
Testing
Added comprehensive test suite in
test/http-agent.js:getConnectionStats()preservationAll tests passing ✅