* fix: patch failing summarizer tests for anthropic claude 3.5
* fix: carveout for gemini-2.5-flash because it doesn't do the send_message tool call
* fix: deprecate old gemini test now that model is unavailable
* fix: deprecate old gemini test now that model is unavailable
* fix: deprecate old gemini test now that model is unavailable
* fix: patch flash flakiness
* fix: relax the gemini 2.5 flash test
* fix: relax the gemini 2.5 flash test
* fix: relax again
* fix: another flash fix
* fix: relax gpt-4o-mini
* fix: swap 4o-mini for 4.1
* fix: drop 4o-mini