arun() - Crawl4AI Documentation (v0.6.x) (2025)

In Crawl4AI’s latest configuration model, nearly all parameters that once went directly to arun() are now part of CrawlerRunConfig. When calling arun(), you provide:

await crawler.arun( url="https://example.com",  config=my_run_config)

Below is an organized look at the parameters that can go inside CrawlerRunConfig, divided by their functional areas. For Browser settings (e.g., headless, browser_type), see BrowserConfig.

1. Core Usage

from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheModeasync def main(): run_config = CrawlerRunConfig( verbose=True, # Detailed logging cache_mode=CacheMode.ENABLED, # Use normal read/write cache check_robots_txt=True, # Respect robots.txt rules # ... other parameters ) async with AsyncWebCrawler() as crawler: result = await crawler.arun( url="https://example.com", config=run_config ) # Check if blocked by robots.txt if not result.success and result.status_code == 403: print(f"Error: {result.error_message}")

Key Fields:- verbose=True logs each crawl step.  - cache_mode decides how to read/write the local crawl cache.

2. Cache Control

cache_mode (default: CacheMode.ENABLED)
Use a built-in enum from CacheMode:

  • ENABLED: Normal caching—reads if available, writes if missing.
  • DISABLED: No caching—always refetch pages.
  • READ_ONLY: Reads from cache only; no new writes.
  • WRITE_ONLY: Writes to cache but doesn’t read existing data.
  • BYPASS: Skips reading cache for this crawl (though it might still write if set up that way).
run_config = CrawlerRunConfig( cache_mode=CacheMode.BYPASS)

Additional flags:

  • bypass_cache=True acts like CacheMode.BYPASS.
  • disable_cache=True acts like CacheMode.DISABLED.
  • no_cache_read=True acts like CacheMode.WRITE_ONLY.
  • no_cache_write=True acts like CacheMode.READ_ONLY.

3. Content Processing & Selection

3.1 Text Processing

run_config = CrawlerRunConfig( word_count_threshold=10, # Ignore text blocks <10 words only_text=False, # If True, tries to remove non-text elements keep_data_attributes=False # Keep or discard data-* attributes)

3.2 Content Selection

run_config = CrawlerRunConfig( css_selector=".main-content", # Focus on .main-content region only excluded_tags=["form", "nav"], # Remove entire tag blocks remove_forms=True, # Specifically strip <form> elements remove_overlay_elements=True, # Attempt to remove modals/popups)

3.3 Link Handling

run_config = CrawlerRunConfig( exclude_external_links=True, # Remove external links from final content exclude_social_media_links=True, # Remove links to known social sites exclude_domains=["ads.example.com"], # Exclude links to these domains exclude_social_media_domains=["facebook.com","twitter.com"], # Extend the default list)

3.4 Media Filtering

run_config = CrawlerRunConfig( exclude_external_images=True # Strip images from other domains)

4. Page Navigation & Timing

4.1 Basic Browser Flow

run_config = CrawlerRunConfig( wait_for="css:.dynamic-content", # Wait for .dynamic-content delay_before_return_html=2.0, # Wait 2s before capturing final HTML page_timeout=60000, # Navigation & script timeout (ms))

Key Fields:

  • wait_for:
  • "css:selector" or
  • "js:() => boolean"
    e.g. js:() => document.querySelectorAll('.item').length > 10.

  • mean_delay & max_range: define random delays for arun_many() calls. 

  • semaphore_count: concurrency limit when crawling multiple URLs.

4.2 JavaScript Execution

run_config = CrawlerRunConfig( js_code=[ "window.scrollTo(0, document.body.scrollHeight);", "document.querySelector('.load-more')?.click();" ], js_only=False)
  • js_code can be a single string or a list of strings. 
  • js_only=True means “I’m continuing in the same session with new JS steps, no new full navigation.”

4.3 Anti-Bot

run_config = CrawlerRunConfig( magic=True, simulate_user=True, override_navigator=True)
- magic=True tries multiple stealth features.  - simulate_user=True mimics mouse movements or random delays.  - override_navigator=True fakes some navigator properties (like user agent checks).

5. Session Management

session_id:

run_config = CrawlerRunConfig( session_id="my_session123")
If re-used in subsequent arun() calls, the same tab/page context is continued (helpful for multi-step tasks or stateful browsing).
run_config = CrawlerRunConfig( screenshot=True, # Grab a screenshot as base64 screenshot_wait_for=1.0, # Wait 1s before capturing pdf=True, # Also produce a PDF image_description_min_word_threshold=5, # If analyzing alt text image_score_threshold=3, # Filter out low-score images)
Where they appear:- result.screenshot → Base64 screenshot string.- result.pdf → Byte array with PDF data.

7. Extraction Strategy

For advanced data extraction (CSS/LLM-based), set extraction_strategy:

run_config = CrawlerRunConfig( extraction_strategy=my_css_or_llm_strategy)

The extracted data will appear in result.extracted_content.

8. Comprehensive Example

Below is a snippet combining many parameters:

import asynciofrom crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheModefrom crawl4ai.extraction_strategy import JsonCssExtractionStrategyasync def main(): # Example schema schema = { "name": "Articles", "baseSelector": "article.post", "fields": [ {"name": "title", "selector": "h2", "type": "text"}, {"name": "link", "selector": "a", "type": "attribute", "attribute": "href"} ] } run_config = CrawlerRunConfig( # Core verbose=True, cache_mode=CacheMode.ENABLED, check_robots_txt=True, # Respect robots.txt rules # Content word_count_threshold=10, css_selector="main.content", excluded_tags=["nav", "footer"], exclude_external_links=True, # Page & JS js_code="document.querySelector('.show-more')?.click();", wait_for="css:.loaded-block", page_timeout=30000, # Extraction extraction_strategy=JsonCssExtractionStrategy(schema), # Session session_id="persistent_session", # Media screenshot=True, pdf=True, # Anti-bot simulate_user=True, magic=True, ) async with AsyncWebCrawler() as crawler: result = await crawler.arun("https://example.com/posts", config=run_config) if result.success: print("HTML length:", len(result.cleaned_html)) print("Extraction JSON:", result.extracted_content) if result.screenshot: print("Screenshot length:", len(result.screenshot)) if result.pdf: print("PDF bytes length:", len(result.pdf)) else: print("Error:", result.error_message)if __name__ == "__main__": asyncio.run(main())

What we covered:

1. Crawling the main content region, ignoring external links.  2. Running JavaScript to click “.show-more”.  3. Waiting for “.loaded-block” to appear.  4. Generating a screenshot & PDF of the final page.  5. Extracting repeated “article.post” elements with a CSS-based extraction strategy.

9. Best Practices

1. Use BrowserConfig for global browser settings (headless, user agent).  2. Use CrawlerRunConfig to handle the specific crawl needs: content filtering, caching, JS, screenshot, extraction, etc.  3. Keep your parameters consistent in run configs—especially if you’re part of a large codebase with multiple crawls.  4. Limit large concurrency (semaphore_count) if the site or your system can’t handle it.  5. For dynamic pages, set js_code or scan_full_page so you load all content.

10. Conclusion

All parameters that used to be direct arguments to arun() now belong in CrawlerRunConfig. This approach:

  • Makes code clearer and more maintainable. 
  • Minimizes confusion about which arguments affect global vs. per-crawl behavior. 
  • Allows you to create reusable config objects for different pages or tasks.

For a full reference, check out the CrawlerRunConfig Docs. 

Happy crawling with your structured, flexible config approach!

arun() - Crawl4AI Documentation (v0.6.x) (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Merrill Bechtelar CPA

Last Updated:

Views: 5883

Rating: 5 / 5 (70 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Merrill Bechtelar CPA

Birthday: 1996-05-19

Address: Apt. 114 873 White Lodge, Libbyfurt, CA 93006

Phone: +5983010455207

Job: Legacy Representative

Hobby: Blacksmithing, Urban exploration, Sudoku, Slacklining, Creative writing, Community, Letterboxing

Introduction: My name is Merrill Bechtelar CPA, I am a clean, agreeable, glorious, magnificent, witty, enchanting, comfortable person who loves writing and wants to share my knowledge and understanding with you.