Floki: HTML parser with CSS selectors
Elixir HTML parser with CSS selectors and multiple parser backends.
Learn more about Floki
Floki is an HTML parsing library for Elixir that converts HTML documents into structured node trees and supports CSS selector queries. It uses a default mochiweb_html parser backend but can be configured to use alternative parsers like fast_html (C-based lexbor) or html5ever (Rust-based). The library represents HTML nodes as tuples containing tag names, attributes, and child nodes, providing functions for parsing, searching, and manipulating HTML content. Common use cases include web scraping, HTML processing, and document transformation in Elixir applications.
Multiple Parser Backends
Supports three different HTML parsers including mochiweb_html, fast_html (C-based), and html5ever (Rust-based) for different performance and correctness trade-offs.
CSS Selector Support
Implements CSS selector syntax for node searching including attribute selectors, combinators, and pseudo-selectors for flexible HTML querying.
Tuple-Based Representation
Uses a simple tuple structure {tag_name, attributes, children_nodes} to represent HTML nodes, making it easy to pattern match and manipulate in Elixir.
# Parse HTML document and find elements using CSS selectors
html = """
<html>
<body>
<section id="content">
<p class="headline">Floki</p>
<span class="headline">Enables search using CSS selectors</span>
<a href="https://github.com/philss/floki">Github page</a>
</section>
</body>
</html>
"""
{:ok, document} = Floki.parse_document(html)
# Find elements by CSS selector
Floki.find(document, "p.headline")
# => [{"p", [{"class", "headline"}], ["Floki"]}]It is a great addition that enables finding elements containing
- –This version adds initial support for the `:has` pseudo-selector.
- –`"div:has(h1)"`
- –`"div:has(h1, p, span)"`
- –`"div:has(p.foo)"`
- –`"div:has(img[src='https://example.com'])"`
Move regex declaration from module tag to inside function. This is a fix to be compatible with the upcoming OTP 28.
- –Add Elixir 1.18 to the CI workflow by @philss in
- –Bump ex_doc from 0.35.1 to 0.36.1 by @dependabot in
- –Bump ex_doc from 0.36.1 to 0.37.1 by @dependabot in
- –Fix versions we describe in README.md by @philss in
- –Bump credo from 1.7.10 to 1.7.11 by @dependabot in
v0.37.0
- –Add `Floki.css_escape/1` -
- –Fix bug propagating identity encoder in `raw_html/2` -
- –Remove support for Elixir 1.13 and OTP 22.
- –Drop support for Elixir 1.13 by @philss in
- –Bump credo from 1.7.8 to 1.7.9 by @dependabot in
Top in Developer Tools
Related Repositories
Discover similar tools and frameworks used by developers
FlameGraph
Converts profiling data into interactive SVG flame graphs for performance analysis.
corepack
Enforces package manager versions specified in package.json.
exceljs
Parse, modify, and generate XLSX files in Node.js.
node-gyp
Compiles native C/C++ addons for Node.js.
playwright
Cross-browser automation framework with built-in test runner.