Navigate:
~$FLOKI0.0%

Floki: HTML parser with CSS selectors

Elixir HTML parser with CSS selectors and multiple parser backends.

LIVE RANKINGS • 09:52 AM • STEADY
OVERALL
#441
229
DEVELOPER TOOLS
#97
54
30 DAY RANKING TREND
ovr#441
·Devel#97
STARS
2.1K
FORKS
163
7D STARS
0
7D FORKS
0
See Repo:
Share:

Learn more about Floki

Floki is an HTML parsing library for Elixir that converts HTML documents into structured node trees and supports CSS selector queries. It uses a default mochiweb_html parser backend but can be configured to use alternative parsers like fast_html (C-based lexbor) or html5ever (Rust-based). The library represents HTML nodes as tuples containing tag names, attributes, and child nodes, providing functions for parsing, searching, and manipulating HTML content. Common use cases include web scraping, HTML processing, and document transformation in Elixir applications.

Floki

1

Multiple Parser Backends

Supports three different HTML parsers including mochiweb_html, fast_html (C-based), and html5ever (Rust-based) for different performance and correctness trade-offs.

2

CSS Selector Support

Implements CSS selector syntax for node searching including attribute selectors, combinators, and pseudo-selectors for flexible HTML querying.

3

Tuple-Based Representation

Uses a simple tuple structure {tag_name, attributes, children_nodes} to represent HTML nodes, making it easy to pattern match and manipulate in Elixir.


# Parse HTML document and find elements using CSS selectors
html = """
<html>
  <body>
    <section id="content">
      <p class="headline">Floki</p>
      <span class="headline">Enables search using CSS selectors</span>
      <a href="https://github.com/philss/floki">Github page</a>
    </section>
  </body>
</html>
"""

{:ok, document} = Floki.parse_document(html)

# Find elements by CSS selector
Floki.find(document, "p.headline")
# => [{"p", [{"class", "headline"}], ["Floki"]}]


vv0.38.0

It is a great addition that enables finding elements containing

  • This version adds initial support for the `:has` pseudo-selector.
  • `"div:has(h1)"`
  • `"div:has(h1, p, span)"`
  • `"div:has(p.foo)"`
  • `"div:has(img[src='https://example.com'])"`
vv0.37.1

Move regex declaration from module tag to inside function. This is a fix to be compatible with the upcoming OTP 28.

  • Add Elixir 1.18 to the CI workflow by @philss in
  • Bump ex_doc from 0.35.1 to 0.36.1 by @dependabot in
  • Bump ex_doc from 0.36.1 to 0.37.1 by @dependabot in
  • Fix versions we describe in README.md by @philss in
  • Bump credo from 1.7.10 to 1.7.11 by @dependabot in
vv0.37.0

v0.37.0

  • Add `Floki.css_escape/1` -
  • Fix bug propagating identity encoder in `raw_html/2` -
  • Remove support for Elixir 1.13 and OTP 22.
  • Drop support for Elixir 1.13 by @philss in
  • Bump credo from 1.7.8 to 1.7.9 by @dependabot in


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers