Navigate:
~$FLOKI0.1%

Floki: HTML parser with CSS selectors

Elixir HTML parser with CSS selectors and multiple parser backends.

LIVE RANKINGS • 01:41 PM • STEADY
OVERALL
#382
7
DEVELOPER TOOLS
#80
1
30 DAY RANKING TREND
ovr#382
·Devel#80
STARS
2.1K
FORKS
163
7D STARS
+2
7D FORKS
0
See Repo:
Share:

Learn more about Floki

Floki is an HTML parsing library for Elixir that converts HTML documents into structured node trees and supports CSS selector queries. It uses a default mochiweb_html parser backend but can be configured to use alternative parsers like fast_html (C-based lexbor) or html5ever (Rust-based). The library represents HTML nodes as tuples containing tag names, attributes, and child nodes, providing functions for parsing, searching, and manipulating HTML content. Common use cases include web scraping, HTML processing, and document transformation in Elixir applications.

Floki

1

Multiple Parser Backends

Supports three different HTML parsers including mochiweb_html, fast_html (C-based), and html5ever (Rust-based) for different performance and correctness trade-offs.

2

CSS Selector Support

Implements CSS selector syntax for node searching including attribute selectors, combinators, and pseudo-selectors for flexible HTML querying.

3

Tuple-Based Representation

Uses a simple tuple structure {tag_name, attributes, children_nodes} to represent HTML nodes, making it easy to pattern match and manipulate in Elixir.


# Parse HTML document and find elements using CSS selectors
html = """
<html>
  <body>
    <section id="content">
      <p class="headline">Floki</p>
      <span class="headline">Enables search using CSS selectors</span>
      <a href="https://github.com/philss/floki">Github page</a>
    </section>
  </body>
</html>
"""

{:ok, document} = Floki.parse_document(html)

# Find elements by CSS selector
Floki.find(document, "p.headline")
# => [{"p", [{"class", "headline"}], ["Floki"]}]


vv0.38.0

Adds initial support for the :has pseudo-selector for finding elements containing specific child elements.

  • This version adds initial support for the :has pseudo-selector
  • Support for div:has(h1), div:has(h1, p, span), div:has(p.foo), and div:has(img[src='url']) selectors
vv0.37.1

Move regex declaration from module tag to inside function. This is a fix to be compatible with the upcoming OTP 28.

  • Add Elixir 1.18 to the CI workflow
  • Bump ex_doc from 0.35.1 to 0.37.1
  • Fix versions we describe in README.md
  • Bump credo from 1.7.10 to 1.7.11
vv0.37.0

Add CSS escape function, fix raw_html encoding bug, and drop support for older Elixir versions.

  • Add Floki.css_escape/1 function
  • Fix bug propagating identity encoder in raw_html/2
  • Remove support for Elixir 1.13 and OTP 22
  • Bump credo from 1.7.8 to 1.7.9


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers