How to extract blocks from Gutenberg

So Gutenberg, the future of WordPress content editing, allows users to create, add and re-use blocks in posts and pages in a nice UI. These blocks are added in WordPress’ the_content inside HTML-comments to ensure backwards-compatibility. For WP YouTube Lyte I needed to extract information to be able to replace Gutenberg embedded YouTube with Lytes and I took the regex-approach. Ugly but efficient, no?

But what if you need a more failsafe method to extract Gutenberg block-data from a post? I took a hard look at the Gutenberg code and came up with this little proof-of-concept to extract all data in a nice little (or big) array:

function gutenprint($html) {
  	// check if we need to and can load the Gutenberg PEG parser
  	if ( !class_exists("Gutenberg_PEG_Parser") && file_exists(WP_PLUGIN_DIR."/gutenberg/lib/load.php") ) {

  	if ( class_exists("Gutenberg_PEG_Parser") && is_single() ) {
	  // do the actual parsing
	  $parser = new Gutenberg_PEG_Parser;
	  $result = $parser->parse(  _gutenberg_utf8_split( $html ) );
	  // we need to see the HTML, not have it rendered, so applying htmlentities
		function (&$result) { $result = htmlentities($result); }

	  // and dump the array to the screen
	  echo "<h1>Gutenprinter reads:</h1><pre>";
	  echo "</pre>";
	} else {
	  echo "Not able to load Gutenberg parser, are you sure you have Gutenberg installed?";
  	// remove filter to avoid double output
  	// and return the HTML
	return $html;

I’m not going to use it for WP YouTube Lyte as I consider the overhead not worth it (for now), but who know it could be useful for others?

2 thoughts on “How to extract blocks from Gutenberg

Leave a Reply

Your email address will not be published. Required fields are marked *