jruby での nokogiri 。チュートリアルの勝手なまとめおまけ - yo

さて、
jruby における nokogiri のチュートリアルの勝手なまとめ、いちおう終わりましたけど、nokogiri というか xml や xhtml を含む html は奥が深いです。

チュートリアルの勝手なまとめ直しは、そのさわりを紹介したにすぎません。実のところ、私もまだ勉強中でよく分かってません。

nokogiri の細かい api については本家サイト（http://nokogiri.org/）の api 一覧でも見てもらうか、ruby のメタな機能を利用してメソッドを探るか、そうでないならソースを読むか、でしょうかね？
（英語サイトでもっと詳しい解説がどこかにあるかもしれませんが）

● 試しに Nokogiri::HTML::Document の public_method を求めてみる。



$KCODE = 'UTF8'

require 'rubygems'

require 'nokogiri'

require 'pp'
@doc = Nokogiri::HTML("<html><body><h1>Mr. Belvedere Fan Club</h1></body></html>")
puts "@doc の class を出力 : #{@doc.class}"

puts "@doc の public_methods を出力 : #{@doc.public_methods.sort.pretty_inspect}"# 実行結果は略

私が nokogiri に興味を持ってのは、面白半分に Rails を触ってみて、Rails もそうですけど、xhtml がやたらに難しいな、と思ったからです。もっと楽に xhtml を扱えないかなと。

もちろん、HTMLエディタといったものはあります。ubuntu 上でも KompoZer などが利用できます。
http://sourceforge.jp/magazine/10/07/16/098249
http://kompozer.cssmaid.net/

しかし、それにしても制約がいろいろあって、思うにまかせません。もっと自由に簡単に xhtml を扱える方法はないものか、という考えから、nokogiri を調べることにしてみたのです。

私もそのあたりは研究中です。みなさんも分かったことがあったら、ブログでもなんででもレポートして下さい。

● xhtml をいろいろ操作してみる土台的ななにか

# ○基本となる xhtml ファイル



$KCODE = 'UTF8'
require 'rubygems'

require 'nokogiri'
@doc = Nokogiri::HTML <<-EOHTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>

<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />

<title>タイトル</title>

<style type="text/css">

</style>

</head>
<body>

<h1>見出し１</h1>

<p>段落１</p>

<p>段落２</p>

</body>
</html>

EOHTML
puts "■ 基本となる xhtml ファイル を出力"

puts @doc.to_html


=begin
実行結果

■ 基本となる xhtml ファイル を出力

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

<title>タイトル</title>

<style type="text/css">

</style>

</head>

<body>

<h1>見出し１</h1>

<p>段落１</p>

<p>段落２</p>

</body>

</html>
=end


puts "■ 処理の基準となるタグの検索（search メソッドを使って）"
puts "□ html タグの検索"

puts @doc.search("html")
puts "□ head タグの検索"

puts @doc.search("head")
puts "□ body タグの検索"

puts @doc.search("body")
puts "■ xpath による検索"

tag_contents = @doc.xpath("//p")

puts '■　@doc.xpath("//p") の要素 を出力。'

tag_contents.each{|elem| puts elem}
=begin

実行結果

■ 処理の基準となるタグの検索（search メソッドを使って）

□ html タグの検索

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

<title>タイトル</title>

<style type="text/css">

</style>

</head>

<body>

<h1>見出し１</h1>

<p>段落１</p>

<p>段落２</p>

</body>

</html>

□ head タグの検索

<head>

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

<title>タイトル</title>

<style type="text/css">

</style>

</head>

□ body タグの検索

<body>

<h1>見出し１</h1>

<p>段落１</p>

<p>段落２</p>

</body>

■ xpath による検索

■　@doc.xpath("//p") の要素 を出力。

<p>段落１</p>

<p>段落２</p>
## ### 補足
tag_contents = @doc.xpath("//p")[0]

あるいは

tag_contents = @doc.xpath("//p")[1]

などとも指定できる。
また、次の形式もある。
css_elems = @doc.css("html head")
=end
puts '■　@doc.css("html head") の要素 を出力。'

css_elems = @doc.css("html head")

css_elems.each{|elem| puts elem}
=begin

実行結果

■　@doc.css("html head") の要素 を出力。

<head>

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

<title>タイトル</title>

<style type="text/css">

</style>

</head>

=end


puts "■ 上記タグの子供のリストを求める。"
puts "□ htmlタグの子供を求める"

puts @doc.search("html").children
puts "□ headタグの子供を求める"

puts @doc.search("head").children
puts "□ bodyタグの子供を求める"

puts @doc.search("body").children# 実行結果、略

○ノード（xml の要素）の移動・属性変更・ノードの新規作成・ラッピングに関しては、
先の勝手にまとめるチュートリアルに書きました。

では。