Gossamer Forum
Home : General : Internet Technologies :

Extracting data from an html

Quote Reply
Extracting data from an html
Hi,

Does anyone know an easy way or where to look to extract the data values from a <FORM></FORM> with in an <HTML> as it would be if the form were sent by POST or GET.

I have the raw form within the html as it would be displayed by a browser i need to get the form values from the html form .
<select name="reg" class="inputclass25" id="reg">
<option value="">- Reg -</option>
<option value="A" >A</option>
<option value="L" SELECTED >L</option>
<option value="55" >55</option>
</select>

To get the value L from the form etc .. ?

Cheers for any time

Norvin
Crazy
Quote Reply
Re: [Syte] Extracting data from an html In reply to
What is the "end result" that you are trying to accomplish?

If you are not actually going to *submit* the form then you could use JavaScript and/or AJAX.

<script language="JavaScript">
function doSomething() {
if (form.reg.value == "L"){
alert('something');
}
}
</script>

<select ... onChange="doSomething()">
...
</select>
Quote Reply
Re: [Syte] Extracting data from an html In reply to
Code:
if (document.forms[0].reg.value == "L")
// do something
If that's all you want to do, you might just have reg with an onchange event handler (something like "if (this.value == 'L') /* do something */"). But if you need to do multiple things, you might install consider looping through all of the form fields...
Code:
for (var i = 0; i < document.forms[0].elements.length; i++) {
// do something with document.forms[0].elements.value
}
or even loop through all of the elements of all of the forms on the page

Code:
for (var x = 0; x < document.forms.length; x++) {
for (var y = 0; y < document.forms[x].elements.length; y++) {
// do something with document.forms[x].elements[y].value
}
}

Are you even trying to do this client side? If you are trying to do it server side and you are using something like PHP or Perl with the CGI module, you'd probably use (in PHP)
Code:
if ($_POST['reg'] == 'L') // or $_GET['reg'] if the form method is get
// do something

Or if you don't want to incur the overhead of full CGI support and are using a language that doesn't automatically extract stuff, you might check
/\breg=L\b/ in the POST data (on stdin) or the QUERY_STRING environment variable.

Please read http://www.catb.org/~esr/faqs/smart-questions.html by Eric S Rayment.
Quote Reply
Re: [mkp] Extracting data from an html In reply to
Thanks for the response,

Yes I am on a Raq4 using perl and have a file that is a raw html containing a <form> </form>
I require to parse and extract the the values from the contents of the html and form rather than have the page displayed and then hitting the submit button each time to POST the data.
I need to get the variables ie
$name
$address
and so on from the form embedded in the html
So i have something like

open (MLIST, "/home/sites/site5/web/aa/html_and _form.html");
$myform = <MLIST>;
close (MLIST);

for (var i = 0; i < $myform[0].elements.length; i++) {
$myform[0].elements.value ;
}
### I need to get the form (contents) (values) and put them into values ie $ Name


Cheers Again


Norvin
Crazy
Quote Reply
Re: [Syte] Extracting data from an html In reply to
I'm not sure why you can't just accept the parameters from POST or GET, but if you're sure parsing the document is what you need, the only stable, non-hack approach is to actually parse the document - i.e., not just try to extract values with a couple REs.

Consider the HTML::Form module, for example.


Just some notes on file handling... it's typically not a good idea to slurp arbitrary length documents unless you are absolutely sure that they will all fit in RAM (and you get a benefit from doing so). If you were going to hack your HTML document up with regular expressions, you might make sure that your form elements were not split among multiple lines (e.g., no inputs that are like <input type=text\nname=whatever\nvalue=something>). In fact, 'while' has some special magic just for [more] efficient file handling:

Code:
while (<FILE>) {
# $_ automagically holds the last line read
}


I'd never slurp and entire file unless it were actually useful (like if I have to look at certain lines many times, but I don't know which lines those are ahead of time) or the file were really tiny and might be read / written by multiple processes at once. Otherwise, I use the equivalent of that 'while' loop or some other IO spiff I know works.

Also, /<input name="(.*?)" value="(.*?)">/ is more efficient on a 100 character line than on a 5000 character file. Especially considering that single lines are usually within the 10 to 80 character range and HTML files can get well above 10KB (10,240 characters).

Last edited by:

mkp: May 3, 2006, 12:18 AM
Quote Reply
Re: [Syte] Extracting data from an html In reply to
CGI.pm is your friend =)

Code:
use CGI;
my $IN = new CGI;

my $reg = $IN->param('reg');

print $IN->header();
print "Reg number: $reg";

Hope that helps.

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Extracting data from an html In reply to
But is s/he trying to get the POST/GET variables, or to parse the HTML file? The question is not too clear, and the JavaSHcript (JavaScript + sh ?) example didn't really help the matter.Crazy

Last edited by:

mkp: May 23, 2006, 8:41 PM
Quote Reply
Re: [mkp] Extracting data from an html In reply to
First off, thanks for all the feedback, I have managed to accomplish this but in a very around about way and poorly done so am still banging my head on it.
Perhaps I have not explained what I am attempting to do very well.

Basically a person would type in their code number or password username and hit the submit button and their details will appear on screen in a html page as a normal form ready to be resubmitted if the submit button is hit.
So basically I have an html page with a form and the persons details in that form. I now need to extract (parse) the details from the form (within the html page) such as the name value address values …etc
So I can now put them into another dbman database.
So taking the values from an html displayed form and sending those details to another database if correct or editing them and then sending to the new database.
In essence I need the values of the form in variables in the html extracted to post to another dbman.

Cheers again for any time

Norvin
BlushCrazy
Quote Reply
Re: [Syte] Extracting data from an html In reply to
So let me get this straight:
  1. Person enters data into and submits form
  2. Server generates HTML document with variables from the form
  3. User submits HTML document generated in step 2 to the server
  4. Server parses HTML document generated in step 2 and stores information in a database
Yeah, it sounds really round-about to me too. Why don't you just store the variables in the database at step 2? Or, in step 3 the user could re-submit the same form they already submitted (i.e., "Confirm that you entered stuff correctly"), then store the data in the database in step 4.

I don't understand where HTML parsing comes into play. Just use CGI.