(2-qism)
def preprocess(data):
inputs = tokenizer(data["context"], data["question"], truncation=True, padding="max_length", max_length=384)
start_positions = []
end_positions = []
for i, answer in enumerate(data["answers"]):
start_positions.append(answer["answer_start"][0])
end_positions.append(answer["answer_start"][0] + len(answer["text"][0]))
inputs["start_positions"] = start_positions
inputs["end_positions"] = end_positions
return inputs
Nima qilinyapti?
1)Matn (context) va savollar (question) tokenizatsiya qilinmoqda.
2)start_positions va end_positions – javob qaysi joyda boshlanib, qayerda tugashini belgilaydi.
4. Ma’lumotlar formatini to‘g‘ri qilish:
dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "start_positions", "end_positions"])
Bu qismda dataset PyTorch formatiga o‘tkazilyapti, chunki Hugging Face Trainer kutubxonasi PyTorch bilan ishlaydi.
5. O‘qitish parametrlari:
output_dir="./quick_model",
per_device_train_batch_size=8,
num_train_epochs=1,
save_steps=500,
evaluation_strategy="no",
fp16=torch.cuda.is_available(),
)
Muammosiz ishlashi uchun 1 ta epoch (1 marta datasetdan o‘tish) ishlatilmoqda.
Agar GPU mavjud bo‘lsa, FP16 tezlashtirish yoqiladi. 6.Modelni o‘qitish:
model=model,
args=training_args,
train_dataset=dataset,
)
trainer.train()
Bu qismda model SQuAD datasetida o‘qitilmoqda.
Trainer klassi avtomatik ravishda ma’lumotlarni yuklaydi, modellarni kuzatadi va optimallashtirishni amalga oshiradi.
Kutilayotgan natija:
✅ Model matndan javoblarni topishni o‘rganadi
✅ 1% ma’lumotda tezroq o‘qitiladi (taxminan 5-10 daqiqa)
✅ O‘qitilgan modelni keyinchalik foydalanish mumkin.
Xulosa:
Bu kod savol-javob (Question Answering) tizimi yaratadi va DistilBERT modelini SQuAD datasetida o‘qitadi.O‘qitish HP Victus noutbuki uchun optimallashtirilgan (kam dataset, 1 epoch).Model o‘qitilgandan keyin, unga yangi savollar berib, javoblarni topish uchun foydalanish mumkin!
def preprocess(data):
inputs = tokenizer(data["context"], data["question"], truncation=True, padding="max_length", max_length=384)
start_positions = []
end_positions = []
for i, answer in enumerate(data["answers"]):
start_positions.append(answer["answer_start"][0])
end_positions.append(answer["answer_start"][0] + len(answer["text"][0]))
inputs["start_positions"] = start_positions
inputs["end_positions"] = end_positions
return inputs
Nima qilinyapti?
1)Matn (context) va savollar (question) tokenizatsiya qilinmoqda.
2)start_positions va end_positions – javob qaysi joyda boshlanib, qayerda tugashini belgilaydi.
4. Ma’lumotlar formatini to‘g‘ri qilish:
dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "start_positions", "end_positions"])
Bu qismda dataset PyTorch formatiga o‘tkazilyapti, chunki Hugging Face Trainer kutubxonasi PyTorch bilan ishlaydi.
5. O‘qitish parametrlari:
output_dir="./quick_model",
per_device_train_batch_size=8,
num_train_epochs=1,
save_steps=500,
evaluation_strategy="no",
fp16=torch.cuda.is_available(),
)
Muammosiz ishlashi uchun 1 ta epoch (1 marta datasetdan o‘tish) ishlatilmoqda.
Agar GPU mavjud bo‘lsa, FP16 tezlashtirish yoqiladi. 6.Modelni o‘qitish:
model=model,
args=training_args,
train_dataset=dataset,
)
trainer.train()
Bu qismda model SQuAD datasetida o‘qitilmoqda.
Trainer klassi avtomatik ravishda ma’lumotlarni yuklaydi, modellarni kuzatadi va optimallashtirishni amalga oshiradi.
Kutilayotgan natija:
✅ Model matndan javoblarni topishni o‘rganadi
✅ 1% ma’lumotda tezroq o‘qitiladi (taxminan 5-10 daqiqa)
✅ O‘qitilgan modelni keyinchalik foydalanish mumkin.
Xulosa:
Bu kod savol-javob (Question Answering) tizimi yaratadi va DistilBERT modelini SQuAD datasetida o‘qitadi.O‘qitish HP Victus noutbuki uchun optimallashtirilgan (kam dataset, 1 epoch).Model o‘qitilgandan keyin, unga yangi savollar berib, javoblarni topish uchun foydalanish mumkin!